0% found this document useful (0 votes)

49 views18 pages

Advanced Clock Gating with Power Compiler

Clock Gating

Uploaded by

GoobeD'Great

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views18 pages

Advanced Clock Gating with Power Compiler

Clock Gating

Uploaded by

GoobeD'Great

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Advanced Clock Gating with Power Compiler

Wolfgang Embacher
Christian Bosch
Martin Embacher
Frank Trautmann

National Semiconductor GmbH

Livry-Gargan-Str. 10
D – 82256 Fuerstenfeldbruck, Germany

[email protected]
[email protected]
[email protected]
[email protected]

ABSTRACT

The popularity of wireless devices and the need for longer battery life makes low power a highly
important design goal for recent applications. Synopsys Power Compiler is a tool that assists
designers achieving this goal in a minimum amount of development time.

In synchronous digital designs with ever increasing clock speeds, clock gating is one of the most
efficient design techniques to reduce the dynamic power consumption. With the aid of Power
Compiler's automatic mechanisms, digital designers can instantly apply clock gating to their
existing design. Depending on structure and type of the design, tremendous savings in the
switching-power consumption can be achieved without any negative impact on performance, timing
or area.

This paper provides a trade-off analysis of Power Compiler’s clock gating along with examples and
recommendations for its usage and optimizations.
Table of Contents

1.0 Introduction............................................................................................................................. 3
2.0 Basic Clock Gating Strategies ............................................................................................... 3
2.1 Abstraction Levels of Clock Gating .......................................................................... 3
2.2 Benefits ........................................................................................................................ 5
3.0 Gating Logic and Timing Requirements .............................................................................. 5
3.1 Combinational Gating Logic ..................................................................................... 5
3.2 Latch Based Gating Logic.......................................................................................... 7
4.0 Power Compiler Clock Gating .............................................................................................. 8
4.1 Principle....................................................................................................................... 8
4.2 Clock Gating Style Controls ...................................................................................... 8
4.3 Design Flow ................................................................................................................. 9
4.4 Limitations and Workarounds ................................................................................ 10
5.0 Analysis and Recommendations.......................................................................................... 11
5.1 Power and Area Savings .......................................................................................... 11
5.2 Coding Style .............................................................................................................. 17
5.3 System Level Improvements.................................................................................... 17
6.0 Conclusion ............................................................................................................................. 17
7.0 Acknowledgements ............................................................................................................... 18
8.0 References.............................................................................................................................. 18

Table of Figures

Figure 2.1: Clock Gating from a System View.................................................................... 4

Figure 2.2: Clock Gating Principle ...................................................................................... 4
Figure 3.1: Illegal Timing (AND gate) ................................................................................. 6
Figure 3.2: Correct Timing (NAND gate)............................................................................ 7
Figure 3.3: Latch based clock gating ................................................................................... 8
Figure 4.1: Design Flow with Clock Gating....................................................................... 10
Figure 5.1: Area Comparison ............................................................................................. 13
Figure 5.2: Low Activity Power Savings............................................................................ 16
Figure 5.3: High Activity Power Savings........................................................................... 16

Table of Tables

Table 3.1: Active enable values and clock gate hold mode ................................................ 6
Table 3.2: Required stable period of enable signal............................................................. 6
Table 4.1: Clock gating options ............................................................................................ 9
Table 5.1: Number of instantiated clock gates .................................................................. 12
Table 5.2: Switching activity statistics ............................................................................... 15

SNUG Europe 2005 2 Advanced Clock Gating with Power Compiler

1.0 Introduction
In synchronous large-scale integration (LSI) designs, the permanent switching of the huge clock
tree causes continuous power consumption. This dynamic power consumption is proportional to the
switching frequency and the switched capacitive load. Thus, especially the high capacitance of the
clock tree in conjunction with its high frequency result in a very large part of the total power
consumption in today’s LSI’s.

Clock gating makes use of the fact that, typically, not all parts of a digital design are in use
simultaneously, and thus do not need to receive an active clock signal all the time. Gating elements
can be inserted into the clock tree, to split its capacitive load into smaller pieces. So, the separate
branches of the tree can be switched on or off individually, depending on whether they are needed
or not. Effectively, the average switching load is reduced.

One of the possible negative impacts of clock gating is the increase of the clock skew by the
propagation delay of the gating elements, which needs to be balanced by the clock tree synthesis
(CTS).

2.0 Basic Clock Gating Strategies

Clock gating represents a design technique that has developed over years. This chapter introduces
the basic techniques and discusses the most common clock gating styles. For better readability, the
further considerations assume that all considered flip-flops are positive edge triggered, and only
single edge triggered designs are used.

2.1 Abstraction Levels of Clock Gating

Generally, clock gating can be seen from different levels of abstraction, i.e. at different hierarchical
levels. At system level, entire functional blocks are enabled or disabled using system-wide enable
signals. At module level, the clock of single register banks is gated locally inside a block. Thus, the
original synchronous load-enable flip-flops (flip-flops with multiplexed input) are replaced with
simple flip-flops without enable pin. The enable functionality of several flip-flops can be attained
with a single clock gate.

System Level

The system level clock gating has a strong relation to the architecture of a design. In most cases, a
separate block of control logic generates the system-wide available activation signals that are used
to control the branches of the clock tree, and thereby enable logical or functional blocks. This kind
of clock gating, or controlled clock distribution is very power efficient in designs like a CPU’s core,
where only small pieces of a huge amount of available logic are active, depending on the desired
function. Figure 2.1 illustrates the controlled clock distribution provided by system level clock
gating.

SNUG Europe 2005 3 Advanced Clock Gating with Power Compiler

Figure 2.1: Clock Gating from a System View

At the current state, Power Compiler does not provide the necessary mechanisms to perform this
kind of complex system level clock gating. Nevertheless, a design optimized at system level, is a
very good starting point for power optimization with Power Compiler in view of ultra low power
consumption. Power Compiler performs clock gating at register level.

In most cases, clock gating at register level means a replacement for synchronous load-enable flip-
flops. The classical implementation of load-enable flip-flops uses a multiplexed data feedback to
keep the existing information in a flip-flop while it is disabled, compare Figure 2.2. Even though
the flip-flop is inactive, it adds switching activity to the design with every clock cycle, and thereby,
power is consumed in the respective register and on its preceding clock network.

Figure 2.2 also illustrates the clock gated equivalent to the classical implementation. The clock is
provided to the register only if its enable signal is active. Thus, any switching activity during
inactive cycles is stopped at the register and the respective part of the clock network.

Figure 2.2: Clock Gating Principle.

”Classical” register with mux-bank (left) and clock gated register (right)

The use of multi-layer clock gating can further split up the clock tree into smaller pieces if the
designs structure is suited. An example for such a structure is any address decoder with a global
write enable, and an address signal that selects the register to be actually written, like in a cache

SNUG Europe 2005 4 Advanced Clock Gating with Power Compiler

controller module. The first layer of clock gates selects the active address space, the second the
actually addressed registers. As shown later, this is not only more efficient in power reduction, it
also results in additional area savings.

2.2 Benefits

The main benefit of clock gating is dynamic power savings. Both, the net-switching and the cell
internal power are proportional to the switching frequency and the switched capacitive load. As
clock gating reduces the effectively switching load, tremendous power savings can be achieved.
Depending on the design, the savings can be 50 percent and more.

Clock gating does not only save power, it also reduces the die area, because the bank of
multiplexers in front of the register becomes redundant: “One multiplexer per bit” is replaced by
“one gate per bank”. This is the reason why clock gating is not efficient below a minimum number
of bits gated with the same enable signal. The recommended value is typically between three and
five. There is no optimum that is true for all designs, so the user might want to experiment with this
number.

The main downside of clock gating is the additional clock skew caused by the clock gate itself.
Theoretically, this could arise problems in very high speed designs. The second problem with clock
gating occurs where FPGA prototyping is desired. Today’s FPGA’s do not provide the necessary
logic to implement extensive use of clock gates. Other than in designs with handcrafted clock gates,
Power Compiler’s clock gating avoids this problem, because it can temporarily be disabled with a
simple switch when compiling for FPGA.

3.0 Gating Logic and Timing Requirements

Figure 2.1 and 2.2 only use a generic symbol to represent the clock gate. This chapter introduces
the logic behind this symbol, and illustrates special timing requirements. There are two different
approaches for the clock gating logic. One is with combinational logic only, the other uses latches.

Please note that depending on the phase of the clock, type of flip-flops and the guaranteed stable
period of the enable signal, all four basic types of gates can be used: AND, NAND, OR, NOR.
Power Compiler supports all these possible combinations.

3.1 Combinational Gating Logic

Latch-free clock gating uses a basic two-input gate to control the clock. One of the inputs is
connected to the clock source being gated, the other one to the controlling enable signal. The output
signal is the gated clock, provided to the flip-flop(s).

The enable signal’s value that produces an active clock is HIGH for AND and NAND gates, and
LOW for OR and NOR gates (“active enable value”). The logic value that the output clock is forced
to by an inactive enable signal is LOW for AND and NOR gates, HIGH for NAND and OR gates.
This inactive clock value is also referred to as the “clock gate hold mode”. These considerations are
summarized in Table 3.1.

SNUG Europe 2005 5 Advanced Clock Gating with Power Compiler

Gating element Active enable valueClock gate hold mode
AND High low
NAND High high
OR Low high
NOR low low

Table 3.1: Active enable value and clock gate hold mode

Because NAND and NOR gates invert the phase of the gated clock, it is advised to use an inverted
input clock for these elements, to provide the original phase of the clock to the register. This is
important for single edge triggered designs.

To ensure a clean, non-glitching output clock, the enable signal must be stable for the duration of
the clock period which drives the output clock active. Any transition or glitch of the enable signal
during that period results in an illegal transition on the output clock. Table 3.2 summarizes the
required duration of the gated clock for each of the possible gating elements that the enable signal
must be stable for.

Gating elementRequisite Stable Period

(of clock being gated)
AND High Phase
NAND High Phase
OR Low Phase
NOR Low Phase

Table 3.2: Required stable period of enable signal

Figure 3.1 illustrates the corrupted waveforms that result from an illegally timed enable signal at an
AND gate. Figure 3.2 shows the respective (clean) waveforms from the same enable signal
controlling a NAND gate. For the AND gate, the enable signal violates the requirements from Table
3.2, for the NAND gate the requirements are met (the clock actually being gated at the NAND gate
is the inverted signal clk, not clk). Thus, often the NAND is preferred over the AND gate for the
gating logic.

Figure 3.1: Illegal timing (AND gate)

SNUG Europe 2005 6 Advanced Clock Gating with Power Compiler

Figure 3.2: Correct timing (NAND gate)

The bent arrows from the rising (triggering) clock edge to the transition of the enable signal in
Figure 3.1 and Figure 3.2 indicate the time that is required for the enable signal to be evaluated and
to become stable. For the NAND gate, the enable signal needs to be stable before the falling edge of
clk. In other words, the evaluation of the enable signal may take up to half a clock cycle. For the
AND gate, the enable signal would have to be stable right before the next rising clock edge, but
then keep its value for more than half a clock cycle. In realistic scenarios, these requirements can
hardly be met. This is why latch-free clock gating with AND gates is usually avoided.

For the NAND type clock gate, the half cycle path requirement can be a limitation for the maximum
design speed or the maximum number of logical layers for the enable signal’s evaluation logic. The
half-cycle requirement can be avoided by the use of latches, as explained in the next section.

3.2 Latch Based Gating Logic

Latches in the clock gating logic can help to ensure a clean, glitch-free clock signal, even if the
enable signal is only stable at the clock’s triggering edge (rising for positive edge triggered flip-
flops). Thus, the half-cycle path requirement for the enable signal is no longer valid, and the enable
signal’s timing is relaxed.
Figure 3.3 shows the latch-based clock gating logic with an AND gate. Because the latch is
transparent only during the low phase of the clock clk (high phase of clk_inv), transitions of the
enable during the clk’s high phase (required stable period for AND gates according to Table 3.2)
are not propagated to the AND gate’s enable pin, and thus cannot result in glitches of clk_en.

SNUG Europe 2005 7 Advanced Clock Gating with Power Compiler

Figure 3.3: Latch based clock gating

4.0 Power Compiler Clock Gating

Power Compiler provides powerful algorithms to automatically implement clock gating in an
existing design. It supports all the described styles of clock gates including further aspects like
controllability and observability for design for test. This offers full control over all additionally
implemented logic.

4.1 Principle

When a digital design is read in by Power Compiler it is translated into the GTECH level, a generic
technical library description. At this level, Power Compiler analyses the design’s structure and
performs logically equivalent transformations to simplify, flatten or clock gate the design as
required. The clock gating opportunity is determined by the design’s logical structure. Power
Compiler looks for synchronous load-enable flip-flops (a flip-flop with its d_out feed back to its
input through a multiplexer, controlled with an enable signal). It analyses flip-flops with common
enable conditions and summarizes them with one or more clock gates, depending on the clock
gating settings. Clock gates are only inserted if the enable signal meets the required timing
conditions. Power Compiler also takes care about the required observation and control logic.

Until Version 2003.12, the clock gating opportunities are recognized from analyzing the HDL code,
rather than the GTECH description. That is why older versions of Power Compiler are likely to
miss some of the possible opportunities in rare cases.
4.2 Clock Gating Style Controls

All of Power Compiler’s clock gating options are controlled with one specific dc_shell command:
set_clock_gating_style. This command must be called before the clock gate insertion is invoked. It
manages the type of clock gating logic for both, positive and negative edge triggered flip-flops
separately and independently from the use of latched or latch-free clock gating. Different types of
control and observation points can be added and configured to increase the observability for design
for test. The clock gating can be fine-tuned with the options minimum_bitwidth and num_stages.
Table 4.1 explains the most common options.

SNUG Europe 2005 8 Advanced Clock Gating with Power Compiler

Option Possible values Effects
minimum_bitwidth minsize_value Minimum number of flip-flops with the same
enable condition to be gated.
num_stages 1,2 ... Number of clock gating stages
positive_edge_logic {and}, {nand},... Specifies the two-input clock gate for positive
edge triggered flip-flops.
negative_edge_logic {or}, {nor},... Specifies the two-input clock gate for negative
edge triggered flip-flops.
sequential_cell latch | none Specifies whether to use latches or not.
control_point none | before | after Location of control point.
control_signal scan_enable | test_mode Type of test_control signal.
observation_point true | false Whether to use observation points or not.
observation_logic_depth depth_value Depth of XOR-tree in observability circuit.

Table 4.1: Clock Gating Options

4.3 Design Flow

Automatic clock gating requires only minimum changes to the common digital design flow. Clock
gating is applied before the (first) compile step, once all the design source code is read in (e.g.
read_verilog) and the clock gating options have been set (set_clock_gating_style). There are several
options to observe the results. The report_power command gives a rough idea of the design’s power
consumption. However, the numbers might be of poor accuracy. For higher accuracy some kind of
switching information is required. This information can origin from either a gate-level or a RTL
simulation. In both cases the switching information is stored in a so-called SAIF file (Switching
Activity Interchange Format) that needs to be annotated to the design in the Synopsys environment
prior to the power analysis. With this information the accuracy is good enough for most common
cases. If the results don’t meet the expectations, the clock gating settings can be further adjusted
(e.g. number of stages, minimum bit width) to optimize the estimated power consumption.

Figure 4.1 shows a complete sample clock gating flow.

SNUG Europe 2005 9 Advanced Clock Gating with Power Compiler

Figure 4.1: Design Flow with Clock Gating

The performance of Power Compiler shows a strong dependence on the clock gating settings for
both runtime and power savings. Especially the values for the minimum bit width and the number
of stages have to be chosen wisely, as they have a strong influence on the results. Experience shows
that a minimum bit width of at least five, and a number of two stages is a good start for most
designs. There might be a need for a few iterations with these variables if the power or area results
are much worse than expected.

4.4 Limitations and Workarounds

The following chapter points out a few issues and limitations that have been found during the work
with Power Compiler version 2004.06-SP2. Some of those are already solved in later versions.
Synopsys is aware of all mentioned issues.

Clock Gate Replacement

Clock gating replacement is an essential feature, wherever the style of existing clock gating needs
to be converted into another. For instance, when adapted code from an external source needs to be
adjusted to existing in-house clack gating style. Clock gating replacement is performed with the
switch “insert_clock_gating -module_level”. Power Compiler follows strict rules: after analyzing
the existing clock gating, it determines whether a logic and clock equivalent replacement with the
new clock gating settings can be performed. Note that Power Compiler replaces the clock gate only
if the result is both, logical and clock equivalent.

Combinational Clock Gating Logic

Power Compiler refuses to apply the latch-free clock gating whenever the enable signal origins
from a design input port. The reason is that Power Compiler does not take user constraints into
account when determining enable signals. To guarantee glitch-free signals it always assumes the
worst case, and therefore refuses to add these kinds of clock gates.

SNUG Europe 2005 10 Advanced Clock Gating with Power Compiler

In versions before 2004.12 Power Compiler also treats signals that cross module boundaries as
direct inputs. Thus, these signals cannot be used as enable either. A workaround for this behavior is
to get rid of the design hierarchy. Once the design is flattened with an “ungroup -all” command, the
sub-module boundaries vanish and Power Compiler is able to determine the correct enable signals.
However, this problem does not apply in recent versions.

CTS Discrete Clock Gating Cells

The use of discrete clock gating logic (not integrated clock gating cells) with latch-based clock
gating causes problems with some layout tools. Astro/Apollo for instance synchronize the latch
with the flip-flops. This leads to hold violations. Furthermore the latch and the AND are not treated
as they belong together, so they could be placed far apart during layout, which introduces a high
clock skew.

Possible workarounds are the use of integrated clock gating cells or a replacement of the clock
gating cells before layout as described in [6]. Solvnet article 003097 describes some more
suggested solutions.

Overriding of clock gating settings

The command set_clock_gating_registers is used to explicitly include or exclude registers from

clock gating. This command should be used very carefully, as it overrides Power Compiler’s
selection of registers to be gated. This can result in unwanted logic overhead, like a clock gate
before a register that is enabled all the time.

5.0 Analysis and Recommendations

To analyze Power Compiler’s efficiency, the tool was run on three different designs. All designs
exist in a plain version without clock gating, and in a version that provides a manual
implementation of clock gating. Power Compiler’s clock gating was applied to the plain designs
with the gating style set to single and double layer clock gating.

5.1 Power and Area Savings

In the following section, the location and the distribution of automatically inserted clock gates is
compared against their manual counterparts. One important fact for the following considerations is
that all manually inserted clock gates use latch-free gating logic. Because of the limitation
concerning latch-free clock gating with signals from module boundaries, for the automatically
instantiated clock gates latch-based style had to be used. This has a slight influence in power and
area consumption of automatic clock gating results.

Placement and Quantity of Clock Gates

With a size of 976 gates design one is an example for a small, simply structured design. In this case
the placement of the automatically inserted clock gates coincides with the location of the manually
implemented ones. Manual clock gates are used in this module for those registers that provide an
explicit enable condition in the respective not-gated version. Thus, the correspondence of the two
clock gating approaches is no surprise.

SNUG Europe 2005 11 Advanced Clock Gating with Power Compiler

A case of special interest shows the second design (two), with a size of 16600 gates. This module
uses the two-layer clock gating approach in the manual version. Power Compiler is able to
implement a one- or two-layer approach. The manual version uses 326 clock gates, four of which in
the first, the rest in second layer. Like in the first design, all manually gated registers provide a
load-enable. Power Compiler implements 333 clock gates for the two-layer solution and 326 clock
gates for the one-layer solution. The additional gates are inserted at registers, where manual (latch-
free) clock gating cannot be applied due to the half-cycle path requirement for latch-free clock
gating and too extensive evaluation logic for the enable signal. Because of the latches, the late-
arriving (after the trailing edge) enable signals can still be used for clock gating in the automatic
version.

The differences in the clock gate placement at design three, with a size of 29500 gates, are similar
to the ones in design two. Power Compiler uses the advantages of the latch-based clock gating and
inserts slightly more clock gates than the manual approach.

Table 5.1 summarizes the numbers of instantiated clock gates for the tested designs for both the
manual and the automatic approach as discussed above.

Design Nr. Manual Clock Gating Automatic clock gating

One layer Two layers
1 9 9 10
2 326 326 333
3 145 148 150
Table 5.1: Number of Instantiated Clock Gates

Effects on Design Area

The insertion of clock gates affects the design’s area in several ways. The main factors are:

- Omitted multiplexers in redundant feedback loops (major effect).

- Changes in required driver strength due to omitted multiplexers (medium effect).
- Additional area consumed by clock gates, control points, observation points (minor effect).

The overall effects of clock gating on the design area under all tested conditions are positive
effects. This is illustrated in Figure 5.1, where the total cell area savings are displayed for all test
cases. The 100% values correspond to the area consumption without clock gating.

SNUG Europe 2005 12 Advanced Clock Gating with Power Compiler

Figure 5.1 Area comparison

The change in the design area caused by clock gating depends on various factors like the clock
gating style and the number of registers gated with one clock gate.

In the technology used, a typical clock gate’s area for driving 32 flip-flops is 203 (um)2 (latch-
based, control point before). The area of 32 multiplexers, each driving one single flip-flop is 864
(um)2. This means in best case an area reduction of more than 20 (um)2 per gated flip-flop. The
minimum bit width setting of the automatic clock gating has here a great effect. For instance, if
only three flip-flops are gated per clock gating cell 100 (um)2 there is an increase of 6 (mm)2 per
gated flip-flop. Therefore the bit width setting must be adapted to the internal structure of the
design to get an optimal value for the area consumption.

The above calculation is not 100% precise for several reasons. The clock gate’s area differs
depending on the clock gating logic style, and any change in the required driver strength of the
preceding logic. Additionally required control and observation points also increase the area for
clock gating. That’s why the effective savings might differ from the expectations calculated above.

For all designs tested, the savings of manual and automatic gating are approximately identical, as
expected. More than 98% of the manually gated flip-flops provide the load enable, and thus, can
also be gated automatically. The eye-catching huge savings at design two (Figure 5.1) are due to the
high number of flip-flops compared to the size of the design.

The two-layer clock gating of Power Compiler shows only positive results on the area of design
two, where the figure matches the one of the manual approach. At design three, almost no effect on
the area is visible, due to the large size of the design, the 5 additional clock gates implemented have
only a very small effect. The area of design one increased clearly due to the additional clock gate.
This is due to the fact that the synthesis uses bigger cells in the neighborhood of the clock gates and
the respective registers with bigger driver strengths, in order to compensate the additional delay that
was inserted by the clock gate.

The overall size of the three designs was reduced by more than 12%. These results show clearly the
positive effects of clock gating on the area of a design.

SNUG Europe 2005 13 Advanced Clock Gating with Power Compiler

Speed and Timing

The latched-based clock gating (automatic) has an advantage over the latch-free (manual) one in
regard of the design’s timing. This leads to two positive effects:

- Clock gating can also be used in timing critical paths

- The maximum design speed can be increased

The effect is visible at design two and three, where automatic clock gating is able to insert more
clock gates, on paths where the manual, latch-free counterpart cannot be used because of the late
arriving enable signals.

There are designs (not the tested ones) where the half-cycle path of the enable signal is the major
limitation for a design’s maximum speed. In these cases, the design’s speed can dramatically be
increased by avoiding the half-cycle requirements with the use of latches.

Theoretically, manual clock gating could make use of latches, too. But for testability and
controllability concerns for the DFT, most designers try to avoid latches in their designs. Thus,
latches are usually not used in manual clock gating logic. Nevertheless, if the use of latches is
controlled by an automatic tool, usually all possible negative side effects are avoided or
compensated for, as is done in the case of automatic clock gating.

The negative effects on the design’s timing due to the additional clock skew can easily be
compensated by buffer insertion during the CTS. Thus, it does not have any noticeable effects on
the timing.

Power Saving Aspects

The power savings that can be achieved with automated clock gating depend on a number of
different conditions. Besides the design structure and the number of flip-flops that can be gated, the
stimulation pattern and the activity distribution across the design have the biggest influence on the
resulting power consumption and the possible power savings.

Apart from design one, the static power consumption is less then 1% of the total power
consumption at the tested designs. At design one the actual savings in the static power are less than
0.5% of the total power consumption. Thus, this part of the power can be neglected due to the
accuracy of the analysis, and will be of no concern in the following analysis, only dynamic power
will be analyzed.

Stimulation Pattern for Power Analysis

The careful choice of the stimulation pattern used for the power analysis is considerably important
for the significance of the resulting power values. It is useful to distinguish between cases of both
low and high switching activity. For both cases, the efficiency of clock gating can differ in wide
ranges. Under real circumstances, a mixture of both cases will usually be seen.

In the low activity means that there should be as little activity on the design’s inputs as possible.
Additionally, any eventual low power mode can be used if provided by the specific design. Design
two and three support such low power modes.

SNUG Europe 2005 14 Advanced Clock Gating with Power Compiler

The high activity pattern should cause as much switching activity in the respective design as
possible (Clock gating without extra observation/control logic).

Table 5.2 presents the percentage of switching nodes in the tested designs for both, the high and the
low activity pattern used (Implementation without clock gating).

Design 1Design 2Desing3

Nodes switching in low activity [%] 0.7 0.01 0.33
Nodes switching in high activity 26.68
[%] 28.5 17.5
Table 5.2: Switching Activity Statistics

Low Activity Power Savings

In the case of low switching activity, the potential power savings from clock gating are generally
comparatively high. Because the large parts of the designs are inactive, the respective parts of the
clock tree can be turned off by the clock gate that would otherwise continuously consume
capacitive power without the clock gate. The potential power savings for low switching activity can
be 80% or more.

There are two major reasons for the differences between manual and automatic clock gating. The
less important one is the use of latches in the automatic version, and the power consumed by them.
This effect is primarily noticeable at design one in Figure 5.2, where the automatic version is
approximately 15% worse than the manual one.

Also the manual savings for design two and three appear better than the automatic ones. In the
design three, the divergence is caused by a wait signal that switches off large parts of the design.
Here it can help to use automatic clock gating with some additional manual clock gates that take
care of the opportunities not recognized by Power Compiler. With design two's power values, the
advantage of the manual two-layer clock gating can be seen. The two layer clock gating of Power
Compiler shows also much better values than the one layer version, but it is not as effective as the
manual version.

SNUG Europe 2005 15 Advanced Clock Gating with Power Compiler

Figure 5.2: Low Activity Power Analysis

High Activity Savings

The average power savings under high activity circumstances are generally smaller than under low
activity circumstances. This is because the registers can no longer be “turned off” for long periods,
as they are frequently being accessed. Nevertheless, the achieved power savings are still larger than
60% under all tested conditions.

With all designs, the differences between manual and automatic clock gating are still noticeable,
but significantly smaller than in the case of low activity. The main reasons for this is that the effects
of system level clock gating (wait/ low-power) do not have any influence if the design is active.
This can be seen in Figure 5.3.

Figure 5.3: High Activity Power Analysis

SNUG Europe 2005 16 Advanced Clock Gating with Power Compiler

5.2 Coding Style

Older versions of Power Compiler, that determine the clock gating opportunity from analyzing the
HDL code show stronger dependency on the coding style. For instance, these versions do not
recognize the load-enable functionality of a flip-flop in case the multiplexer is hidden in a “cloud of
logic”.

However, the only coding style requirement that could be found with the latest version of Power
Compiler is that the multiplexer needs to be directly connected to the flip-flops. Power Compiler
doesn’t recognize the opportunity if the multiplexer is hidden behind a piece of logic, e.g. an adder
with a fixed value. Even though this example could easily be equipped with clock gating, this
applies to a really small fraction of cases.

5.3 System Level Improvements

The low activity results in figure 5.2 indicate an advantage of the manual approach over the
automatic one. The in-depth analysis of the clock gating in design two and three shows that the
manual approach uses internal wait and low power signals, that are not apparent to the algorithms
of Power Compiler. Since all clock gating activities represent a trade-off effort versus power
savings the combination of the fast automatic Power Compiler clock gating with the most effective
hand-crafted system-level clock gating comes into the main focus.

This approach splits the clock gating of a design into two independent steps. Firstly, the designer
adds a small number of system level clock gates, with sophisticated enable conditions that are
unknown to Power Compiler. Secondly, Power Compiler is run on the system-level gated RTL to
implement clock gates at register level.

Due to the incremental strategy the optimum power results are achieved with a minimum of design
time. However the above proceeding suffers two minor drawbacks. Firstly, the additional level of
clock gates increases the clock skew to be balanced in CTS. Secondly, the flexibility of Power
Compiler with regards to removing the clock gating for FPGA use and changing the clock gating
style by simple settings is lost.

The advanced strategy presented above uses the best parts of handcrafted and automatic clock
gating, and fulfils thereby the main driving factors in modern designs, reduced time to market and
optimum power consumption.

6.0 Conclusion
Clock gating is a sophisticated design technique that has developed over years. Consequentially, it
is no surprise that Power Compiler is able to save up to 80 percent power dissipations in certain
cases with clock gating. But it is outstanding that Synopsys developed clock gating to a push-button
technology that can be applied within seconds to any kind of design. Even with no experience in
this area, enormous power savings can be achieved with no risk violating any design rule.

Power Compiler is a good aid to speed up the design process and to shorten the time to market,
even for designers that already have experience with clock gating. The results show that Power
Compiler’s automatic clock gating is absolutely comparable to a handcrafted implementation. The
only thing where a human designer could further enhance automatic clock gating is at system-level
clock gating.

SNUG Europe 2005 17 Advanced Clock Gating with Power Compiler

Due to several implications that arise with the use of automatic clock gating, the use of Power
Compiler in an existing environment with an individual design flow might imply unexpected
complications, e.g. with the CTS or layout. The only solution for this is an evaluation on individual
test cases and a close collaboration with Synopsys.

7.0 Acknowledgments
We would like to thank all friends and colleagues from National Semiconductor who contributed to
our work on this document. Furthermore, we would like to express special thanks to Andy Chaggar,
Dr. Th. Mahnke and Dr. W. Stechele for their support and friendly collaboration.

8.0 References
[1] Analysis of automated Power Saving Techniques using Power Compiler, Wolfgang
Embacher, TU Munich, 2004.
[2] Designing low-power circuits: practical recipes, L. Benini, G. De. Micheli, E. Macii, IEEE
Circuits and Systems Magazine, vol. 1, no. 1, 2001.
[3] Low Power ASIC Design Using Voltage Scaling at the Logic Level, Th. Mahnke, TU
Munich, 2003.
[4] Low Power Digital CMOS Design, A. Chandrakasan and R. Brodersen, Kluwer Academic
Publishers, 1995.
[5] How To Successfully Use Gated Clocking in ASIC Design, Darren Jones, SNUG 2002.
[6] Automatic Clock Gating for Power Reduction, Zia Kahn, Guarav Meth, SNUG 1999.
[7] “Power Compiler Reference Manual”, Synopsys

SNUG Europe 2005 18 Advanced Clock Gating with Power Compiler

Clock Gating
No ratings yet
Clock Gating
2 pages
Electrical & Electronic Engineering Questions For UPSC Interview
No ratings yet
Electrical & Electronic Engineering Questions For UPSC Interview
6 pages
Power Optimization Through Dual Supply Voltage Scaling Using Power Compiler PDF
No ratings yet
Power Optimization Through Dual Supply Voltage Scaling Using Power Compiler PDF
11 pages
JST KSK 7034
100% (1)
JST KSK 7034
11 pages
Power Optimized Programmable Embedded Controller
No ratings yet
Power Optimized Programmable Embedded Controller
11 pages
DIY Automatic Alcohol Dispenser No Arduino Needed
No ratings yet
DIY Automatic Alcohol Dispenser No Arduino Needed
16 pages
Power_optimization_in_configurable_ALU_using_blend_of_techniques (1)
No ratings yet
Power_optimization_in_configurable_ALU_using_blend_of_techniques (1)
5 pages
Power Optimisation for a 32-bit RISC Processor
No ratings yet
Power Optimisation for a 32-bit RISC Processor
7 pages
Waveguide
No ratings yet
Waveguide
51 pages
04 014 Hager MCBRCCB
No ratings yet
04 014 Hager MCBRCCB
4 pages
Assignment 1 (Section 12)
No ratings yet
Assignment 1 (Section 12)
1 page
Power Delivery - Topological Design Strategy PDF
No ratings yet
Power Delivery - Topological Design Strategy PDF
58 pages
Early estimation of power
No ratings yet
Early estimation of power
8 pages
31R-0000-080-2
No ratings yet
31R-0000-080-2
8 pages
Clock Gating
No ratings yet
Clock Gating
28 pages
Power Delivery - Topological Design Strategy PPT
No ratings yet
Power Delivery - Topological Design Strategy PPT
36 pages
Using Synthesis Techniques for Power Reduction
No ratings yet
Using Synthesis Techniques for Power Reduction
9 pages
Sequential_Modelling
No ratings yet
Sequential_Modelling
30 pages
Pre project presentation_khubasad
No ratings yet
Pre project presentation_khubasad
11 pages
Fight the Power PPT
No ratings yet
Fight the Power PPT
29 pages
Power Analysis using Astro Rail PDF
No ratings yet
Power Analysis using Astro Rail PDF
12 pages
Structural_Modelling
No ratings yet
Structural_Modelling
27 pages
Fight the Power PDF
No ratings yet
Fight the Power PDF
24 pages
estimedia[1]
No ratings yet
estimedia[1]
23 pages
Power Network Synthesis and Analysis with JupiterXT and primepower
No ratings yet
Power Network Synthesis and Analysis with JupiterXT and primepower
14 pages
AstroRail Tips ,Tricks and Gotchas PPT
No ratings yet
AstroRail Tips ,Tricks and Gotchas PPT
19 pages
Comparison of Power Estimation in Different Stages of an ASIC design stages PPT
No ratings yet
Comparison of Power Estimation in Different Stages of an ASIC design stages PPT
19 pages
Power Optimization Through Dual Supply Voltage Scaling using power compiler PPT
No ratings yet
Power Optimization Through Dual Supply Voltage Scaling using power compiler PPT
18 pages
early estimation of leakage power
No ratings yet
early estimation of leakage power
18 pages
Accurate Timing- and Power Characterization with nanosim PDF
No ratings yet
Accurate Timing- and Power Characterization with nanosim PDF
17 pages
Optimal Power Calculation For The Cryptography Aes Algorithm Using Clock Gating Technique
No ratings yet
Optimal Power Calculation For The Cryptography Aes Algorithm Using Clock Gating Technique
5 pages
paper1
No ratings yet
paper1
68 pages
Wheel Chair PPT - Main
No ratings yet
Wheel Chair PPT - Main
39 pages
Power Management Techniques for Soft IP PDF
No ratings yet
Power Management Techniques for Soft IP PDF
12 pages
Hall Effect, Schottky Diode, Ohmic Contact
No ratings yet
Hall Effect, Schottky Diode, Ohmic Contact
6 pages
dft3
No ratings yet
dft3
52 pages
MCG1600 Radar
No ratings yet
MCG1600 Radar
4 pages
CCNA 4 Module 4
No ratings yet
CCNA 4 Module 4
37 pages
Speed Controller
No ratings yet
Speed Controller
4 pages
Transtech Surge Handout Rev2
No ratings yet
Transtech Surge Handout Rev2
44 pages
Integrated Design Project Advisor AssignmentCommunicaiton Sheet1
No ratings yet
Integrated Design Project Advisor AssignmentCommunicaiton Sheet1
1 page
Controller T51
No ratings yet
Controller T51
4 pages
Gate Level Power Estimation by RTL Activity File PDF
No ratings yet
Gate Level Power Estimation by RTL Activity File PDF
13 pages
Overcoming Power Compiler limitations to optimize clock gating PDF
No ratings yet
Overcoming Power Compiler limitations to optimize clock gating PDF
19 pages
ch3
No ratings yet
ch3
10 pages
Power Analysis Methodology and Objectives for TI wireless platform PDF
No ratings yet
Power Analysis Methodology and Objectives for TI wireless platform PDF
19 pages
MUSE 2017 Catalogue REV5 Small
No ratings yet
MUSE 2017 Catalogue REV5 Small
35 pages
HLR Basics
75% (4)
HLR Basics
13 pages
A Practical Approach to the Full-chip Dynamic IR Drop PPT
No ratings yet
A Practical Approach to the Full-chip Dynamic IR Drop PPT
22 pages
ID09011 B 02806 D 090 F
No ratings yet
ID09011 B 02806 D 090 F
22 pages
(2023 Conference) Novel_Clock_Gating_Broadcasting_Applications_for_Low-Power_FPGA_Architectures
No ratings yet
(2023 Conference) Novel_Clock_Gating_Broadcasting_Applications_for_Low-Power_FPGA_Architectures
5 pages
FSMs
No ratings yet
FSMs
64 pages
Number System
No ratings yet
Number System
41 pages
Power Reduction in Datapath Designs
No ratings yet
Power Reduction in Datapath Designs
10 pages
Chapter 3
No ratings yet
Chapter 3
27 pages
Clock Gating Circuits PDF
100% (1)
Clock Gating Circuits PDF
4 pages
Clockgating Fpga 2
No ratings yet
Clockgating Fpga 2
5 pages
Behavioural_Modelling
No ratings yet
Behavioural_Modelling
40 pages
Power Analysis and Implementation of The 8 - Bit T
No ratings yet
Power Analysis and Implementation of The 8 - Bit T
6 pages
My Paper
No ratings yet
My Paper
11 pages
Clock Gating
No ratings yet
Clock Gating
10 pages
Analyzing HardFaults On Cortex-M CPU
No ratings yet
Analyzing HardFaults On Cortex-M CPU
12 pages
ECE 2104 Lab
No ratings yet
ECE 2104 Lab
50 pages
loc & los , clock gating
No ratings yet
loc & los , clock gating
4 pages
PVL 207 Lec 12 (Minimizing Switched Capacitances)
No ratings yet
PVL 207 Lec 12 (Minimizing Switched Capacitances)
19 pages
DataFlow_Modelling
No ratings yet
DataFlow_Modelling
60 pages
7.3. Clock Gating: Excerpt Reprinted by Permission From "FPGA-Based Prototyping Methodology Manual."
No ratings yet
7.3. Clock Gating: Excerpt Reprinted by Permission From "FPGA-Based Prototyping Methodology Manual."
5 pages
76 - A Low
No ratings yet
76 - A Low
9 pages
IJETR032485
No ratings yet
IJETR032485
3 pages
Chapter 2
No ratings yet
Chapter 2
27 pages
EC469 Opto Electronic Devices
No ratings yet
EC469 Opto Electronic Devices
2 pages
Comparative Analysis of Different Clock Gating Techniques
No ratings yet
Comparative Analysis of Different Clock Gating Techniques
55 pages
Teensy 3.6 Reference Manual PDF
100% (1)
Teensy 3.6 Reference Manual PDF
2,237 pages
Power isolation and challenges PPT
No ratings yet
Power isolation and challenges PPT
9 pages
Dynamic Power Reduction in Sequential Circuits Using Look Ahead Clock Gating Technique
No ratings yet
Dynamic Power Reduction in Sequential Circuits Using Look Ahead Clock Gating Technique
7 pages
(2014 Transanction) Design_Flow_for_Flip-Flop_Grouping_in_Data-Driven_Clock_Gating
No ratings yet
(2014 Transanction) Design_Flow_for_Flip-Flop_Grouping_in_Data-Driven_Clock_Gating
8 pages
(2011 Conference) On_applying_erroneous_clock_gating_conditions_to_further_cut_down_power
No ratings yet
(2011 Conference) On_applying_erroneous_clock_gating_conditions_to_further_cut_down_power
6 pages
Clock Gating
100% (1)
Clock Gating
4 pages
Low Power Register Design With Integration Clock Gating and Power Gating
No ratings yet
Low Power Register Design With Integration Clock Gating and Power Gating
6 pages
GENUS Clock Gating Timing Check
100% (1)
GENUS Clock Gating Timing Check
17 pages
A Survey On Sequential Elements For Low Power Clocking System
No ratings yet
A Survey On Sequential Elements For Low Power Clocking System
10 pages
Nokia Products
No ratings yet
Nokia Products
15 pages
Data Driven Clock Gating: Bar Ilan University School of Engineering Vlsi Lab
No ratings yet
Data Driven Clock Gating: Bar Ilan University School of Engineering Vlsi Lab
34 pages
Clock Gating
No ratings yet
Clock Gating
2 pages
80-Character Liquid Crystal Display Fire Annunciator
No ratings yet
80-Character Liquid Crystal Display Fire Annunciator
2 pages
Focus PX en LTR 201605
No ratings yet
Focus PX en LTR 201605
8 pages
Performance Comparison of Various Clock Gating Techniques: S.V.Lakshmi, P.S.Vishnu Priya, Mrs.S.Prema
No ratings yet
Performance Comparison of Various Clock Gating Techniques: S.V.Lakshmi, P.S.Vishnu Priya, Mrs.S.Prema
6 pages
Suprema Manual GB Rev 4.00
No ratings yet
Suprema Manual GB Rev 4.00
386 pages
Mitch Dale, Calypto Design Systems: Share On Facebook Share On Twitter
No ratings yet
Mitch Dale, Calypto Design Systems: Share On Facebook Share On Twitter
5 pages
Power Aware CLK Tree Planning
No ratings yet
Power Aware CLK Tree Planning
8 pages
Clock Gating
No ratings yet
Clock Gating
11 pages
Data of ISCAS Benchmark Circuits (RTL CAD Tool Design)
No ratings yet
Data of ISCAS Benchmark Circuits (RTL CAD Tool Design)
4 pages
Placement Aware Clock Gate Cloning and Redistribution Methodology PDF
100% (1)
Placement Aware Clock Gate Cloning and Redistribution Methodology PDF
4 pages
Automatic Clock Gating For Power Reductionl
No ratings yet
Automatic Clock Gating For Power Reductionl
11 pages
Clock Enable Timing Closure Methodology: Harish Dangat Samsung Semiconductor
No ratings yet
Clock Enable Timing Closure Methodology: Harish Dangat Samsung Semiconductor
41 pages
Clock Gate
No ratings yet
Clock Gate
19 pages
Appl., Vol. 63, Pp. 199-223, 1978. Monthly, Vol. 82, Pp. 481-485, 1975
No ratings yet
Appl., Vol. 63, Pp. 199-223, 1978. Monthly, Vol. 82, Pp. 481-485, 1975
6 pages
Clock Gating: K.Harshavardhan 19021D6802 M-Tech (Vlsi & Es)
No ratings yet
Clock Gating: K.Harshavardhan 19021D6802 M-Tech (Vlsi & Es)
14 pages
Dees ST-Microelectronics Stradale Primosole, Viale Andrea Dona Universita' Di Catania 1-95 121 CATANIA Italy 1-95 125 CATANIA Italy
No ratings yet
Dees ST-Microelectronics Stradale Primosole, Viale Andrea Dona Universita' Di Catania 1-95 121 CATANIA Italy 1-95 125 CATANIA Italy
4 pages
Flip-Flop Grouping in Data-Driven Clock Gating: Varghese James A, Divya S, Seena George
No ratings yet
Flip-Flop Grouping in Data-Driven Clock Gating: Varghese James A, Divya S, Seena George
9 pages
Power_Reduction_by_Clock_Gating_Technique
No ratings yet
Power_Reduction_by_Clock_Gating_Technique
5 pages
Samba-Edit Doc - In-1
No ratings yet
Samba-Edit Doc - In-1
60 pages
Chapter2 Clocks Resets-02
No ratings yet
Chapter2 Clocks Resets-02
11 pages
Clock Issues in Deep Submircron Design
No ratings yet
Clock Issues in Deep Submircron Design
50 pages
Different Low Power Techniques: Trade-Offs Associated With The Various Power Management Techniques
No ratings yet
Different Low Power Techniques: Trade-Offs Associated With The Various Power Management Techniques
2 pages
LG Mc049c Chassis Rz21fd70rx
No ratings yet
LG Mc049c Chassis Rz21fd70rx
30 pages
Clock Distribution Using VHDL
No ratings yet
Clock Distribution Using VHDL
66 pages
Clock Gating: Smart Use Ensures Smart Returns
No ratings yet
Clock Gating: Smart Use Ensures Smart Returns
4 pages
Clock Gating
No ratings yet
Clock Gating
7 pages
ASIC-System On Chip-VLSI Design - Clock Gating
No ratings yet
ASIC-System On Chip-VLSI Design - Clock Gating
4 pages
Clock Issues in Deep Submircron Design
100% (1)
Clock Issues in Deep Submircron Design
50 pages
Cargador de Bateria
No ratings yet
Cargador de Bateria
2 pages
Mil STD 100g
No ratings yet
Mil STD 100g
84 pages
2CSG299893R4052 m2m Modbus Network Analyser
No ratings yet
2CSG299893R4052 m2m Modbus Network Analyser
3 pages
Clock Gating
No ratings yet
Clock Gating
12 pages
Solar on my Roof
From Everand
Solar on my Roof
Albert J. Sjoberg
No ratings yet
PLC Programming Using RSLogix 500 & Industrial Applications: Learn ladder logic step by step with real-world applications
From Everand
PLC Programming Using RSLogix 500 & Industrial Applications: Learn ladder logic step by step with real-world applications
Charles Johnson
No ratings yet
Computerised Systems Architecture: An embedded systems approach
From Everand
Computerised Systems Architecture: An embedded systems approach
S Mathioudakis
No ratings yet

Advanced Clock Gating with Power Compiler

Uploaded by

Advanced Clock Gating with Power Compiler

Uploaded by

Advanced Clock Gating with Power Compiler

National Semiconductor GmbH

Figure 2.1: Clock Gating from a System View.................................................................... 4

SNUG Europe 2005 2 Advanced Clock Gating with Power Compiler

2.0 Basic Clock Gating Strategies

2.1 Abstraction Levels of Clock Gating

SNUG Europe 2005 3 Advanced Clock Gating with Power Compiler

Figure 2.2: Clock Gating Principle.

SNUG Europe 2005 4 Advanced Clock Gating with Power Compiler

3.0 Gating Logic and Timing Requirements

3.1 Combinational Gating Logic

SNUG Europe 2005 5 Advanced Clock Gating with Power Compiler

Gating elementRequisite Stable Period

Table 3.2: Required stable period of enable signal

Figure 3.1: Illegal timing (AND gate)

SNUG Europe 2005 6 Advanced Clock Gating with Power Compiler

3.2 Latch Based Gating Logic

SNUG Europe 2005 7 Advanced Clock Gating with Power Compiler

4.0 Power Compiler Clock Gating

SNUG Europe 2005 8 Advanced Clock Gating with Power Compiler

Table 4.1: Clock Gating Options

4.3 Design Flow

Figure 4.1 shows a complete sample clock gating flow.

SNUG Europe 2005 9 Advanced Clock Gating with Power Compiler

4.4 Limitations and Workarounds

Clock Gate Replacement

Combinational Clock Gating Logic

SNUG Europe 2005 10 Advanced Clock Gating with Power Compiler

CTS Discrete Clock Gating Cells

Overriding of clock gating settings

The command set_clock_gating_registers is used to explicitly include or exclude registers from

5.0 Analysis and Recommendations

5.1 Power and Area Savings

Placement and Quantity of Clock Gates

SNUG Europe 2005 11 Advanced Clock Gating with Power Compiler

Design Nr. Manual Clock Gating Automatic clock gating

Effects on Design Area

- Omitted multiplexers in redundant feedback loops (major effect).

SNUG Europe 2005 12 Advanced Clock Gating with Power Compiler

SNUG Europe 2005 13 Advanced Clock Gating with Power Compiler

- Clock gating can also be used in timing critical paths

Power Saving Aspects

Stimulation Pattern for Power Analysis

SNUG Europe 2005 14 Advanced Clock Gating with Power Compiler

Design 1Design 2Desing3

Low Activity Power Savings

SNUG Europe 2005 15 Advanced Clock Gating with Power Compiler

High Activity Savings

Figure 5.3: High Activity Power Analysis

SNUG Europe 2005 16 Advanced Clock Gating with Power Compiler

5.3 System Level Improvements

SNUG Europe 2005 17 Advanced Clock Gating with Power Compiler

SNUG Europe 2005 18 Advanced Clock Gating with Power Compiler

You might also like