Partially-Adaibatic Sequential Circuit,Low Power Clock Distribution-1
Partially-Adaibatic Sequential Circuit,Low Power Clock Distribution-1
• Introduction
• Partially adiabatic sequential circuit
• Flip flop style adiabatic charging logic gate
• Power dissipation in clock distribution
• Synchronization and Power Dissipation in Clock Distribution
• Challenges and considerations in low-power clock distribution
• Single driver vs distributed driver
• Cnclusion
2
INTRODUCTION
•Power consumption in digital circuits is a critical concern, leading to research in low-power design
techniques.
•Adiabatic computing minimizes energy dissipation by reusing charge instead of dissipating it as heat.
•Partially-adiabatic sequential circuits offer a balance between traditional CMOS and fully adiabatic
designs, improving power efficiency.
•Low-power clock distribution techniques play a vital role in reducing overall power consumption in
large-scale circuits.
3
PARTIALLY ADIABATIC SEQUENTIAL CIRCUIT
4
FLIP FLOP STYLE ADIABATIC CHARGING
LOGIC GATE This example shows a gate implementing an
AND/NAND function. Both inputs and outputs are
dual-rail-encoded. Load capacitances are not
shown. is connected to the pulsed-power
supply.
•Structure of the Flip-Flop Gate
•Comprises two inverters arranged in a cross-coupled flip-
flop configuration.
•Includes two pull-down networks implementing
complementary logic functions.
Figure 1 •Uses dual-rail encoding for both inputs and outputs.
5
•Initial State (Before Activation)
•Load capacitances (representing inputs of other gates and wiring) and the supply/clock line are initially at 0V.
•Inputs are assigned valid logic values, connecting one of the outputs to ground due to complementary logic
behavior.
•Charging Process
•Supply voltage ramps from 0V to Vdd.
•Initially, no significant current flows because PMOS transistors are off.
•When the supply voltage reaches Vth (PMOS threshold voltage), the transistor whose gate is grounded by a
pull-down network turns on, causing the corresponding load capacitance to start charging.
6
•Dissipation Characteristics
•The initial charging incurs a non-adiabatic dissipation of approximately (1/2) C L Vth².
•The remainder of the charging up to Vdd is adiabatic, meaning the energy dissipation reduces with increased
charging time.
•Completion of Charging and Output Stability
•Once charging is complete, the output values are valid, and high inputs transition to low.
•Pull-down networks are disconnected to prevent unwanted discharge.
•The cross-coupled inverters provide feedback to maintain the output state.
•Effect on Subsequent Gates
•The outputs of this gate serve as inputs to other gates, which can be energized while the current gate’s inputs
are ramped down.
7
• De-energizing the Gate:
•After output values are sampled, the gate can
be de-energized by reducing the supply
voltage.
•The charge and energy from the load
capacitance return to the power supply via the
PMOS device.
•The PMOS device turns off when the supply
Figure 2:Charging an output capacitance of a
voltage reaches Vth, stopping further energy
buffer, implemented in the style of Figure 1
recovery.
8
• Charge Leakage & Recharging:
•Any remaining charge on the load capacitance leaks away slowly through the turned-off devices.
•If the supply voltage is ramped up before much charge leaks, charging resumes close to Vth with minimal
dissipation.
•Low leakage ensures near-zero non-adiabatic dissipation in ideal conditions.
•If the other pull-down network is used, both load capacitances switch values non-adiabatically, causing
9
•Efficiency Analysis of Buffer Charging:
•Charging process:
10
From the equation the resistance of the charging device can be
modelled as
….(1)
•The channel resistance depends on the channel voltage in a highly non-linear manner.
•As Vch approaches Vth, the resistance increases indefinitely.
So we consider as specific case ie.,
•Dissipation is analyzed for a sinusoidal current case.
•This current would typically be produced by an inductive power supply with very low output imped
11
Finding Current:
….(5)
12
13
….(2)
14
Finding channel voltage:
….(6)
15
….(3)
16
Finding resistance:
….(7)
….(1)
17
….(8)
18
Charging from V th to Vdd will then require a current given by:
….
(2)
. The voltage drop across the device is small when the charging time is long, so the channel voltage is
approximately the same as the voltage on the load capacitance:
….(3)
….
(4)
19
Where . The instantaneous power dissipation during the charging is given by:
….(9)
The total adiabatic energy dissipation is:
….(10)
• The adiabatic part of the dissipation for the flip-flop driver grows linearly with the voltage
swing above the threshold voltage.
• There is no "optimum" voltage , as for the charging through a T-gate ,a lower swing
yields a lower dissipation.
20
• According to Equation 202, the dissipation is proportional to .This is since the flip-flop driver, as described
here, charges and discharges its load capacitances through PMOS devices.
• Since the carrier mobility is larger for NMOS devices (and therefore )the dissipation may be
decreased by "turning the circuit upside down" so that the load capacitances are charged through NMOS
devices instead.
Assumptions:
• The load capacitance is large compared to the gate capacitance of the driver.
• The voltage drop across the driver device is small
• The body effect is neglected.
• Current will stay sinusoidal despite the severe resistance variations
21
POWER DISSIPATION IN CLOCK
DISTRIBUTION
•Power dissipation is a major limitation in integrating more transistors on a single chip.
•Performance-driven design is shifting toward power-driven design due to advances in portable wireless systems.
•Physical Design Considerations:
•Traditionally, physical design optimizes performance and cost.
•Increasing demand for low-power designs has led to a focus on power-efficient physical design.
•Low-power considerations are now integrated into the floorplan stage of chip layout.
•Placement and routing techniques are studied for minimizing power consumption.
•Iterative methods for transistor sizing and wire length minimization are applied in semi-custom designs.
22
SYNCHRONIZATION AND POWER
DISSIPATION IN CLOCK DISTRIBUTION
Synchronization in Digital Systems:
• A digital system requires reference signals for proper sequencing of operations.
•Fully synchronous operation with a common clock is the dominant design approach.
•A clock tree is used to distribute the clock signal globally to all system modules.
•Clock transitions provide reference timing for:
•Latching data
•Triggering operations
•Transmitting outputs
•The clock is the most important signal affecting both system performance and total power consumption.
23
Power Dissipation in CMOS Circuits:
Power dissipation mainly consists of two parts:
1.Dynamic Power:
•Caused by charging and discharging capacitive loads in every clock cycle.
2.Short-Circuit Power:
•Caused by short-circuit current flowing through PMOS and NMOS transistors during switching.
•Carrying large loads and switching at high frequencies makes the clock a significant contributor to dynamic
power dissipation.
•The dynamic power dissipated by switching the clock can be given by:
….(11)
24
Total load on the clock
Given the total number of clock terminals N, the nominal input capacitance at each terminal, , the unit length wire
capacitance, , and the chip dimension, D, assuming an H-tree based global clock routing of h levels , can be given
by:
….(12)
where the second and third terms are the global and local wiring capacitance respectively, α is an estimation factor
depending on the algorithm used for local clock routing.
Eq. (5.1) and Eq. (5.2) suggest that the dynamic power dissipated by clock increases as the number of clocked devices
and the chip dimensions increase.
25
•The global clock may account for up to 40% of the total system power dissipation.
•For low-power clock distribution, measures should be taken to:
•Reduce the clock terminal load.
•Minimize the routing capacitance.
•Decrease the driver capacitance.
•Clock skew refers to the variation in delays from the clock source to the clock terminals.
•To achieve optimal performance, clock skew must be controlled within small or tolerable values.
•Clock phase delay (the longest delay from source to sinks) also needs to be controlled to maximize system
throughput.
26
CHALLENGES AND CONSIDERATIONS IN
LOW-POWER CLOCK DISTRIBUTION
•Clock capacitive load is mainly due to clock terminal (sink) capacitance.
•Device sizes are shrinking in deep-submicron technology, reducing clock terminal capacitance.
•Increasing chip dimensions make interconnect capacitance more significant.
•High-frequency requirements drive clock tree construction methods like adjusting wire lengths/widths to reduce
clock skew.
•These methods increase interconnect capacitance, making it a dominant part of total clock load.
•Low-power systems with reduced supply voltage require larger device sizes to maintain speed.
•Larger clock drivers are needed for fast clock transitions, increasing both dynamic and short-circuit power
dissipation.
27
SINGLE DRIVER VS DISTRIBUTED DRIVER
Clock Driving Schemes
•Buffers are required to drive large load capacitance for fast clock transitions.
•Two common clock driving schemes:
•Single Driver Scheme:
•Uses a chain of cascaded buffers.
•A very large buffer at the end is placed at the clock source. No buffers are used elsewhere.
•Avoids the adjustment of intermediate buffer delays.
•Widening branches near the clock source reduces skew from asymmetric clock tree loads and wire width
deviations.
•Wire sizing helps reduce clock phase delay.
28
•Distributed Buffers Scheme:
•Intermediate buffers are inserted at various points in the clock tree.
•At each insertion location , one or more buffers may be cascaded.
•Suitable for large clock trees with long path lengths.
•Intermediate buffers (repeaters) help reduce clock phase delay.
•Uses relatively small buffers that can be flexibly placed across the chip, saving layout area.
29
Figure 3 :Two clock tree driving schemes: (a) Single driver scheme where drivers are at clock source and no
buffers elsewhere; (b) Distributed buffers scheme where intermediate buffers are distributed in the clock tree.
30
EFFECTS OF WIRE WIDENING AND
INTERMEDIATE BUFFER INSERTION ON
DELAY REDUCTION
Figure 4:(a) A long clock line can be considered as a distributed RC line. Two ways to reduce delay of the long
clock line: (b) by wire widening; (c) by intermediate buffers insertion.
31
•A long clock path can be modeled as a distributed RC delay line.
•Widening the clock line reduces line resistance but increases capacitance.
•Adjusting buffer sizes at the source helps reduce line delay.
•Inserting intermediate buffers along the line partitions it into shorter segments, each with smaller line
resistance.
•Using intermediate buffers makes delay more linear with line length.
•Widening wires requires larger buffers at the source, leading to higher short-circuit power dissipation.
•Small intermediate buffers driving short wire segments impose minimal power dissipation penalties.
32
POWER OPTIMIZATION AND SKEW
REDUCTION IN CLOCK TREES
• The distributed buffers scheme is preferred over the single driver scheme
for power minimization in a clock tree.
Figure 5:(a) An equal path-length clock tree; (b) The delay model.
33
• Here in fig 5 we take an example of equal path length tree and its delay model.
….(13)
where r is the sheet resistance of the wire. The skew variation in terms of wire width variations can be stated as:
34
• Assuming the maximum width variations Δw=±0.15w, the worst-case additional skew is:
….(14)
•Eq. (13) and Eq. (14) indicate that without wire width variations, skew is a linear function of path length.
•With wire width variations, the additional skew depends on the product of path length and total load
capacitance.
•Increasing wire widths reduces skew but increases capacitance and power dissipation.
•Reducing both path length and load capacitance minimizes skew while allowing the use of minimum wire
width to keep wiring capacitance low.
•Inserting buffers to partition a large clock tree into sub-trees with short path-lengths and small loads reduces
skew caused by asymmetric loads and wire width variations.
35
BUFFER INSERTION IN CLOCK TREE
•Different buffer delays cause phase delay variations on different source-to-sink paths.
•The given tolerable skew of a buffered clock tree is divided into two components:
•The tolerable skew for buffer delays
•The skew allowed for asymmetric loads and wire width deviations after buffer insertion
….(15)
36
• Objective of Buffer Insertion Scheme is to
Balances buffer delays on source-to-sink paths.
Works independently of the clock tree topology.
•Clock tree is divided into levels using cut-lines.
•Buffer Insertion Points (BIPs) are determined at these cut-lines.
• Properties of the Resulting Clock Tree
•Each source-to-sink path has the same number of buffer levels.
•Sub-trees at the same level have equal path-lengths.
37
Iso-Radius Levels
•Cut-lines are chosen to form iso-radius levels. An iso-radius level is a circular boundary centered at the clock source. Ensures
that all paths from the source to BIPs within the same level have the same length.
Calculation of Cut-Line Radius
•For a clock tree with path length L, the radius θ of the first cut-line (nearest to the clock source) is:
….
•Here, ϕ is the number of buffer levels. (16)
•The radius for subsequent iso-radius levels:
•First level: θ
•Second level: 2θ
•ϕ-th level: ϕθ
• Determining Minimum Buffer Levels (ϕ*)
• Iteratively evaluate the worst-case skew of the clock tree.
• Start with ϕ=1,2,...and find the smallest ϕ∗ that keeps the worst-case skew below .
38
Figure 6:Buffer levels are increased under the tolerable skew bound is satisfied
39
Figure 7 :An example of buffer insertion in an equal Figure 8: Comparison of buffer insertion in a
path-length clock tree: (a) using the balanced buffer general equal path-length clock tree using: (a) the
insertion method; (b) using the level-by-level balanced buffer insertion method; (b) the level-by-
method. level method.
40
• An example of the buffer insertion scheme is shown in Figure 7(a).
• Previous methods insert buffers level-by-level at the branch split points of the clock tree as shown in Figure
7(b).
• This works well in a full binary tree where all sinks have the same number of levels.
• In the case of a general equal path-length tree, such as the case in Figure 8, different numbers of buffers are
inserted on different source to sink paths.
• Depending on the clock tree topology, some large sub-trees may still require wire widening to reduce skew.
41
CONCLUSION
•The combination of partially-adiabatic logic and optimized clock distribution significantly reduces power
dissipation.
•Adiabatic circuits introduce design complexities, requiring careful trade-offs in performance, area, and
efficiency.
•Ongoing research in hybrid low-power techniques can further improve energy efficiency and practicality.
•Implementing these approaches can lead to substantial energy savings in modern digital systems.
42
THANK YOU…..