0% found this document useful (0 votes)
27 views

Lecture-2 Low Power VLSI Design: Instructor: Rajesh Bathija, Hod-Ece, Mewar University, Chittorgarh

Uploaded by

ansuharsh
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Lecture-2 Low Power VLSI Design: Instructor: Rajesh Bathija, Hod-Ece, Mewar University, Chittorgarh

Uploaded by

ansuharsh
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 63

Lecture-2

Low Power VLSI Design


Instructor: Rajesh Bathija,
HOD-ECE,
Mewar University,
Chittorgarh
Motivation for Low Power Design
Low power design is important from three
different reasons
 Device temperature
 Failure rate, Cooling and packaging costs
 Life of the battery
 Meantime between charging, System cost
 Environment
 Overall energy consumption

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 2


Low Power or Low Energy Design
 Power
 Direct impact on instantaneous energy
consumption and temperature
 Energy
 Power integrated over time is energy and impact
on battery self life and environment

T
E(T) = ∫ P(t) dt
0

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 3


Power Consumption
 Dynamic
 Transition
 Short circuit
 Leakage
 Sub-threshold leakage
 Diode/Drain leakage
 Gate leakage
At 250nm leakage power was only 5% but it is increasing rapidly as
geometries decrease

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 4


Dynamic Energy
Consumption
Vdd
Transition Power

Vin Vout

CL

Energy/transition = CL * VDD2 * P01

Power = CL * VDD2 * f* P01

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 5


Modification for Circuits with Reduced
Swing Vdd
Vdd

Vdd -Vt

CL

E0 = CL  Vdd   V dd – Vt 
1

Can exploit reduced swing to lower power


(e.g., reduced bit-line swing in memory)

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 6


Dynamic Energy
Consumption
Vdd
Short-circuit Power

Vin Vout

CL

Energy/transition = tsc * VDD * Ipeak * P 0/11/0

Power = tsc * VDD * Ipeak * f

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 7


Impact of Logic Function
Example: Static 2-input NOR gate
Assume signal probabilities
A B Out pA=1 = 1/2
0 0 1 pB=1 = 1/2
0 1 0
Then transition probability
1 0 0
p01 = pOut=0 x pOut=1
1 1 0
= 3/4 x 1/4 = 3/16
If inputs switch every cycle
NOR = 3/16

NAND gate yields similar result


12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 8
Impact of Logic Function
Example: Static 2-input XOR Gate

Assume signal probabilities


A B Out pA=1 = 1/2
0 0 0 pB=1 = 1/2
0 1 1
Then transition probability
1 0 1 p01 = pOut=0 x pOut=1
1 1 0
= 1/2 x 1/2 = 1/4
If inputs switch in every cycle
P01 = 1/4

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 9


Transition Probabilities for Basic
Gates
As a function of the input probabilities

p01
AND (1 - pApB)pApB
OR (1 - pA)(1 - pB)(1 - (1 - pA)(1 - pB))
XOR (1 - (pA +pB – 2pApB))(pA + pB – 2pApB)

Activity for static CMOS gates


 = p0p1
12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 10
Leakage Energy

Vout
Drain junction
OFF leakage

Sub-threshold
Gate leakage current

Independent of switching
12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 11
Dynamic vs Static Power
1E+4

1E+2
Power Density (W/cm^2)

Active Power
1E+0
Shrinking Margin
1E-2

1E-4
SubThreshold
1E-6 Power

1E-8
0.01 0.1 1 10
Gate Length (microns)
Source: Leon Stok, DAC 42©

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 12


Low Power Design Needs
 Low power design techniques
 Effectiveness
 Effect/tradeoff with other design parameters like
area (cost), performance, reliability,
manufacturability etc.
 Power modeling and estimation
 Accuracy of the models
 Time for estimation

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 13


Design Approaches
 System design: Top down
 Effective low power transformations in synthesis
 Fast estimation techniques for an effective
exploration of a large design space
 Cell library design: Bottom up
 Low power circuit design techniques
 Accurate estimation
 Effective models for synthesis tools

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 14


Design Levels
 System
 Algorithmic/Module
 RTL
 Gate
 Circuit
 Device technology

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 15


System Level Design
Same MP3 Application
running on different systems
consume significantly
different amounts of power

• System partitioning
• Busses/Memory/IO devices /interfaces
• Choice of components
• Coding
• System states (sleep/snooze etc)
• 12/08/21
DVS/DFS/.. Rajesh Bathija, Mewar University, Chittorgarh 16
Algorithmic/sub-system Level
 Choice of algorithm (operation count etc.)
 Word length choices
 Module interfaces
 Implementation technology
 SW: Processor selection
 HW: ASIC/FPGA/..
 Behavioral synthesis constraints and trade-off

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 17


RTL
 Pipelining/retiming
 Module selection
 Multiple frequency and voltage islands
 Reduction in switching activity through
transformations

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 18


Gate Level
 Clock gating
 Power gating
 Clock tree optimization
 Logic level transformations to reduce
switching activity

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 19


Device Technology
 Multi-oxide devices
 Multiple “cell types” on a single substrate
 Logic, SRAM, Flash etc.
 Support for many other low power design
techniques (multiple thresholds, multiple
voltages, multiple frequencies etc.)

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 20


Impact of
Technology Scaling

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 21


Goals of Technology Scaling
 Make things cheaper:
 Want to sell more functions (transistors) per chip
for the same money
 Build same products cheaper, sell the same part
for less money
 Price of a transistor has to be reduced
 But also want to be faster, smaller, lower
power

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 22


Technology Scaling
 Goals of scaling the dimensions by 30%:
 Reduce gate delay by 30% (increase operating frequency
by 43%)
 Double transistor density
 Reduce energy per transition by 65% (50% power savings
@ 43% increase in frequency
 Die size used to increase by 14% per generation
 Technology generation spans 2-3 years

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 23


Technology Evolution (2000 data)
International Technology Roadmap for Semiconductors

Year of Introduction 1999 2000 2001 2004 2008 2011 2014

Technology node
180 130 90 60 40 30
[nm]
Supply [V] 1.5-1.8 1.5-1.8 1.2-1.5 0.9-1.2 0.6-0.9 0.5-0.6 0.3-0.6
Wiring levels 6-7 6-7 7 8 9 9-10 10
Max frequency 14.9
1.2 1.6-1.4 2.1-1.6 3.5-2 7.1-2.5 11-3
[GHz],Local-Global -3.6
Max P power [W] 90 106 130 160 171 177 186
Bat. power [W] 1.4 1.7 2.0 2.4 2.1 2.3 2.5

Node years: 2007/65nm, 2010/45nm, 2013/33nm, 2016/23nm


12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 24
Technology Evolution (1999)

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 25


Technology Scaling (1)
2
10

Minimum Feature Size (micron)

1
10

0
10

-1
10

-2
10
1960 1970 1980 1990 2000 2010
Year

Minimum Feature Size


12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 26
Technology Scaling (2)

Number of components per chip

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 27


Technology Scaling (3)

tp decreases by 13%/year
50% every 5 years!

Propagation Delay
12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 28
Technology Scaling Models

• Full Scaling (Constant Electrical Field)


ideal model — dimensions and voltage scale
together by the same factor S

• Fixed Voltage Scaling


most common model until recently —
only dimensions scale, voltages remain constant

• General Scaling
most realistic for todays situation —
voltages and dimensions scale with different factors

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 29


Scaling Relationships for Long Channel Devices

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 30


Transistor Scaling
(velocity-saturated devices)

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 31


Processor Scaling

P.Gelsinger: Processors for the New Millenium, ISSCC 2001

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 32


Processor Power

P.Gelsinger: Processors for the New Millenium, ISSCC 2001

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 33


Processor Performance

P.Gelsinger: Processors for the New Millenium, ISSCC 2001

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 34


2010 Outlook
 Performance 2X/16 months
 1 TIP (terra instructions/s)
 30 GHz clock
 Size
 No of transistors: 2 Billion
 Die: 40*40 mm
 Power
 10kW!!
 Leakage: 1/3 active Power

P.Gelsinger: Processors for the New Millenium, ISSCC 2001

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 35


Low Power RTL Synthesis
Techniques
 Module selection
 Retiming
 Pipelining
 Parallelism
 Bus data encoding
 FSM encoding
 Transformations for Switching activity
reduction

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 36


Module Selection
 Modules are used for implementing functional
units, small memory modules etc.
 Significant difference in power consumption
of different implementations
 Word-length as well as number coding
techniques employed can play a significant
role

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 37


Ripple Carry Adder

Carry signal switching propagates through all the stages


and consumes Power
ACTEL: MAPLD2004

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 38


Carry Look Ahead Adder

Carry signal switching propagates through much less number


of stages and thus not only reduces delay but can also
consume less power
ACTEL: MAPLD2004

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 39


Carry Select Adder

Carry signal switching propagates through much less number


of stages and thus reduces delay but considerable circuit
duplication
ACTEL: MAPLD2004

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 40


Brent & Kung Adder
 Brent & Kung in their paper in 1982 [6] had
proposed an area efficient adder
 It is basically a restructured carry-look-ahead
adder
 For details refer to their paper

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 41


Adder Architectures:
Area-Delay Trade-offs
45 360
Area (# Tiles)
Delay (ns)
40
310
RPL CLA CLF BK

35 RPL CLA CLF BK


260

30
210

25

160

20

110
15

60
10

Width Bit Width


5 10
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

Forward Carry Look Ahead (CLF): Fastest but also largest


Brent and Kung (BK): Almost same speed as CLF but drastically smaller
Carry Look Ahead (CLA): Relatively small and slow
Ripple (RPL): Smallest but slowest
ACTEL: MAPLD2004
Brent and Kung: Best area/speed tradeoff
12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 42
Adder Architectures:
Power Comparison
45 Power (mW)
Power Consumption of 32 bit Adder (Speed)
40
RPL CLA BK CLF
35

30

25

20

15

10

5
Frequency
0
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

Brent & Kung is the Lowest Power Dissipation as well


ACTEL: MAPLD2004
12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 43
Multiplier Architectures
 Considerable variation with carry save adders
 Wallace tree structures have a good area-
time performance
 Pipelining for throughput becomes important

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 44


Other Operations and
Operators
 ALUs
 Traditional method: Perform all operations and
use select for the output; very inefficient in terms
of switching activity
 Permit switching activity only in the operator
required in this cycle
 Complex operators like MAC
 Cordic functions
 Look up table vs computation

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 45


Alternative ALU Structures
Inputs Inputs

Demux

F1 F2 Fn
F1 F2 Fn

Function Mux
Select Function Mux
Select

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 46


Operator Power Estimation
Application model

Parameter extraction
from high-level
simulation

Extracted parameters

Parameter extraction
Operator from high-level
simulation
Models

Power estimates
12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 47
Retiming
 Leiserson[1] first proposed retiming for
optimizing synchronous circuits and
Monteiro[2] modified for low power design
 Basic observation is that positioning a flip-flop
can stop propagation of “glitches” and thus
unnecessary transitions
 This implies they can be positioned not only
to minimize delays (classical retiming) but
also to reduce transitions

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 48


Positioning a Flip-flop and Power
Consumption

Eg Eg ER

Logic Logic FF

CL CL
CR

P1 = k * Eg * CL P2 = k * (Eg * CR + ER * CL)

P2 can be less than P1

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 49


Retiming and Power Consumption
E0 E1 E2

Logic FF Logic Logic

CR C2
C1
P1 = k* (E0 * CR + E1 * CL+ E2 * C2 )
E0 E2 E3

Logic Logic FF Logic

C1 CR C2

P2 = k* (E0 * C1 + E2 * CR+ E3 * C2 )

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 50


Retiming: Methodology and
Results
 Evaluates each possible stage for potential power
saving i.e. transitions generated to those needing
propagation
 This is done by finding the difference between
transition count in 0-delay and time delay simulation
 Based on the above computation flip-flops are
placed either for minimizing power or for minimizing
power and timing (some factor)
 Results show a reduction of 10% to 25% in
transition count in a number of benchmark circuits

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 51


Pipelining
 Pipelining effects power in two different ways
 One factor is similar to retiming where flip-
flops can cut down on glitches
 As pipelining can reduce the critical path to
give higher frequency and performance
(throughput), this can be used to reduce the
voltage for the given throughput to reduce
power

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 52


Effect of Pipelining
freq: f0
Case1:
Logic voltage: v0
No Pipelining

f1 > f0
Case2: Logic FF Logic
Logic FF v1 = v0
Pipelining for
performance

f2 = f0
Case 3: Logic FF Logic
Logic FF v2 < v0
Pipelining for
low power
12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 53
Increasing Parallelism/
Concurrency
 Chandrakasan[4] first showed that concurrency can
be used to reduce power instead of increasing
performance
 Primary idea is to reduce the frequency of operation
and/or voltage to meet a certain throughput
 Power consumed by additional logic required to
distribute computation and multiplex results needs
to be accounted for

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 54


Effect of Parallelism
freq: f0
Case1: FU
voltage: v0
Single FU
throughput: T0
reg FU
Case2: M f1 = f0
U
Two FUs for v1 = v0
reg FU x
enhanced
T1 > T0
performance

reg FU
Case 3: M f2 < f0
Two FUs for U v2 < v0
reducing x
reg FU T2 = T0
power

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 55


Bus Data Encoding
 Bus is known to consume upto 30% of power in many systems
 The bus transitions can be reduced by encoding the data being
sent on the bus
 Encoding such that value pairs corresponding to frequent
transitions have smaller hamming distance
 Power consumed by encoders and decoders to be accounted for
 Part of the bits can also be encoded
 Applicable for both data and address busses. For address
busses, patterns can be encoded

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 56


Bus Data Encoding

Logic Logic

Logic Logic

Encoder/ Encoder/
decoder decoder

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 57


FSM State Encoding
 FSM state encoding evolved for logic synthesis
 Primarily encoding techniques are based on reducing the
hamming distance of codes assigned to “neighboring”
states
 The assumption is that lower hamming distance would
imply less logic for implementing the transitions.
 This has to be combined with input as well as output states
as well
 The approach to low power state assignment is
similar except that “neighbour” is to be defined by
frequency of transition rather than just by
connectivity in the state machine

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 58


FSM State Encoding
Encoding for Logic Minimization Encoding for Power Minimization

s0 000 s0 000

10 5 30 80

s1 s2 s3 s4 s1 s2 s3 s4

001 010 100 101 100 101 010 001

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 59


Other Transformations
Case 1 Case 2
a b c a c b

mux mux

mux mux

y y

Power Consumption of case 2 is significantly less than case 1


If activity (b) > > activity (a) and activity (b) > > activity (c)

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 60


References
1. Leiserson et.al, “Optimizing Synchronous Circuits by Retiming”, Proc.
Of 3rd CalTech Conference on VLSI, March 1983, pp. 23-36
2. J. Monteiro et. al., “Retiming Sequential Circuits for Low Power”,
ICCAD, Nov. 1993, pp. 398-402
3. Devadas & Malik, “A Survey of Optimization Techniques targeting
Low Power VLSI Circuits”, DAC 32, 1995, pp. 242-247
4. A.P. Chandrakasan, “Optimizing power using transformations” IEEE
TCAD, vol 14,  No.1  Jan 1995, pp. 12-31
5. Koegst et.al. “State Assignment for FSM Low Power Design”, EDAC
1995, pp. 28-33
6. Brent & Kung, “A Regular Layout for Parallel Adders” IEEE Tr on
Comp., vol C-31, No. 3, pp. 260-264

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 61


Glitching in Static CMOS
Analysis so far did not include timing effects

A
X
B Z
C

ABC 101 000

Z
Glitch

Gate Delay

Also known as dynamic hazards:


The result is correct, “A single input change causing
but extra power is dissipated multiple changes in the output”

12/08/21 Rajesh Bathija, Mewar University, Chittorgarh 62


What Causes Glitches?
A
A
X B
B X
Z
Y Y
C
C
Z
D D

A,B A,B

C,D C,D

X X

Y Y

Z Z

Uneven arrival times of input signals of gate due to


unbalanced delay paths
Solution: balancingRajesh
12/08/21
delay paths!
Bathija, Mewar University, Chittorgarh 63

You might also like