Chap4 Lect11 Logical Effort
Chap4 Lect11 Logical Effort
Logical Effort
CMPE 413
Logic Gate Delay Chip designers need to choose: What is the best circuit topology for a function? How many stages of logic produce least delay? How wide transistors should be? Logical Effort Helps make the above decisions. Uses a simple delay model Allows easy hand calculations Compare alternative designs easily Express delay in process independent terms d = d abs Delay has two components d = f+p where, f = Effort Delay (stage effort)= gh p =Parasitic Delay
1
Logical Effort
CMPE 413
Logic Gate Delay g logical Effort Measures relative ability of gate to deliver current 1 for inverter h electrical effort = Cout/Cin Ratio of output to input capacitance Sometimes called fanout p parasitic delay Represents delay of gate driving no load Set by internal parasitic capacitance Again d = f + p = gh + p
2
2-input NAND
NormalizedDelay:d
5 4 3 2 1
Parasitic Delay: p
0 0 1 2 3 4 5
Logical Effort
CMPE 413
Logical Effort Logical Effort: It is the ratio of the input capacitance of a gate to the input capacitance of an inverter delivering the same output current. Can be measured from delay vs. fanout plots Or estimate by counting transistor widths
2 2 A Y 1 A 2 Y 2 2 A B 4 4 Y 1 1
Cin = 3 g = 3/3 Gate Type Inverter NAND NOR Tristate, Mux XOR, XNOR
Cin = 4 B g = 4/3
Cin = 5 g = 5/3
Number of Inputs 1 1 4/3 5/3 2 2 4,4 5/3 7/3 2 6,12,6 6/3 9/3 2 8,16,16,8
3
n (n+2)/3 (2n+1)/3 2
Logical Effort
CMPE 413
Parasitic Delay Count diffusion capacitance on the output assuming contacted diffusions. Inverter: 3 units of diffusion capacitance, parasitic delay is 3RC = . Normalized parasitic pinv is. pinv is the ratio of diffusion capacitance to gate capacitance for a particular process. Is considered close to 1 for simplicity More refined parasitic delay estimations can be performed using Elmore delay. Internal diffusion capacitance are considered, delay grows quadratically rather than linearly as estimated by the crude method. Parasitic delay for common gates using the crude method Gate Type Inverter NAND NOR Tristate, Mux 2 Number of Inputs 1 1 2 2 4 3 3 6 4 4 8 n n 2n
4
Logical Effort
CMPE 413
Example: Ring Oscillator and FO4 inverter Estimate the frequency of a N-stage ring oscillator
Logical Effort g = 1, Electrical Effort h = 1, Parasitic Delay p = 1 Stage Delay d = 2 Frequency fosc = 1 / (2 * N * d) = 1 / 4N Period = 2N (edge has to propagate twice through the ring to attain original polarity) 31 stage ring oscillator in 0.6 m technology has frequency of ~ 200MHz. Estimate the delay of a fanout-of-4 (FO4) inverter
d
Logical Effort g = 1, Electrical Effort h = 4, Parasitic Delay p = 1 Stage Delay d = 5 The FO4 delay is: 200ps in 0.6 m, 60ps in 180nm, ~f/3 ns in an f m process.
5
Logical Effort
CMPE 413
Multistage Logic Networks Logical Effort generalizes to multistage networks Path Logical Effort: G =
gi
fi
x
gi hi
10 g1 = 1 h1 = x/10
g2 = 5/3 h2 = y/x
y g3 = 4/3 h3 = z/y
z g4 = 1 h4 = 20/z
20
Logical Effort
CMPE 413
15 5 15
90
90
G=1 H = 90/5 = 18 GH = 18 h1 = (15 + 15) / 5 = 6 h2 = 90/15 = 6 F = g1g2h1h2 = 36 = 2GH Thus we need to introduce branching effort.
7
Logical Effort
CMPE 413
Multistage Delay Branching Effort Accounts for branching between stages in path C onpath + C offpath b = ----------------------------------------------------C onpath B = Path Effort: F = GBH Path Effort Delay: D F =
bi
fi pi
di
= DF + P
Logical Effort
CMPE 413
Designing Fast Circuits Delay is the smallest when each stage bears the same effort , with N stages in the path f = g h = F1 N f i i Thus, minimum delay of N stage path is 1N D = NF +P The above equation helps to find fastest possible delay without calculating gate sizes. Capacitance transformation used to used to calculate gate widths. C out = gh = g -----------f C in Working backward, apply capacitance transformation to find input capacitance of each gate given load it drives Check work by verifying input capacitance specification is met.
Logical Effort
CMPE 413
Logical Effort Example: 3-stage path Logical Effort G=(4/3) * (5/3) * (5/3) = 100/27 Electrical Effort H= 45/8 Branching Effort G= 3 *2 = 6 Path Effort F=GBH= 125 f Best Stage Effort = 3 F = 5 Parasitic Delay P = 2 + 3 + 2 = 7 Delay D = 3 * 5 + 7 = 22 = 4.4 FO4 For best sizes work backward y = 45 * (5/3) / 5 = 15 x = (15 * 2) * (5/3) / 5 = 10 Sizes chosen to get equal rise-fall times Check path input capacitance to check values (10 + 10 + 10) (4/3) / 5 = 8
A P: 4 N: 4 P: 4 N: 6 P: 12 N: 3 B 45
x x A 8 x y 45 y B
45
45
10
Logical Effort
CMPE 413
Best Number of Stages Another important choice is the number of stages in a path Minimum number of stages does not provide best delay in all cases E.g. drive 64-bit datapath with unit inverter D = NF 1N + P = N ( 64 ) 1N +N
InitialDriver 1 1 1 1
2.8
16
23 DatapathLoad N: f: D: 1 64 65 64 2 8 18 64 3 4 15 64 64
11
Logical Effort
CMPE 413
Best Number of Stages Consider adding inverters at the end of the path? How many produce the best delay?
Logic Block: n1Stages Path Effort F N - n1 ExtraInverters
D = NF
1N
n1 +
i=1
p i + ( N n 1 ) inv
inv + ( 1 ln ) = 0
12
Logical Effort
CMPE 413
Best Stage Effort inv + ( 1 ln ) = 0 has no closed form solution Neglecting parasitics (inv = 0), = 2.718 (e) For inv = 1, solve numerically for = 3.59 Sensitivity analysis: How sensitive is the delay to using exactly the best number of stages?
D(N) /D(N)
2.4 < < 6.0 gives delay within 15% of optimal = 4 is a convenient choice Due to the above simplification: FO4 inverter used as 'representative' logic gate delay in a particular process
(=6)
( =2.4)
13
Logical Effort
CMPE 413
Larger Example: Register File Decoder Decoder specifications 16 word register file Each word is 32 bits wide Each bit presents a load of 3 unit-sized transistors on the word line True and complimentary versions of address bits A[3:0] are available Each address input can drive 10 unit-sized transistors.
How many stages to use? How large should each gate be?
4:16 Decoder
16 words
16
Register File
14
Logical Effort
CMPE 413
Larger Example: Register File Decoder Decoder effort is mainly electrical and branching Electrical Effort H = (32 * 3) / 10 = 9.6 Branching Effort B = 8 If we neglect logical effort G (assume G = 1) Path Effort F = GBH = 76.8 Number of stages = N = log4F = 3.1 Three stage design: Logical Effort G = 1 * 6/3 * 1 = 2 Path Effort F= GBH = 154 Stage Effort = F1/3 = 5.36 f Path Delay D = 3 + 1 + 4 + 1 = 22.1 f Gate sizes z = 96 * 1/ 5.36 = 18 y = 18 * 2/ 5.36 = 6.7
A[3] A[3] 10 10 A[2] A[2] 10 10 A[1] A[1] 10 10 A[0] A[0] 10 10
word[15]
15
Logical Effort
CMPE 413
Larger Example: Register File Decoder Compare alternatives with a spreadsheet Design NAND4 - INV NAND2 - NOR2 INV - NAND4 - INV NAND4 - INV - INV - INV NAND2 - NOR2 - INV - INV NAND2 - INV - NAND2 - INV INV - NAND2 - INV - NAND2 - INV NAND2 - INV - NAND2 - INV - INV - INV N 2 2 3 4 4 4 5 6 G 2 20/9 2 2 20/9 16/9 16/9 16/9 P 5 4 6 7 6 6 7 8 D 29.8 30.1 22.1 21.1 20.5 19.7 20.4 21.6
16
Logical Effort
CMPE 413
Logical Effort: Recap of Definitions TERM number of stages logical effort electrical effort STAGE 1 g Cout / Cin PATH N G =
gi
branching effort
bi
F = GBH DF = P = D =
fi
= DF + P
pi di
17
Logical Effort
CMPE 413
Logical Effort: Method Recap Compute path effort F = GBH Estimate best number of stages N=log4F Sketch path with N stages Estimate least delay D = NF 1N +P 1N f Determine best stage effort = F C out = g -----------Find gate sizes using capacitance transformation f C in Logical Effort Summary Provides a mechanism for designing and discussing fast circuits NANDs are faster than NORs in CMOS Paths are fastest when effort delay is ~4 Path delay is weakly sensitive to stages, sizes Using fewer stages doesn't mean faster circuit Inverters and NAND2 best for driving large loads (caps) BUT REQUIRES PRACTICE TO MASTER !!!
18
Logical Effort
CMPE 413
Limitations of Logical Effort Chicken and Egg problem Need path to compute G But don't know number of stages without G Simplistic Delay Model Neglects input rise time effects and input arrival times Gate-source capacitance approximation Bootstrapping due to gate to drain capacitance coupling Ignores secondary effects: velocity saturation, body effect etc Does not account for interconnect More applicable to datapath circuits with regular layout structure e.g. adders, mults etc Iterations required in designs with significant interconnect delay Design for maximum speed only, no information about minimum area/power Paths with complex branching are difficult to analyze by hand
19