0% found this document useful (0 votes)
156 views

Delay: Introduction To CMOS VLSI Design

The document discusses delay definitions and models for CMOS VLSI circuits. It defines propagation delay, rise/fall times, and contamination delay. It presents simulated inverter delay waveforms and introduces RC delay models using effective resistances. RC values for capacitance and resistance are provided. Equivalent RC circuits are shown for an inverter and 3-input NAND gate. Elmore delay is used to estimate delays. Contamination delay is defined as the best-case delay.

Uploaded by

Huy Tran
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
156 views

Delay: Introduction To CMOS VLSI Design

The document discusses delay definitions and models for CMOS VLSI circuits. It defines propagation delay, rise/fall times, and contamination delay. It presents simulated inverter delay waveforms and introduces RC delay models using effective resistances. RC values for capacitance and resistance are provided. Equivalent RC circuits are shown for an inverter and 3-input NAND gate. Elmore delay is used to estimate delays. Contamination delay is defined as the best-case delay.

Uploaded by

Huy Tran
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

Introduction to CMOS VLSI Design

Chapter 4
Delay
Delay Definitions
 tpdr: rising propagation delay
– From input to rising output
crossing VDD/2
 tpdf: falling propagation delay
– From input to falling output
crossing VDD/2
 tpd: average propagation delay
– tpd = (tpdr + tpdf)/2
 tr: rise time
– From output crossing 0.2
VDD to 0.8 VDD
 tf: fall time
– From output crossing 0.8 Inverter
VDD to 0.2 VDD

Chapter 4 CMOS VLSI Design 2


Delay Definitions
 tcdr: rising contamination delay
– From input to rising output crossing VDD/2
 tcdf: falling contamination delay
– From input to falling output crossing VDD/2
 tcd: contamination delay
– tcd = (tcdr + tcdf)/2 ??
– tcd = min(tcdr , tcdf)

Chapter 4 CMOS VLSI Design 3


Simulated Inverter Delay
 Solving differential equations by hand is too hard
 SPICE simulator solves the equations numerically
– Uses more accurate I-V models too!
 But simulations take time to write, may hide insight
2.0

1.5

1.0
(V)
tpdf = 66ps tpdr = 83ps
Vin
Vout
0.5

0.0

0.0 200p 400p 600p 800p 1n


t(s)

Chapter 4 CMOS VLSI Design 4


Delay Estimation
 We would like to be able to easily estimate delay
– Not as accurate as simulation
– But easier to ask “What if?”
 The step response usually looks like a 1st order RC
response with a decaying exponential.
 Use RC delay models to estimate delay
– C = total capacitance on output node
– Use effective resistance R
– So that tpd = RC
 Characterize transistors by finding their effective R
– Depends on average current as gate switches

Chapter 4 CMOS VLSI Design 5


Effective Resistance
 Shockley models have limited value
– Not accurate enough for modern transistors
– Too complicated for much hand analysis
 Simplification: treat transistor as resistor
– Replace Ids(Vds, Vgs) with effective resistance R
• Ids = Vds/R
– R averaged across switching of digital gate
 Too inaccurate to predict current at any given time
– But good enough to predict RC delay

Chapter 4 CMOS VLSI Design 6


RC Delay Model
 Use equivalent circuits for MOS transistors
– Ideal switch + capacitance and ON resistance
– Unit nMOS has resistance R, capacitance C
– Unit pMOS has resistance 2R (lower carrier
mobility), capacitance C
 Capacitance proportional to width (k)
 Resistance inversely proportional to width
d
s
kC
kC
R/k
d 2R/k
d
g k g kC
g k g
s kC kC
kC s
s
d
Chapter 4 CMOS VLSI Design 7
RC Values
 Capacitance
– C = Cg = Cs = Cd = 2 fF/mm of gate width in 0.6 mm
– Gradually decline to 1 fF/mm in nanometer techs. (
(1 fF = 1.0E-15 F)
 Resistance
– R  6 KW*mm in 0.6 mm process
– Improves with shorter channel lengths
 Unit transistors
– May refer to minimum contacted device (4/2 l)
– Or maybe 1 mm wide device
– Doesn’t matter as long as you are consistent
Chapter 4 CMOS VLSI Design 8
Equivalent RC circuits

Chapter 4 CMOS VLSI Design Slide 9


Inverter Delay Estimate
 Estimate the delay of a fanout-of-1 inverter
 A=1, consider output capacitance
2C

2C 2C
2C 2C
2 Y 2
A Y
1 1 R C
C
R C C

C output
capacitance
d = 6RC
Chapter 4 CMOS VLSI Design 10
Delay Model Comparison
(Example 4.1, p.145)

Chapter 4 CMOS VLSI Design 11


Example: 3-input NAND
 Sketch a 3-input NAND with transistor widths chosen to
achieve effective rise and fall resistances equal to a unit
inverter (R).

2 2 2

3
3

Chapter 4 CMOS VLSI Design 12


3-input NAND Caps
 Annotate the 3-input NAND gate with gate and diffusion
capacitance.

2C 2C 2C
2C 2C 2C
2 2 2
2C 2C 2C

3 3C
3C
3C
3
3C
3C
3
3C
3C

Chapter 4 CMOS VLSI Design 13


3-input NAND Caps
 Annotate the 3-input NAND gate with gate and diffusion
capacitance.

2 2 2

9C
3
5C
3C
3
5C
3C
3
5C

Chapter 4 CMOS VLSI Design 14


2C 2C 2C
2C 2C 2C
2 2 2
2C 2C 2C

3 3C
3C
3C
3
3C
3C
3
3C 2 2 2
3C
9C
3
5C
3C
3
5C
3C
3
5C

CMOS VLSI Design


Elmore Delay
 ON transistors look like resistors (output capacitor)
 Pullup or pulldown network modeled as RC ladder
 Elmore delay of RC ladder
t pd  R
nodes i
i to  source Ci

 R1C1   R1  R2  C2  ...   R1  R2  ...  RN  CN


R1 R2 R3 RN

C1 C2 C3 CN

Chapter 4 CMOS VLSI Design 16


Example: 3-input NAND
 Estimate worst-case rising and falling delay of 3-input NAND
driving h identical gates.
2 2 2 Y
A 3 9C 5hC
n2
B 3 n1 3C
h copies
C 3 3C

A=1, B=1, C=0


A=1, B=1, C=1

t pdf   3C   R3    3C   R3  R3    9  5h  C   R3  R3  R3 
t pdr   9  5h  RC
 11  5h  RC
rising output (nạp tụ)
falling output (xả tụ)

Chapter 4 CMOS VLSI Design 17


Delay Components
 Delay has two parts
– Parasitic delay
• 9 or 11 RC
• Independent of load
– Effort delay
• 5h RC
• Proportional to load capacitance

Chapter 4 CMOS VLSI Design 18


Contamination Delay
 Best-case (contamination) delay (minimum delay) can be
substantially less than propagation delay.
 Ex: If all three inputs fall simultaneously (rising output)

2 2 2 Y
A 3 9C 5hC
n2
B 3 n1 3C
C 3 3C

A=0, B=0, C=0

R  5 
tcdr   9  5h  C      3  h  RC
3  3 
rising output (nạp tụ)

Chapter 4 CMOS VLSI Design 19


Diffusion Capacitance
 We assumed contacted diffusion on every s / d.
 Good layout minimizes diffusion area
 Ex: NAND3 layout shares one diffusion contact
– Reduces output capacitance by 2C
– Merged uncontacted diffusion might help too
2C 2C
Shared
Contacted
Diffusion Isolated
Contacted 2 2 2
Merged Diffusion
Uncontacted 3 7C
Diffusion 3 3C

3C 3C 3C 3 3C

Chapter 4 CMOS VLSI Design 20


Layout Comparison
 Which layout is better?

VDD VDD
A B A B

Y Y

GND GND

Chapter 4 CMOS VLSI Design 21


Logical Effort Review
 Logical Effort
 Delay in a Logic Gate
 Multistage Logic Networks
 Choosing the Best Number of Stages
 Example
 Summary

Chapter 4 CMOS VLSI Design 22


Introduction
 Chip designers face a bewildering array of choices
– What is the best circuit topology for a function?
– How many stages of logic give least delay?
– How wide should the transistors be?

 Logical effort is a method to make these decisions


– Uses a simple model of delay
– Allows back-of-the-envelope calculations
– Helps make rapid comparisons between alternatives
– Emphasizes remarkable symmetries

Chapter 4 CMOS VLSI Design 23


Example
 A memory designer for an embedded automotive processor.
Help design the decoder for a register file.
A[3:0] A[3:0]
32 bits

4:16 Decoder
 Decoder specifications:

16 words
16
Register File
– 16 word register file
– Each word is 32 bits wide
– Each bit presents load of 3 unit-sized transistors
– True and complementary address inputs A[3:0]
– Each input may drive 10 unit-sized transistors
 needs to decide:
– How many stages to use?
– How large should each gate be?
– How fast can decoder operate?
Chapter 4 CMOS VLSI Design 24
Delay in a Logic Gate
 Express delays in process-independent unit d  d abs
 Delay has two components: d = f + p 
  3RC
 f: effort delay = gh (a.k.a. stage effort)
 3 ps in 65 nm process
– Again has two components 60 ps in 0.6 mm process
 g: logical effort
– Measures relative ability of gate to deliver current
– g  1 for inverter
 h: electrical effort (or fanout) = Cout / Cin
– Ratio of output to input capacitance
– Sometimes called fanout
 p: parasitic delay (normally ~1)
– Represents delay of gate driving no load
– Set by internal parasitic capacitance

Chapter 4 CMOS VLSI Design 25


Delay Plots
d =f+p 2-input
= gh + p 6
NAND Inverter
g = 4/3

Normalized Delay: d
5 p=2
 What about d = (4/3)h + 2
4 g=1
NOR2? p=1
3 d=h+1

2 Effort Delay: f

1
Parasitic Delay: p
0
0 1 2 3 4 5

Electrical Effort:
h = Cout / Cin

Chapter 4 CMOS VLSI Design 26


Computing Logical Effort
 DEF: Logical effort (g) is the ratio of the input
capacitance of a gate to the input capacitance of an
inverter delivering the same output current.
 Measure from delay vs. fanout plots
 Or estimate by counting transistor widths
2 2 A 4
Y
2 B 4
A 2
A Y Y
1 B 2 1 1

Cin = 3 Cin = 4 Cin = 5


g = 3/3 g = 4/3 g = 5/3

Chapter 4 CMOS VLSI Design 27


Catalog of Gates
 Logical effort of common gates

Gate type Number of inputs


1 2 3 4 n
Inverter 1
NAND 4/3 5/3 6/3 (n+2)/3
NOR 5/3 7/3 9/3 (2n+1)/3
Tristate / mux 2 2 2 2 2
XOR, XNOR 4, 4 6, 12, 6 8, 16, 16, 8

Chapter 4 CMOS VLSI Design 28


Catalog of Gates
 Parasitic delay of common gates
– In multiples of pinv (1)
Gate type Number of inputs
1 2 3 4 n
Inverter 1
NAND 2 3 4 n
NOR 2 3 4 n
Tristate / mux 2 4 6 8 2n
XOR, XNOR 4 6 8

Chapter 4 CMOS VLSI Design 29


Example: Ring Oscillator
 Estimate the frequency of an N-stage ring oscillator

Logical Effort: g=1 31 stage ring oscillator in


0.6 mm process has
Electrical Effort: h=1 frequency of ~ 200 MHz
Parasitic Delay: p=1
Stage Delay: d=2
Frequency: fosc = 1/(2*N*d) = 1/4N

Chapter 4 CMOS VLSI Design 30


Example: FO4 Inverter
 Estimate the delay of a fanout-of-4 (FO4) inverter
d

Logical Effort: g=1


Electrical Effort: h=4 The FO4 delay is about

Parasitic Delay: p=1 300 ps in 0.6 mm process

Stage Delay: d=5 15 ps in a 65 nm process

Chapter 4 CMOS VLSI Design 31


Multistage Logic Networks
 Logical effort generalizes to multistage networks
 Path Logical Effort G gi 
Cout-path
 Path Electrical Effort H
Cin-path
 Path Effort F   f i   gi hi

10
x z
y
20
g1 = 1 g2 = 5/3 g3 = 4/3 g4 = 1
h1 = x/10 h2 = y/x h3 = z/y h4 = 20/z

Chapter 4 CMOS VLSI Design 32


Multistage Logic Networks
 Logical effort generalizes to multistage networks
 Path Logical Effort G  gi
Cout  path
 Path Electrical Effort H
Cin  path
 Path Effort F   f i   gi hi

 Can we write F = GH?

Chapter 4 CMOS VLSI Design 33


Paths that Branch
 No! Consider paths that branch:
15
G =1 90
5
H = 90 / 5 = 18
GH = 18 15
90
h1 = (15 +15) / 5 = 6
h2 = 90 / 15 = 6
F = g1g2h1h2 = 36 = 2GH

Chapter 4 CMOS VLSI Design 34


Branching Effort
 Introduce branching effort
– Accounts for branching between stages in path
Con path  Coff path
b
Con path
B   bi
Note:

 h  BHi

 Now we compute the path effort


– F = GBH

Chapter 4 CMOS VLSI Design 35


Multistage Delays
 Path Effort Delay DF   f i

 Path Parasitic Delay P   pi

 Path Delay D   d i  DF  P

Chapter 4 CMOS VLSI Design 36


Designing Fast Circuits
D   d i  DF  P
 Delay is smallest when each stage bears same effort

fˆ  gi hi  F
1
N

 Thus minimum delay of N stage path is


1
D  NF  P
N

 This is a key result of logical effort


– Find fastest possible delay
– Doesn’t require calculating gate sizes

Chapter 4 CMOS VLSI Design 37


Gate Sizes
 How wide should the gates be for least delay?

fˆ  gh  g CCoutin
gi Couti
 Cini 

 Working backward, apply capacitance
transformation to find input capacitance of each gate
given load it drives.
 Check work by verifying input cap spec is met.

Chapter 4 CMOS VLSI Design 38


Example: 3-stage path
 Select gate sizes x and y for least delay from A to B

y
x
45
A 8
x
y B
45

Chapter 4 CMOS VLSI Design 39


Example: 3-stage path
x

y
x
45
A 8
x
y B
45

Logical Effort G = (4/3)*(5/3)*(5/3) = 100/27


Electrical Effort H = 45/8
Branching Effort B=3*2=6
Path Effort F = GBH = 125
Best Stage Effort fˆ  3 F  5
Parasitic Delay P=2+3+2=7
Delay D = 3*5 + 7 = 22 = 4.4 FO4

Chapter 4 CMOS VLSI Design 40


Example: 3-stage path
 Work backward for sizes
y = 45 * (5/3) / 5 = 15
x = (15*2) * (5/3) / 5 = 10

y
x
45
45
A P:
84 P:
x 4
N: 4 P:
y 12 B
B
N: 6 45
N: 3 45

Chapter 4 CMOS VLSI Design 41


Best Number of Stages
 How many stages should a path use?
– Minimizing number of stages is not always fastest
 Example: drive 64-bit datapath with unit inverter
Initial Driver 1 1 1 1

8 4 2.8

D = NF1/N + P 16 8

= N(64)1/N + N
23

Datapath Load 64 64 64 64

N: 1 2 3 4
f: 64 8 4 2.8
D: 65 18 15 15.3
Fastest

Chapter 4 CMOS VLSI Design 42


Derivation
 Consider adding inverters to end of path
– How many give least delay? N - n1 ExtraInverters
Logic Block:
n1 n1Stages

D  NF   pi   N  n1  pinv
1
N Path Effort F

i 1
D 1 1 1
  F N ln F N  F N  pinv  0
N
F
1
 Define best stage effort N

pinv   1  ln    0

Chapter 4 CMOS VLSI Design 43


Best Stage Effort
 pinv   1  ln    0 has no closed-form solution

 Neglecting parasitics (pinv = 0), we find  = 2.718 (e)


 For pinv = 1, solve numerically for  = 3.59

Chapter 4 CMOS VLSI Design 44


Sensitivity Analysis
 How sensitive is delay to using exactly the best
number of stages? 1.6
1.51

D(N) /D(N)
1.4
1.26
1.2 1.15
1.0

(=6) ( =2.4)

0.0
0.5 0.7 1.0 1.4 2.0

N/ N

 2.4 <  < 6 gives delay within 15% of optimal


– We can be sloppy!
– Harris uses  = 4, exact  is process dependent

Chapter 4 CMOS VLSI Design 45


Optimal Power?
 Switching vs. Crowbar

Chapter 4 CMOS VLSI Design Slide 46


Example, Revisited
 A memory designer for an embedded automotive processor.
Help design the decoder for a register file.
A[3:0] A[3:0]
32 bits

4:16 Decoder
 Decoder specifications:

16 words
16
Register File
– 16 word register file
– Each word is 32 bits wide
– Each bit presents load of 3 unit-sized transistors
– True and complementary address inputs A[3:0]
– Each input may drive 10 unit-sized transistors
 needs to decide:
– How many stages to use?
– How large should each gate be?
– How fast can decoder operate?
Chapter 4 CMOS VLSI Design 47
Number of Stages
 Decoder effort is mainly electrical and branching
Electrical Effort: H = (32*3) / 10 = 9.6
Branching Effort: B=8

 If we neglect logical effort (assume G = 1)


Path Effort: F = GBH = 76.8

Number of Stages: N = log4F = 3.1

 Try a 3-stage design (But G1 and 4)

Chapter 4 CMOS VLSI Design 48


Gate Sizes & Delay
Logical Effort: G = 1 * 6/3 * 1 = 2
Path Effort: F = GBH = 154
Stage Effort: fˆ  F 1/ 3  5.36
Path Delay: D  3 fˆ  1  4  1  22.1
Gate sizes: z = 96*1/5.36 = 18 y = 18*2/5.36 = 6.7
A[3] A[3] A[2] A[2] A[1] A[1] A[0] A[0]

10 10 10 10 10 10 10 10

y z word[0]

96 units of wordline capacitance

y z word[15]

Chapter 4 CMOS VLSI Design 49


Comparison
 Compare many alternatives with a spreadsheet
 D = N(76.8 G)1/N + P
Design N G P D
NOR4 1 3 4 234
NAND4-INV 2 2 5 29.8
NAND2-NOR2 2 20/9 4 30.1
INV-NAND4-INV 3 2 6 22.1
NAND4-INV-INV-INV 4 2 7 21.1
NAND2-NOR2-INV-INV 4 20/9 6 20.5
NAND2-INV-NAND2-INV 4 16/9 6 19.7
INV-NAND2-INV-NAND2-INV 5 16/9 7 20.4
NAND2-INV-NAND2-INV-INV-INV 6 16/9 8 21.6

Chapter 4 CMOS VLSI Design 50


Review of Definitions
Term Stage Path
number of stages 1 N
logical effort g G   gi
H
Cout-path
electrical effort h  CCoutin Cin-path
Con-path Coff-path
branching effort b Con-path B   bi
effort f  gh F  GBH

effort delay f DF   f i

parasitic delay p P   pi
delay d f p D   di  DF  P

Chapter 4 CMOS VLSI Design 51


Method of Logical Effort
1) Compute path effort F  GBH
2) Estimate best number of stages N  log4 F
3) Sketch path with N stages
1
4) Estimate least delay D  NF  PN

5) Determine best stage effort ˆf  F N1

gi Couti
6) Find gate sizes Cini 

Chapter 4 CMOS VLSI Design 52


Limits of Logical Effort
 Chicken and egg problem
– Need path to compute G
– But don’t know number of stages without G
 Simplistic delay model
– Neglects input rise time effects
 Interconnect
– Iteration required in designs with wire
 Maximum speed only
– Not minimum area/power for constrained delay

Chapter 4 CMOS VLSI Design 53


Summary
 Logical effort is useful for thinking of delay in circuits
– Numeric logical effort characterizes gates
– NANDs are faster than NORs in CMOS
– Paths are fastest when effort delays are ~4
– Path delay is weakly sensitive to stages, sizes
– But using fewer stages doesn’t mean faster paths
– Delay of path is about log4F FO4 inverter delays
– Inverters and NAND2 best for driving large caps
 Provides language for discussing fast circuits
– But requires practice to master

Chapter 4 CMOS VLSI Design 54

You might also like