0% found this document useful (0 votes)
67 views

Adders: Cmos Vlsi Design Cmos Vlsi Design

This document discusses various types of adders used in digital circuits. It begins with an outline of adder topics including carry-ripple adders, carry-skip adders, carry-lookahead adders, and others. It then provides motivation for studying adders, as arithmetic units are core components of data paths in microprocessors, signal processors, and other digital circuits. The document reviews principles of binary number systems and computer arithmetic relevant to adder design.

Uploaded by

Sudhan Krish
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views

Adders: Cmos Vlsi Design Cmos Vlsi Design

This document discusses various types of adders used in digital circuits. It begins with an outline of adder topics including carry-ripple adders, carry-skip adders, carry-lookahead adders, and others. It then provides motivation for studying adders, as arithmetic units are core components of data paths in microprocessors, signal processors, and other digital circuits. The document reviews principles of binary number systems and computer arithmetic relevant to adder design.

Uploaded by

Sudhan Krish
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 160

Lecture 17:

Adders

CMOS VLSI Design 4th Ed.


Outline
 Datapath
 Computer Arithmetic Principles
 Single-bit Addition
 Carry-Ripple Adder
 Carry-Skip Adder
 Carry-Lookahead Adder
 Carry-Select Adder
 Carry-Increment Adder
 Tree Adder

17: Adders CMOS VLSI Design 4th Ed. 2


A Generic Digital Processor

17: Adders CMOS VLSI Design 4th Ed. 3


Building Blocks for Digital Architectures

 Arithmetic unit
– Bit sliced data path – adder, multiplier, shifter,
comparator, etc.
 Memory
– RAM, ROM, buffers, shift registers
 Control
– Finite state machine (PLA, random logic)
– Counters
 Interconnect
– Switches, arbiters, bus

17: Adders CMOS VLSI Design 4th Ed. 4


An Intel Microprocessor

17: Adders CMOS VLSI Design 4th Ed. 5


Bit-Sliced Design

17: Adders CMOS VLSI Design 4th Ed. 6


Bit-Sliced Datapath

17: Adders CMOS VLSI Design 4th Ed. 7


Itanium Integer Datapath

17: Adders CMOS VLSI Design 4th Ed. 8


Motivation
 Arithmetic units are, among others, core of every data path and
addressing unit.
 Data path is at the core of
– microprocessors (CPU)
– signal processors (DSP)
– data processing application specific IC’s (ASIC) and
programmable IC’s (FPGA)
 Standard arithmetic units available from libraries
 Design of arithmetic units necessary for
– non-standard operations
– high performance components
– library development

17: Adders CMOS VLSI Design 4th Ed. 9


Naming Conventions
 Signal busses: A (1-D), Ai, (2-D), ai:k (sub-bus, 1-D)
 Signals: a, ai (1-D), ai,k (2-D), Ai:k (group signal)
 Circuit complexity measures: A (Area), T (cycle time,
delay), AT (area-time product), L (latency, number of
cycles).
 Arithmetic operators: +, -, •, /, log (=log2)
 Logic operators: OR, AND, XOR, NOT, …

17: Adders CMOS VLSI Design 4th Ed. 10


Circuit Complexity Measures
 Unit gate model
– Inverter, buffer: A = 0, T = 0
– Simple monotonic 2-input gates (AND, OR,
NAND, NOR): A = 1, T = 1
– Simple non-monotonic 2-input gates (XOR,
XNOR): A = 2, T = 2
– Simple m-input gates: A = m – 1, T = ⎡log m⎤
– Wiring not considered
– Only for estimation purposes

17: Adders CMOS VLSI Design 4th Ed. 11


Recursive Function Evaluation

 Given: inputs ai, outputs zi, function f (graph sym. •)


 Non-recursive functions (n.)
– Output zi is a function of input ai
zi = f ( ai , x ); i = 0,...,n −1

A = O( n ), T = O(1)
– Parallel structure

17: Adders CMOS VLSI Design 4th Ed. 12


Recursive Function Evaluation

 Recursive functions (r.)


– Output zi is a function of all inputs ak, k ≤ i
• with a single output z = zn-1 (r.s.):
t i = f (ai ,t i−1 ); i = 0,...,n −1
t −1 = 0 /1, z = t n −1

– f is non-associative (r.s.n)
€ » serial structure
A = O( n ), T = O( n )

– f is associative (r.s.a)
» serial
A = or
O(single-tree structure
n ), T = O(log n)

17: Adders CMOS VLSI Design 4th Ed. 13
Recursive Function Evaluation

– Output zi is a function of all inputs ak, k ≤ i


• multiple outputs zi (r.m.) (=> prefix problem)
zi = f ( ai ,zi−1 ); i = 0,...,n −1, z−1 = 0 /1
– f is non-associative (r.m.n)
» serial structure
€ A = O( n ), T = O( n )
– f is associative (r.m.a)
» Serial or multi-tree
2 structure
A = O( n ), T = O(log n )

» Shared tree structure
A = O( n log n ), T = O(log n )

17: Adders CMOS VLSI Design 4th Ed. 14


Arithmetic Operations
 Overview

17: Adders CMOS VLSI Design 4th Ed. 15


Overview of Arithmetic Operations

 Direct implementation of dedicated units


– always: 1 – 5
– in most cases: 6
– sometimes: 7, 8
 Sequential implementation using simpler units and several
clock cycles (decomposition)
– sometimes: 6
– in most cases: 7, 8, 9
 Table look-up techniques using ROMs
– universal: simple application to all operations
– efficient only for single-operand operations of high
complexity (8 - 12) and small word length.

17: Adders CMOS VLSI Design 4th Ed. 16


Overview of Arithmetic Operations

 Approximation using simpler units: 7 – 12


– Taylor series expansion
– polynomial and rational approximations
– convergence of recursive equation systems
– CORDIC (COordinate Rotation DIgital Computer)

17: Adders CMOS VLSI Design 4th Ed. 17


Binary Number Systems
 Radix-2, binary number system (BNS): irredundant,
weighted, positional, monotonic.
 n-bit number is an ordered sequence of bits (binary
digits) A = ( an −1,an −2 ,...,a0 ) 2, ai ∈ {0,1}
 Simple and efficient implementation in digital circuits
 MSB/LSB

(most/least significant bit): an-1/a0
 Represents an integer or fixed point number, exact.
 Fixed point numbers:

( am −1,...,a0 . a−1,...,am −n )
m-bit integer n-m bit fraction

17: Adders CMOS VLSI Design 4th Ed. 18



Binary Number Systems
 Unsigned: positive or natural numbers
– Value: n −1

A = an −12 n −1 + ...+ a1 2 + a0 = ∑ ai 2 i
i=0
– Range: [0, 2 n −1]

€ Two’s (2’s) complement: standard representation of


signed
€ or integer numbers
– Value n −2
n −1
A = −an −1 2i
+ ∑ ai 2
i=0

– Range −2 n −1, 2 n −1 −1
[ ]


17: Adders CMOS VLSI Design 4th Ed. 19
Binary Number Systems
– Complement: −A = 2 n − A = A +1, where A = ( an −1,an −2 ,...,a0 )
– Sign: an-1
– Properties:
€ asymmetric range, compatible with
unsigned numbers in many arithmetic operations.
(same treatment of positive and negative
numbers)
 One’s (1’s) complement: similar to 2’s complement
n −2
– Value:
A = −a 2 n −1 −1 + ∑ a 2 i
n −1 ( ) i
i=0

n −1 n −1
– Range: [ (
−2 −1), 2 −1]


17: Adders CMOS VLSI Design 4th Ed. 20
Binary Number Systems
– Complement: −A = 2 n − A −1 = A
– Sign: an-1
– Properties: double representation of zero,
symmetric
€ range, modulo (2n-1) number system.
 Sign-magnitude: alternative representation of signed
numbers n −2
– Value: A = (−1) ⋅ ∑ ai 2 i
a n−1

i=0

n −1 n −1
– Range: [ (
−2 −1), 2 −1]

–€Complement: −A = ( an −1,an −2 ,...,a0 )

17: Adders€ CMOS VLSI Design 4th Ed. 21


Binary Number Systems
 Sign: an-1
 Properties: double representation of zero, symmetric
range, different treatment of positive and negative
numbers in arithmetic operations, no MSB toggles at
sign changes around 0 (=> low power)

17: Adders CMOS VLSI Design 4th Ed. 22


Gray Numbers
 Gray numbers (code): binary, irredundant, non-
weighted, non-monotonic.
– Property: unit-distance coding. Exactly one-bit
toggles between adjacent numbers.
– Applications: counters with low output toggle rate
(low power busses), representation of continuous
signals for low-error sampling (no false numbers
due to switching of different bits at different
times).
– Non-monotonic numbers: difficult arithmetic
operations (addition, comparison).

17: Adders CMOS VLSI Design 4th Ed. 23


Gray Numbers
– Binary - Gray conversion

gi = bi+1 ⊕bi , bn = 0;
i = 0,...,n −1 (n.)

– Gray – binary conversion


€ bi = bi+1 ⊕ gi , bn = 0
i = n −1,...,0 (r.m.a)

17: Adders CMOS VLSI Design 4th Ed. 24


Redundant Number Systems
 Non-binary, redundant, weighted number systems.
 Digit set larger than radix (typically radix 2) => multiple
representations of the same number => redundancy.
 No carry propagation in adders => more efficient
implementation of adder-based units (multipliers, dividers, etc.)
 Redundancy => no direct implementation of relational
operators => conversion to irredundant numbers.
 Several bits used to represent one digit => higher storage
requirements.
 Expensive conversion to irredundant numbers. Not necessary
if redundant input operators are allowed.

17: Adders CMOS VLSI Design 4th Ed. 25


Delayed-Carry Representation
 Delayed-carry or half adder representation
ri ∈ {0,1,2} , c i ,si ,ai ,bi ∈ {0,1} ,
ri = (c i+1,si ) = 2c i+1 + si = ai + bi , c i+1si = 0
n −1
R = ∑ ri 2 i = (C,S ) = C + S = A + B
i=0

 1 digit holds the sum of 2 bits (no carry out)


 Example: 01 + 01 = (0,0) (1,0) = 2

17: Adders CMOS VLSI Design 4th Ed. 26


Carry-Save Representation
ri ∈ {0,1,2,3} , c i ,si ,ai ,bi ,di ∈ {0,1} ,
ri = (c i+1,si ) = 2c i+1 + si = ai + bi + di = ai + ri′
n −1

R = ∑ ri 2 i = (C,S ) = C + S = A + R′
i=0

 One digit holds the sum of 3 bits or 1 digit and 1 bit.


No carry-out digit, carry is saved.

 Standard redundant number system for fast addition.

17: Adders CMOS VLSI Design 4th Ed. 27


Signed-Digit Representation
 Signed-digit (SD) or redundant digit (RD) number
representation.
n −1
ri ,si ,t i ∈ {−1,0,1} ≡ {1,0,1} , R = ∑ ri 2 i
i=0

 No carry propagation in S = R + T

€ ri + t i = (c i+1,ui ) = 2c i+1 + ui , c i+1,ui ∈ {1,0,1}


(c i+1,ui ) is redundant (e.g., 0 +1 = 01 = 1 1
∀ i ∃(c i ,ui ) s.t. si ∈ {1,0,1}

 One digit holds the sum of two digits. No carry-out.



17: Adders CMOS VLSI Design 4th Ed. 28
Signed-Digit Representation
 Minimal SD representation: minimal number of non-
zero digits.
…011{1}10… →…100{0}1 0…
– Applications: sequential multiplication (less
cycles), filters with constant coefficients (less
€ hardware).
– Example:

7 = (0111 , 1 1 11 , 10 1 1 , 100 1 , 1 1 111 , …)


minimal

17: Adders CMOS VLSI Design 4th Ed. 29


Signed-Digit Representation
 Canonical SD representation: minimal SD. Not two
non-zero digits in sequence.
…01{1}10… →…10{0}1 0
 SD -> binary: carry propagation necessary => adder.
 Other applications: high speed multipliers.
 Similar
€ to carry-save, simple use for signed
numbers.

17: Adders CMOS VLSI Design 4th Ed. 30


Residue Number Systems
 Non-binary, irredundant, non-weighted number
system.
 Carry-free and fast additions and multiplications.
 Complex and slow other arithmetic operations (e.g.
comparison, sign, and overflow detection) because
digits are not weighted. Conversion to weighted
mixed-radix or binary system required.
 Codes for error correction and detection.
 Possible applications (but hardly used)
– Digital filters
– Error detection and correction

17: Adders CMOS VLSI Design 4th Ed. 31


Residue Number Systems
 Base is n-tuple of integers (mn-1, mn-2, …, m0),
residues (or moduli). These mi are pairwise prime.
A = (an −1,an −2 ,…,a0 ) m
n−1 ,m n−s ,…,m 0

ai ∈ {0,1,…,mi −1}
n −1

Range : M = ∏ mi , anywhere in Z
i=0

ai = Amod mi = A m i , A = mi ⋅ qi + ai

 Arithmetic operations: each digit computed


separately.

17: Adders CMOS VLSI Design 4th Ed. 32


Residue Number Systems
z i = Z m i = f ( A) m = f ( A m i ) = f ( ai )
i mi mi

A + B mi = A mi + B mi = ai + bi m
mi i

A⋅ B m i = A m i ⋅ B m i = ai ⋅ bi m
mi i

−ai m = mi − ai m
i i

ai−1 m = aim i −2 m (Fermat's Theorem)


i i

 Best moduli mi are 2k and 2k – 1.


€ High storage efficiency with k bits.

– Simple modular addition k bit adder without cout

17: Adders CMOS VLSI Design 4th Ed. 33


Residue Number Systems
 Example:

( m1,m0 ) = ( 3,2), M = 6
5 6 = A = ( a1,a0 ) = ( 5 3, 5 2 ) = (2,1)
4 + 5 6 = (1,0) + (2,1) = ( 1+ 2 3 , 0 +1 2 ) = (0,1) = 3 6
4⋅ 5 6 = (1,0)⋅ (2,1) = ( 1⋅ 2 3 , 0⋅1 2 ) = (2,0) = 2 6

17: Adders CMOS VLSI Design 4th Ed. 34


Floating-Point Numbers
 Larger range, smaller precision than fixed-point
representation, inexact, real numbers.
 Double-number form => discontinuous precision.
 S | biased exponent E | unsigned norm mantissa M
S S
F = ( −1) ⋅ M⋅ β E = ( −1) ⋅ M⋅ 2 E −bias

 Basic arithmetic operations


€ A⋅ B = ( −1) SA ⊕SB M ⋅ M ⋅ β E A +E B
A B
S S
A + B = (( −1) A ⋅ M A + ( −1) B ⋅ ( M B >> ( E A − E B )))⋅ β E A

17: Adders CMOS VLSI Design 4th Ed. 35


Floating-Point Numbers
 Basic arithmetic operations based in fixed point add,
multiply, and shift operations. Post-normalization
required.
 Applications:
– Processors: real floating point formats (e.g. IEEE
standard), large range due to universal use.
– ASICs: usually simplified floating-point formats
with small exponents, smaller range. Used for
range extension of normal fixed-point numbers.
 IEEE floating point format:

17: Adders CMOS VLSI Design 4th Ed. 36


Logarithmic Number System
 Alternative representation to floating point (mantissa
+ integer exponent -> only fixed point exponent).
 Single number form => continuous precision =>
higher accuracy, more reliable.
S | biased fixed − point exponent E
S S
L = ( −1) ⋅ β E = ( −1) ⋅ 2 E −bias
 Basic arithmetic operations:
– (A < B) = (EA < EB) additionally consider sign

– A + B by approximation or addition in
conventional number system and double
conversion.
17: Adders CMOS VLSI Design 4th Ed. 37
Logarithmic Number System
 Basic arithmetic operations
S A ⊕S B
A⋅ B = (−1) ⋅ β E A +E B
EA
y SA y⋅E A y SA y
A = ( −1) ⋅ β , A = ( −1) ⋅ β

– Simpler multiplication, exponentiation. More


complex addition.

– Expensive conversion: (anti)logarithms probably
by table look-up.
– Applications: real-time digital filters.

17: Adders CMOS VLSI Design 4th Ed. 38


Antitetrational Number System
2
2...
 Tetration (t.x = 2{2 and antitetration (a.t.x)
x times

 Larger range, but smaller precision than logarithmic


representation. Otherwise, analogous.
 Note that
€ all these systems can be mixed in
composite arithmetic.
 Choice of number representation should be hidden
from the user. The compiler should handle it.
 Rational numbers can also be represented in floating
slash notation.

17: Adders CMOS VLSI Design 4th Ed. 39


Round-Off Schemes
 Intermediate results with d additional lower bits. This
results in higher accuracy. A = ( an −1,…,a0,a−1,…,a−d )
 Rounding: keeping error e small during final word
length reduction: R = ( rn −1,…,r0 ) = A − ε
 Trade-off: numerical€accuracy vs implementation
cost.
 Truncation € RTRUNC = ( an −1,…,a0 )
– bias = − 1 + 1 = average error e
2 2 d +1
 Round
€ to nearest (normal rounding)
1
RROUND = ( a′n −1,…, a′0 ) , A′ = A + = A + 0.12
€ 2
17: Adders CMOS VLSI Design 4th Ed. 40
Round-Off Schemes
 Round to nearest
1
– The error is bias = d +1 nearly symmetric
2
– + 0.12 can often be included in a previous operation.
 Round to nearest even/odd

⎪RROUND if ( a′−1,…, a′−d ) ≠ 0…0
€ RROUND −EVEN = ⎨

⎩( a′n −1,…, a1′,0) otherwise
– bias = 0 (symmetric)
– Mandatory in IEEE floating-point standard
 3 guard bits for rounding after floating point operations: guard

bit G (postnormalization), round bit R (round to nearest ), sticky
bit S (round to nearest even)

17: Adders CMOS VLSI Design 4th Ed. 41


Addition

17: Adders CMOS VLSI Design 4th Ed. 42


Single-Bit Addition
A B A B
Half Adder Full Adder
S  A B Cout S  A B C Cout C

Cout  AB Cout  MAJ ( A, B, C ) S


S
A B Cout S A B C Cout S
0 0 0 0 0 0 0 0 0
0 1 0 1 0 0 1 0 1
1 0 0 1 0 1 0 0 1
1 1 1 0 0 1 1 1 0
1 0 0 0 1
1 0 1 1 0
1 1 0 1 0
1 1 1 1 1

17: Adders CMOS VLSI Design 4th Ed. 43


1-Bit Adders
 Add up m bits of same magnitude
 Output the sum as a k-bit number (k = ⎣log m⎦+1 )
 Or count 1’s at inputs => (m,k) counter –
combinational counter.
 A half adder is a (2,2) counter

(c out ,s) = 2c out + s = a + b A = 3 , T = 2(1)

s = a ⊕b (sum)
c out = ab (carry - out)
€ €


17: Adders CMOS VLSI Design 4th Ed. 44
1-Bit Adders

17: Adders CMOS VLSI Design 4th Ed. 45


1-Bit Adders
 A full-adder is a (3,2) counter.
(c out ,s) = 2c out + s = a + b + c in A = 7 , T = 4 (2)

g = ab (generate) c 0 = ab
p = a ⊕b (propagate) c1 = a + b
s = a ⊕b ⊕c in = p ⊕c in €

c out = ab + ac in + bc in = ab + ( a ⊕b)c in
= g + pc in = pg + pc in = pa + pc in
= c in c 0 + c in c1

17: Adders CMOS VLSI Design 4th Ed. 46


PGK
 For a full adder, define what happens to carries
(in terms of A and B)
– Generate: Cout = 1 independent of C
• G=A•B
– Propagate: Cout = C
• P=AB
– Kill: Cout = 0 independent of C
• K = ~A • ~B

17: Adders CMOS VLSI Design 4th Ed. 47


Full Adder Design I
 Brute force implementation from eqns
S  A B C
Cout  MAJ ( A, B, C )
A A B B C C

A A

B B
A
B S B
C C C
A B B
S
A C C C A
MAJ

B Cout
Cout
C B
B B C A
A B B
A A

17: Adders CMOS VLSI Design 4th Ed. 48


Full Adder Design II
 Factor S in terms of Cout
S = ABC + (A + B + C)(~Cout)
 Critical path is usually C to Cout in ripple adder
MINORITY
A
B
C
Cout S
S

Cout

17: Adders CMOS VLSI Design 4th Ed. 49


Full Adder Design II
 Same circuit with sized transistors

17: Adders CMOS VLSI Design 4th Ed. 50


Layout
 Clever layout circumvents usual line of diffusion
– Use wide transistors on critical path
– Eliminate output inverters

17: Adders CMOS VLSI Design 4th Ed. 51


Full Adder Design III
 Complementary Pass Transistor Logic (CPL)
– Slightly faster, but more area
B

B C B C
A

B C B C
S Cout
A
B C B C

A
B C B C
S Cout

B
A

17: Adders CMOS VLSI Design 4th Ed. 52


Full Adder Design III
 Transmission gates

17: Adders CMOS VLSI Design 4th Ed. 53


Full Adder Design IV
 Dual-rail domino
– Very fast, but large and power hungry
– Used in very fast multipliers
 
Cout _h Cout _l
C_h A_h C_l A_l
A_h B_h B_h A_l B_l B_l

S_l  S_h
C_l
C_h C_h

B_l
B_h B_h

A_h A_l

17: Adders CMOS VLSI Design 4th Ed. 54


(m,k) Counters
k −1 m −1

( sk −1,…,s0 ) = ∑ s j 2 = ∑ ai
j

j =0 i=0

 Usually built from full-adders.


€  Associativity of addition allows conversion from
linear to tree structure => faster at the same number
of FAs.
log m

A = 7 ∑⎣m2 −k ⎦ ≈ 7( m − log m)
k =1

TLIN = 4m + 2⎣log m⎦ , TTREE = 4 ⎡log 3 m ⎤+ 2⎣log m⎦

17: Adders CMOS VLSI Design 4th Ed. 55



(7,3) Counter
 Example
A = 28 , T = 14 A = 28 , T = 10

€ €

17: Adders CMOS VLSI Design 4th Ed. 56


Carry Propagate Adders
 Add two n-bit operands A and B and an optional
carry in cin by performing carry propagation.
 Sum (cout, S) is an irredundant (n+1) bit number
n
c ,S
( out ) = c out 2 + S = A + B + c in

2c i+1 + si = ai + bi + c i
i = 0,1,…,n −1
€c 0 = c in , c out = c n (r.m.a)


17: Adders CMOS VLSI Design 4th Ed. 57
Carry Propagate Adders
 N-bit adder called CPA
– Each sum bit depends on all previous carries
– How do we compute all these carries quickly?

AN...1 BN...1
Cout Cin Cout Cin
00000 11111 carries
Cout Cin
+ 1111 1111 A4...1
+0000 +0000 B4...1
SN...1 1111 0000 S4...1

17: Adders CMOS VLSI Design 4th Ed. 58


Ripple-Carry Adder(RCA)
 Serial arrangement of n full adders.
 Simplest, smallest, and slowest CPA structure.

A = 7n , T = 2n , AT = 14n 2

17: Adders CMOS VLSI Design 4th Ed. 59


Carry-Ripple Adder
 Simplest design: cascade full adders
– Critical path goes from Cin to Cout
– Design full adder to have fast carry delay

A4 B4 A3 B3 A2 B2 A1 B1

Cout Cin
C3 C2 C1
S4 S3 S2 S1

17: Adders CMOS VLSI Design 4th Ed. 60


Carry Ripple Adder
 Note that worst case delay is linear with number of
bits.

t adder = ( N −1) t carry + t sum


 Goal: Make the fastest possible carry path circuit.

17: Adders CMOS VLSI Design 4th Ed. 61


A Full Adder Circuit

17: Adders CMOS VLSI Design 4th Ed. 62


Inversion Property

17: Adders CMOS VLSI Design 4th Ed. 63


Inversions
 Critical path passes through majority gate
– Built from minority + inverter
– Eliminate inverter and use inverting full adder

A4 B4 A3 B3 A2 B2 A1 B1

Cout Cin
C3 C2 C1

S4 S3 S2 S1

17: Adders CMOS VLSI Design 4th Ed. 64


Mirror Adder

17: Adders CMOS VLSI Design 4th Ed. 65


Mirror Adder

17: Adders CMOS VLSI Design 4th Ed. 66


Mirror Adder
 The NMOS and PMOS chains are completely
symmetrical. A maximum of two series transistors
can be observed in the carry generation circuit.
 When laying out the cell, the most critical issue is the
minimization of the capacitance at node Co. The
reduction of the diffusion capacitances is particularly
important.
 The capacitance at node Co is composed of four
diffusion capacitances, two internal gate
capacitances, and six gate capacitances in the
connecting adder cell.

17: Adders CMOS VLSI Design 4th Ed. 67


Mirror Adder
 The transistors connected to Ci are placed closest to
the input.
 Only the transistors in the carry stage have to be
optimized for optimal speed. All transistors in the
sum stage can be minimal size.

17: Adders CMOS VLSI Design 4th Ed. 68


Transmission Gate FA

17: Adders CMOS VLSI Design 4th Ed. 69


Carry Propagation Speed-up
 Concatenation of partial CPA’s with fast cin -> cout.

 Fast carry look-ahead logic for entire range of bits.

17: Adders CMOS VLSI Design 4th Ed. 70


Generate / Propagate
 Equations often factored into G and P
 Generate and propagate for groups spanning i:j
Gi: j  Gi:k  Pi:k  Gk 1: j
Pi: j  Pi:k  Pk 1: j 0 GCP
0:00:0 in

 Base case
Gi:i  Gi  Ai  Bi G0:0  G0  Cin
Pi:i  Pi  Ai  Bi P0:0  P0  0
 Sum:
Si  Pi  Gi 1:0

17: Adders CMOS VLSI Design 4th Ed. 71


PG Logic
A4 B4 A3 B3 A2 B2 A1 B1 Cin

1: Bitwise PG logic
G4 P4 G3 P3 G2 P2 G1 P1 G0 P0

2: Group PG logic

G3:0 G2:0 G1:0 G0:0

C3 C2 C1 C0
3: Sum logic

C4

Cout S4 S3 S2 S1

17: Adders CMOS VLSI Design 4th Ed. 72


PG Logic

17: Adders CMOS VLSI Design 4th Ed. 73


Carry-Ripple Revisited
Gi:0  Gi  Pi  Gi 1:0
A4 B4 A3 B3 A2 B2 A1 B1 Cin

G4 P4 G3 P3 G2 P2 G1 P1 G0 P0

G3:0 G2:0 G1:0 G0:0

C3 C2 C1 C0

C4

Cout S4 S3 S2 S1

17: Adders CMOS VLSI Design 4th Ed. 74


Carry-Ripple PG Diagram
Bit Position

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

tripple  t pg  ( N  1)t AO  txor

Delay
15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0

17: Adders CMOS VLSI Design 4th Ed. 75


PG Diagram Notation

Black cell Gray cell Buffer


i:k k-1:j i:k k-1:j i:j

i:j i:j i:j

Gi:k Gi:k
Gi:j Gi:j
Pi:k Pi:k Gi:j Gi:j
Gk-1:j Gk-1:j
Pi:j Pi:j
Pi:j
Pk-1:j

17: Adders CMOS VLSI Design 4th Ed. 76


Manchester Carry Chain

17: Adders CMOS VLSI Design 4th Ed. 77


Manchester Carry Chain

17: Adders CMOS VLSI Design 4th Ed. 78


Manchester Carry Chain

17: Adders CMOS VLSI Design 4th Ed. 79


Carry-Skip Adder
 Carry-ripple is slow through all N stages
 Carry-skip allows carry to skip over groups of n bits
– Decision based on n-bit propagate signal

A16:13 B16:13 A12:9 B12:9 A8:5 B8:5 A4:1 B4:1

P16:13 P12:9 P8:5 P4:1


1 C12 1 C8 1 C4 1
Cout Cin
0 + 0 + 0 + 0 +

S16:13 S12:9 S8:5 S4:1

17: Adders CMOS VLSI Design 4th Ed. 80


Carry-Skip Adder

17: Adders CMOS VLSI Design 4th Ed. 81


Carry-Skip Adder

17: Adders CMOS VLSI Design 4th Ed. 82


Carry-Skip Adder

17: Adders CMOS VLSI Design 4th Ed. 83


Carry-Skip PG Diagram
16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

16:0 15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0

For k n-bit groups (N = nk)


tskip  t pg   2  n  1  (k  1)  t AO  txor

17: Adders CMOS VLSI Design 4th Ed. 84


Variable Group Size
16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

16:0 15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0

Delay grows as O(sqrt(N))


17: Adders CMOS VLSI Design 4th Ed. 85
Carry-Skip Adder
 Partial CPA with fast ck -> ci
c i = Pi−1:k c′i + Pi−1:k c k (bit group ( ai−1,…,ak ))
Pi−1:k = pi−1 pi−2 …pk (group propagate)
 If Pi-1:k = 0 : ck does not become c’i and c’i is selected,
becoming ci.

 If Pi-1:k = 0 : ck becomes c’i, but c’i is skipped.
 Path ck -> c’i -> ci never sensitized => fast ck -> ci
 False path => inherent logic redundancy =>
problems in circuit optimization, timing analysis, and
testing.

17: Adders CMOS VLSI Design 4th Ed. 86


Carry-Skip Adder
 Variable group sizes are faster.
– Use larger groups in the middle
– Minimize delays a0 -> ck -> si-1 and ak -> ci -> sn-1
 Partial CPA type is RCA or CSKA (multilevel CSKA)
 Medium speed-up at small hardware overhead (+
AND/bit +MUX/group)
1 3
A ≈ 8n , T ≈ 4n 2
, AT ≈ 32n 2

17: Adders CMOS VLSI Design 4th Ed. 87


CSKA + Manchester

17: Adders CMOS VLSI Design 4th Ed. 88


Carry-Select Adder
 Trick for critical paths dependent on late input X
– Precompute two possible outputs for X = 0, 1
– Select proper output when X arrives
 Carry-select adder precomputes n-bit sums
– For both possible carries into n-bit group
A16:13 B16:13 A12:9 B12:9 A8:5 B8:5 A4:1 B4:1

0 0 0
+ + +

Cout C12 C8 C4
1 1 1 Cin
+ + + +
1

1
0

0
S16:13 S12:9 S8:5 S4:1

17: Adders CMOS VLSI Design 4th Ed. 89


Carry-Select Adder
 Partial CPA with fast ck -> ci and ck -> si-1:k
0
si−1:k = c k si−1:k + c k s1i−1:k
c i = c k c i0 + c k c1i
 Two CPA’s compute two possible results (cin = 0/1),
group carry-in ck selects correct one afterwards.
 Variable
€ group sizes are faster; use larger groups at
end (MSB). Balance delays a0 -> ck and ak -> ci0
 Partial CPA type is RCA, CSLA (multilevel CSLA) or
CLA.

17: Adders CMOS VLSI Design 4th Ed. 90


Carry-Select Adder
 High speed-up at high hardware overhead.
– + MUX/bit + (CPA + MUX)/group
1 3
A ≈ 14n , T ≈ 2.8n 2
, AT ≈ 39n 2

17: Adders CMOS VLSI Design 4th Ed. 91


Carry-Select Adder

17: Adders CMOS VLSI Design 4th Ed. 92


Carry-Select Adder

17: Adders CMOS VLSI Design 4th Ed. 93


Linear Carry-Select

17: Adders CMOS VLSI Design 4th Ed. 94


Square-Root Carry-Select

17: Adders CMOS VLSI Design 4th Ed. 95


Delay Comparison

17: Adders CMOS VLSI Design 4th Ed. 96


Carry-Increment Adder
 Partial CPA with fast ck -> ci and ck -> si-1:k
si−1:k = s′i−1:k + c k , c i = c′i + Pi−1:k c k
Pi−1:k = pi−1 pi−2 …pk (group propagate)
 Result is incremented after addition if ck = 1
 Variable group sizes are faster, use larger groups at
€end (MSB). Balance delays a0 -> ck and ak -> c’i
 Partial CPA could be RCA, CIA (multilevel CIA) or
CLA.
 High speed-up at medium hardware overhead
(+AND/bit + (incrementer + AND/OR)/group).
 Logic of CPA and incrementer could be merged.
17: Adders CMOS VLSI Design 4th Ed. 97
Carry-Increment Adder

17: Adders CMOS VLSI Design 4th Ed. 98


Carry-Increment Adder
 Example: gate-level schematic of carry-increment
adder (CIA)
– Only two different logic cells (bit-slices): IHA and
IFA

17: Adders CMOS VLSI Design 4th Ed. 99


Carry-Increment Adder
 Factor initial PG and final XOR out of carry-select
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

13:12 9:8 5:4

14:12 10:8 6:4

15:12 11:8 7:4

15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0

tincrement  t pg   n  1  (k  1)  t AO  txor

17: Adders CMOS VLSI Design 4th Ed. 100


Variable Group Size
 Also buffer 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

noncritical 12:11 8:7 5:4 3:2

13:11 9:7

signals
6:4

14:11 10:7

15:11

15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

12:11 8:7 5:4 3:2 1:0

13:11 9:7 6:4 3:0

14:11 10:7 6:0

15:11

15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0

17: Adders CMOS VLSI Design 4th Ed. 101


Conditional-Sum Adder
 Optimized multilevel CSLA with logn levels
0
 Correct sum bits si−1:k or s1i−k:1 are conditionally
selected through logn levels of multiplexers.
 Bit groups of size 2l at level l.
 Higher parallelism,
€ more balanced signal paths.
 Highest speed-up at highest hardware overhead
(2RCA + more than logn MUX/bit)

17: Adders CMOS VLSI Design 4th Ed. 102


Conditional-Sum Adder

17: Adders CMOS VLSI Design 4th Ed. 103


Conditional-Sum Adder

17: Adders CMOS VLSI Design 4th Ed. 104


Conditional-Sum Adder

17: Adders CMOS VLSI Design 4th Ed. 105


Carry-Lookahead Adder
 Carries look ahead before sum bits are computed
c 0 = c′0
c1 = g0 + p0c′0
c 2 = g1 + p1g0 + p1 p0c′0
c 3 = g2 + p2 g1 + p2 p1g0 + p2 p1 p0c′0
g′3 = g3 + p3 g2 + p3 p2 g1 + p3 p2 p1g0
p′3 = p3 p2 p1 p0
 Hierarchical arrangement using 1 log n levels: (g′ , p′ )
3 3
2
passed up, c’0 passed down between levels.
 High €speed-up at medium hardware overhead.


17: Adders CMOS VLSI Design 4th Ed. 106
Carry-Lookahead Adder
A ≈ 14n , T ≈ 4 log n , AT ≈ 56n log n

17: Adders CMOS VLSI Design 4th Ed. 107


Carry-Lookahead Adder

17: Adders CMOS VLSI Design 4th Ed. 108


Carry-Lookahead Adder
 Carry-lookahead adder computes Gi:0 for many bits
in parallel.
 Uses higher-valency cells with more than two inputs.

A16:13 B16:13 A12:9 B12:9 A8:5 B8:5 A4:1 B4:1

Cout G16:13 C12 G12:9 C8 G8:5 C4 G4:1


P16:13 P12:9 P8:5 P4:1

+ + + + Cin

S16:13 S12:9 S8:5 S4:1

17: Adders CMOS VLSI Design 4th Ed. 109


CLA PG Diagram

17: Adders CMOS VLSI Design 4th Ed. 110


Carry-Lookahead

17: Adders CMOS VLSI Design 4th Ed. 111


Lookahead Tree

17: Adders CMOS VLSI Design 4th Ed. 112


Lookahead Tree

17: Adders CMOS VLSI Design 4th Ed. 113


Higher-Valency Cells

Gi:k
Pi:k Gi:j
i:k k-1:l l-1:m m-1:j
Gk-1:l
Pk-1:l
Gl-1:m
Pl-1:m
Gm-1:j
i:j
Pi:j
Pm-1:j

17: Adders CMOS VLSI Design 4th Ed. 114


Higher Valency PG Diagram

16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

16:0 15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0

17: Adders CMOS VLSI Design 4th Ed. 115


Tree Adder
 If lookahead is good, lookahead across lookahead!
– Recursive lookahead gives O(log N) delay
 Many variations on tree adders

17: Adders CMOS VLSI Design 4th Ed. 116


Parallel Prefix Adders
 Universal adder architecture comprising RCA, CIA,
CLA, and more (entire range of area-delay trade-offs
from slowest RCA to fastest CLA).
 Preprocessing, carry-lookahead, and postprocessing
step.
 Carries calculated using parallel-prefix algorithms
– High regularity: suitable for synthesis and layout
– High flexibility: special adders, other arthmetic
operations, exchangeable prefix algorithms.
– High performance: smallest and fastest adders

17: Adders CMOS VLSI Design 4th Ed. 117


Parallel Prefix Adders
A ≈ 5n + 3A• , T = 4 + 2T•

17: Adders CMOS VLSI Design 4th Ed. 118


Prefix Problem
 Inputs (xn-1,…,x0) outputs (yn-1,…,y0), associative
binary operator •
( y n −1,…, y 0 ) = ( x n −1 • ⋅⋅⋅ • x 0 ,…, x1 • x 0 , x 0 ) or
y 0 = x 0 , y i = x i • y i−1 ; i = 1,…,n −1 (r.m.a)

 Associativity of • => tree structures for evaluation



⎛ ⎞
⎜ ⎟
x 3 • ⎜x 2 • ( x1 • x 0 )⎟ = ( x 3 • x 2 ) • ( x1 • x 0 )
⎜ 14 2 43 ⎟ 14 2 43 14 2 43
⎝1 4 4 2 y14=Y1:0143 ⎠ 1 4Y3:2144 2 4 y14=Y41:013
2 2
y 2 =Y2:0 y 3 =Y3:0
1 4 44 2 4 4 43
3
y 3 =Y3:0

17: Adders CMOS VLSI Design 4th Ed. 119


Prefix Problem
l
 Group variables Yi:k : covers bits (xk,…,xi) at level l.
 Carry-propagation is prefix problem: Yi:kl = (Gi:kl ,Pi:kl )
0 0
(Gi:i ,Pi:i ) = ( gi , pi )
l l l −1 l −1 l −1 l −1
(Gi:k ,Pi:k ) = (Gi: j +1,Pi: j +1 ) • (G j:k ,P j:k ) ; k ≤ j ≤ i
€= l −1 l −1 l −1 l −1 l −1 €
(Gi: j +1 + Pi: j +1G j:k ,P j:k P j:k )
m
c i+1 = Gi:0 ; i = 0,…,n −1 , l =1,…,m
 Parallel-prefix algorithms:
– Multi-tree structures T = O(n) -> O(logn)

– Sharing subtrees A = O(n2) -> O(nlogn)
– Different algorithms trading area vs delay. Also consider
wirng and fanout.

17: Adders CMOS VLSI Design 4th Ed. 120


Prefix Algorithms
 Algorithms visualized by directed acyclic graphs
(DAG) with array structure (n bits x m levels).
 Graph vertex symbols

 Performance measures:
– A• : graph size (number of black nodes)
– T• : graph depth (number of black nodes on
critical path)

17: Adders CMOS VLSI Design 4th Ed. 121


Prefix Algorithms
 Serial prefix algorithm (RCA)
A• = n −1 , T• = n −1 , FOmax = 2

17: Adders CMOS VLSI Design 4th Ed. 122


Prefix Algorithms
 Sklansky parallel-prefix algorithm (PPA-SK)
– Tree-like collection, parallel redistribution of
carries
1 1
A• ≈ n log n , T• = ⎡log n⎤ , FOmax ≈ n
2 2

17: Adders CMOS VLSI Design 4th Ed. 123


Sklansky

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

15:14 13:12 11:10 9:8 7:6 5:4 3:2 1:0

15:12 14:12 11:8 10:8 7:4 6:4 3:0 2:0

15:8 14:8 13:8 12:8

15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0

17: Adders CMOS VLSI Design 4th Ed. 124


Prefix Algorithms
 Brent-Kung parallel-prefix algorithm (PPA-BK)
– Traditional CLA is PPA-BK with 4-bit groups
– Tree-like redistribution of carries (fan-out tree)
A• = 2n − ⎡log n⎤ − 2 , T• = 2⎡log n⎤ − 2
FOmax ≈ log n

17: Adders CMOS VLSI Design 4th Ed. 125


Brent-Kung
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

15:14 13:12 11:10 9:8 7:6 5:4 3:2 1:0

15:12 11:8 7:4 3:0

15:8 7:0

11:0

13:0 9:0 5:0

15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0

17: Adders CMOS VLSI Design 4th Ed. 126


Prefix Algorithms
 Kogge-Stone parallel-prefix algorithm (PPA-KS)
– very high wiring requirements
A• ≈ n log n − n +1 , T• = ⎡log n⎤ , FOmax = 2

17: Adders CMOS VLSI Design 4th Ed. 127


Kogge-Stone

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

15:14 14:13 13:12 12:11 11:10 10:9 9:8 8:7 7:6 6:5 5:4 4:3 3:2 2:1 1:0

15:12 14:11 13:10 12:9 11:8 10:7 9:6 8:5 7:4 6:3 5:2 4:1 3:0 2:0

15:8 14:7 13:6 12:5 11:4 10:3 9:2 8:1 7:0 6:0 5:0 4:0

15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0

17: Adders CMOS VLSI Design 4th Ed. 128


Prefix Algorithms
 Carry-increment parallel-prefix algorithm
1 1 1
A• ≈ 2n −1.4n 2
, T• ≈ 1.4n 2
, FOmax ≈ 1.4n 2

17: Adders CMOS VLSI Design 4th Ed. 129


Prefix Algorithms
 Mixed serial/parallel-prefix algorithm (RCA+PPA)
– Linear size-depth trade-off using parameter k:

0 ≤ k ≤ n − 2⎡log n⎤+ 2

– k = 0 : serial prefix graph


–€ k = n − 2⎡log n⎤+1 : Brent-Kung parallel-prefix
graph
– Fills the gap between RCA and PPA-BK (CLA) in
€ steps of single •-operations.

17: Adders CMOS VLSI Design 4th Ed. 130


Prefix Algorithms

17: Adders CMOS VLSI Design 4th Ed. 131


Prefix Algorithms
 Example: 4-bit PPA-SK
– Efficient AND-OR-prefix circuit for the generate
and AND-prefix circuit for the propagate signals
– Optimization: alternatingly AOI/OAI- resp.
NAND-/NOR-gates (inverting gatesare smaller
and faster).
– Can also be realized using two MUX-prefix
circuits

17: Adders CMOS VLSI Design 4th Ed. 132


Prefix Algorithms

17: Adders CMOS VLSI Design 4th Ed. 133


Prefix Algorithms
 Prefix adders can be synthesized by human or
computer as well.
 Starting from a serial structure, one can use
compression rules and expansion rules to obtain
new graphs.
 Can generate all previous graphs except PPA-KS.
 Universal adder synthesis approach.

17: Adders CMOS VLSI Design 4th Ed. 134


Tree Adder Taxonomy
 Ideal N-bit tree adder would have
– L = log N logic levels
– Fanout never exceeding 2
– No more than one wiring track between levels
 Describe adder with 3-D taxonomy (l, f, t)
– Logic levels: L+l
– Fanout: 2f + 1
– Wiring tracks: 2t
 Known tree adders sit on plane defined by
l + f + t = L-1

17: Adders CMOS VLSI Design 4th Ed. 135


Tree Adder Taxonomy
l (Logic Levels)

3 (7)
Brent-Kung
f (Fanout)
Sklansky 2 (6)

3 (9)
1 (5)
2 (5)
1 (3)
0 (2) 0 (4)
0 (1)

1 (2)

2 (4)

Kogge-Stone
3 (8)

t (Wire Tracks)

17: Adders CMOS VLSI Design 4th Ed. 136


Han-Carlson
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

15:14 13:12 11:10 9:8 7:6 5:4 3:2 1:0

15:12 13:10 11:8 9:6 7:4 5:2 3:0

15:8 13:6 11:4 9:2 7:0 5:0

15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0

17: Adders CMOS VLSI Design 4th Ed. 137


Knowles [2, 1, 1, 1]

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

15:14 14:13 13:12 12:11 11:10 10:9 9:8 8:7 7:6 6:5 5:4 4:3 3:2 2:1 1:0

15:12 14:11 13:10 12:9 11:8 10:7 9:6 8:5 7:4 6:3 5:2 4:1 3:0 2:0

15:8 14:7 13:6 12:5 11:4 10:3 9:2 8:1 7:0 6:0 5:0 4:0

15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0

17: Adders CMOS VLSI Design 4th Ed. 138


Ladner-Fischer
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

15:14 13:12 11:10 9:8 7:6 5:4 3:2 1:0

15:12 11:8 7:4 3:0

15:8 13:8 7:0 5:0

15:8 13:0 11:0 9:0

15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0

17: Adders CMOS VLSI Design 4th Ed. 139


Taxonomy Revisited
(f) Ladner-Fischer
(b) Sklansky 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
15:14 13:12 11:10 9:8 7:6 5:4 3:2 1:0

15:14 13:12 11:10 9:8 7:6 5:4 3:2 1:0


15:12 11:8 7:4 3:0
15:12 14:12 11:8 10:8 7:4 6:4 3:0 2:0 l (Logic Levels)
15:8 14:8 13:8 12:8 Brent- 15:8 13:8 7:0 5:0

Kung
Ladner- 15:8 13:0 11:0 9:0

15:0 14:0 13:0 12:0 11:010:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0 Fischer 3 (7)
Ladner-
Fischer 15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0
f (Fanout)
Sklansky 2 (6) (a) Brent-Kung
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
3 (9)
1 (5)
2 (5) 15:14 13:12 11:10 9:8 7:6 5:4 3:2 1:0

(e) Knowles [2,1,1,1] 1 (3)


0 (2) 0 (4) Han-
15:12 11:8 7:4 3:0

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 (1) Carlson 15:8 7:0

New
15:14 14:13 13:12 12:11 11:10 10:9 9:8 8:7 7:6 6:5 5:4 4:3 3:2 2:1 1:0
Knowles
(1,1,1)
15:12 14:11 13:10 12:9 11:8 10:7 9:6 8:5 7:4 6:3 5:2 4:1 3:0 2:0 [4,2,1,1] 11:0

13:0 9:0 5:0


15:8 14:7 13:6 12:5 11:4 10:3 9:2 8:1 7:0 6:0 5:0 4:0
1 (2)
15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0
15:014:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0 Han-
Carlson
Knowles
[2,1,1,1]
2 (4) (d) Han-Carlson
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
(c) Kogge-Stone
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
15:14 13:12 11:10 9:8 7:6 5:4 3:2 1:0

Kogge-
3 (8)
15:14 14:13 13:12 12:11 11:10 10:9 9:8 8:7 7:6 6:5 5:4 4:3 3:2 2:1 1:0 Stone 15:12 13:10 11:8 9:6 7:4 5:2 3:0

15:12 14:11 13:10 12:9 11:8 10:7 9:6 8:5 7:4 6:3 5:2 4:1 3:0 2:0
15:8 13:6 11:4 9:2 7:0 5:0

15:8 14:7 13:6 12:5 11:4 10:3 9:2 8:1 7:0 6:0 5:0 4:0
t (Wire Tracks)

15:0 14:0 13:0 12:0 11:010:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0 15:0 14:013:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0

17: Adders CMOS VLSI Design 4th Ed. 140


More Adder Issues
 Multilevel adders
– Multilevel versions of adders possible
• CSKA, CSLA, CIA
 Hybrid adders
– Arbitrary combination of speed-up techniques possible.
– Often used combinations: CLA – CSLA
 Transistor level adders
– Influence of logic styles (dynamic logic, pass transistor logic)
– Efficient transistor level implementation of ripple-carry chains
(Manchester chain)
– Combinations of speed-up techniques make sense.
• Much higher design effort
– Many efficient implementations exist in the literature.
 Higher valency (radix) also possible.

17: Adders CMOS VLSI Design 4th Ed. 141


More Adder Issues
 Higher valency is a poor choice in static CMOS logic
since each stage has higher delay.
 However, if the stages are built using domino logic, it
could prove to be an advantage.
 Nodes with large fanouts or long wires could use
buffers.
 The prefix trees can also be internally pipelined.

17: Adders CMOS VLSI Design 4th Ed. 142


Transistor Level

17: Adders CMOS VLSI Design 4th Ed. 143


Transistor Level

17: Adders CMOS VLSI Design 4th Ed. 144


Transistor Level

17: Adders CMOS VLSI Design 4th Ed. 145


Higher Valency Adders

17: Adders CMOS VLSI Design 4th Ed. 146


Sparse Trees
 Building a prefix tree to compute carries in every bit
is expensive in terms of power.
 An alternative is to compute carries into short groups
such as s = 2,3,8, or 16 bits.
 Meanwhile, pairs of s-bit adders precompute the
sums assuming both carries-in of 0 and 1 to each
group.
 It is a hybrid between a prefix adder and carry select
adder.

17: Adders CMOS VLSI Design 4th Ed. 147


Valency-3 BK Adder
 Sparse tree adder with s = 3

17: Adders CMOS VLSI Design 4th Ed. 148


Carry-Select Implementation

17: Adders CMOS VLSI Design 4th Ed. 149


Sparse Tree Adders
 Intel Valency-2 Sklansky sparse tree adder with s=4

17: Adders CMOS VLSI Design 4th Ed. 150


Sparse Tree Adders
 Valency-3 Kogge-Stone sparse tree adder with s=3

17: Adders CMOS VLSI Design 4th Ed. 151


Ling Adders
 Ling discovered a technique to remove one series
transistor from the critical group generate path at the
expense of another XOR gate in the sum
precomputation.
 Define a pseudo-generate Hi:j = Gi + Gi-1:j This is a
simpler computation.
K i H i: j = K iGi + K iGi−1: j = Gi + K iGi−1: j = Gi: j
 Define a pseudo-propagate signal I that is a shifted
version of propagate. I i: j = K i −1: j −1

H i: j = H i: k + I i: k H k −1: j
I i: j = I i: k I k −1: j

17: Adders CMOS VLSI Design 4th Ed. 152


Ling Adders
 Finally, the sums are computed by
Si = Pi ⊕ (K i−1H i−1:0 )
Si = H i−1:0 [ Pi ⊕K i−1] + H i−1:0 [ Pi ]

17: Adders CMOS VLSI Design 4th Ed. 153


Ling Adders

17: Adders CMOS VLSI Design 4th Ed. 154


Comparison
 Standard-cell implementation, 0.8mm technology

17: Adders CMOS VLSI Design 4th Ed. 155


Comparison

17: Adders CMOS VLSI Design 4th Ed. 156


Summary
Adder architectures offer area / power / delay tradeoffs.
Choose the best one for your application.
Architecture Classification Logic Max Tracks Cells
Levels Fanout
Carry-Ripple N-1 1 1 N
Carry-Skip n=4 N/4 + 5 2 1 1.25N
Carry-Inc. n=4 N/4 + 2 4 1 2N
Brent-Kung (L-1, 0, 0) 2log2N – 1 2 1 2N
Sklansky (0, L-1, 0) log2N N/2 + 1 1 0.5 Nlog2N
Kogge-Stone (0, 0, L-1) log2N 2 N/2 Nlog2N

17: Adders CMOS VLSI Design 4th Ed. 157


E vs Delay Trade-off

17: Adders CMOS VLSI Design 4th Ed. 158


E vs Delay Tradeoff

90nm 64 bit domino KS Ling adder with various valency and s


17: Adders CMOS VLSI Design 4th Ed. 159
Area vs Delay

Synthesized Adders

17: Adders CMOS VLSI Design 4th Ed. 160

You might also like