0% found this document useful (0 votes)
81 views6 pages

EE224 Handout Fast Adders: 1 The Problem

This document describes fast adder circuits. It begins by explaining a simple ripple carry adder that has O(n) gate complexity and delay. It then describes how carry lookahead adders can reduce the delay to O(log n) by calculating propagate and generate signals to independently determine carry bits in parallel blocks. A practical carry lookahead adder is then presented that uses block sizes of √n to achieve O(log n) delay with O(n) gates. Homework is assigned to complete the design of a 16-bit adder with 4-bit blocks.

Uploaded by

Srishti Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views6 pages

EE224 Handout Fast Adders: 1 The Problem

This document describes fast adder circuits. It begins by explaining a simple ripple carry adder that has O(n) gate complexity and delay. It then describes how carry lookahead adders can reduce the delay to O(log n) by calculating propagate and generate signals to independently determine carry bits in parallel blocks. A practical carry lookahead adder is then presented that uses block sizes of √n to achieve O(log n) delay with O(n) gates. Homework is assigned to complete the design of a 16-bit adder with 4-bit blocks.

Uploaded by

Srishti Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

EE224 Handout

Fast Adders
Madhav P. Desai
February 7, 2018

1 The problem
We are given two binary numbers
an−1 an−2 . . . a0
bn−1 bn−2 . . . b0
and we wish to compute their sum (modulo 2n )
sn−1 sn−2 . . . s0

2 A simple, but slow implementation


It is easy to see that the addition algorithm can be translated to the following
Boolean formulas.
s0 = (a0 ⊕ b0 )
c1 = (a0 .b0 )
s1 = (a1 ⊕ b1 ⊕ c1 )
c2 = ((a1 ⊕ b1 ).c1 + a1 .b1 )
s2 = (a2 ⊕ b2 ⊕ c2 )
c3 = ((a2 ⊕ b2 ).c2 + a2 .b2 )
...
sn−1 = (an−1 ⊕ bn−1 ⊕ cn−1 )

1
FA (full-adder)

cin

sum
a
b p
r
cout
g

Figure 1: Full Adder

In these formulas, ci can be thought of as the carry generated out of the ith
bit position.
We design a full-adder (FA) which has inputs a, b, cin and outputs sum, cout
and implements the formulas
sum = a ⊕ b ⊕ cin
cout = (a ⊕ b).cin + a.b
which can be implemented by the circuit shown in Figure 1. Note the follow-
ing: The full-adder can be implemented with five logic gates, and if we assume
that the delay of each gate is 1 unit, then the maximum delay through the
full-adder is 3 units (the path a, p, r, cout ) and the minimum delay through
the full-adder is 1 unit (the path cin , sum).
Using the full-adder, we can construct a simple ripple-carry adder as
shown in Figure 2. Note the following: An n−bit adder constructed in this
fashion needs 5n logic gates, and the maximum delay through the adder is
determined by the path a0 , c1 , c2 , . . ., sn−1 , which is n + 2 units. The cost of
the adder measured in number of gates needed is O(n) and the delay of the
adder is also O(n).

3 Faster addition
We would like to reduce the time required by an n−bit adder to O(log n) if
possible (Why log n? Why not smaller?). The chief difficulty is the propa-

2
a n-1 bn-1 a2 b2 a1 b1 a0 b0

a b
FA
FA FA FA cout cin 0
cn-1 c2 c1
sum

sn-1 s2 s1 s0

Figure 2: Ripple-carry Adder

gation of the carry from the lowest bit position to the highest bit position.
Lets see if we can do this faster.
Introduce the following intermediate signals

p i = ai ⊕ b i
gi = ai .bi

It is easy to convince ourselves that

ci+1 = pi .ci + gi

which is why the pi is called the propagate signal at position i and gi is called
the generate signal at position i.
In terms of pi , gi we can then write

ci+1 = pi .ci + gi
= pi .(pi−1 .ci−1 + gi−1 ) + gi
= (pi .pi−1 ).ci−1 + (pi .gi−1 + gi )

Continuing to substitute for ci−1 until we reach c0 , we find that

ci+1 = (pi .pi−1 .pi−2 ...p0 ).c0 +


(gi + pi .gi−1 + pi .pi−1 .gi−2 + . . . pi .pi−1 ..p1 .g0 )

which can be rewritten as

ci+1 = bpi .c0 + bgi (1)

3
where
bpi = (pi .pi−1 .pi−2 ...p0 )
bgi = (gi + pi .gi−1 + pi .pi−1 .gi−2 + . . . pi .pi−1 ..p1 .g0 )
Thus, it follows that if we can calculate bpi and bgi quickly, then we can
do fast addition. Using two-input gates (each with delay 1, it is easy to see
that:
• bpi can be calculated in O(log i) units of time, using O(i) gates.
• bgi can be calculated in O(log i) time, but using O(i2 ) gates.
Thus, we can construct a fast adder by calculating the bpi and bgi signals
in O(log i) time. The computation of the carries then requires one additional
unit of delay and the final sum requires an additional unit of delay. Thus,
addition is performed in O(log n) time. But there is a catch: we will need
O(n3 ) gates which is not affordable.

4 A practical fast adder


A practical carry-lookahead adder (CLA) is constructed using the observa-
tions made in the previous section, but using far fewer gates (See Figure
3).
The different blocks in the CLA can be described as follows:
• The PG-block calculates pi , gi from ai , bi for i = 0, 1, 2, . . . , n − 1. This
requires an XOR gate and an AND gate and takes 1 unit of delay.
• The BP,BG block combines the pi and gi in blocks of size k. The signals
bpk , bgk are calculated as
bpi = (pk−1 .pk−2 .pk−3 ...p0 )
bgi = (gk−1 + pk−1 .gk−2 + pk−1 .pk−2 .gk−3 + . . . pk−1 .pk−2 ..p1 .g0 )
Using log-depth circuits, this can be done in O(log k) time, with O(k 2 )
gates.
• The Block-carry stage calculates the carries at bits k, 2k etc. using
the bpk, bp2k, . . . and bgk, bg2k, . . . signals calculated in the previous
stage. This can be done using Equation 1, in O(log(n/k)) time, using
O((n/k)2 ) gates.

4
a,b in groups
of k bits
P,G block Calculate pi,gi (constant delay) O(n) gates

p,g

BP, BG block Calculate bp,bg at Each has O(k^2) gates


bp2k bpk most-significant bit
O(log k) delay
bg2k bgk p,g

O((n/k)^2) Block Calculate carry-in at each group O(llog(n/k)) delay. 0


Carry Stage
gates c0 = 0
p,g

Each has Calculate carries


O(k) gates Carry-chain
incoming carry O(k) delay.
ck
to block
p,c values

O(k) gates. Final XOR (constant delay)

Figure 3: Carry-lookahead Adder

• The carry-chains calculate the carries within each group of k bits using
the block carry in calculated in the previous stage. This can be done
in a rippled manner and takes O(k) time, with O(k) gates.

• Finally, in the XOR stage, the pi is XOR-ed with the carry computed
in the carry chain, in order to finish the task. This takes n gates and
is done in 1 unit of time.

Putting it all together, if k = n, we see that we can compute an addition
in close to O(log n) time (for small values of n), with O(n) gates!

5 Homework
Assume that you have at your disposal two-input XOR, AND, OR gates and
NOT gates. Let n = 16 and k = 4. Complete the design of each block in
the CLA of Figure 3. How many gates does the final circuit require? If the
delay of each gate is 1 unit, what is the maximum delay through the CLA?

5
6 Further Reading
There are many other ways of building fast adders: carry-select addition,
carry-skip addition, Ling-adders, and hybrid strategies. Those who are inter-
ested can purchase the book [1] which is an excellent resource for arithmetic
circuits.

References
[1] I. Koren, Computer Arithmetic Algorithms, second edition, Universities
Press, Hyderabad, 2002.

You might also like