0% found this document useful (0 votes)
7 views

DICD-Fall-2024-Lecture-09-Arithmetic-Circuits

This lecture covers the design of arithmetic circuits, focusing on basic and fast adders and multipliers. It discusses various types of adders, including full adders, ripple carry adders, and carry-skip adders, along with their implementations and performance comparisons. The lecture also touches on subtraction using two's complement and introduces advanced adder designs like carry lookahead and tree adders.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

DICD-Fall-2024-Lecture-09-Arithmetic-Circuits

This lecture covers the design of arithmetic circuits, focusing on basic and fast adders and multipliers. It discusses various types of adders, including full adders, ripple carry adders, and carry-skip adders, along with their implementations and performance comparisons. The lecture also touches on subtraction using two's complement and introduces advanced adder designs like carry lookahead and tree adders.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 52

EE-808 Fall 2024

Digital Integrated Circuit Design

Lecture # 09
Arithmetic Circuits

Muhammad Imran
[email protected]
Acknowledgement
2

 Content from following resources has been used in these lectures


 Digital Integrated Circuits, Adam Teman, BIU
 Jan. M. Rabaey, Digital Integrated Circuits, 2nd Ed.
Contents
3

 Introduction
 Basic Adders
 Fast Adders
 Basic Multipliers
 Fast Multipliers
Data Path and Functional Units
Functional Units
5

 A complex processor may have multiple functional units working in


parallel:

Source: Kuchuk, 2003


Basic Adders
Serial Addition
7

 At time i:
 Read ai and bi
 Produce si and ci+1
 Internal state stores ci
 Carry bit c0 is set as cin

Source: Gate Overflow


Basic Addition Unit – Full Adder
8

X Y Cin S Cout
0 0 0 0 0
S  x  y  Cin Kill  x 
0 0 1 1 0
 S  P  Cin y
Generate x
0 1 0 1 0 Cout y
 xy  xCin  yCin
0 1 1 0 1
 Cout  G  P  Cin
Propagate  x 
1 0 0 1 0 y x
1 0 1 0 1
1 1 0 0 1
y
1 1 1 1 1

Cout=MAJ(X,Y,Cin)
Full-Adder Implementation
9

 A full-adder is therefore a majority gate and a 3-input XOR:

Total: 32 Transistors

Source: CMOS VLSI Design


Ripple Carry Adder
10

• So, it is clear, the Cout output of the Full


Adder is on the critical path.
• Can we exploit this to improve the
design?

S  A  B  Cin 

 ABCin   A  B  Cin Cout

tadder = (N-1)tcarry + tsum tpd = O(N)

Source: CMOS VLSI Design


Full Adder Implementation
11

VDD

VDD Ci A B
A B
A
B
Ci B
VDD
A
X Ci

Ci A S
Ci
A B B
VDD
A

Co B

Cout  AB  ACi 
BCi
S  ABCin   A  B  Cin

Cout 28 Transistors
Full Adder Implementation
12

P
K
G!
P!

Cout  AB  ACi  GA S  P  Cin


BCi B PA Cout  G  P  Cin
S  ABCin   A  B  Cin
B
24 Transistors
Exploiting Inversion Property
13

A B A B

Ci FA Co Ci FA Co

S S

Even cell Odd


cell
A0 B0 A1 B1 A2
B2 A3 B3

Ci,0 Co,0 Co,1 Co,2 Co,3


FA FA FA FA

S0 S1 S2 S3
Subtraction
14

 To subtract two’s complement, just remember that:


  x  x 1  A  B  A  B  1

 So, to subtract:
 Invert one of the operands
 Add a carry in to the first bit

 Therefore, to provide an adder/subtractor:


 Add an XOR gate to the B-input
 Use the sub/add selector to the XOR and carry in
Fast Adders
Ripple-Carry using P and G
16

A4 B4 A3 B3 A2 B2 A1 B1
Ci:0  Gi  Pi  Cin

 Set up
CP0:0
i1:0 G0:0  Cin G4 P4 G3 P3 G2 P2 G1 P1 G0 P0
0
Cout ,i  Gi:0
Carry chain

G3:0 G2:0 G1:0 G0:0

C3 C2 C1 C0

C4

S4 S3 S2 S1
Cout

tadder = tsetup + (N-1) tcarry + max(tcarry, tsum)


Carry-Skip (Carry Bypass) Adder
17

M Sections of (N/M) Bits Each

 NM  
Bit 0– Bit 4– Bit 8– Bit 12–
3 7 11 15
Setup tsetup Setup Setup Setup
tbypass

Carry Carry Carry Carry


propagatio propagatio propagatio propagatio
n n n n

Sum Sum Sum tsum


Sum
Carry-Skip (Carry Bypass) Adder
18

 Example

A 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
B 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
G/P 2
G P P P P P P P P P P P P P P P
Ci 3 4 5 6 7 8 9 10 11 12
Sum
Make Fast Need not be Fast
Carry-Skip (Carry Bypass) Adder
19
Carry-Select Adder
20

S0–3 S4–7

Let’s guess the answer


for each value of the N-bit input with M CSA
carry. blocks

tselect  p/g  N
M
carry
 M  mux
t sum

 N M

t t
t
Linear Carry Select
21

Bit 0- Bit 4-7 Bit 8-11 Bit 12-15


3
Setup Setup Setup Setup

(1)

"0" Carry "0" "0" Carry "0" Carry


"0" "0" Carry "0" "0"
(1)
time "1" Carry "1" "1" "1" Carry
"1" Carry Carry
(5) "1" (5) "1" (5) "1" (5)
(5) (6) (7) (8)
Multiplexer Multiplexer Multiplexer Multiplexer
Ci,0 (9)

Sum Generation Sum Generation Sum Generation Sum Generation

S0-3 S 4 -7 S8-11 S 12-15 (10)


A 0 0 0 0 0 0 0 0 0 0 0 0 0
B 1 1 0 0 1 1 1 1
G/P 2 G P P
1 1 1 1 1 1 P P P P
0 1 3 1 4 1 5 11 0 0 0 0 9
6 1 7 1 8
1 1 1 1
1 1 1 1
P P P P P P
Ci 1
1 P P
Sum 0 0 0 0 0 0 0 0 0 0 1
Square Root Carry Select
22

If an N-bit adder contains P stages and first stage adds


M bits,
𝑁 = 𝑀 += 𝑀𝑃𝑀 1 + 1
+ + 2𝑀
for+M2<< N + ⋯
𝑃+≈𝑀 +
1
2
𝑃(𝑃 − 1)
2 ≈
𝑃 − 1 2𝑁
𝑃  2N tmux
tsqrt  tp/g  Mtcarry t sum   2N
Adder Delay Comparison
23
Carry Lookahead Adder – Basic Idea
24

 Problem – Cout,k takes approximately k gate delays to ripple.


 Question – can we calculate the carry without any ripple?

 f ( Ak , Bk , Cout,k 1 ) 
Cout,k P k  C out,k Gi  Ai 
Gk G
 k
 Pk  (Gk 1  Pk 1 
1
C out,k 2 Bi
 Gk  Pk  (G Pi  Ai 
C k 1  P
k 1  (  P
1 (G0  PC
0 in ,0 )))
Cout,k )
A0 , B0 A1 , B1 N-1 N-1
Bi
out,k
Carry / Propagation
Computation Logic

Ci,0 P0 Ci,1 P1 Ci, N-1


PN-1

S0 S1 ••
S
G  A  Pi  Ai 
Tree Adders (Logarithmic CLA) B SBi P 
CinG  P 
Cout 
25

 Can we reduce the complexity of calculating Pi, Gi ?


Cin
 P1:0  P1  P0 G1:0  G1  P1  G0

 C out ,1  G1:0  P1:0 Cin ,0


 P3:2  P3  P2 G3:2  G3  P3  G2

 Cout ,3  G3:2  P3:2 Cin,2

P 3:0  P3:2  P1:0 G3:0  G3:2  P3:2  G1:0


 C out ,3  G3:0  P3:0 Cin ,0

t tree  t p/g   log 2 N  t AND/OR  tsum

 O log 2 N
Tree Adders (Logarithmic CLA)
26

 Many ways to construct these CLA or tree adders, based on:


 Radix: How many bits combined in each gate
 Tree Depth: How many stages of logic to the final carry (>=logradixN)
 Fanout: Maximal logic branching in tree
PG Diagram Notation
27

Black
P1 G1 P2 G2 Pj:i Gj:i Pk:j+1 Gk:j+1
2 1 cell
k:j+1
= =
j:i
2: k:
1 i
P2:1 G2:1 Pk:i Gk:i

Gray
cell Buffe
Gj:i Pk:j+1
k:j+1 r
Gk:j+1
= k:i Gk:i Gk:i
j:i
=
k: Pk:i Pk:i
i k:
i
Gk:i
Kogge-Stone Adder
28

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
0

15:1 14:1 13:12 11:1 10:9 9:8 8:7 7:6 6:5 5:4 4:3 3:2 2:1
2’ 4 3 12:11 0 1:0

s 15:1 14:1 13:1 12: 11: 10: 9:6 8:5 7:4 6:3 5:2 4:1 3:0
4’s 2 1 0 9 8 7 2:0 log2(n

15: 14: 13: 12: 11: 10:3 9:2 8:1 7:0 6:0 5:0
8 7 6 5 4 4:0
8’
s

15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0
P
1:0 0:0
(15:8) + (7:0) 9
C 8:0 (8:1) + (0:0)
(15:12) + (11:8) + (7:4) + (3:0) (8:5) + (4:1) + (0:0)
(15:14) + (13:12) + (11:10) + (9:8) + (7:6) + (5:4) + (3:2) + (1:0) (8:7) + (6:5) + (4:3) + (2:1) +
S9 (0:0)
 Log2(n) stages
 Fastest adder with high power
Manchester Carry-Chain Adder
29

VDD

P0 P1 P2 P3

Ci,0 G0 G1 G2 G3

Static Circuits
V DD
Pi
 C0 C1 C2 C3

Ci Co

t P  0.69 Ci    Rj

N i
Gi

 i1 j

 1N (N
 RC
1)2
0.69
Dynamic Circuit where R  C 
Basic Multipliers
How multiplication is done?
31

1 2 3 4
X 1 2
Multiplication using Addition
32

Multiplicand
1 0 1 0 1 0
Multiplie
X 1 0 1 1 r
1 0 1 0 1 0
1 0 1 0 1 0 Partial
Products
0 0 0 0 0
+ 1 0 1 0 1 0
Result
1 1 1 0 0 1 1 1 0
Binary Multiplication
33

multiplica

nd

multiplier
partial

N can be formed in
produ parallel
ct
array
double precision
product
2
N
Multiplication using Shift and Add
34

 Concept:
 Multiplying by ‘1’ is copying the
multiplicand
 Multiplying by ‘0’ is a row of zeros
 Select multiplicand or zeros according to
multiplier bit
 Add to result
 Shift multiplier and accumulated result
Array Multiplier
35

 Calculate the final product in a single combinatorial calculation


(=potentially one cycle)
Array Multiplier Implementation
36

 Stack 2-input Adders


Array Multiplier Implementation
37

 Stack 2-input Adders

X3 X2 X1 X0 Y0

X3 X2 X1 X0 Y1 Z0

HA FA FA HA

X3 X2 X1 X0 Y2 Z1

FA FA FA HA

X3 X2 X1 X0 Y3 Z2

FA FA FA HA

Z7 Z6 Z5 Z4 Z3
Many Critical Paths
38

tmult  tAND  M 1   N  2  tcarry   N 1tsum

O NM
Can we do better?
39

Source: CMOS VLSI Design


Carry-Save Multiplier
40

(
t mult  t AND t merge  1) tcarry

 O  N  log 2 N


Multiplier Floorplan
41

X3 X2 X1 X0

Y0
C S C S C S C S
Y1 Z0

Y2
C S C S C S C S
Z1

Y3 C S C S C S C S
Z2
Half
Adder C C C C
Full S S S S
Adder
Vector Z7 Z6 Z5 Z4 Z3
Merging
Cell
X and Y signals are broadcast
through the complete array
Fast Multipliers
Booth Recoding
43

 Multiplying by ‘0’ is redundant


 Can we reduce the number of partial products? n1


 Based on the observation that
 We can turn sequences of 1’s into sequences of 0’s
2 i
 2 n
1
 For example: 0111=1000-0001 i0
 So, we can introduce a ‘-1’ bit and recode the multiplier:
 For example, the number 56
Radix-2 Booth Recoding
44

 Parse multiplier from left to right


 For each change from 0 to 1, encode a ‘1’
 For each change from 1 to 0, encode a ‘-1’
 For bit 0, assume bit i=-1 is a 0
 Example: 0011 0111 0011 = 0x373

0 1 0 1 1 0 0 1 0 1 0 1

0 1 0 0 1 0 0 0 0 1 0 0 0x 484
0 0 0 1 0 0 0 1 0 0 0 1 0x 111
0x 373
Modified (Radix-4) Booth Recoding
45

 Radix-2 Booth Recoding doesn’t work for parallel hardware implementations:


 A worst case (010101010101010) doesn’t reduce the number of partial products.
 Variable length recoders (according to the length of ‘1’ strings) cannot be
implemented efficiently.
 Instead, just assume a constant length recoder.
 First apply standard booth recoding. Partial Product Selection Table

 Next encode each pair of bits: Multiplier Bits Recorded Bits


000 0
001 + Multiplicand
010 + Multiplicand
011 +2 ×
Multiplicand
100 -2 ×
multiplicand
101 - Multiplicand
110 - Multiplicand
 This can be summarized in a truth table: 111 0
Modified (Radix-4) Booth Recoding
46

 For example, let’s take our previous example:


 0011 0111 0011 = 01 0-1 10 0-1 01 0-1
 This comes out: 1 -1 2 -1 1 -1

 We could have done this by using the table:


 001101110011

Source: CMOS VLSI Design


Tree Multipliers
47

PP3 PP4 PP6 PP7


PP0 PP1
 Can we further reduce the PP2
PP5 PP8

multiplier delay by +
+ +
employing logarithmic (tree)
structures?

+ +

CLA

Resul
Wallace-Tree Multiplier
48

y 0 y1
y2

y0 y1 y2 y3 y 4 y5
FA Ci-1

y3 FA FA
Ci Ci Ci-1
Ci-1
FA Ci
Ci-1

y4
FA
Ci Ci
Ci-1 Ci-1
FA

y5

Ci FA
FA

C S
C S
Wallace-Tree Multiplier
49
Wallace=Tree Multiplier
50

Partial products First


stage
6 5 4 3 2 1 0 6 5 4 3 2 1 Bit
0 position

(a (b
) )
Second stage Final adder
6 5 4 3 2 1 6 5 4 3 2 1
0 0

FA HA
(c (d
) )

HA
Pipelining Multipliers
51

 Pipelining can be applied to most multiplier structures:


Relevant Reading
52

 Jan. M. Rabaey, Digital Integrated Circuits, 2nd Ed.


 Chapter 11

You might also like