0% found this document useful (0 votes)
49 views

Lect 05 PDF

The document discusses custom single-purpose processors and their design. It begins with an introduction to processors in general, including both general-purpose and single-purpose varieties. It then provides examples of how a custom single-purpose processor may be implemented using CMOS transistors and basic logic gates like inverters, AND, OR and NAND gates. The document proceeds to explain combinational logic design, showing the process from a problem description, to a truth table, to minimized output equations and final logic gate implementation.

Uploaded by

Vijendra Pandey
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views

Lect 05 PDF

The document discusses custom single-purpose processors and their design. It begins with an introduction to processors in general, including both general-purpose and single-purpose varieties. It then provides examples of how a custom single-purpose processor may be implemented using CMOS transistors and basic logic gates like inverters, AND, OR and NAND gates. The document proceeds to explain combinational logic design, showing the process from a problem description, to a truth table, to minimized output equations and final logic gate implementation.

Uploaded by

Vijendra Pandey
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

LECTURE 5: CUSTOM SINGLE-

PURPOSE PROCESSORS

Dr. ASSAF M. H.

EE326 - EMBEDDED SYSTEMS

USP AUGUST 2018


TOPICS

• Introduction

• Combinational logic

• Sequential logic

• Custom single-purpose processor design

• RT-level custom single-purpose processor design

Embedded Systems Design: A Unified 2


Hardware/Software Introduction, (c) 2002 Vahid/Givargis
Introduction
Processor
 Digital circuit that performs a
computation tasks
 Controller and datapath CCD
Digital camera chip

 General-purpose: variety of CCD Pixel coprocessor D2A


computation tasks A2D preprocessor

 Single-purpose: one particular lens

computation task JPEG codec Microcontroller Multiplier/Accum


 Custom single-purpose: non-
standard task DMA controller Display

A custom single-purpose
ctrl

processor may be Memory controller ISA bus interface UART LCD ctrl
 Fast, small, low power
 But, high NRE, longer time-to-
market, less flexible

Embedded Systems Design: A Unified 3


Hardware/Software Introduction, (c) 2002 Vahid/Givargis
CMOS transistor on silicon
Transistor
 The basic electrical component in digital systems
 Acts as an on/off switch
 Voltage at “gate” controls whether current flows from
source to drain
 Don’t confuse this “gate” with a logic gate
source
gate Conducts
if gate=1
1 drain

gate
IC package IC oxide
source channel drain
Silicon substrate

Embedded Systems Design: A Unified 4


Hardware/Software Introduction, (c) 2002 Vahid/Givargis
CMOS transistor implementations
Complementary Metal source source

Oxide Semiconductor gate Conducts


if gate=1
gate Conducts
if gate=0

We refer to logic levels drain drain

nMOS pMOS
 Typically 0 is 0V, 1 is 5V
Two basic CMOS types
 nMOS conducts if gate=1 1 1 1

pMOS conducts if gate=0


x y x
 x F = x' y
F = (xy)'
 Hence “complementary” x F = (x+y)'
y
Basic gates
0 x y

0 0
 Inverter, NAND, NOR inverter NAND gate NOR gate

Embedded Systems Design: A Unified 5


Hardware/Software Introduction, (c) 2002 Vahid/Givargis
Basic logic gates

x F x F x x y F x x y F x x y F
F y F F
0 0 y 0 0 0 0 0 0 y 0 0 0
1 1 0 1 0 0 1 1 0 1 1
1 0 0 1 0 1 1 0 1
F=x F=xy F=x+y F=xy
1 1 1 1 1 1 1 1 0
Driver AND OR XOR

x F x F x x y F x x y F x x y F
F F F
0 1 y 0 0 1 y 0 0 1 y 0 0 1
1 0 0 1 1 0 1 0 0 1 0
F = x’ F = (x y)’ 1 0 1 F = (x+y)’ 1 0 0 F=x y 1 0 0
Inverter NAND 1 1 0 NOR 1 1 0 XNOR 1 1 1

Embedded Systems Design: A Unified 6


Hardware/Software Introduction, (c) 2002 Vahid/Givargis
Combinational logic design
A) Problem description B) Truth table C) Output equations

y is 1 if a is to 1, or b and c are 1. z is 1 if Inputs Outputs y = a'bc + ab'c' + ab'c + abc' + abc


b or c is to 1, but not both, or if all are 1. a b c y z
0 0 0 0 0
0 0 1 0 1 z = a'b'c + a'bc' + ab'c + abc' + abc
0 1 0 0 1
0 1 1 1 0
1 0 0 1 0
1 0 1 1 1
D) Minimized output equations 1 1 0 1 1
y bc 1 1 1 1 1 E) Logic Gates
a 00 01 11 10
0 0 0 1 0
a y
1 1 1 1 1 b
c
y = a + bc
z
bc
a 00 01 11 10
0 0 1 0 1
z
1 0 1 1 1

z = ab + b’c + bc’

Embedded Systems Design: A Unified 7


Hardware/Software Introduction, (c) 2002 Vahid/Givargis
Combinational components

I(log n -1) I0 A A B
B A B
I(m-1) I1 I0 n
… n n n n
n …
log n x n n-bit n bit,
S0 n-bit, m x 1 n-bit
Decoder Adder m function S0
… Multiplexor Comparator
ALU …
… n
S(log m) S(log m)
n n
O(n-1) O1 O0 carry sum less equal greater
O O

O= O0 =1 if I=0..00 sum = A+B less = 1 if A<B O = A op B


I0 if S=0..00 O1 =1 if I=0..01 (first n bits) equal =1 if A=B op determined
I1 if S=0..01 … carry = (n+1)’th greater=1 if A>B by S.
… O(n-1) =1 if I=1..11 bit of A+B
I(m-1) if S=1..11

With enable input e  With carry-in input Ci May have status outputs
all O’s are 0 if e=0 carry, zero, etc.
sum = A + B + Ci

Embedded Systems Design: A Unified 8


Hardware/Software Introduction, (c) 2002 Vahid/Givargis
Sequential components

I
n
load shift n-bit
n-bit n-bit
Register Shift register Counter
clear I Q
n n

Q Q

Q = 0 if clear=1, Q = lsb Q = 0 if clear=1,


I if load=1 and clock=1, - Content shifted Q(prev)+1 if count=1 and clock=1.
Q(previous) otherwise. - I stored in msb

Embedded Systems Design: A Unified 9


Hardware/Software Introduction, (c) 2002 Vahid/Givargis
Sequential logic design
A) Problem Description C) Implementation Model D) State Table (Moore-type)
You want to construct a clock
divider. Slow down your pre- x
a Combinational logic Inputs Outputs
existing clock so that you output a I1 Q1 Q0 a I1 I0 x
1 for every four clock cycles 0 0 0 0 0
I0 0
0 0 1 0 1
0 1 0 0 1 0
Q1 Q0 0 1 1 1 0
1 0 0 1 0 0
B) State Diagram State register 1 0 1 1 1
1 1 0 1 1
x=0 x=1 a=0 1
a=0 1 1 1 0 0
I1 I0
0 a=1 3

a=1 a=1

a=0
1
x=0
a=1
x=0
2
a=0 Given this implementation model
 Sequential logic design quickly reduces
to combinational logic design

Embedded Systems Design: A Unified 10


Hardware/Software Introduction, (c) 2002 Vahid/Givargis
Sequential logic design (cont.)
E) Minimized Output Equations F) Combinational Logic
I1 Q1Q0
a 00 01 11 10
a
0 0 0 1 1
I1 = Q1’Q0a + Q1a’ + x
1 Q1Q0’
0 1 0 1

I0 Q1Q0 I1
00 01 11 10
a
0 0 1 1 0 I0 = Q0a’ + Q0’a

1 1 0 0 1

x Q1Q0 I0
a
00 01 11 10
0 0 0 1 0 x = Q1Q0
Q1 Q0
1 0 0 1 0

Embedded Systems Design: A Unified 11


Hardware/Software Introduction, (c) 2002 Vahid/Givargis
Custom single-purpose processor
basic model
… …

external external
control data controller datapath
inputs inputs
… …
datapath next-state registers
control and
controller inputs datapath control
logic

datapath
control state functional
outputs register units
… …
external external
control data
outputs outputs
… …

controller and datapath a view inside the controller and datapath

Embedded Systems Design: A Unified 12


Hardware/Software Introduction, (c) 2002 Vahid/Givargis
Example: greatest common
divisor
!1
(a) black-box 1:

First create algorithm


(c) state
view 1 !(!go_i) diagram
2:

Convert algorithm to go_i x_i y_i


2-J:
!go_i

“complex” state GCD


3: x = x_i

machine
d_o
4: y = y_i

 Known as FSMD: (b) desired functionality


5: !(x!=y)

finite-state machine 0: int x, y; x!=y

with datapath
1: while (1) { 6:
2: while (!go_i);
x<y !(x<y)
Can use templates to
3: x = x_i;
 4: y = y_i; 7: y = y -x 8: x = x - y

perform such 5: while (x != y) {


6: if (x < y)
6-J:

conversion 7: y = y - x;
5-J:
else
8: x = x - y; 9: d_o = x
}
9: d_o = x; 1-J:
}
Embedded Systems Design: A Unified 13
Hardware/Software Introduction, (c) 2002 Vahid/Givargis
State diagram templates
Assignment statement Loop statement Branch statement
a=b while (cond) { if (c1)
next statement loop-body- c1 stmts
statements else if c2
} c2 stmts
next statement else
other stmts
next statement

!cond
a=b C: C:
cond c1 !c1*c2 !c1*!c2

next loop-body-
c1 stmts c2 stmts others
statement statements

J: J:

next next
statement statement

Embedded Systems Design: A Unified 14


Hardware/Software Introduction, (c) 2002 Vahid/Givargis
Creating the datapath
Create a register for any 1:
!1

declared variable 2:
1 !(!go_i)

x_i y_i

Create a functional unit for


!go_i
Datapath
2-J:

each arithmetic operation 3: x = x_i


x_sel

y_sel
n-bit 2x1 n-bit 2x1

Connect the ports, 4: y = y_i


x_ld

y_ld
0: x 0: y

registers and functional 5: !(x!=y)

units x!=y
5: x!=y
!=
6: x<y
< subtractor
8: x-y
subtractor
7: y-x
6:
Based on reads and writes
x_neq_y
 x<y !(x<y) x_lt_y 9: d
 Use multiplexors for multiple 7: y = y -x 8: x = x - y d_ld

sources 6-J:
d_o

Create unique identifier 5-J:

 for each datapath 9: d_o = x

component control input and 1-J:

output
Embedded Systems Design: A Unified 15
Hardware/Software Introduction, (c) 2002 Vahid/Givargis
Creating the controller’s FSM
go_i
Same structure as FSMD
!1
1:
Controller !1
1 !(!go_i) 0000 1:
2:
!go_i
0001 2:
1 !(!go_i)
Replace complex
2-J:
0010 2-J:
!go_i
actions/conditions with
3: x = x_i
0011
x_sel = 0
3: x_ld = 1 datapath configurations
4: y = y_i
y_sel = 0 x_i y_i
0100 4: y_ld = 1
!(x!=y)
Datapath
5: !x_neq_y
0101 5: x_sel
x!=y n-bit 2x1 n-bit 2x1
x_neq_y y_sel
6: 0110 6:
x_ld
x<y !(x<y) x_lt_y !x_lt_y 0: x 0: y
y_ld
7: y = y -x 8: x = x - y 7: y_sel = 1 8: x_sel =1
y_ld = 1 x_ld = 1

6-J: 0111 1000


!= < subtractor subtractor
1001 6-J:
5: x!=y 6: x<y 8: x-y 7: y-x
5-J: x_neq_y
1010 5-J:
x_lt_y 9: d
9: d_o = x 1011 9: d_ld = 1
d_ld

1-J: 1100 1-J: d_o

Embedded Systems Design: A Unified 16


Hardware/Software Introduction, (c) 2002 Vahid/Givargis
Splitting into a controller and
datapath
go_i

Controller implementation model Controller !1


0000 1: x_i y_i
go_i
x_sel 1 !(!go_i) (b) Datapath
Combinational y_sel 0001 2:
logic !go_i x_sel
x_ld n-bit 2x1 n-bit 2x1
y_ld 0010 2-J: y_sel
x_neq_y x_sel = 0 x_ld
0011 3: x_ld = 1 0: x 0: y
x_lt_y y_ld
d_ld
y_sel = 0
0100 4: y_ld = 1
!= < subtractor subtractor
x_neq_y=0 5: x!=y 6: x<y 8: x-y 7: y-x
0101 5: x_neq_y
Q3 Q2 Q1 Q0 x_neq_y=1
0110 6: x_lt_y 9: d
State register d_ld
x_lt_y=1 x_lt_y=0
I3 I2 I1 I0
7: y_sel = 1 8: x_sel = 1 d_o
y_ld = 1 x_ld = 1
0111 1000
1001 6-J:

1010 5-J:

1011 9: d_ld = 1

1100 1-J:

Embedded Systems Design: A Unified 17


Hardware/Software Introduction, (c) 2002 Vahid/Givargis
Controller state table for the GCD
example
Inputs Outputs
Q3 Q2 Q1 Q0 x_neq x_lt_ go_i I3 I2 I1 I0 x_sel y_sel x_ld y_ld d_ld
_y y
0 0 0 0 * * * 0 0 0 1 X X 0 0 0
0 0 0 1 * * 0 0 0 1 0 X X 0 0 0
0 0 0 1 * * 1 0 0 1 1 X X 0 0 0
0 0 1 0 * * * 0 0 0 1 X X 0 0 0
0 0 1 1 * * * 0 1 0 0 0 X 1 0 0
0 1 0 0 * * * 0 1 0 1 X 0 0 1 0
0 1 0 1 0 * * 1 0 1 1 X X 0 0 0
0 1 0 1 1 * * 0 1 1 0 X X 0 0 0
0 1 1 0 * 0 * 1 0 0 0 X X 0 0 0
0 1 1 0 * 1 * 0 1 1 1 X X 0 0 0
0 1 1 1 * * * 1 0 0 1 X 1 0 1 0
1 0 0 0 * * * 1 0 0 1 1 X 1 0 0
1 0 0 1 * * * 1 0 1 0 X X 0 0 0
1 0 1 0 * * * 0 1 0 1 X X 0 0 0
1 0 1 1 * * * 1 1 0 0 X X 0 0 1
1 1 0 0 * * * 0 0 0 0 X X 0 0 0
1 1 0 1 * * * 0 0 0 0 X X 0 0 0
1 1 1 0 * * * 0 0 0 0 X X 0 0 0
1 1 1 1 * * * 0 0 0 0 X X 0 0 0

Embedded Systems Design: A Unified 18


Hardware/Software Introduction, (c) 2002 Vahid/Givargis
Completing the GCD custom
single-purpose processor design
We finished the … …

datapath controller datapath

We have a state table


registers
for the next state and
next-state
and

control logic
control
logic

 All that’s left is


combinational logic state functional
units
register
design
This is not an optimized
design, but we see the … …

basic steps a view inside the controller and datapath

Embedded Systems Design: A Unified 19


Hardware/Software Introduction, (c) 2002 Vahid/Givargis
RT-level custom single-purpose
processor design
We often start with a

Problem Specification
state machine Sende
r rdy_in
Bridge
A single-purpose processor that rdy_out
Rece
iver

Rather than algorithm


converts two 4-bit inputs, arriving one
clock
 at a time over data_in along with a
rdy_in pulse, into one 8-bit output on

Cycle timing often too


data_out along with a rdy_out pulse.
 data_in(4) data_out(8)

central to functionality
Example rdy_in=0
rdy_in=1
Bridge rdy_in=1

 Bus bridge that converts 4- WaitFirst4 RecFirst4Start


data_lo=data_in
RecFirst4End

bit bus to 8-bit bus rdy_in=0 rdy_in=0 rdy_in=1

Start with FSMD


rdy_in=1
 WaitSecond4 RecSecond4Start RecSecond4End
FSMD
data_hi=data_in

 Known as register-transfer rdy_in=0

(RT) level
Inputs
Send8Start rdy_in: bit; data_in: bit[4];
data_out=data_hi Send8End
Outputs

Exercise: complete the


& data_lo rdy_out=0
rdy_out: bit; data_out:bit[8]
 rdy_out=1 Variables

design
data_lo, data_hi: bit[4];

Embedded Systems Design: A Unified 20


Hardware/Software Introduction, (c) 2002 Vahid/Givargis
RT-level custom single-purpose processor
design (cont’)
Bridge
(a) Controller
rdy_in=0 rdy_in=1
rdy_in=1
WaitFirst4 RecFirst4Start RecFirst4End
data_lo_ld=1
rdy_in=0 rdy_in=0 rdy_in=1
rdy_in=1
WaitSecond4 RecSecond4Start RecSecond4End
data_hi_ld=1

Send8Start Send8End
data_out_ld=1 rdy_out=0
rdy_out=1

rdy_in rdy_out

clk
data_in(4) data_out

data_lo_ld
data_out_ld
data_hi_ld
registers

data_hi data_lo
to all

data_out
(b) Datapath

Embedded Systems Design: A Unified 21


Hardware/Software Introduction, (c) 2002 Vahid/Givargis
Optimizing single-purpose
processors
Optimization is the task of making design
metric values the best possible
Optimization opportunities
 original program
 FSMD
 datapath
 FSM

Embedded Systems Design: A Unified 22


Hardware/Software Introduction, (c) 2002 Vahid/Givargis
Optimizing the original program

Analyze program attributes and look for


areas of possible improvement
 number of computations
 size of variable
 time and space complexity
 operations used
 multiplication and division very expensive

Embedded Systems Design: A Unified 23


Hardware/Software Introduction, (c) 2002 Vahid/Givargis
Optimizing the original program
(cont’)
original program optimized program
0: int x, y; 0: int x, y, r;
1: while (1) { 1: while (1) {
2: while (!go_i); 2: while (!go_i);
3: x = x_i; // x must be the larger number
4: y = y_i; 3: if (x_i >= y_i) {
5: while (x != y) { 4: x=x_i;
replace the subtraction
6: if (x < y) 5: y=y_i;
operation(s) with modulo
7: y = y - x; }
operation in order to speed
else 6: else {
up program
8: x = x - y; 7: x=y_i;
} 8: y=x_i;
9: d_o = x; }
} 9: while (y != 0) {
10: r = x % y;
11: x = y;
12: y = r;
}
13: d_o = x;
}
GCD(42, 8) - 9 iterations to complete the loop GCD(42,8) - 3 iterations to complete the loop
x and y values evaluated as follows : (42, 8), (34, 8), x and y values evaluated as follows: (42, 8), (8,2),
(26,8), (18,8), (10, 8), (2,8), (2,6), (2,4), (2,2). (2,0)

Embedded Systems Design: A Unified 24


Hardware/Software Introduction, (c) 2002 Vahid/Givargis
Optimizing the FSMD

Areas of possible improvements


 merge states
 states with constants on transitions can be
eliminated, transition taken is already known
 states with independent operations can be
merged
 separate states
 states which require complex operations
(a*b*c*d) can be broken into smaller states to
reduce hardware size
 scheduling
Embedded Systems Design: A Unified 25
Hardware/Software Introduction, (c) 2002 Vahid/Givargis
Optimizing the FSMD (cont.)
int x, y; !1 optimized FSMD
original FSMD
1:
int x, y;
1 !(!go_i) eliminate state 1 – transitions have constant values 2:
2:
!go_i go_i !go_i

2-J: x = x_i
3: y = y_i
merge state 2 and state 2J – no loop operation in
3: x = x_i between them
5:

4: y = y_i x<y x>y


merge state 3 and state 4 – assignment operations are
independent of one another 7: y = y -x 8: x = x - y
5: !(x!=y)

x!=y
9: d_o = x
6: merge state 5 and state 6 – transitions from state 6 can
x<y !(x<y) be done in state 5
y = y -x 8: x = x - y
7:
eliminate state 5J and 6J – transitions from each state
6-J: can be done from state 7 and state 8, respectively

5-J:
eliminate state 1-J – transition from state 1-J can be
d_o = x done directly from state 9
9:

1-J:

Embedded Systems Design: A Unified 26


Hardware/Software Introduction, (c) 2002 Vahid/Givargis
Optimizing the datapath

Sharing of functional units


 one-to-one mapping, as done previously, is
not necessary
 if same operation occurs in different states,
they can share a single functional unit
Multi-functional units
 ALUs support a variety of operations, it can
be shared among operations occurring in
different states

Embedded Systems Design: A Unified 27


Hardware/Software Introduction, (c) 2002 Vahid/Givargis
Optimizing the FSM
State encoding
 task of assigning a unique bit pattern to each
state in an FSM
 size of state register and combinational logic
vary
 can be treated as an ordering problem
State minimization
 task of merging equivalent states into a
single state
 state equivalent if for all possible input
combinations the two states generate the same
outputs and transitions to the next same state
Embedded Systems Design: A Unified 28
Hardware/Software Introduction, (c) 2002 Vahid/Givargis
Summary

Custom single-purpose processors


 Straightforward design techniques
 Can be built to execute algorithms
 Typically start with FSMD
 CAD tools can be of great assistance

Embedded Systems Design: A Unified 29


Hardware/Software Introduction, (c) 2002 Vahid/Givargis

You might also like