0% found this document useful (0 votes)
10 views89 pages

DSD Chapter 5

Digital System Design Notes

Uploaded by

Hamza Javed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views89 pages

DSD Chapter 5

Digital System Design Notes

Uploaded by

Hamza Javed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 89

Digital Design

Chapter 5:
Register-Transfer Level
(RTL) Design
Slides to accompany the textbook Digital Design, First Edition,
by Frank Vahid, John Wiley and Sons Publishers, 2007.
http://www.ddvahid.com

Copyright © 2007 Frank Vahid


Instructors of courses requiring Vahid's Digital Design textbook (published by John Wiley and Sons) have permission to modify and use these slides for customary course-related activities,
Digital
subject to keeping Design
this copyright notice in place and unmodified. These slides may be posted as unanimated pdf versions on publicly-accessible course websites.. PowerPoint source (or pdf
with animations) may not be posted to publicly-accessible websites, but may be posted for students on internal protected sites or distributed directly to students by other electronic means.
Copyright © 2006 1
Instructors may make printouts of the slides available to students for a reasonable photocopying charge, without incurring royalties. Any other use requires explicit permission. Instructors
Frank Vahid
may obtain PowerPoint source or obtain special use permissions from Wiley – see http://www.ddvahid.com for information.
5.1

Introduction
• Chpt 2

Higher levels
Register-
– Capture Comb. behavior: Equations, truth tables transfer
– Convert to circuit: AND + OR + NOT  Comb. logic level (RTL)
• Chpt 3 Logic ev
l el
– Capture sequential behavior: FSMs
Tansistor
r evel
l
– Convert to circuit: Register + Comb. logic  Controller
• Chpt 4 Levels of digital
– Datapath components, simple datapaths design abstraction

• Chpt 5
– Capture behavior: High-level state machine Processors:
– Convert to circuit: Controller + Datapath  Processor • Programmable
– Known as “RTL” (register-transfer level) design (microprocessor)
• Custom
Digital Design
Copyright © 2006 2
Frank Vahid
Note: Slides with animation are denoted with a small red "a" near the animated items
RTL Design: Capture Behavior, Convert to Circuit
• Recall
– Chapter 2: Combinational Logic Design
• First step: Capture behavior (using equation
or truth table)
• Remaining steps: Convert to circuit
Capture behavior
– Chapter 3: Sequential Logic Design
• First step: Capture behavior (using FSM)
• Remaining steps: Convert to circuit
• RTL Design (the method for creating
Convert to circuit
custom processors)
– First step: Capture behavior (using high-
level state machine, to be introduced)
– Remaining steps: Convert to circuit

Digital Design
Copyright © 2006 3
Frank Vahid
5.2

RTL Design Method

Digital Design
Copyright © 2006 4
Frank Vahid
RTL Design Method: “Preview” Example
• Soda dispenser
s a
– c: bit input, 1 when coin
deposited
– a: 8-bit input having value of c Soda
deposited coin d dispenser
– s: 8-bit input having cost of a processor
soda
– d: bit output, processor sets to s a 25
1 when total value of
deposited coins equals or 50 25
0 1 0 1 0
exceeds cost of a soda c Soda tot:
tot:
d dispenser a

0 1 0 processor 50
25

How can we precisely describe this


Digital Design
Copyright © 2006 processor’s behavior? 5
Frank Vahid
Preview Example: Step 1 --
Capture High-Level State Machine s a
• Declare local register tot 8 8
c
• Init state: Set d=0, tot=0 d
Soda
dispenser
processor
• Wait state: wait for coin
– If see coin, go to Add state
Inputs: c (bit), a (8 bits), s (8 bits)
• Add state: Update total value: Outputs: d (bit)
tot = tot + a Local registers: tot (8 bits)
– Remember, a is present coin’s
c
value Add
– Go back to Wait state
Init Wait
• In Wait state, if tot >= s, go to tot=tot+a
Disp(ense) state d=0 c’*(tot<s)
tot=0 c’*(tot<s)’
• Disp state: Set d=1 (dispense
soda) Disp
– Return to Init state
d=1
Digital Design
Copyright © 2006 6
Frank Vahid
Preview Example:
Step 2 -- Create Datapath
Inputs : c (bit), a(8 bits) , s (8 bits)
Outputs : d (bit)
Local re g isters : tot (8 bits)

• Need tot register c


Add
Init Wait
• Need 8-bit comparator d=0 c‘
tot= t ot+a
c ‘ (tot<s)

to compare s and tot t ot=0 (t ot<s)‘


Disp

• Need 8-bit adder to s a


d=1

perform tot = tot + a


• Wire the components
tot_ld ld
as needed for above tot
tot_clr clr
• Create control 8
input/outputs, give 8 8
them names
8-bit 8-bit
tot_lt_s
< adder

Datapath 8
Digital Design
Copyright © 2006 7
Frank Vahid
Preview Example: Step 3 –
Connect Datapath to a Controller s a

• Controller’s inputs tot_ld ld


tot
tot_clr clr
– External input c 8
8 8
(coin detected)
8-bit
– Input from datapath tot_lt_s 8-bit
< adder

comparator’s output, Datapath 8

s a
which we named
tot_lt_s 8 8

• Controller’s outputs
– External output d c
(dispense soda)
– Outputs to datapath d tot_ld
to load and clear the
tot register tot_clr

Controller Datapath
Digital Design
tot_lt_s
Copyright © 2006 8
Frank Vahid
Preview Example: Step 4 – Derive the Controller’s
FSM s a

• Same states 8 8

and arcs as
c
high-level
d
state machine tot_ld

Controller

Datapath
tot_clr
• But set/read
datapath tot_lt_s
s a
control Inputs:: c, tot_lt_s (bit)

signals for all Outputs: d, tot_ld, tot_clr (bit)


tot_ld
tot_ld
tot_clr
ld
clr
tpt

datapath c c
Add
8
8 8
tot_clr
operations d Init Wait
tot_ld=1 tot_lt_s 8-bit
tot_lt_s 8-bit
and d=0 c’ * t
c’*tot_lt_s < adder
o t_ 8
tot_clr=1
conditions lt_
s’
Disp
Datapath

d=1
Digital Design Controller
Copyright © 2006 9
Frank Vahid
Preview Example: Completing the Design
• Implement the FSM as
a state register and

tot_lt_s
logic

tot_clr
tot_ld
s1 s0 c n1 n0 d
– As in Ch3 0 0 0 0 0 1 0 0 1
– Table shown on right 0 0 0 1 0 1 0 0 1

Init
0 0 1 0 0 1 0 0 1
0 0 1 1 0 1 0 0 1
Inputs:: c, tot_lt_s (bit)
0 1 0 0 1 1 0 0 0
Outputs: d, tot_ld, tot_clr (bit)
0 1 0 1 0 1 0 0 0

Wait
tot_ld
c c 0 1 1 0 1 0 0 0 0
Add tot_clr 0 1 1 1 1 0 0 0 0
d Init Wait 1 0 0 0 0 1 0 1 0
tot_ld=1
Add
tot_lt_s
d=0 c’ * c’*tot_lt_s
tot 1 1 0 0 0 0 1 0 0
tot_clr=1
Disp
_ lt_
s’
Disp

d=1
Controller

Digital Design
Copyright © 2006 10
Frank Vahid
Step 1: Create a High-Level State Machine
• Let’s consider each step of the
RTL design process in more
detail Inputs : c (bit), a (8 bits) , s (8 bits)
• Step 1 Outputs : d (bit)
Local registers: tot (8 bits)
– Soda dispenser example
c
– Not an FSM because:
• Multi-bit (data) inputs a and s Init Wait
• Local register tot tot= tot+a
• Data operations tot=0, tot<s, d=0 c’ (tot<s )
c’(tot<s )’
tot=tot+a. tot=0
– Useful high-level state machine: Disp
• Data types beyond just bits d=1
• Local registers
• Arithmetic equations/expressions

Digital Design
Copyright © 2006 11
Frank Vahid
Example: Laser-Based Distance Measurer
T (in seconds)
laser
D
Object of
a
interest
sensor
2D = T sec * 3*108 m/sec

• Laser-based distance measurement – pulse laser,


measure time T to sense reflection
– Laser light travels at speed of light, 3*108 m/sec
– Distance is thus D = (T sec * 3*108 m/sec) / 2

Digital Design
Copyright © 2006 12
Frank Vahid
Example: Laser-Based Distance Measurer
T (in seconds)
B L
laser from button to laser
Laser-based
distance
sensor D 16 measurer S
to display from sensor

• Inputs/outputs
– B: bit input, from button, to begin measurement
– L: bit output, activates laser
– S: bit input, senses laser reflection
– D: 16-bit output, to display computed distance

Digital Design
Copyright © 2006 13
Frank Vahid
Example: Laser-Based Distance Measurer
DistanceMeasurer from button B Laser-
L
to laser
InputsB
: (bit), S (bit) based
OutputsL: (bit), D (16 bits) distance
D 16 measurer S
Local storage: Dreg(16) to display from sensor
(required)
a
S0 ?
(first state usually
L := '0' // laser initializes the system)
Dreg := 0off //distance
is 0

• Declare inputs, outputs, and local storage


– Dreg required for multi-bit output
• Create initial state, name it S0
– Initialize laser to off (L:='0') Recall: '0' means single bit,
– Initialize displayed distance to 0 (Dreg:=0) 0 means integer

Digital Design
Copyright © 2006 14
Frank Vahid
Example: Laser-Based Distance Measurer
from button B Laser-
L
to laser
DistanceMeasurer based
... B'// button not pressed distance
D 16 measurer S
to display from sensor

S0 S1 ?
B
L := '0' // button
Dreg := 0 pressed

• Add another state, S1, that waits for a button press


– B' – stay in S1, keep waiting
– B – go to a new state S2

Q: What should S2 do? A: Turn on the laser


a
Digital Design
Copyright © 2006 15
Frank Vahid
Example: Laser-Based Distance Measurer
from button B Laser-
L
to laser
based
DistanceMeasurer distance
... B' D 16 S
to display measurer
from sensor

S0 S1 S2 S3
B
L := '0' L := '1' L := '0'
Dreg := 0 // laser on // laser
off

• Add a state S2 that turns on the laser (L:='1')


• Then turn off laser (L:='0') in a state S3

Q: What do next? A: Start timer, wait to sense reflection


a

Digital Design
Copyright © 2006 16
Frank Vahid
Example: Laser-Based Distance Measurer
B L
fr om button to laser
DistanceMeasurer InputsB
: (bit), S (bit) Outputs
L: (bit), D (16 bits) Laser-based
Local storage: Dreg,Dctr (16 bits)
D 16
distance
S
measurer
B' t o display from sensor
S' // no reflection

S //reflection
S0 S1 S2 S3 ?
B
L := '0' Dctr := 0 L := '1' L := '0'
Dreg := 0 // reset cycle Dctr := Dctr + 1
count // count cycles
a

• Stay in S3 until sense reflection (S)


• To measure time, count cycles while in S3
– To count, declare local storage Dctr
– Initialize Dctr to 0 in S1. In S2 would have been O.K. too.
• Don't forget to initialize local storage—common mistake
– Increment Dctr each cycle in S3
Digital Design
Copyright © 2006 17
Frank Vahid
Example: Laser-Based Distance Measurer
B L
from button Laser- t o laser
DistanceMeasurerInputsB
: (bit), S (bit)Outputs
L: (bit), D (16 bits) based
Local storage:
Dreg, Dctr (16 bits) distance
D 16 S
to display measurer
from sensor
B' S'

S0 S1 S2 S3 S4
B S
L := '0' Dctr := 0 L := '1' L := '0' Dreg := Dctr/2
Dreg := 0 Dctr := Dctr+1// calculate D

• Once reflection detected (S), go to new state S4


– Calculate distance
– Assuming clock frequency is 3x108, Dctr holds number of meters, so
Dreg:=Dctr/2
• After S4, go back to S1 to wait for button again
Digital Design
Copyright © 2006 18
Frank Vahid
Step 2: Create a Datapath
• Datapath must
– Implement data storage
– Implement data computations
• Look at high-level state machine, do
three substeps
– (a) Make data inputs/outputs be datapath
inputs/outputs
– (b) Instantiate declared registers into the
datapath (also instantiate a register for each Instantiate: to
data output) introduce a new
– (c) Examine every state and transition, and
instantiate datapath components and component into a
connections to implement any data design.
computations

Digital Design
Copyright © 2006 19
Frank Vahid
Step 2 Example: Laser-Based Distance Measurer
Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits)
(a) Make data Local Registers: Dctr (16 bits)
inputs/outputs be
datapath B‘ S‘
inputs/outputs
(b) Instantiate declared
registers into the S0 S1 S2 S3 S4
B S
datapath (also
instantiate a L=0 Dctr = 0 L=1 L=0 D = Dctr / 2
D=0 Dctr = Dctr + 1 (calculate D)
register for each
data output) a
Datap ath
(c) Examine every Dreg_clr
state and Dreg_ld
transition, and
Dctr_clr clear clear I
instantiate Dctr: 16-bit Dreg: 16-bit
datapath Dctr_cnt count load
up-counter register
components and Q Q
connections to
implement any 16
data computations
D

Digital Design
Copyright © 2006 20
Frank Vahid
Step 2 Example: Laser-Based Distance Measurer
Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits)
(c) (continued) Local Registers: Dctr (16 bits)
Examine every
state and B‘ S‘
transition, and
instantiate
S0 S1 S2 S3 S4
datapath B S
components and L=0 Dctr = 0 L=1 L=0 D = Dctr / 2
connections to D=0 Dctr = Dctr + 1 (calculate D)
implement any Datap ath
a

data computations
Dreg_clr >>1
16
Dreg_ld
Dctr_clr clear clear I
Dctr: 16-bit Dreg: 16-bit
Dctr_cnt cou nt load
up-cou nter register
Q Q
16

16
D
Digital Design
Copyright © 2006 21
Frank Vahid
Step 2 Example Showing Mux Use
Localregisters:
E, F, G, R (16 bits)
E F G E F G E F G

T0 R = E + F
A B A B add_A_s0 1
2× 1

+ + add_B_s0
T1 R = R + G A B
a
+
R R

R
(a) (b) (c)

(d)
• Introduce mux when one component input can come from
more than one source
Digital Design
Copyright © 2006 22
Frank Vahid
Step 3: Connecting the Datapath to a Controller

L
B to laser
from button
Controller from sensor
Dreg_clr S

Dreg_ld
• Laser-based distance
measurer example
Dctr_clr Datapath
• Easy – just connect all
Dctr_cnt
D control signals
to display between controller and
16 300 MH z Clock
datapath
Datap ath

Dreg_clr >>1
Dreg_ld 16

Dctr_clr clear clear I


count Dctr: 16-bit Dreg: 16-bit
Dctr_cnt up-counter load register
Q Q
16
Digital Design
16
Copyright © 2006 23
Frank Vahid D
Step 4: Driving the Controller’s FSM
B
L Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits)
from button
Controller
to laser
Local Registers: Dctr (16 bits)
from sensor
Dreg_clr S

Dreg_ld
B’ S’
Dctr_clr Datap ath

Dctr_cnt
D S0 S1 S2 S3 S4
to display B S
16 300 MHz Clock
L=0 Dctr = 0 L=1 L=0 D = Dctr / 2
D=0 Dctr = Dctr + 1 (calculate D)
Inputs: B, S
• FSM has same Outputs: L, Dreg_clr, Dreg_ld, Dctr_clr, Dctr_cnt
structure as high-
level state machine B’ S’

– Inputs/outputs all a

bits now B S
S0 S1 S2 S3 S4
– Replace data
operations by bit L=0 L=0 L=1 L=0 L=0
Dreg_clr = 1 Dreg_clr = 0 Dreg_clr = 0 Dreg_clr = 0 Dreg_clr = 0
operations using Dreg_ld = 0 Dreg_ld = 0 Dreg_ld = 0 Dreg_ld = 0 Dreg_ld = 1
datapath Dctr_clr = 0 Dctr_clr = 1 Dctr_clr = 0 Dctr_clr = 0 Dctr_clr = 0
Dctr_cnt = 0 Dctr_cnt = 0 Dctr_cnt = 0 Dctr_cnt = 1 Dctr_cnt = 0
Digital Design (laser off) (clear count) (laser on) (laser off) (load D reg with Dctr/2)
Copyright © 2006 (clear D reg) (count up) (stop counting) 24
Frank Vahid
Step 4: Deriving the Controller’s FSM
B’ S’

B S
S0 S1 S2 S3 S4

L=0 L=0 L=1 L=0 L=0


Dreg_clr = 1 Dreg_clr = 0 Dreg_clr = 0 Dreg_clr = 0 Dreg_clr = 0
Dreg_ld = 0 Dreg_ld = 0 Dreg_ld = 0 Dreg_ld = 0 Dreg_ld = 1
Dctr_clr = 0 Dctr_clr = 1 Dctr_clr = 0 Dctr_clr = 0 Dctr_clr = 0
Dctr_cnt = 0 Dctr_cnt = 0 Dctr_cnt = 0 Dctr_cnt = 1 Dctr_cnt = 0
(laser off) (clear count) (laser on) (laser off) (load D reg with Dctr/2)
(clear D reg) (count up) (stop counting)

Inputs: B, S
• Using Outputs: L, Dreg_clr, Dreg_ld, Dctr_clr, Dctr_cnt

shorthand of B’ S’
outputs not a

assigned B S
S0 S1 S2 S3 S4
implicitly
assigned 0 L=0 Dctr_clr = 1 L=1 L=0 Dreg_ld = 1
Dreg_clr = 1 (clear count) (laser on) Dctr_cnt = 1 Dctr_cnt = 0
(laser off) (laser off) (load D reg with Dctr/2)
(clear D reg) (count up) (stop counting)
Digital Design
Copyright © 2006 25
Frank Vahid
Step 4
B L
from button to laser Datap ath

Controller
from sensor
Dreg_clr S
Dreg_clr >>1

Datapath
Dreg_ld 16
Dreg_ld
Dctr_clr
Dctr_clr clear clear I
Dctr_cnt count Dctr: 16-bit Dreg: 16-bit
Dctr_cnt up-counter load register
D
to display Q Q
16 300 MHz Clock 16
16
D

Inputs: B, S Outputs: L, Dreg_clr, Dreg_ld, Dctr_clr, Dctr_cnt

B’ S’

• Implement
B S
S0 S1 S2 S3 S4 FSM as state
L=0 Dctr_clr = 1 L=1 L=0 Dreg_ld = 1 register and
Dreg_clr = 1 (laser on) Dctr_cnt = 1
(laser off)
(clear count)
(laser off)
Dctr_cnt = 0 logic (Ch3) to
(load D reg with Dctr/2)
(clear D reg) (count up) (stop counting) complete the
design
Digital Design
Copyright © 2006 26
Frank Vahid
b Laser
Timer Example: Laser Surgery Surgery x
System
laser

clk
System (DIY) patient
(a)
• Recall Chpt 3 laser surgery 300,000 (in binary)

example b ld M
load
– Clock was 10 ns, wanted 30 ns, Controller
en
enable 32-bit
Q 1 microsec
used 3 states. clk Q
timer
x
– What if wanted 300 ms? Adding 30
million states is not reasonable. (b)

• Use timer clk


10 ns

...
– Controller FSM loads timer, Inputs:
enables, then waits for Q=1 b ...
300 ms
Q ...
Inputs:b, Q Outputs:
ld, en, x
State Off Off Off OffStrt On ... On Off
x=0 ...
ld=1 Off b' Outputs
:
en=0 x
b Q Q'
ld ...
x=0 x=1
ld=0 Strt On ld=0 en ...
Digital Design en=1 en=1
Copyright © 2006 (d) 27
Frank Vahid (c)
5.3

RTL Design Examples and Issues


• We’ll use several more Master
processor
examples to illustrate RTL
design rd

• Example: Bus interface 32 D


4 A
– Master processor can read
register from any peripheral Per0 Per1 Per15
• Each register has unique 4-bit
address to/from processor bus
rd D A
• Assume 1 register/periph.
– Sets rd=1, A=address 32 4

– Appropriate peripheral places Faddr


Bus interface
register data on 32-bit D lines 4
• Periph’s address provided on Q
32
Faddr inputs (maybe from DIP
switches, or another register) Main part

Digital Design Peripheral


Copyright © 2006 28
Frank Vahid
RTL Example: Bus Interface
Inputs: rd (bit); Q (32 bits); A, Faddr (4 bits)
Outputs: D (32 bits)
Local register: Q1 (32 bits)
rd’ rd
((A = Faddr)
and rd’)
WaitMyAddress SendData
(A = Faddr)
D = “Z” and rd D = Q1
Q1 = Q

• Step 1: Create high-level state machine


– State WaitMyAddress
• Output “nothing” (“Z”) on D, store peripheral’s register value Q into local
register Q1
• Wait until this peripheral’s address is seen (A=Faddr) and rd=1
– State SendData
• Output Q1 onto D, wait for rd=0 (meaning main processor is done
reading the D lines)
Digital Design
Copyright © 2006 29
Frank Vahid
RTL Example: Bus Interface
Inputs: rd (bit); Q (32 bits); A, Faddr (4 bits)
Outputs: D (32 bits)
Local register: Q1 (32 bits)
rd’ rd
((A = Faddr)
and rd’)
WaitMyAddress SendData
(A = Faddr)
D = “Z” and rd D = Q1
Q1 = Q

clk
Inputs
rd

State W W SD W W SD SD W
Outputs
D Z Q1 Z Q1 Z

Digital Design
Copyright © 2006 30
Frank Vahid
RTL Example: Bus Interface

Inputs: rd (bit); Q (32 bits); A, Faddr (4 bits)


Outputs: D (32 bits)
Local register: Q1 (32 bits) A Faddr Q
rd’ rd
4 4 32
((A = Faddr)
and rd)’ Q1_ld
ld Q1
WaitMyAddress SendData
(A = Faddr)
D = “Z” and rd D = Q1 = (4-bit)
Q1 = Q 32
A_eq_Faddr

D_en
32
a

• Step 2: Create a datapath Datapath


(a) Datapath inputs/outputs
Bus interface
(b) Instantiate declared registers
D
(c) Instantiate datapath components and
connections
Digital Design
Copyright © 2006 31
Frank Vahid
RTL Example: Bus Interface
Inputs: rd (bit); Q (32 bits); A, Faddr (4 bits)
Outputs: D (32 bits)
Local register: Q1 (32 bits)
rd’ rd A Faddr Q
Inputs: rd, A_eq_Faddr
((A(bit)
= Faddr)
Outputs: Q1_ld, D_en and
(bit) rd)’ 4 4 32
WaitMyAddress ‘
rdSendData Q1_ld
rd rd ld
(A = Faddr) Q1
D = “Z” and(A_eq_
rd Faddr D = Q1
Q1 = Q and rd) ‘
= (4-bit) 32
WaitMyAdd ress SendD ata A_eq_Faddr
A_eq_ Faddr
D_en = 0 and rd D_en = 1 D_en
a Q1_ld = 1 Q1_ld = 0 32

Datapath
Bus interface

• Step 3: Connect datapath to controller D

• Step 4: Derive controller’s FSM


Digital Design
Copyright © 2006 32
Frank Vahid
RTL Example: Video Compression – Sum of Absolute
Only difference: ball moving
Differences
Frame 1 Frame 2 Frame 1 Frame 2

Digitized Digitized Digitized Difference of a


frame 1 frame 2 frame 1 2 from 1

1 Mbyte 1 Mbyte 1 Mbyte 0.01 Mbyte


(a) (b)
Just send
• Video is a series of frames (e.g., 30 per second) difference
• Most frames similar to previous frame
– Compression idea: just send difference from previous frame
Digital Design
Copyright © 2006 33
Frank Vahid
RTL Example: Video Compression – Sum of Absolute
Differences
compare Each is a pixel, assume
Frame 1 Frame 2
represented as 1 byte
(actually, a color picture
might have 3 bytes per
pixel, for intensity of
red, green, and blue
components of pixel)
• Need to quickly determine whether two frames are similar
enough to just send difference for second frame
– Compare corresponding 16x16 “blocks”
• Treat 16x16 block as 256-byte array
– Compute the absolute value of the difference of each array item
– Sum those differences – if above a threshold, send complete frame
for second frame; if below, can use difference method (using
another technique, not described)
Digital Design
Copyright © 2006 34
Frank Vahid
RTL Example: Video Compression – Sum of Absolute
Differences

A SAD
256-byte array

integer
B sad
256-byte array
go

!(i<256)

• Want fast sum-of-absolute-differences (SAD) component


– When go=1, sums the differences of element pairs in arrays A and
B, outputs that sum

Digital Design
Copyright © 2006 35
Frank Vahid
RTL Example: Video Compression – Sum of Absolute
Differences
A SAD
Inputs: A, B (256 byte memory); go (bit)
Outputs: sad (32 bits)
B sad Local registers: sum, sad_reg (32 bits); i (9 bits)

go
S0 !go
go
• S0: wait for go sum = 0 a
S1
• S1: initialize sum and index i=0

• S2: check if done (i>=256) (i<256)’


S2
!(i<256)
• S3: add difference to sum, i<256
increment index S3
sum=sum+abs(A[i]-B[i])
i=i+1
• S4: done, write to output
sad_reg S4 sad_ r eg = sum

Digital Design
Copyright © 2006 36
Frank Vahid
RTL Example: Video Compression – Sum of Absolute
Differences
Inputs: A, B (256 byte memory); go (bit) AB_addr A_data B_data
Outputs: sad (32 bits)
Local registers: sum, sad_reg (32 bits); i (9 bits) i_lt_256
<256 8 8
9
S0 !go i_inc
go i_clr
i –
sum = 0 a
8
S1
i=0
sum_ld
(i<256)’ sum 32 abs
S2 sum_clr
i<256 !(i<256) 32 32 8
sum=sum+abs(A[i]-B[i]) sad_reg_ld
S3
i=i+1
sad_reg +
!(i<256) (i_lt_256)
S4 sad_ reg=sum
Datapath 32

sad
• Step 2: Create datapath
Digital Design
Copyright © 2006 37
Frank Vahid
RTL Example: Video Compression – Sum of Absolute
Differences
go AB_rd AB_addr A_data B_data

i_lt_256
<256 8 8
S0 go’
9
go i_inc
S1
sum=0 sum_clr=1
i_clr
i –
i=0 i_clr=1
8
S2 sum_ld
 i<256 i_lt_256 sum 32 abs
sum_clr
S3 sum=sum+abs(A[i]-B[i])
sum_ld=1; AB_rd=1 32 32 8
i=i+1 i_inc=1 !(i<256)
sad_reg_ld
S4 sad_reg=sum a
sad_reg +
(i<256) (i_lt_256) sad_reg_ld=1
!(i<256) (i_lt_256) Controller 32

sad
• Step 3: Connect to controller
• Step 4: Replace high-level state machine by FSM
Digital Design
Copyright © 2006 38
Frank Vahid
RTL Example: Video Compression – Sum of Absolute
Differences
• Comparing software and custom
circuit SAD
– Circuit: Two states (S2 & S3) for
each i, 256 i’s 512 clock cycles
– Software: Loop (for i = 1 to 256), but (i<256)’
S2
for each i, must move memory to
local registers, subtract, compute i<256
sum=sum+abs(A[i]-B[i])
absolute value, add to sum, S3
i=i+1
increment i – say about 6 cycles per
array item  256*6 = 1536 !(i<256)
cycles
– Circuit is about 3 times (300%)
faster
!(i<256) (i_lt_256)
– Later, we’ll see how to build SAD
circuit that is even faster

Digital Design
Copyright © 2006 39
Frank Vahid
RTL Design Pitfalls and Good Practice
• Common pitfall: Assuming Local registers: R, Q (8 bits)
register is update in the
state it’s written R<100 C

– Final value of Q? A B R>=100


– Final state?
R=99 R=R+1 D
– Answers may surprise you Q=R
(a)
• Value of Q unknown
R<100
• Final state is C, not D
clk A B C
– Why?
99 100
• State A: R=99 and Q=R R ? 99 100
happen simultaneously
• State B: R not updated with Q ? ? ?
R+1 until next clock cycle,
simultaneously with state (b)
register being updated

Digital Design
Copyright © 2006 40
Frank Vahid
RTL Design Pitfalls and Good Practice
• Solutions Local registers: R, Q (8 bits)

– Read register in R<100 C


following state (Q=R) A B B2 R>=100
– Insert extra state so that
R=99 R=R+1 D
conditions use updated Q=R Q=R
value (a)

– Other solutions are R<100 R>=100

possible, depends on clk A B B2 D


the example 99 100
R ? 99 100 100

Q ? ? 99 99

(b)

Digital Design
Copyright © 2006 41
Frank Vahid
RTL Design Pitfalls and Good Practice
• Common pitfall: Inputs: A, B (8 bits) Inputs: A, B (8 bits)
Reading outputs Outputs: P (8 bits) Outputs: P (8 bits)
Local register: R (8 bits)
– Outputs can only be
written
– Solution: Introduce S T S T
additional register,
which can be written P=A P=P+B R=A P=R+B
and read P=A

(a) (b)

Digital Design
Copyright © 2006 42
Frank Vahid
RTL Design Pitfalls and Good Practice
• Good practice: Register B B
all data outputs R R
– In fig (a), output P would
show spurious values as
addition computes
• Furthermore, longest + +
register-to-register path,
which determines clock
period, is not known until P
that output is connected
to another component (a) Preg
– In fig (b), spurious outputs
reduced, and longest P
register-to-register path is (b)
clear

Digital Design
Copyright © 2006 43
Frank Vahid
Control vs. Data Dominated RTL Design
• Designs often categorized as control-dominated or data-
dominated
– Control-dominated design – Controller contains most of the
complexity
– Data-dominated design – Datapath contains most of the complexity
– General, descriptive terms – no hard rule that separates the two
types of designs
– Laser-based distance measurer – control dominated
– Bus interface– mix of control and data
– Now let’s do a data dominated design

Digital Design
Copyright © 2006 44
Frank Vahid
Data Dominated RTL Design Example: FIR Filter
• Filter concept
– Suppose X is data from a
temperature sensor, and
particular input sequence is
180, 180, 181, 240, 180, 181 X Y
(one per clock cycle)
– That 240 is probably wrong! 12 digital filter 12
• Could be electrical noise clk
– Filter should remove such
noise in its output Y
– Simple filter: Output average
of last N values
• Small N: less filtering
• Large N: more filtering, but
less sharp output

Digital Design
Copyright © 2006 45
Frank Vahid
Data Dominated RTL Design Example: FIR Filter
• FIR filter
– “Finite Impulse Response” X Y
– Simply a configurable weighted 12 digital filter 12
sum of past input values clk
– y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
• Above known as “3 tap”
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
• Tens of taps more common
• Very general filter – User sets the
constants (c0, c1, c2) to define
specific filter
– RTL design
• Step 1: Create high-level state
machine
– But there really is none! Data
dominated indeed.
• Go straight to step 2
Digital Design
Copyright © 2006 46
Frank Vahid
Data Dominated RTL Design Example: FIR Filter
• Step 2: Create datapath X Y
12 digital filter 12
– Begin by creating chain clk
of xt registers to hold
past values of X
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
Suppose sequence is: 180, 181, 240
3-tap FIR filter
x(t) x(t-1) x(t-2)

xt0 xt1 xt2


X 240
180
181 180
181 180 Y
12 12 12 12 a

clk

Digital Design
Copyright © 2006 47
Frank Vahid
Data Dominated RTL Design Example: FIR Filter
• Step 2: Create datapath X Y
12 digital filter 12
(cont.) clk
– Instantiate registers for
c0, c1, c2
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
– Instantiate multipliers to
compute c*x values
3-tap FIR filter
x(t) x(t-1) x(t-2)
c0 c1 c2
xt0 xt1 xt2
X
a
clk
  
Y

Digital Design
Copyright © 2006 48
Frank Vahid
Data Dominated RTL Design Example: FIR Filter
• Step 2: Create datapath X Y
12 digital filter 12
(cont.) clk
– Instantiate adders

y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)


3-tap FIR filter
x(t) x(t-1) x(t-2)
c0 c1 c2
xt0 xt1 xt2
X

clk
a
  

+ +
Y

Digital Design
Copyright © 2006 49
Frank Vahid
Data Dominated RTL Design Example: FIR Filter
• Step 2: Create datapath (cont.) X Y
12 digital filter 12
– Add circuitry to allow loading of clk
particular c register
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
CL 3-tap FIR filter
e
3
Ca1 2x4 2
Ca0 1
0
C

x(t) x(t-1) x(t-2)


c0 c1 c2
xt0 xt1 xt2 a
X

clk

* * *

+ + yreg
Y
Digital Design
Copyright © 2006 50
Frank Vahid
Data Dominated RTL Design Example: FIR Filter
y(t) = c0*x(t) + c1*x(t-1) + c2*x(t-2)
• Step 3 & 4: Connect to controller, Create FSM
– No controller needed
– Extreme data-dominated example
– (Example of an extreme control-dominated design – an FSM, with no
datapath)
• Comparing the FIR circuit to a software implementation
– Circuit
• Assume adder has 2-gate delay, multiplier has 20-gate delay
• Longest past goes through one multiplier and two adders
– 20 + 2 + 2 = 24-gate delay
• 100-tap filter, following design on previous slide, would have about a 34-gate
delay: 1 multiplier and 7 adders on longest path
– Software
• 100-tap filter: 100 multiplications, 100 additions. Say 2 instructions per
multiplication, 2 per addition. Say 10-gate delay per instruction.
• (100*2 + 100*2)*10 = 4000 gate delays
– Circuit is more than 100 times faster (10,000% faster). Wow.

Digital Design
Copyright © 2006 51
Frank Vahid
5.4

Determining Clock Frequency


• Designers of digital circuits
often want fastest
performance clk a b
– Means want high clock
frequency
• Frequency limited by longest
register-to-register delay
2 ns 
delay
– Known as critical path
– If clock is any faster, incorrect
data may be stored into register c
– Longest path on right is 2 ns
• Ignoring wire delays, and
register setup and hold times,
for simplicity

Digital Design
Copyright © 2006 52
Frank Vahid
Critical Path
• Example shows four paths
– a to c through +: 2 ns
– a to d through + and *: 7 ns
– b to d through + and *: 7 ns a b

– b to d through *: 5 ns
• Longest path is thus 7 ns 2 ns + * 5 ns
delay delay
• Fastest frequency
7 ns 7 ns

2 ns

5 ns
7 ns
7 ns
– 1 / 7 ns = 142 MHz c d
Max
(2,7,7,5)
= 7 ns

Digital Design
Copyright © 2006 53
Frank Vahid
Critical Path Considering Wire Delays
• Real wires have delay too
– Must include in critical path
• Example shows two paths
– Each is 0.5 + 2 + 0.5 = 3 ns clk a b
• Trend
0.5 ns
– 1980s/1990s: Wire delays were tiny 0.5 ns
compared to logic delays
– But wire delays not shrinking as fast as + 2 ns
logic delays
• Wire delays may even be greater than 0.5 ns
logic delays!

3 ns

3 ns
c 3 ns
• Must also consider register setup and
hold times, also add to path
• Then add some time to the computed
path, just to be safe
– e.g., if path is 3 ns, say 4 ns instead
Digital Design
Copyright © 2006 54
Frank Vahid
A Circuit May Have Numerous Paths
• Paths can exist s a

– In the datapath Combinational logic 8 8


d
– In the controller
– Between the tot_ld
ld
controller and t ot_clr tot
c clr
datapath
(c ) 8
– May be tot_lt_s
n1

hundreds or
thousands of n0
8-bit 8-bit
< adder
paths tot_lt_s 8

• Timing analysis Datapath


s1 s0
tools that evaluate (b) (a)
all possible paths clk State register

automatically very
helpful
Digital Design
Copyright © 2006 55
Frank Vahid
5.5

Behavioral Level Design: C to Gates


C code
S0 !go
in t SAD (byte A[256], byte B[256]) // not quite C syntax
go
{
sum = 0
S1 uint sum; short uint I;
i=0
sum = 0;
(i<256)’ i = 0;
S2 while (i < 256) {
sum = sum + abs(A[i] – B[i]);
i<256
i = i + 1;
sum=sum+abs(A[i]-B[i])
S3 }
i=i+1
return sum;
}
a
S4 sad_ reg = sum

• Earlier sum-of-absolute-differences example


– Started with high-level state machine
– C code is an even better starting point -- easier to understand
Digital Design
Copyright © 2006 56
Frank Vahid
Behavioral-Level Design: Start with C (or Similar
Language)
• Replace first step of RTL design method by two steps
– Capture in C, then convert C to high-level state machine
– How convert from C to high-level state machine?

Step 1A: Capture in C


a
Step 1B: Convert to high-level state machine

Digital Design
Copyright © 2006 57
Frank Vahid
Converting from C to High-Level State Machine
• Convert each C construct to
equivalent states and
transitions
• Assignment statement
target= a
– Becomes one state with target = expression;
expression
assignment
• If-then statement
– Becomes state with condition !cond
check, transitioning to “then” cond
if (cond) {
statements if condition true, // then stmts (then stmts) a

otherwise to ending state }

• “then” statements would also (end)


be converted to states

Digital Design
Copyright © 2006 58
Frank Vahid
Converting from C to High-Level State Machine
• If-then-else
!cond
– Becomes state with condition
if (cond) { cond
check, transitioning to “then” // then stmts
(then stmts) (else stmts)
statements if condition true, or }
else { a
to “else” statements if condition // else stmts (end)
false }

• While loop statement !cond

cond
– Becomes state with condition while (cond) {
// while stmts (while stmts)
check, transitioning to while }
a

loop’s statements if true, then


transitioning back to condition
(end)
check
Digital Design
Copyright © 2006 59
Frank Vahid
Simple Example of Converting from C to High-
Level State Machine
Inputs: uint X, Y
Outputs: uint Max !(X>Y) !(X>Y)

X>Y X>Y
if (X > Y) {
Max = X; (then stmts) (else stmts) Max=X Max=Y
}
else {
Max = Y;
(end) (end)
}
a a

(a) (b) (c)

• Simple example: Computing the maximum of two numbers


– Convert if-then-else statement to states (b)
– Then convert assignment statements to states (c)
Digital Design
Copyright © 2006 60
Frank Vahid
Example: Converting Sum-of-Absolute-Differences C
code to High-Level State Machine
• Convert each construct to Inputs: byte A[256, B[256]
bit go;
!(!go)
Output: int sad
states main() !go !go go !go go
{
– Simplify when possible, uint sum; short uint I;
while (1) {
sum=0 sum=0
i=0
e.g., merge states
while (!go); i=0
• From high-level state sum = 0;
(d)
i = 0;
machine, follow RTL design while (i < 256) {
(b)
(c)

method to create circuit sum = sum + abs(A[i] - B[i]);


i = i + 1;
• Thus, can convert C to }
}
sad = sum;
gates using straightforward }
(a)
!go go !go go
a

automatable process !go go


sum=0
i=0
sum=0
i=0
– Not all C constructs can be sum=0 !(i<256) !(i<256)
efficiently converted i=0
i<256 i<256
– Use C subset if intended !(i<256)
sum=sum sum=sum
for circuit i<256
+ abs
i=i+1
+ abs
i=i+1
while stmts
– Can use languages other sad =
than C, of course sum

(g)
Digital Design
(e) sad =
Copyright © 2006 sum 61
Frank Vahid (f)
4.10

Register Files
• MxN register file
component provides er C
32
C
efficient access to M N- ert 8
d0 d0 loadload
a

reg0reg0 huge mux


bit-wide registers sompu
t T
32 o the ab
om theompu omi
i0 i0
– If we have many c
car 4 162  4 or displ T
the
c al
rcar's too much 8 mi T r
r
8-bit ror displ
registers but only need om the
alr nt 32-bit
16x41× 1 r
fanout
r r
t F e
c 4 a0
d1 load reg1 A
access one or two at a ec F n i0 o
time, a register file is 8
i1
ov ya ve
i3-i0
a1
i1 ae
d dy - D D
load reg2
more efficient d2 I 32 8

– Ex: Above-mirror display 8


i2
congestion
(earlier example), but this d3 load reg3 M
d15
e load reg15
time having 16 32-bit e
registers load i15i3 s1 s0
load 32 8
s3-s0
• Too many wires, and x y
big mux is too slow

Digital Design
Copyright © 2006 62
Frank Vahid
Register File
• Instead, want component that has one data input and one data output,
and allows us to specify which internal register to write and which to read

32 32
W_data R_data a

4 4
W_addr R_addr

W_en R_en
16×32
register file

Digital Design
Copyright © 2006 63
Frank Vahid
Register File Timing Diagram
• Can write one clk
cycle 1 cycle 2 cycle 3 cycle 4 cycle 5 cycle 6

register and read 1 2 3 4 5 6

one register each W_data 9 22 X X 177 555

clock cycle W_addr 3 1 X X 2 3


– May be same
W_en
register
R_data Z Z Z 9 Z 22 9 555

R_addr X X 3 X 1 3

R_en

0: ? 0: ? 0: ? 0: ? 0: ? 0: ? 0: ?
32 32
W_data R_data
1: ? 1: ? 1: 22 1: 22 1: 22 1: 22 1: 22
2: ? 2: ? 2: ? 2: ? 2: ? 2: 177 2: 177
2 2
W_addr R_addr 3: ? 3: 9 3: 9 3: 9 3: 9 3: 9 3: 555

W_en R_en
4x32
register file

Digital Design
Copyright © 2006 64
Frank Vahid
5.6

Memory Components
• Register-transfer level
design instantiates datapath
components to create
datapath, controlled by a
controller

M words
– A few more components are
often used outside the
controller and datapath
• MxN memory
– M words, N bits wide each N-bits
wide each
• Several varieties of memory,
M× N memory
which we now introduce

Digital Design
Copyright © 2006 65
Frank Vahid
Random Access Memory (RAM)
• RAM – Readable and writable memory 32 32
W_data R_data
– “Random access memory” 4 4
• Strange name – Created several decades ago to W_addr R_addr
contrast with sequentially-accessed storage like W_en R_en
tape drives 16×32
register file
– Logically same as register file – Memory with
address inputs, data inputs/outputs, and control Register file from Chpt. 4
• RAM usually just one port; register file usually two
or more
– RAM vs. register file 32
data
• RAM typically larger than roughly 512 or 1024 10
addr
words 1024 × 32
rw RAM
• RAM typically stores bits using a bit storage
approach that is more efficient than a flip flop en
• RAM typically implemented on a chip in a square
rather than rectangular shape – keeps longest
RAM block symbol
wires (hence delay) short
Digital Design
Copyright © 2006 66
Frank Vahid
RAM Internal Structure
32
data
10
addr Let A = log2M wdata(N-1) wdata(N-2) wdata0
1024x32
rw RAM word bit storage
en enable block
d0 (aka “cell”)
addr0 a0 word
addr1 a1 AxM
d1
decoder
data cell
addr
addr(A-1) a(A-1)
word word
e d(M-1) enable enable
clk
rw data
en
rw to all cells

rdata(N-1) rdata(N-2) rdata0 RAM cell

• Similar internal structure as register file


– Decoder enables appropriate word based on address inputs
– rw controls whether cell is written or read
– Let’s see what’s inside each RAM cell
Digital Design
Copyright © 2006 67
Frank Vahid
Static RAM (SRAM)
wdata(N-1) wdata(N-2) wdata0
SRAM cell
32 Let A = log2 M
data data’
data word bit storage
10 enable block ,, ,,
addr d0 (aka cell ) cell
1024x32 addr0 a0 word d d’
rw RAM addr1 a1 A× M
d1

addr
decoder
en addr(A-1) a(A-1) data cell
a
word word
e d(M-1) enable enable
clk
rw data
en
rw to all cells
word 0
rdata(N-1) rdata(N-2) rdata0 enable

• “Static” RAM cell SRAM cell


– 6 transistors (recall inverter is 2 transistors) data data’
1 0
– Writing this cell d
• word enable input comes from decoder a
• When 0, value d loops around inverters 1 0
– That loop is where a bit stays stored
• When 1, the data bit value enters the loop word 1
– data is the bit to be stored in this cell enable
– data’ enters on other side
data data’
– Example shows a “1” being written into cell cell
d d’
1 0 a
Digital Design
Copyright © 2006 68
Frank Vahid word 0
enable
Static RAM (SRAM)
wdata(N-1) wdata(N-2) wdata0
32 Let A = log2 M
data word bit storage
10 enable block ,, ,,
addr d0 (aka cell )
1024x32 addr0 a0 word
rw RAM addr1 a1 A× M
d1

addr
decoder
en addr(A-1) a(A-1) data cell

word word
e d(M-1) enable enable
clk
rw data
en
rw to all cells

• “Static” RAM cell rdata(N-1) rdata(N-2) rdata0


SRAM cell
– Reading this cell data data’
1 1
• Somewhat trickier
d
• When rw set to read, the RAM logic sets both data
and data’ to 1
1 0
• The stored bit d will pull either the left line or the right
a
bit down slightly below 1
• “Sense amplifiers” detect which side is slightly pulled 1 1 <1
word
down enable
– The electrical description of SRAM is really beyond To sense amplifiers
our scope – just general idea here, mainly to
contrast with DRAM...

Digital Design
Copyright © 2006 69
Frank Vahid
Dynamic RAM (DRAM)
wdata(N-1) wdata(N-2) wdata0
32 Let A = log2 M
data word bit storage
10 enable block ,, ,,
addr d0 (aka cell )
1024x32 addr0 a0 word
rw RAM addr1 a1 A× M
d1

addr
decoder
en addr(A-1) a(A-1) data cell

word word
e d(M-1) enable enable
clk
en
rw to all cells
rw data
DRAM cell
• “Dynamic” RAM cell rdata(N-1) rdata(N-2) rdata0 data

cell
– 1 transistor (rather than 6)
word
– Relies on large capacitor to store bit enable
d
capacitor
• Write: Transistor conducts, data voltage slowly
level gets stored on top plate of capacitor discharging

• Read: Just look at value of d (a)


• Problem: Capacitor discharges over time
data
– Must “refresh” regularly, by reading d and
enable
then writing it right back
discharges
d
(b)
Digital Design
Copyright © 2006 70
Frank Vahid
Comparing Memory Types
• Register file MxN Memory
– Fastest implemented as a:

– But biggest size register


file
• SRAM
– Fast SRAM
– More compact than register file DRAM
• DRAM
– Slowest
• And refreshing takes time
Size comparison for same
– But very compact
number of bits (not to scale)
• Use register file for small items,
SRAM for large items, and DRAM
for huge items
– Note: DRAM’s big capacitor requires
a special chip design process, so
DRAM is often a separate chip
Digital Design
Copyright © 2006 71
Frank Vahid
Reading and Writing a RAM
clk clk
1 2 3
addr 9 13 9 addr valid setup
time
data 500 999 Z 500 data valid hold Z 500
time
rw 1 means write setup
rw
time
en access
RAM[9] RAM[13] time
now equals 500 now equals 999
• Writing (b)
– Put address on addr lines, data on data lines, set rw=1, en=1
• Reading
– Set addr and en lines, but put nothing (Z) on data lines, set rw=0
– Data will appear on data lines
• Don’t forget to obey setup and hold times
– In short – keep inputs stable before and after a clock edge
Digital Design
Copyright © 2006 72
Frank Vahid
RAM Example: Digital Sound Recorder
4096× 16
RAM

addr
data

rw
en
wire 16
analog-to- digital-to-
digital 12 analog
ad_buf Ra Rrw Ren wire
microphone converter converter
ad_ld processor da_ld

• Behavior speaker
– Record: Digitize sound, store as series of 4096 12-bit digital values in RAM
• We’ll use a 4096x16 RAM (12-bit wide RAM not common)
– Play back later
– Common behavior in telephone answering machine, toys, voice recorders
• To record, processor should read a-to-d, store read values into
successive RAM words
– To play, processor should read successive RAM words and enable d-to-a
Digital Design
Copyright © 2006 73
Frank Vahid
RAM Example: Digital Sound Recorder
4096x16
• RTL design of processor RAM

– Create high-level state


machine 16
analog-to- digital-to-
– Begin with the record behavior digital ad_buf
12
Ra Rw Ren analog
converter converter
– Keep local register a ad_ld processor da_ld
• Stores current address,
ranges from 0 to 4095 (thus
Record behavior
need 12 bits)
Local register: a (12 bits)
– Create state machine that a<4095
counts from 0 to 4095 using a S T
• For each a a=0 ad_ld=1 a

– Read analog-to-digital conv. ad_buf=1


Ra=a U
» ad_ld=1, ad_buf=1 Rrw=1 a=a+1
– Write to RAM at address a Ren=1

» Ra=a, Rrw=1, Ren=1 a=4095

Digital Design
Copyright © 2006 74
Frank Vahid
RAM Example: Digital Sound Recorder
4096x16
– Now create play behavior RAM data bus
– Use local register a again,
create state machine that 16
counts from 0 to 4095 again analog-to-
digital 12
digital-to-
analog
ad_buf Ra Rw Ren
• For each a converter converter
ad_ld processor da_ld
– Read RAM
– Write to digital-to-analog conv.
• Note: Must write d-to-a one
Play behavior
cycle after reading RAM, when
Local register: a (12 bits)
the read data is available on
the data bus a<4095
V W
– The record and play state a=0
a
ad_buf=0
machines would be parts of a Ra=a
X
larger state machine controlled Rrw=0
Ren=1
by signals that determine when da_ld=1
a=a+1
to record or play
a=4095

Digital Design
Copyright © 2006 75
Frank Vahid
Read-Only Memory – ROM
• Memory that can only be read from, not 32
data
written to 10
addr
1024× 32
– Data lines are output only rw RAM
– No need for rw input en

• Advantages over RAM


– Compact: May be smaller RAM block symbol

– Nonvolatile: Saves bits even if power supply


is turned off 32
– Speed: May be faster (especially than data
10
DRAM) addr
1024x32
ROM
– Low power: Doesn’t need power supply to
en
save bits, so can extend battery life
• Choose ROM over RAM if stored data won’t ROM block symbol
change (or won’t change often)
– For example, a table of Celsius to Fahrenheit
conversions in a digital thermometer
Digital Design
Copyright © 2006 76
Frank Vahid
Read-Only Memory – ROM
32
data
10 1024x32
addr Let A = log2M
ROM
en
word bit storage
enable block
ROM block symbol d0 (aka “cell”)
addr0 a0 word
addr1 a1 AxM
d1
decoder
data
addr
addr(A-1) a(A-1)
word word
e d(M-1) enable enable
clk
en data

rdata(N-1) rdata(N-2) rdata0 ROM cell

• Internal logical structure similar to RAM, without the data


input lines

Digital Design
Copyright © 2006 77
Frank Vahid
ROM Types
• If a ROM can only be read, how Let A = log2 M
word bit storage

are the stored bits stored in the


enable block
,, ,,
d0 (a cell )
addr0 a0 word
addr1 a1 A× M
d1

addr
first place?
decoder
data
addr(A-1) a(A-1) cell
word word
e d(M-1) enable enable

– Storing bits in a ROM known as en


da

programming
data(N-1) data(N-2) data0

– Several methods
• Mask-programmed ROM 1 data line 0 data line

– Bits are hardwired as 0s or 1s cell cell


during chip manufacturing word
• 2-bit word on right stores “10” enable
• word enable (from decoder) simply
passes the hardwired value
through transistor
– Notice how compact, and fast, this
memory would be
Digital Design
Copyright © 2006 78
Frank Vahid
ROM Types
• Fuse-Based Programmable Let A = log2 M
word
enable
bit storage
block
,, ,,

ROM
d0 (a cell )
addr0 a0 word
addr1 a1 A× M
d1

addr
decoder
data
addr(A-1) a(A-1)

– Each cell has a fuse


cell
word word
e d(M-1) enable enable
da
en

– A special device, known as a data(N-1) data(N-2) data0

programmer, blows certain fuses


(using higher-than-normal voltage)
1 data line 1 data line
• Those cells will be read as 0s
(involving some special electronics) cell cell
• Cells with unblown fuses will be read word
a

as 1s enable

• 2-bit word on right stores “10”


fuse blown fuse
– Also known as One-Time
Programmable (OTP) ROM

Digital Design
Copyright © 2006 79
Frank Vahid
ROM Types
• Erasable Programmable ROM Let A = log2 M
word bit storage

(EPROM)
enable block
,, ,,
d0 (a cell )
addr0 a0 word
addr1 a1 A× M
d1

addr
– Uses “floating-gate transistor” in each cell
decoder
data
addr(A-1) a(A-1) cell
word word
e d(M-1) enable enable

– Special programmer device uses higher- en


da

than-normal voltage to cause electrons to data(N-1) data(N-2) data0

tunnel into the gate

floating-gate
• Electrons become trapped in the gate data line data line

transistor
• Only done for cells that should store 0 cell cell
• Other cells (without electrons trapped in 1 0
gate) will be 1 or
t
word eÐeÐ
– 2-bit word on right stores “10” enable
tingar
• Details beyond our scope – just general eatt trapped electrons
idea is necessary here g
– To erase, shine ultraviolet light onto chip
• Gives trapped electrons energy to escape
• Requires chip package to have window
Digital Design
Copyright © 2006 80
Frank Vahid
ROM Types
• Electronically-Erasable Programmable ROM
(EEPROM)
– Similar to EPROM
• Uses floating-gate transistor, electronic programming to
trap electrons in certain cells
– But erasing done electronically, not using UV light
– Erasing done one word at a time
• Flash memory
– Like EEPROM, but all words (or large blocks of
words) can be erased simultaneously 32
data
– Become common relatively recently (late 1990s) 10
addr
• Both types are in-system programmable en 1024x32
– Can be programmed with new stored bits while in the EEPROM
write
system in which the ROM operates
• Requires bi-directional data lines, and write control input busy

• Also need busy output to indicate that erasing is in


progress – erasing takes some time
Digital Design
Copyright © 2006 81
Frank Vahid
ROM Example: Digital Telephone Answering Machine
Using a Flash Memory
• Want to record the outgoing
announcement 4096x16 Flash
– When rec=1, record digitized “We’re not home.”
sound in locations 0 to 4095
busy
– When play=1, play those
stored sounds to digital-to- 16
analog converter analog-to-
digital 12 digital-to-
ad_buf Ra Rrw Ren er bu
• What type of memory? converter analog
– Should store without power ad_ld processor converter
da_ld
supply – ROM, not RAM
– Should be in-system rec
programmable – EEPROM record play
or Flash, not EPROM, OTP
microphone speaker
ROM, or mask-programmed
ROM
– Will always erase entire
memory when
reprogramming – Flash
better than EEPROM

Digital Design
Copyright © 2006 82
Frank Vahid
ROM Example: Digital Telephone Answering Machine
Using a Flash Memory
• High-level state machine 4096x16 Flash

– Once rec=1, begin


erasing flash by setting
16
er=1 analog-to-
digital 12 digital-to-
ad_buf Ra Rrw Ren er bu
– Wait for flash to finish converter
ad_ld processor
analog
converter
da_ld
erasing by waiting for
rec
bu=0 record play

– Execute loop that sets microphone speaker

local register a from 0 to


4095, reading analog-to- Local register: a (13 bits)
bu
digital converter and a<4096 a
writing to flash for each a S T bu’ U
a=0 er=0 ad_ld=1
er=1 ad_buf=1
Ra=a V
rec
Rrw=1
Ren=1
a=a+1 a=4096

Digital Design
Copyright © 2006 83
Frank Vahid
Blurring of Distinction Between ROM and RAM
• We said that
– RAM is readable and writable ROM Flash RAM
a
EEPROM NVRAM
– ROM is read-only
• But some ROMs act almost like RAMs
– EEPROM and Flash are in-system programmable
• Essentially means that writes are slow
– Also, number of writes may be limited (perhaps a few million times)
• And, some RAMs act almost like ROMs
– Non-volatile RAMs: Can save their data without the power supply
• One type: Built-in battery, may work for up to 10 years
• Another type: Includes ROM backup for RAM – controller writes RAM contents to
ROM before turning off
• New memory technologies evolving that merge RAM and ROM benefits
– e.g., MRAM
• Bottom line
– Lot of choices available to designer, must find best fit with design goals
Digital Design
Copyright © 2006 84
Frank Vahid
Hierarchy and Abstraction

• Abstraction
– Hierarchy often involves not just grouping
items into a new item, but also associating
higher-level behavior with the new item,
known as abstraction
• e.g., an 8-bit adder has an understandable a7.. a0 b7.. b0
high-level behavior – it adds two 8-bit binary
numbers 8-bit adder ci
– Frees designer from having to remember,
co s7.. s0
or even from having to understand, the
lower-level details

Digital Design
Copyright © 2006 85
Frank Vahid
Hierarchy and Composing Larger Components
from Smaller Versions
4× 1
• A common task is to compose smaller components i0 i0
into a larger one i1 i1 a
– Gates: Suppose you have plenty of 3-input AND gates, i2 i2 d
but need a 9-input AND gate
i3 i3
• Can simple compose the 9-input gate from several 3-input
gates 2× 1
– Muxes: Suppose you have 4x1 and 2x1 muxes, but s1 s0 i0
need an 8x1 mux d
4× 1 i1
• s2 selects either top or bottom 4x1
• s1s0 select particular 4x1 input i4 i0 s0
• Implements 8x1 mux – 8 data inputs, 3 selects, one output i5 i1
i6 i2 d
i7 i3
Pr
o
vin s1 s0
ec1
s1 s0 s2

Digital Design
Copyright © 2006 86
Frank Vahid
Hierarchy and Composing Larger Components
from Smaller Versions
• Composing memory very common
• Making memory words wider
– Easy – just place memories side-by-side until desired width obtained
– Share address/control lines, concatenate data lines
– Example: Compose 1024x8 ROMs into 1024x32 ROM
10
addr addr addr addr
1024x8 1024x8 1024x8 1024x8
addr ROM ROM ROM ROM
en en en en
data data data data
en
8 8 8 8

data(31..0)
10
1024x32
ROM
data
Digital Design
Copyright © 2006 32
87
Frank Vahid
Hierarchy and Composing Larger Components
from Smaller Versions
11
• Creating memory with more words a9..a0
addr
– Put memories on top of one another until the number 1x2 d0 1024x8
of desired words is achieved addr a10
i0 dcd ROM
– Use decoder to select among the memories
e d1 en data
• Can use highest order address input(s) as decoder input
• Although actually, any address line could be used 8
– Example: Compose 1024x8 memories into 2048x8
memory en addr
1024x8
11 ROM
2048x8 en data
ROM
a10 a9 a8 a0 8
0 0 0 0 0 0 0 0 0 0 0 data
0 0 0 0 0 0 0 0 0 0 1 addr 8
0 0 0 0 0 0 0 0 0 1 0 1024x8
a
ROM
0 1 1 1 1 1 1 1 1 1 0 en data
a10 just chooses
0 1 1 1 1 1 1 1 1 1 1 a
which memory to To create memory with more
access 1 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 1 addr words and wider words, can first
1 0 0 0 0 0 0 0 0 1 0 1024x8 compose to enough words, then
ROM widen.
Digital Design
1 1 1 1 1 1 1 1 1 1 0 en data
Copyright © 2006 88
Frank Vahid 1 1 1 1 1 1 1 1 1 1 1
Chapter Summary
– Modern digital design involves creating processor-level components
– Four-step RTL method can be used
• 1. High-level state machine 2. Create datapath 3. Connect datapath
to controller 4. Derive controller FSM
– Several example
• Control dominated, data dominated, and mix
– Determining fastest clock frequency
• By finding critical path
– Behavioral-level design – C to gates
• By using method to convert C (subset) to high-level state machine
– Additional RTL components
• Memory: RAM, ROM
• Queues
– Hierarchy: A key concept used throughout Chapters 2-5
Digital Design
Copyright © 2006 89
Frank Vahid

You might also like