Pipelining & Verilog
Pipelining & Verilog
• Division
• Latency & Throughput
• Pipelining to increase throughput
• Retiming
• Verilog Math Functions
Chose minimium
number for application
Circuit Latency (L): time between arrival of new input and generation
of corresponding output.
Latency dependent on
dividend width +
fractioanl reminder width
X
F(X)
G(X)
P(X)
Cutset retiming: A cutset intersects the edges, such that this would result in two disjoint
partitions of the edges being cut. To retime, delays are moved from the ingoing to the
outgoing edges or vice versa.
Benefits of retiming:
• Modify critical path delay
• Reduce total number of registers
6.111 Fall 2016 Lecture 9 10
Retiming Combinational Circuits
aka “Pipelining”
15 15
X 25 P(X) Xi 25 P(Xi-2)
20 20
G
20
i i+1 i+2 i+3
CONVENTION:
Every pipeline stage, hence every K-Stage pipeline, has a register on its
OUTPUT (not on its input).
ALWAYS:
The CLOCK common to all registers must have a period sufficient to
cover propagation over combinational paths PLUS (input) register tPD
PLUS (output) register tSETUP.
X A C 1 2
B
Y
Problem:
Successive inputs get mixed: e.g., B(A(Xi+1), Yi). This
happened because some paths from inputs to outputs
have 2 registers, and some have only 1!
This CAN’T HAPPEN on a well-formed K pipeline!
cut-set. D F
4 nS 5 nS
= register
= register
65mhz = 27mhz*2.4
Synthesis
report Multiple: 7.251ns
Total Propagation
delay: 34.8ns
• Key questions:
– How to make logic blocks programmable?
(after chip has been fabbed!)
– What should the logic granularity be?
– How to make the wires programmable?
(after chip has been fabbed!) n m
Logic
– Specialized wiring structures for local Inputs Logic
D
SET
Q
Outputs
vs. long distance routes?
CLR
Q
input AND
signals array OR array
output
signals
programming of programming of
product terms sum terms
D Q
Switch Output Pad
Matrix Buffer
Input
Buffer
Q D Delay
CLB CLB
Programmable
Interconnect I/O Blocks (IOBs)
C1 C2 C3 C4
H1 DIN S/R EC
S/R
Control
G4 DIN
G3 G F'
SD
G2 Func. G' D Q
Gen. H'
G1
EC
RD
1
H G'
Y
Func. H'
S/R
Gen. Control
Configurable
F4
F3 F DIN
Func. SD
F2 Gen.
F'
G' D Q
EC
RD
1
H'
F'
X
K
Output
4LUT example
6.111 Fall 2016 Lecture 9 25
Configuring the CLB as a RAM
Memory is built using Latches not FFs
16x2
Gigabit
Serial
18 Bit
36 Bit
I/O
18 Bit
Multiplier VCCIO
Programmable Z
Z
Impedance
Control
Termination Clock
BRAM Mgmt
Courtesy of David B. Parlour, ISSCC 2004 Tutorial,
“The Reality and Promise of Reconfigurable Computing in Digital Signal Processing”
6.111 Fall 2016 Lecture 9 29
The Virtex II CLB (Half Slice Shown)
B
A Y = A B Cin
Cin
Gigabit ethernet
support
a
c
b
SET SET
D Q D Q
LUT
b Q Q
CLR CLR
LUT
LUT
LUT
Routing – iterative process to connect CLB inputs/outputs and IOBs. Optimizes critical path
delay – can take hours or days for large, dense designs
Challenge! Cannot use full chip for reasonable speeds (wires are not ideal).
Typically no more than 50% utilization.
6.111 Fall 2016 Lecture 9 34
Example: Verilog to FPGA
Logic Emulation
Prototyping
Ensemble of gate arrays used to emulate a
circuit to be manufactured
Get more/better/faster debugging done than
with simulation
Reconfigurable hardware
One hardware block used to implement more
than one function
Special-purpose computation engines
Hardware dedicated to solving one problem
(or class of problems)
Accelerators attached to general-purpose
computers (e.g., in a cell phone!)
FPGA-based Emulator
(courtesy of IKOS)
initial begin
// Initialize Inputs
bit_in = 0;
bus_in = 0;
end
endmodule