Vlsi
Vlsi
Course: 2IMN35
www: http://www.win.tue.nl/~wsinmak/Education/2IMN35/
Lecture 1: Introduction
1 19/04/16
Introduction to VLSI Programming: goals
2 19/04/16
Contents
3 19/04/16
FPGA IC on a Xilinx XUP Board (Atlys)
FPGA
Xilinx
Spartan 6
4 19/04/16
Atlys board, based on Xilinx Spartan 6
FPGA
Xilinx
Spartan 6
5
19/04/16
Lab work prerequisites
• Exceed
(can be obtained through the TU/e software distribution)
2015 in Tue: h5-h8; MF.07 out 2015 in Thu: h1-h4; Gemini-Z3A-08/10/13 out
19-Apr introduc/on, DSP graphs, bounds, … 21-Apr pipelining, re/ming, transposi/on, T1
J-slow, unfolding + T2
26-Apr tools Introduc/ons to L1: audio filter L1 28-Apr T1 unfolding, look-ahead, L1 cntd T3
installed FPGA and Verilog simula/on L2 + T2 strength reduc/on + T4
3-May folding L2: audio filter 5-May
on XUP board
10-May T3 + T4 DSP processors L2 cntd L3 12-May L3:
sequen/al FIR + strength-reduced FIR
17-May L3 cntd 19-May L3 cntd L4
24-May systolic computa/on T5 26-May L4
31-May T5 L4: 2-Jun L3 L4 cntd L5
audio sample rate convertor
7-Jun L5: 9-Jun L4 L5 cntd
1024x audio sample rate convertor
14-Jun 16-Jun L5 deadline report L5
7 19/04/16
Course grading (provisional)
8 19/04/16
Note on course literature
Mandatory reading:
• Keshab K. Parhi. High-Level Algorithm and Architecture
Transformations for DSP Synthesis. Journal of VLSI Signal
Processing, 9, 121-143 (1995), Kluwer Academic Publishers.
9 19/04/16
Introduction
• Parhi, Chapters 1, 2
• DSP Representation Methods
• Iteration bounds
10 19/04/16
Some inspiration
from the technology side
11 19/04/16
Vertical cut through VLSI circuit
12 19/04/16
Intel 4004 processor [1970]
§ 1970
§ 4-bit
§ 2300
transist
ors
13 19/04/16
Apple A9 SoC (System on Chip)
• 2015
• Production:
Samsung/TSMC
• 14/16 nm FinFet
• 96/104.5 mm2
• > 2B transistors
• Assuming 0.1$/mm2
production costs
• ⇒ 5 nano$ / transistor
14 19/04/16
Flash memory
• 32 GB = 256Gb
• ≈100G transistors => << 1 n$ per transistor
15 19/04/16
Xilinx Kintex7 FPGA
18 19/04/16
A 2016 “node”
Source: Samsung
19 19/04/16
Source: NVidia
20 19/04/16
Moore’s Law: 50th anniversary in 2015!
21 19/04/16
Cost per Transistor over Time for Intel MPUs
×0.5/2years
↑
US$
?
Rule of two [Hu, 1993]
23 19/04/16
ITRS: INTERNATIONAL TECHNOLOGY
ROADMAP FOR SEMICONDUCTORS
• The overall objective of the ITRS is to present industry-wide
consensus on the “best current estimate” of the industry’s
research and development needs out to a 15-year horizon.
• As such, it provides a guide to the efforts of companies,
universities, governments, and other research providers/funders.
• The ITRS has improved the quality of R&D investment decisions
made at all levels and has helped channel research efforts to
areas that most need research breakthroughs.
25 19/04/16
2013 ITRS
MPU/ASIC Half Pitch and Gate Length Trends
Flexible Logic
500MHz clock 6,144 CLBs
Programmable
multi-port RAM
512 DSP slides
320 × 18 kbit
27 19/04/16
Some inspiration
from the application side
28 19/04/16
All things grand and small [Moravec ‘98]
29 19/04/16
Chess Machine Performance [Moravec ‘98]
30 19/04/16
Evolution computer power/cost [Moravec ‘98]
31 19/04/16
The Square Kilometer Array (SKA)
• Chip fotos:
• http://www-vlsi.stanford.edu/group/chips.html
• ITRS Roadmap
• http://www.itrs.net/Links/2005ITRS/ExecSum2005.pdf
34 19/04/16
VLSI Digital Signal Processing
Systems
35 19/04/16
DSP applications classes
10G
1G radar
Sample rate [Hz]→
100M
10M
HDTV
radio
1M video
100k
modems
audio modems
10k
speech
1k seismic
100 control modeling
10
1
complexity →
# operations/sample [log]
36 19/04/16
Typical DSP algorithms
37 19/04/16
Typical DSP kernels: FIR Filters
• where
• x is the input sequence
• y is the output sequence
• h is the impulse response (filter coefficients)
• N is the number of taps (coefficients) in the filter
• Output sequence depends only on input sequence and impulse
response.
38 19/04/16
Typical DSP kernels: IIR Filters
39 19/04/16
Typical DSP kernels: DFT and FFT
The Discrete Fourier Transform (DFT) supports frequency
domain (“spectral”) analysis:
N −1 − 2 jπ
nk
y (k ) = ∑ WN x(n) WN = e N j = −1
n =0
for k = 0, 1, … , N-1, where
• x is the input sequence in the time domain (real or complex)
• y is an output sequence in the frequency domain (complex)
The Inverse Discrete Fourier Transform (IDFT) is computed as
N −1
− nk
x(n) = ∑ WN y ( k ), for n = 0, 1, ... , n - 1
k =0
The Fast Fourier Transform (FFT) and its inverse (IFFT) provide
an efficient method for computing the DFT and IDFT.
40 19/04/16
Typical DSP kernels: DCT
The Discrete Cosine Transform (DCT) and its inverse IDCT are
frequently used in video (de-) compression (e.g., MPEG-2):
N −1
( 2 n + 1) kπ
y ( k ) = e ( k ) ∑ cos[ ] x ( n ), for k = 0, 1, ... N - 1
n=0 2N
N −1
2 ( 2 n + 1) kπ
x(n) = ∑ e( k ) cos[ ] y ( n ), for k = 0, 1, ... N - 1
N k =0 2N
41 19/04/16
Typical DSP kernels: distance calculation
N −1
1 1 N −1
d= ∑ | x (i ) − rk (i ) | d= ∑ [ x ( i ) − rk ( i )] 2
N i =0 N i =0
42 19/04/16
Typical DSP kernels: matrix computations
43 19/04/16
Computation Rates
RC = R S ⋅ N S
• where
• Rc is the computation rate
• Rs is the sampling rate
• Ns is the (average) number of operations per sample
44 19/04/16
Computational Rates for FIR Filtering
45 19/04/16
DSP systems and programs
46 19/04/16
DSP SYSTEMS
GRAPHICAL REPRESENTATIONS
47 19/04/16
DSP systems: 3 graphical representations
• Block diagram:
• general block diagram
• loose semantics
data flow graph
• Data-flow graph:
• used for signal processing signal flow graph
• formal definition
LTI systems
• Signal-flow graph:
general
• linear time-invariant systems
• formal definition, stilll more theory
48 19/04/16
DSP system: block diagram
x(n-1) x(n-2)
x(n) D D
a × b × c ×
+ + y(n)
49 19/04/16
DSP system: data-flow graph (DFG)
D D
x(n)
a b c
y(n)
50 19/04/16
Data-flow graph (DFG)
D D
x(n)
a b c
y(n)
51 19/04/16
Data-flow graphs
52 19/04/16
Data-flow graphs
53 19/04/16
Signal-flow graph (representation method 3)
input x, output y:
discrete system:
results in
• x(n) y(n)
linear system:
• x1(n) + x2(n) results in y1(n) + y2(n)
• c1 x1(n) + c2 x2(n)
results in
c1 y1(n) + c2 y2(n)
55 19/04/16
Linear Time-Invariant Systems
input x, output y:
56 19/04/16
Commutativity of LTI systems
is equivalent to
57 19/04/16
LOOP BOUNDS
AND ITERATION BOUNDS
58 19/04/16
Iteration of a Synchronous Flow Graph
59 19/04/16
Iteration period
Iteration period =
the time required for the execution of one iteration of the SFG
a
Example, let
×
• Tm = 10 = multiplication time
• Ta = 4 = addition time
x(n) + D y(n-1)
60 19/04/16
Loop and loop bound
• A loop (cycle) in a DFG is a directed path that begins and ends
at the same node.
• The loop bound of loop j is defined Tj/Wj where
• Tj is the loop computation time (sum of all Ti of loop nodes i ),
• Wj is the number of delays (D-elements) in the loop.
61 19/04/16
Critical loop and Iteration bound
62 19/04/16
Iteration bound cntd
Example:
• TL1 = (10+2)/1 = 12
• TL2 = (2+3+5)/2 =5
• TL3 = (10+2+3)/2 = 7.5
• Iteration bound = max (12, 5, 7.5) = 12
Notes:
• Delays are non-negative
(negative delay would imply non-causality).
• If loop weight equals 0 (no delay elements in loop)
then TL/0 = ∞ (deadlock).
63 19/04/16
4 types of delay paths; critical path
• Redraw block diagram by partitioning nodes in
D-elements and combinational functions (“FSM view”):
combinational
inputs functions 3 outputs
path from to
2
1 1 inputs state
2 state outputs
4
3 inputs outputs
65 19/04/16
DSP references
66 19/04/16
Computer Architecture and DSP references
67 19/04/16
VLSI Programming:
68 19/04/16
VLSI Programming: Thursday April 21
Transformations:
• Transposition
• Pipelining
• Retiming
• K-slow transformation
• Parallel processing
(Parhi, Chapters 2, 3)
69 19/04/16
THANK YOU