0% found this document useful (0 votes)

109 views70 pages

Vlsi

Processing notes

Uploaded by

Cecilia Chinna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

109 views70 pages

Vlsi

Processing notes

Uploaded by

Cecilia Chinna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 70

VLSI Programming 2016: Lecture 1

Course: 2IMN35

Teachers: Kees van Berkel [email protected]

Rudolf Mak [email protected]

Lab: Kees van Berkel, Rudolf Mak,

Alok Lele

www: http://www.win.tue.nl/~wsinmak/Education/2IMN35/

Lecture 1: Introduction

1 19/04/16
Introduction to VLSI Programming: goals

•  to acquire insight in the description, design, and

optimization of fine-grained parallel computations;

•  to acquire insight in the (future) capabilities of VLSI  

as an implementation medium of parallel computations;

•  to acquire skills in the design of parallel computations

and in their implementation on FPGAs.

2 19/04/16
Contents

Massive parallelism is needed to exploit  

the huge and still increasing computational capabilities  
of Very Large Scale Integrated (VLSI) circuits:

•  we focus on fine-grained parallelism  

(not on networks of computers);

•  we assume that parallelism is by design  

(not by compilation);

•  we draw inspiration from consumer applications, such as

digital TV, 3D TV, image processing, mobile phones, etc.;

•  we will use Field Programmable Arrays (FPGA) as fine-

grained abstraction of VLSI for practical implementation.

3 19/04/16
FPGA IC on a Xilinx XUP Board (Atlys)

FPGA
Xilinx
Spartan 6

4 19/04/16
Atlys board, based on Xilinx Spartan 6

FPGA
Xilinx
Spartan 6

5
19/04/16
Lab work prerequisites

•  Laptop, running Windows

•  Exceed  
(can be obtained through the TU/e software distribution)

•  Access to UNIX server Dept. W&I 

(can be obtained through BCF)

•  Lab work is by teams of two students,  

with at least 1 Windows laptop.

•  Have FPGA tools (SW) installed on your machine  

by Tuesday April 26 

•  check website 2IMN35

6 19/04/16
VLSI Programming (2IMN35): time table 2016 

2015 in Tue: h5-h8; MF.07 out 2015 in Thu: h1-h4; Gemini-Z3A-08/10/13 out
19-Apr introduc/on, DSP graphs, bounds, … 21-Apr pipelining, re/ming, transposi/on, T1
J-slow, unfolding + T2
26-Apr tools Introduc/ons to L1: audio ﬁlter L1 28-Apr T1 unfolding, look-ahead, L1 cntd T3
installed FPGA and Verilog simula/on L2 + T2 strength reduc/on + T4
3-May folding L2: audio ﬁlter 5-May
on XUP board
10-May T3 + T4 DSP processors L2 cntd L3 12-May L3:
sequen/al FIR + strength-reduced FIR
17-May L3 cntd 19-May L3 cntd L4

24-May systolic computa/on T5 26-May L4

31-May T5 L4: 2-Jun L3 L4 cntd L5
audio sample rate convertor
7-Jun L5: 9-Jun L4 L5 cntd
1024x audio sample rate convertor

14-Jun 16-Jun L5 deadline report L5

7 19/04/16
Course grading (provisional)

Your course grade is based on:

•  the quality of your programs/designs [30%];

•  your final report on the design and evaluation  

of these programs (guidelines will follow) [30%];

•  a concluding discussion with you on the  

programs, the report and the lecture notes [20%];

•  intermediate assignments [20%].

•  Credits: 5 points = based on 140 hours from your side

8 19/04/16
Note on course literature

Lectures VLSI programming are loosely based on:

•  Keshab K. Parhi. VLSI Digital Signal Processing Systems, Design and
Implementation. Wiley Inter-Science 1999.
•  This book is recommended, but not mandatory

Accompanying slides can be found on:

•  http://www.ece.umn.edu/users/parhi/slides.html
•  http://www.win.tue.nl/~wsinmak/Education/2IMN35/

Mandatory reading:
•  Keshab K. Parhi. High-Level Algorithm and Architecture
Transformations for DSP Synthesis. Journal of VLSI Signal
Processing, 9, 121-143 (1995), Kluwer Academic Publishers.

9 19/04/16
Introduction

• Some inspiration from the technology side

• VLSI
• FPGAs

• Some inspiration from the application side

• Machine Intellligence
• Bee, SKA, SETI
• Digital Signal Processing (Software Defined Radio)

• Parhi, Chapters 1, 2
• DSP Representation Methods
• Iteration bounds

10 19/04/16
Some inspiration
from the technology side

11 19/04/16
Vertical cut through VLSI circuit

12 19/04/16
Intel 4004 processor [1970]

§  1970
§  4-bit
§  2300
transist
ors

13 19/04/16
Apple A9 SoC (System on Chip)

• 2015
• Production:  
Samsung/TSMC
• 14/16 nm FinFet
• 96/104.5 mm2
•  > 2B transistors
• Assuming 0.1$/mm2
production costs
• ⇒ 5 nano$ / transistor

14 19/04/16
Flash memory

• 32 GB = 256Gb
• ≈100G transistors => << 1 n$ per transistor

15 19/04/16
Xilinx Kintex7 FPGA

• 2G transistors • 1920 DSP slices

• 165mm2
16 19/04/16
Stratix 10 FPGA from Altera (Intel)

• > 10,000 FLOPs per clock cycle

• @ nearly 1 GHz
17 19/04/16
Exa-scale computing: 1018 FLOPs/Sec

A scenario (year 2021):

• 1018 FLOPs/sec = 109 arithmetic units running 109Hz
• 109 arithmetic units = 104.5 nodes ×104.5 arithmetic units
• 1 node = 32TFLOPs/s “X”+ 1TB DRAM + “CPU”  
@ 10 MW 

Today (2016: “petaflop” era):

• #1: Tianhe-2 (China): 34 ×1015 FLOPs/sec  
104.5 nodes @ 24 MW,
• GPU (Nvidia GM200): 6 TFLOPs/sec
• FPGA (Altera Stratix 10, GX2800): 9 TFLOPs/sec

18 19/04/16
A 2016 “node”

Source: Samsung
19 19/04/16
Source: NVidia
20 19/04/16
Moore’s Law: 50th anniversary in 2015!

21 19/04/16
Cost per Transistor over Time for Intel MPUs

×0.5/2years

↑ 
US$
?
Rule of two [Hu, 1993]

•  Every 2 generations of IC technology (6 years)

•  device feature size 0.5 x
•  chip size 2x
•  clock frequency 2x (no longer true)
•  number of i/o pins 2x
•  DRAM capacity 16 x
•  logic-gate density 4 x

23 19/04/16
ITRS: INTERNATIONAL TECHNOLOGY
ROADMAP FOR SEMICONDUCTORS
• The overall objective of the ITRS is to present industry-wide
consensus on the “best current estimate” of the industry’s
research and development needs out to a 15-year horizon.
• As such, it provides a guide to the efforts of companies,
universities, governments, and other research providers/funders.
• The ITRS has improved the quality of R&D investment decisions
made at all levels and has helped channel research efforts to
areas that most need research breakthroughs.

• Involves over 1000 technical experts, world wide.

• a self-fulfilling prophecy? … or wishful thinking?

24 ST-Ericsson confidential 19/04/16

ITRS 2013

25 19/04/16
2013 ITRS 
MPU/ASIC Half Pitch and Gate Length Trends

26 ST-Ericsson confidential 19/04/16

Virtex 4 FPGA: 4VSX55 
FPGA = Field Programmable Gate Array

Flexible Logic
500MHz clock 6,144 CLBs

Programmable
multi-port RAM
512 DSP slides
320 × 18 kbit

450MHz 1Gbps 0.6-11.1Gbps

PowerPC™ Differential I/O Serial Trx

27 19/04/16
Some inspiration
from the application side

28 19/04/16
All things grand and small [Moravec ‘98]

29 19/04/16
Chess Machine Performance [Moravec ‘98]

30 19/04/16
Evolution computer power/cost [Moravec ‘98]

brain power equivalent

per $1000 of computer

31 19/04/16
The Square Kilometer Array (SKA)

... the ultimate exploration tool

... and the ultimate  
software defined radio

MPSoC -- 2010, June

32 30
The Square Kilometer Array (SKA)
• antenna surface: 1 km2 (sensitivity 50×)
• large physical extent (3000+ km)
• wide frequency range: 50 MHz – 30 GHz
• full design by 2016; phase 1: 2021; phase 2: 2026
• phase 1: 250 dishes (12m) in the central 5 km
• + dense and/or sparse aperture arrays
• connected to a massive data processor by an optical fibre network
• Software Defined Radio Astronomy
• computational load ≈ 1 exa FLOPs/sec (1018 FLops/s)
• power budget = 20 MW (≈ 20 pJ/FLOP “all-in”)
MPSoC -- 2010, June
33 30
References

•  Chip fotos:
•  http://www-vlsi.stanford.edu/group/chips.html

•  ITRS Roadmap
•  http://www.itrs.net/Links/2005ITRS/ExecSum2005.pdf

•  When will computer hardware match the human brain?

•  http://www.jetpress.org/volume1/moravec.htm

•  BEE & Square Kilometer Array

•  http://bwrc.eecs.berkeley.edu/Research/BEE/
•  http://seti.berkeley.edu/casper/papers/BEE2_ska2004_poster.pdf
•  http://www.skatelescope.org/

34 19/04/16
VLSI Digital Signal Processing
Systems

Parhi, Chapters 1&2

35 19/04/16
DSP applications classes

10G
1G radar
Sample rate [Hz]→

100M
10M
HDTV
radio
1M video
100k
modems
audio modems
10k
speech
1k seismic
100 control modeling
10
1
complexity →
# operations/sample [log]
36 19/04/16
Typical DSP algorithms

• speech (de-)coding • sound synthesis

• speech recognition • echo cancellation
• speech synthesis • modem: (de-)modulation
• speaker identification • vision
• Hi-fi audio en/decoding • image (de-)compression
• noise cancellation • image composition
• audio equalization • beam cancellation
• ambient acoustic • spectral estimation
emulation.
• etc.

37 19/04/16
Typical DSP kernels: FIR Filters

• Filters reduce signal noise and enhance image or signal quality

by removing unwanted frequencies.
• Finite Impulse Response (FIR) filters compute y(n) :
N −1
y (i ) = ∑ h ( k ) x (i − k ) = h ( n ) * x ( n )
k =0

• where
• x is the input sequence
• y is the output sequence
• h is the impulse response (filter coefficients)
• N is the number of taps (coefficients) in the filter
• Output sequence depends only on input sequence and impulse
response.
38 19/04/16
Typical DSP kernels: IIR Filters

• Infinite Impulse Response (IIR) filters compute:

M −1 N −1
y (i ) = ∑ a ( k ) y (i − k ) + ∑ b ( k ) x (i − k )
k =1 k =0

• Output sequence depends on input sequence, impulse

response,as well as previous outputs

• Adaptive filters (FIR and IIR) update their coefficients to

minimize the distance between the filter output and the
desired signal.

39 19/04/16
Typical DSP kernels: DFT and FFT
The Discrete Fourier Transform (DFT) supports frequency
domain (“spectral”) analysis:
N −1 − 2 jπ
nk
y (k ) = ∑ WN x(n) WN = e N j = −1
n =0
for k = 0, 1, … , N-1, where
• x is the input sequence in the time domain (real or complex)
• y is an output sequence in the frequency domain (complex)
The Inverse Discrete Fourier Transform (IDFT) is computed as
N −1
− nk
x(n) = ∑ WN y ( k ), for n = 0, 1, ... , n - 1
k =0
The Fast Fourier Transform (FFT) and its inverse (IFFT) provide
an efficient method for computing the DFT and IDFT.

40 19/04/16
Typical DSP kernels: DCT

The Discrete Cosine Transform (DCT) and its inverse IDCT are
frequently used in video (de-) compression (e.g., MPEG-2):

N −1
( 2 n + 1) kπ
y ( k ) = e ( k ) ∑ cos[ ] x ( n ), for k = 0, 1, ... N - 1
n=0 2N

N −1
2 ( 2 n + 1) kπ
x(n) = ∑ e( k ) cos[ ] y ( n ), for k = 0, 1, ... N - 1
N k =0 2N

where e(k) = 1/sqrt(2) if k = 0; otherwise e(k) = 1.

A N-Point, 1D-DCT requires N2 MAC operations.

41 19/04/16
Typical DSP kernels: distance calculation

• Distance calculations are typically used in pattern

recognition, motion estimation, and coding.

• Problem: chose the vector rk whose distance (see below) from

the input vector x is minimum.

Mean Absolute Difference (MAD) Mean Square Error (MSE)

N −1
1 1 N −1
d= ∑ | x (i ) − rk (i ) | d= ∑ [ x ( i ) − rk ( i )] 2
N i =0 N i =0

42 19/04/16
Typical DSP kernels: matrix computations

Matrix computations are typically used to estimate parameters

in DSP systems.

•  Matrix vector multiplication

•  Matrix-matrix multiplication
•  Matrix inversion
•  Matrix triangulization

Matrices may be dense/sparse/band-structured/….

43 19/04/16
Computation Rates

• To estimate the hardware resources required, we can use

the equation:

RC = R S ⋅ N S
• where
• Rc is the computation rate
• Rs is the sampling rate
• Ns is the (average) number of operations per sample

• For example, a 1-D FIR has NS = 2N  

and a 2-D FIR has NS = 2N2.

44 19/04/16
Computational Rates for FIR Filtering

Signal type Frequency # taps Performance

Speech 8 kHz N =128 20 MOPs

Music 48 kHz N =256 240 MOPs

Video phone 6.75 MHz N*N = 81 1,090 MOPs

TV 27 MHz N*N = 81 4,370 MOPs

HDTV 144 MHz N*N = 81 23,300 MOPs

45 19/04/16
DSP systems and programs

x(n) DSP y(n)

System

• infinite input stream (samples): x(0), x(1), x(2), …

• infinite output stream (samples): y(0), y(1), y(2), …
• (there may be multiple input and/or output streams)
• non-terminating program, e.g:
for n=1 to ∞ 
y(n) = a*x(n) + b*x(n-1) + c*x(n-2) 
end

46 19/04/16
DSP SYSTEMS 
GRAPHICAL REPRESENTATIONS

47 19/04/16
DSP systems: 3 graphical representations

• Block diagram:
• general block diagram

• loose semantics 
data flow graph

• Data-flow graph:
• used for signal processing signal flow graph

• formal definition
LTI systems

• powerful tools , lots of theory signal processing

• Signal-flow graph:
general
• linear time-invariant systems
• formal definition, stilll more theory

48 19/04/16
DSP system: block diagram

•  Consider FIR: y(n) = ax(n) + bx(n-1) + c*x(n-2)

x(n-1) x(n-2)
x(n) D D

a × b × c ×

+ + y(n)

•  D delay element = memory element = register  

•  a × multiply with constant a 

•  + adder: output value = sum of input values

49 19/04/16

 
DSP system: data-flow graph (DFG)

•  Consider FIR: y(n) = ax(n) + bx(n-1) + c*x(n-2)

D D
x(n)

a b c

y(n)

•  D is (non-negative) number of delays 

•  a multiplier: output value = (constant a) × input value 

•  adder: output value = sum of input values

50 19/04/16

 
Data-flow graph (DFG)

•  Consider FIR: y(n) = ax(n) + bx(n-1) + c*x(n-2)

D D
x(n)

a b c

y(n)

Each edge describes a precedence constraint between two nodes:

•  D=0: Intra-iteration precedence constraint
•  D>0: Inter-iteration precedence constraint

 
 
 
51 19/04/16
 
 
Data-flow graphs

Tokens can represent numbers, vectors (blocks), matrices …

Nodes may be complex (coarse-grained) functions, e.g.:

Single-rate data flow: Each node:

• consumes one token from each input edge;
• performs its function (in T time units);
• produces one token onto each output edge.

52 19/04/16
Data-flow graphs

Multi-rate data flow: Each node:

• consumes a fixed number of tokens from each input edge;
• performs its function (in T time units);
• produces a fixed number of tokens onto each output edge.

53 19/04/16
Signal-flow graph (representation method 3)

• A join-node denotes an adder

• Label a next to an edge denotes multiplication by constant a
• z-k denotes k units delay
• Signal-flow graphs are used to represent  
Linear Time Invariant systems LTI.
• A signal flow-graph represents a so-called  
Z-transform (Laplace), a powerful LTI system theory. 
(outside the scope of 2IN35)
54 19/04/16
Linear Systems

input x, output y:

discrete system:
results in
•  x(n) y(n)

linear system:
•  x1(n) + x2(n) results in y1(n) + y2(n)
•  c1 x1(n) + c2 x2(n)
results in
c1 y1(n) + c2 y2(n)

for arbitrary c1 and c2

Most of our examples will be linear systems

55 19/04/16
Linear Time-Invariant Systems

input x, output y:

•  x(n+k) = x(n) shifted by integer k sample periods  

time-invariant system
•  x’(n) =x(n+k) results in y’(n) = y(n+k)

Most of our examples will be linear time-invariant systems,

or LTI systems

56 19/04/16
Commutativity of LTI systems

LTI f(n) LTI

x(n) y(n)
System A System B

is equivalent to

LTI g(n) LTI

x(n) y(n)
System B System A

57 19/04/16
LOOP BOUNDS  
AND ITERATION BOUNDS

58 19/04/16
Iteration of a Synchronous Flow Graph

• Each actor fires the minimum number of times to return the

graph to a particular state
1 2 2 3 2 1
• Example of a  A B C
multi-rate DFG:
# firings for 1 iteration
A B C
2 2 3

# tokens per edge for 1 iteration

→A A→B B→C C→
2 4 6 3

59 19/04/16
Iteration period

Iteration period =  
the time required for the execution of one iteration of the SFG 
a
Example, let
×
• Tm = 10 = multiplication time
• Ta = 4 = addition time  x(n) + D y(n-1)

Iteration period = Tm+Ta = 14 [e.g. nsec]

= minimum sample period Ts; that is: Ts ≥ Tm+Ta
Iteration rate = (iteration period)-1 [e.g. GHz]

60 19/04/16
Loop and loop bound

• A loop (cycle) in a DFG is a directed path that begins and ends
at the same node.
• The loop bound of loop j is defined Tj/Wj where
• Tj is the loop computation time (sum of all Ti of loop nodes i ),
• Wj is the number of delays (D-elements) in the loop.

• Example (IIR filter):

a
• Tloop = Tm+Ta = 14 ns
• Wloop = 2 ×
• Loop bound  
= Tloop /Wloop   x(n) + 2D y(n-2)
= 14 /2 
=7 nsec

61 19/04/16
Critical loop and Iteration bound

• The critical loop of a DFG  

is the loop with the maximum loop bound.
• The iteration bound T∞ of a DFG   #T &
is the loop bound of the critical loop: T∞ = max %% j ((
$ Wj '
• L is the set of loops of the DFG
j∈L

• Tj of is the loop bound of loop j

• Wj of is the weight of loop j, i.e. the number of delays D.

62 19/04/16
Iteration bound cntd

Example:
• TL1 = (10+2)/1 = 12
• TL2 = (2+3+5)/2 =5
• TL3 = (10+2+3)/2 = 7.5
• Iteration bound = max (12, 5, 7.5) = 12

Notes:
• Delays are non-negative  
(negative delay would imply non-causality).
• If loop weight equals 0 (no delay elements in loop) 
then TL/0 = ∞ (deadlock).

63 19/04/16
4 types of delay paths; critical path
• Redraw block diagram by partitioning nodes in
D-elements and combinational functions (“FSM view”):

combinational
inputs functions 3 outputs
path from to
2
1 1 inputs state
2 state outputs
4
3 inputs outputs

delay elements 4 state state

= state

• Paths do not contain delay-elements

• The critical path is the path with the longest computation bound
and is an lower bound for the clock period.
64 19/04/16
Critical path cntd

Example (FIR filter):

• Tm= 10 ns
• Ta= 4 ns
• No loops! 
1.  1 path from input to state: 0 ns
2.  4 path from state to outputs: 26, 22, 18, 14 ns
3.  1 path from input to output: 26 ns
4.  3 paths from state to state: 0, 0, 0 ns
The critical path is 26 ns.  
(can be reduced by pipelining and parallel processing.)

65 19/04/16
DSP references

•  Keshab K. Parhi. VLSI Digital Signal Processing Systems, Design and

Implementation. Wiley Inter-Science 1999.

•  Richard G. Lyons. Understanding Digital Signal Processing (2nd

edition). Prentice Hall 2004.

•  John G. Proakis and Dimitris K Manolakis. Digital Signal Processing

(4th edition), Prentice Hall, 2006.

•  Simon Haykin. Neural Networks, a Comprehensive Foundation (2nd

edition). Prentice Hall 1999.

66 19/04/16
Computer Architecture and DSP references

•  Hennessy and Patterson, Computer Architecture, a

Quantitative Approach. 3rd edition. Morgan Kaufmann, 2002.

•  Phil Lapsley, Jeff Bier, Amit Sholam, Edward Lee. DSP

Processor Fundamentals, Berkeley Design Technology, Inc,
1994-199

•  Jennifer Eyre, Jeff Bier, The Evolution of DSP Processors, IEEE

Signal Processing Magazine, 2000.

•  Kees van Berkel et al. Vector Processing as an Enabler for

Software-Defined Radio in Handheld Devices, EURASIP Journal
on Applied Signal Processing 2005:16, 2613-2625.

67 19/04/16
VLSI Programming:

Preparations for Lab work, before Tuesday April 26:

• team up (2 students/team), and

• install FPGA tools.

68 19/04/16
VLSI Programming: Thursday April 21

Transformations:
•  Transposition
•  Pipelining
•  Retiming
•  K-slow transformation
•  Parallel processing

(Parhi, Chapters 2, 3)

69 19/04/16
THANK YOU

Image Enhancement PPT-2
50% (2)
Image Enhancement PPT-2
28 pages
Finite Impulse Response
No ratings yet
Finite Impulse Response
2 pages
1-S2.0-S1877050923004489-Main Ss
No ratings yet
1-S2.0-S1877050923004489-Main Ss
8 pages
Camera Calibration Report
No ratings yet
Camera Calibration Report
4 pages
Anna University Time Table 2021
No ratings yet
Anna University Time Table 2021
45 pages
Meghnad Saha Answers
No ratings yet
Meghnad Saha Answers
25 pages
Ffmpeg Scaler
No ratings yet
Ffmpeg Scaler
4 pages
DSP University QN Paper
No ratings yet
DSP University QN Paper
3 pages
Computer Vison Lab Task
No ratings yet
Computer Vison Lab Task
3 pages
Midterm Exam 2 SPRING 2020
No ratings yet
Midterm Exam 2 SPRING 2020
7 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Online Application Form - 126
No ratings yet
Online Application Form - 126
4 pages
Chapter-11 DSP
No ratings yet
Chapter-11 DSP
17 pages
2 IN35 Slides 1
No ratings yet
2 IN35 Slides 1
61 pages
DK1s Manual E
No ratings yet
DK1s Manual E
7 pages
Lesson 2 - Digitizing and Packetizing Voice
No ratings yet
Lesson 2 - Digitizing and Packetizing Voice
22 pages
Chapter5
No ratings yet
Chapter5
53 pages
Project Report
No ratings yet
Project Report
31 pages
Ece 5017-Digital Design With Fpga Lab: Slot:L5+L6 /L29+L30/L31+L32/L45+L46
No ratings yet
Ece 5017-Digital Design With Fpga Lab: Slot:L5+L6 /L29+L30/L31+L32/L45+L46
22 pages
Mira PDF
No ratings yet
Mira PDF
11 pages
Why Take 6.375: 6.375: Complex Digital Systems
No ratings yet
Why Take 6.375: 6.375: Complex Digital Systems
14 pages
Combine Syllabus of BE SEVENTH SEM and Eight Sem EN PDF
No ratings yet
Combine Syllabus of BE SEVENTH SEM and Eight Sem EN PDF
45 pages
IPA Course Content
No ratings yet
IPA Course Content
2 pages
FPGA_1722521703
No ratings yet
FPGA_1722521703
73 pages
LTI Systems
No ratings yet
LTI Systems
21 pages
91192v00 System Level Design
No ratings yet
91192v00 System Level Design
4 pages
Object Detection and Tracking Using Opencv in Python: March 2020
No ratings yet
Object Detection and Tracking Using Opencv in Python: March 2020
43 pages
Anthropometric Comparison of Nasal Indices Between Ham and Adara Tribes of Kaduna State, Nigeria
No ratings yet
Anthropometric Comparison of Nasal Indices Between Ham and Adara Tribes of Kaduna State, Nigeria
3 pages
LEPTON® Engineering Datasheet: General Description
No ratings yet
LEPTON® Engineering Datasheet: General Description
84 pages
Network-on-Chip: Ben Abdallah Abderazek The University of Aizu E-Mail: Benab@u-Aizu - Ac.jp
No ratings yet
Network-on-Chip: Ben Abdallah Abderazek The University of Aizu E-Mail: Benab@u-Aizu - Ac.jp
83 pages
Chapter2 Lect8
No ratings yet
Chapter2 Lect8
12 pages
IAY0600 Digital Systems Design
No ratings yet
IAY0600 Digital Systems Design
50 pages
Embedded Sys1_updated
No ratings yet
Embedded Sys1_updated
5 pages
Design and Management of 3D CMP's Using Network-in-Memory: Ashok Ayyamani
No ratings yet
Design and Management of 3D CMP's Using Network-in-Memory: Ashok Ayyamani
60 pages
Reconfigurable Computing
No ratings yet
Reconfigurable Computing
38 pages
MAOsyl11
No ratings yet
MAOsyl11
6 pages
08 Architecture
No ratings yet
08 Architecture
51 pages
An Introduction To: Compressive Sensing
No ratings yet
An Introduction To: Compressive Sensing
28 pages
Unit IV
No ratings yet
Unit IV
34 pages
Course Slides 2018
No ratings yet
Course Slides 2018
484 pages
2-3_Syllabus_Digital_System_Design_0
No ratings yet
2-3_Syllabus_Digital_System_Design_0
5 pages
Scheduling Algorithms For High-Level Synthesis
No ratings yet
Scheduling Algorithms For High-Level Synthesis
10 pages
Sdan 0025-0158 C
No ratings yet
Sdan 0025-0158 C
134 pages
Intelligent System For Brooding Process in Poultry Farming
No ratings yet
Intelligent System For Brooding Process in Poultry Farming
5 pages
M.tech2nd Sem Syllabus
No ratings yet
M.tech2nd Sem Syllabus
8 pages
09-00174
No ratings yet
09-00174
14 pages
Chapter 2
No ratings yet
Chapter 2
8 pages
Vlsi Design
No ratings yet
Vlsi Design
12 pages
Mod 5 HW - Upd 2024
No ratings yet
Mod 5 HW - Upd 2024
38 pages
Cs295: Modern Systems What Are Fpgas and Why Should You Care
No ratings yet
Cs295: Modern Systems What Are Fpgas and Why Should You Care
22 pages
AE306 Digital Signal Processing
No ratings yet
AE306 Digital Signal Processing
2 pages
EE 423 Embedded System Design Awais Kamboh
No ratings yet
EE 423 Embedded System Design Awais Kamboh
5 pages
Sega Dreamcast Gd-Rom Controller (Oti-9220)
No ratings yet
Sega Dreamcast Gd-Rom Controller (Oti-9220)
17 pages
Chat GPT
No ratings yet
Chat GPT
7 pages
Vestel DVD 6000
No ratings yet
Vestel DVD 6000
19 pages
Vlsi Mtech Jntu Kakinada II Sem Syllabus
50% (2)
Vlsi Mtech Jntu Kakinada II Sem Syllabus
16 pages
2022 06 15 FPGA Lecture HS
No ratings yet
2022 06 15 FPGA Lecture HS
79 pages
1 Introduction
No ratings yet
1 Introduction
29 pages
22ec902 Unit 5 Fpga Architecture and Applications
No ratings yet
22ec902 Unit 5 Fpga Architecture and Applications
61 pages
Valliammai Engineering College: Department of Electronics and Communication Engineering
No ratings yet
Valliammai Engineering College: Department of Electronics and Communication Engineering
9 pages
The DSP For FPGA Primer
No ratings yet
The DSP For FPGA Primer
2 pages
Lesson Plan - FCV - 2024
No ratings yet
Lesson Plan - FCV - 2024
4 pages
Preface: Brief Overview
No ratings yet
Preface: Brief Overview
3 pages
Introduction To Digital Hardware Design
No ratings yet
Introduction To Digital Hardware Design
25 pages
Lecture 7: Dynamic Circuits: High Speed CMOS VLSI Design
No ratings yet
Lecture 7: Dynamic Circuits: High Speed CMOS VLSI Design
15 pages
2017 01 31 FPGA Lecture HS
No ratings yet
2017 01 31 FPGA Lecture HS
75 pages
Design and Implementation of Signal Processing Systems: An Introduction
No ratings yet
Design and Implementation of Signal Processing Systems: An Introduction
35 pages
CV GTU ANSWERS
No ratings yet
CV GTU ANSWERS
56 pages
Mapping DSP Algorithms Into Fpgas: Sean Gallagher, Senior DSP Specialist, Xilinx Inc. 215-990-4616
No ratings yet
Mapping DSP Algorithms Into Fpgas: Sean Gallagher, Senior DSP Specialist, Xilinx Inc. 215-990-4616
35 pages
DSP Is Everywhere: Digital Signal Processing
No ratings yet
DSP Is Everywhere: Digital Signal Processing
20 pages
Ece699 Lecture1
No ratings yet
Ece699 Lecture1
106 pages
Mapping DSP Algorithms Into Fpgas
No ratings yet
Mapping DSP Algorithms Into Fpgas
35 pages
Introduction To Digital Hardware Design
No ratings yet
Introduction To Digital Hardware Design
25 pages
Es Ii
No ratings yet
Es Ii
16 pages
By Rodger H. Hosking: Reprinted From Vmebus Systems / June 2000
No ratings yet
By Rodger H. Hosking: Reprinted From Vmebus Systems / June 2000
4 pages
DSP Processors
No ratings yet
DSP Processors
114 pages
Fpga Timeline & Applications: Fpgas Past, Present & Future
No ratings yet
Fpga Timeline & Applications: Fpgas Past, Present & Future
39 pages
Course Work Syllabus
No ratings yet
Course Work Syllabus
4 pages
PG Diploma in Vlsi Soc Design and Verification Course Structure
No ratings yet
PG Diploma in Vlsi Soc Design and Verification Course Structure
16 pages
DSP Notes KAR Part2
No ratings yet
DSP Notes KAR Part2
23 pages
M (1) .Tech 2 Sem Syllabi
No ratings yet
M (1) .Tech 2 Sem Syllabi
16 pages
Course Plan Mtech 2014 Spring
No ratings yet
Course Plan Mtech 2014 Spring
16 pages
VLSI Programming: Lecture 1
No ratings yet
VLSI Programming: Lecture 1
61 pages
01 Introduction
No ratings yet
01 Introduction
29 pages
VLSI Laboratory: Manual For
No ratings yet
VLSI Laboratory: Manual For
80 pages
Seminar and Workshop: D.B.Rajesh Application Engineer
No ratings yet
Seminar and Workshop: D.B.Rajesh Application Engineer
56 pages
Mudge Mpsoc
No ratings yet
Mudge Mpsoc
47 pages
Week 1 Verilog
No ratings yet
Week 1 Verilog
45 pages
Digital Design Using FPGA
No ratings yet
Digital Design Using FPGA
33 pages
Sagar Report Final PDF
No ratings yet
Sagar Report Final PDF
65 pages
Reconfigurable Computing ES ZG554 Session 1: BITS Pilani
No ratings yet
Reconfigurable Computing ES ZG554 Session 1: BITS Pilani
18 pages
ADAU1701
No ratings yet
ADAU1701
12 pages
M.tech Vlsi Syllabus: D.A.John & K.Martin, Analog Integrated Circuit Design, Wiley, 1997
No ratings yet
M.tech Vlsi Syllabus: D.A.John & K.Martin, Analog Integrated Circuit Design, Wiley, 1997
5 pages
TCP/IP Networks and Connectivity
From Everand
TCP/IP Networks and Connectivity
Pasquale De Marco
No ratings yet
MULTICAST IP ROUTING Part-2: IP routing & forwarding
From Everand
MULTICAST IP ROUTING Part-2: IP routing & forwarding
Ummed Singh
No ratings yet
ROUTING INFORMATION PROTOCOL: RIP DYNAMIC ROUTING LAB CONFIGURATION
From Everand
ROUTING INFORMATION PROTOCOL: RIP DYNAMIC ROUTING LAB CONFIGURATION
Mulayam Singh
No ratings yet
Stack Computers: The New Wave
From Everand
Stack Computers: The New Wave
Philip Koopman
No ratings yet
LEARN MPLS FROM SCRATCH PART-B: A Beginners guide to next level of networking
From Everand
LEARN MPLS FROM SCRATCH PART-B: A Beginners guide to next level of networking
POONAM DEVI
No ratings yet
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
MARIO FRANCO
No ratings yet
PlayStation Architecture: Architecture of Consoles: A Practical Analysis, #6
From Everand
PlayStation Architecture: Architecture of Consoles: A Practical Analysis, #6
Rodrigo Copetti
No ratings yet
Antti-Brain Vol I
From Everand
Antti-Brain Vol I
Antti Lukats
No ratings yet

Vlsi

Uploaded by

Vlsi

Uploaded by

VLSI Programming 2016: Lecture 1

Teachers: Kees van Berkel [email protected]

Lab: Kees van Berkel, Rudolf Mak,

• to acquire insight in the description, design, and

• to acquire insight in the (future) capabilities of VLSI

• to acquire skills in the design of parallel computations

Massive parallelism is needed to exploit

• we focus on fine-grained parallelism

• we assume that parallelism is by design

• we draw inspiration from consumer applications, such as

• we will use Field Programmable Arrays (FPGA) as fine-

• Laptop, running Windows

• Access to UNIX server Dept. W&I

• Lab work is by teams of two students,

• Have FPGA tools (SW) installed on your machine

• check website 2IMN35

Your course grade is based on:

• the quality of your programs/designs [30%];

• your final report on the design and evaluation

• a concluding discussion with you on the

• intermediate assignments [20%].

• Credits: 5 points = based on 140 hours from your side

Lectures VLSI programming are loosely based on:

Accompanying slides can be found on:

• Some inspiration from the technology side

• Some inspiration from the application side

• 2G transistors • 1920 DSP slices

• > 10,000 FLOPs per clock cycle

A scenario (year 2021):

Today (2016: “petaflop” era):

• Every 2 generations of IC technology (6 years)

• Involves over 1000 technical experts, world wide.

24 ST-Ericsson confidential 19/04/16

26 ST-Ericsson confidential 19/04/16

450MHz 1Gbps 0.6-11.1Gbps

brain power equivalent

... the ultimate exploration tool

MPSoC -- 2010, June

• When will computer hardware match the human brain?

• BEE & Square Kilometer Array

Parhi, Chapters 1&2

• speech (de-)coding • sound synthesis

• Filters reduce signal noise and enhance image or signal quality

• Infinite Impulse Response (IIR) filters compute:

• Output sequence depends on input sequence, impulse

• Adaptive filters (FIR and IIR) update their coefficients to

where e(k) = 1/sqrt(2) if k = 0; otherwise e(k) = 1.

• Distance calculations are typically used in pattern

• Problem: chose the vector rk whose distance (see below) from

Mean Absolute Difference (MAD) Mean Square Error (MSE)

Matrix computations are typically used to estimate parameters

• Matrix vector multiplication

Matrices may be dense/sparse/band-structured/….

• To estimate the hardware resources required, we can use

• For example, a 1-D FIR has NS = 2N

Signal type Frequency # taps Performance

Speech 8 kHz N =128 20 MOPs

Music 48 kHz N =256 240 MOPs

Video phone 6.75 MHz N*N = 81 1,090 MOPs

TV 27 MHz N*N = 81 4,370 MOPs

HDTV 144 MHz N*N = 81 23,300 MOPs

x(n) DSP y(n)

• infinite input stream (samples): x(0), x(1), x(2), …

• powerful tools , lots of theory signal processing

• Consider FIR: y(n) = a*x(n) + b*x(n-1) + c*x(n-2)

• D delay element = memory element = register

• a × multiply with constant a

• + adder: output value = sum of input values

• Consider FIR: y(n) = a*x(n) + b*x(n-1) + c*x(n-2)

• D is (non-negative) number of delays

• a multiplier: output value = (constant a) × input value

• adder: output value = sum of input values

• Consider FIR: y(n) = a*x(n) + b*x(n-1) + c*x(n-2)

Each edge describes a precedence constraint between two nodes:

Tokens can represent numbers, vectors (blocks), matrices …

Single-rate data flow: Each node:

•  to acquire insight in the description, design, and

•  to acquire insight in the (future) capabilities of VLSI  

•  to acquire skills in the design of parallel computations

Massive parallelism is needed to exploit  

•  we focus on fine-grained parallelism  

•  we assume that parallelism is by design  

•  we draw inspiration from consumer applications, such as

•  we will use Field Programmable Arrays (FPGA) as fine-

•  Laptop, running Windows

•  Access to UNIX server Dept. W&I 

•  Lab work is by teams of two students,  

•  Have FPGA tools (SW) installed on your machine  

•  check website 2IMN35

•  the quality of your programs/designs [30%];

•  your final report on the design and evaluation  

•  a concluding discussion with you on the  

•  intermediate assignments [20%].

•  Credits: 5 points = based on 140 hours from your side

• Some inspiration from the technology side

• Some inspiration from the application side

• 2G transistors • 1920 DSP slices

• > 10,000 FLOPs per clock cycle

•  Every 2 generations of IC technology (6 years)

• Involves over 1000 technical experts, world wide.

•  When will computer hardware match the human brain?

•  BEE & Square Kilometer Array

• speech (de-)coding • sound synthesis

• Filters reduce signal noise and enhance image or signal quality

• Infinite Impulse Response (IIR) filters compute:

• Output sequence depends on input sequence, impulse

• Adaptive filters (FIR and IIR) update their coefficients to

• Distance calculations are typically used in pattern

• Problem: chose the vector rk whose distance (see below) from

•  Matrix vector multiplication

• To estimate the hardware resources required, we can use

• For example, a 1-D FIR has NS = 2N  

• infinite input stream (samples): x(0), x(1), x(2), …

• powerful tools , lots of theory signal processing

•  Consider FIR: y(n) = ax(n) + bx(n-1) + c*x(n-2)

•  D delay element = memory element = register  

•  a × multiply with constant a 

•  + adder: output value = sum of input values

•  Consider FIR: y(n) = ax(n) + bx(n-1) + c*x(n-2)

•  D is (non-negative) number of delays 

•  a multiplier: output value = (constant a) × input value 

•  adder: output value = sum of input values

•  Consider FIR: y(n) = ax(n) + bx(n-1) + c*x(n-2)

• A join-node denotes an adder

•  x(n+k) = x(n) shifted by integer k sample periods  

• Each actor fires the minimum number of times to return the

• Example (IIR filter):

• The critical loop of a DFG  

• Tj of is the loop bound of loop j

• Paths do not contain delay-elements

•  Keshab K. Parhi. VLSI Digital Signal Processing Systems, Design and

•  Richard G. Lyons. Understanding Digital Signal Processing (2nd

•  John G. Proakis and Dimitris K Manolakis. Digital Signal Processing

•  Simon Haykin. Neural Networks, a Comprehensive Foundation (2nd

•  Hennessy and Patterson, Computer Architecture, a

•  Phil Lapsley, Jeff Bier, Amit Sholam, Edward Lee. DSP

•  Jennifer Eyre, Jeff Bier, The Evolution of DSP Processors, IEEE

•  Kees van Berkel et al. Vector Processing as an Enabler for

• team up (2 students/team), and

• install FPGA tools.