0% found this document useful (0 votes)

2 views

CHUONG7_FPGA

Chapter 07 discusses Systolic Architecture Design, focusing on the methodology for designing systolic arrays, particularly for FIR filters and matrix-matrix multiplication. It outlines the characteristics, advantages, and disadvantages of systolic arrays, as well as their typical applications in signal processing and data structures. The chapter emphasizes the importance of mapping dependence graphs to lower-dimensional architectures and selecting appropriate scheduling vectors for efficient design.

Uploaded by

Quang Anh Vu

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

CHUONG7_FPGA

Uploaded by

Quang Anh Vu

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 74

ĐHBK Tp HCM

BMĐT
GV: Hồ Trung Mỹ

Chapter 07
Systolic Architecture Design
(Thiết kế kiến trúc tâm thu)

TLTK:
1. Các slide từ sách của Prof. Parhi
2. Slide của Prof. Lan-Da Van
3. Slide của Prof. Rudolf Mak 1
Outline
7.1 Introduction
7.2 Systolic Array Design Methodology
7.3 FIR Systolic Arrays
7.4 Selection of Scheduling Vector
7.5 Matrix-Matrix Multiplication and 2D Systolic Array Design
7.6 Systolic Design for Space Representations Containing Delays
7.7 Conclusions
7.1 Introduction

• What is a Systolic Architecture?

• “A network of processing elements (PEs) that
computes and rhythmically passes data
through it”
• Systolic – compare blood flowing through the
veins

3
FSM reminder

4
Systolic system (Leiserson)

5
Systolic = Uniform Pipelined SDF

6
Characteristics of Systolic Arrays
• Synchronization
• Modularity
• Regularity
• Locality
• Finite Connection
• Parallel/Pipeline
• Extendibility
• Some relaxations are introduced to increase the
utility of systolic arrays
– Neighbor interconnection ( near, but not nearest )
– Data broadcast operations
– Different PEs, especially at the boundaries
7
Relaxations

8
Introduction (cont’d)

Like flow of blood

through the heart

Systolic array: Moore machines

9
Introduction (cont’d)
• Advantages
– Simple and regular design
– Cost-effective
– Concurrency and communication
– Computation and I/O
• Disadvantages
– Not all algorithms can be implemented using a
systolic architecture
– Cost in hardware and area
– Cost in latency
10
Systolic Array Structures
1D

MultiD
2D

11
Typical Applications
• Signal and Image Processing
– FIR and IIR filters
– Convolution and correlation
– DFT
– Interpolation
– Median filter
• Matrix Arithmetic
– Matrix- vector multiplication
– Matrix- matrix multiplication
– Matrix triangularisation
– Decomposition

12
Typical Applications (cont’d)

• Non-numerical Applications
– Graph algorithms
– Data structures
– Language and character recognition
– Encoders and decoders
– Relation data-base operations

13
Typical Applications: Example 1
• Systolic FIR Filter

14
Typical Applications: Example 2

15
Typical Applications: Example 3

16
7.2 Systolic Array Design Methodology
• Systolic architectures are designed by using linear mapping
techniques on regular dependence graphs (DG).
• Regular Dependence Graph : The presence of an edge in a
certain direction at any node in the DG represents presence of
an edge in the same direction at all nodes in the DG.
– DG corresponds to space representation no time instance is
assigned to any computation  t=0.
• Systolic architectures have a space-time representation
where each node is mapped to a certain processing
element(PE) and is scheduled at a particular time instance.
• Systolic design methodology maps an N-dimensional DG to a
lower dimensional systolic architecture.
– Mapping of N-dimensional DG to (N-1) dimensional systolic
array is considered.
17
DG vs DFG
• DG is Similar to DFG, differences
– DFG: only covers the
computations in one iteration
(executed repetitively),
containing delay elements

– DG: contains computations for

all iterations in an algorithm,
no delay elements
without considering HW
architecture
y(n)=b0x(n) + b1x(n-1) + b2x(n-2)

18
Ex: Regular DG for a 3-tap FIR Filter

y(n)=b0x(n) + b1x(n-1) + b2x(n-2)

19
Means how many PEs we need in whole DG!

Define PE mappings of nodes in DG!

Means how many delays each edge needs!

20
Basic Vectors in Systolic Array Design

21
Feasibility Constraints

22
Space-Time Representation of the Graph

 i'   i  0 0 1  i 
      
 j'  T  j    pT 0  j 
 t'  t   sT 0  t 
    

23
Dependence graphs (DG)
1. The nodes of a dependence graph represent
(small) computations. There is a separate node
for each computation.

2. The edges of a dependence graph represent

causal dependencies between computations, i.e.,
an edge from node x to node y indicates that the
result of the computation performed by x is used
in the computation performed by y.

3. There is no notion of time in a dependence

graph. It is an (index-) space representation.
24
FIR: Dependence graph

25
Regular dependence graphs

A dependence graph G is regular when:

1. There is a injective mapping I from the nodes of
G, to a grid of points in the N - dimensional index
space.

2. There exists a finite set E of vectors, called

fundamental edges, such that every pair of
neighboring nodes is mapped to a pair of grid
locations that differ by a fundamental edge e ∈ E.

26
FIR: DG in space representation

27
Systolic Array Design Methodology

28
7.3 FIR Systolic Arrays
• This section derives a family of systolic arrays
for FIR digital filters using the linear mapping
technique.
1. Design B1
2. Design B2
3. Design F
4. Design R1
5. Design R2
6. Design W1
7. Design W2
29
FIR Systolic Array (Design B1)
• B1 design is derived by selecting projection vector, processor vector
and scheduling vector as follows:
1
d   , pT  (0 1), sT  (1 0).
0
• Any node with index IT=(i, j) is mapped to processor
i
p T I  0 1
 j j
 
 all nodes on a horizontal line are mapped to the same processor
i
• Any node with index IT=(i, j) is executed at time s T
I  1 0    i
• Since T 1 then HUE  1  1  j
s d  1 0   1 | sT d |
 0
• Edge mapping:
eT pTe sTe
Weight (wt(1 0)) 0 1
Input ( i/p(0 1)) 1 0
Result (1 -1) -1 1
30
Systolic Array Design Methodology

31
DG of FIR Filter

Note: Wn is n-th weight

32
Systolic Array Design Methodology

33
Applying Projection and Scheduling (1/2)

34
Applying Projection and Scheduling (2/2)

35
Systolic Array Design Methodology

36
Edge Mapping (1/3)

37
Edge Mapping (2/3)

38
Edge Mapping (3/3)

39
40
Systolic Array Design Methodology

41
Construct the Final Systolic Architecture

42
Alternative Designs

43
B2 – Broadcast Inputs, Move Weight, Results Stay

44
F - Fan-in Results, Move Inputs, Weight Stay

45
R1 - Results Stay, Inputs and Weight Move in
Opposite Directions

46
R2 and Dual R2-Results Stay, Inputs and Weights Move
in the Same Direction but at Different Speeds

47
W1 – Weights Stay, Inputs and Results Move in
Opposite Directions

48
W2 and Dual W2-Weights Stay, Inputs and Results
Move in Same Direction but at Different Speeds

49
Relating Systolic Designs Using Transformations

50
Cutset Retiming Transformation

51
7.4 Selection of Scheduling Vector

Scheduling Vector
• Consider the dependence X  Y
• Y can start after X has started and completed.
• We also have to take into consideration the
time it will take the data to travel from X to Y
• Constraints on the scheduling vector.

52
Scheduling Inequalities (1/3)

53
Scheduling Inequalities (2/3)

54
Scheduling Inequalities (3/3)

55
Regular Iterative Algorithm (RIA)

56
Scheduling Vector and Systolic Array Design Using RDG

• Constructing scheduling inequalities using RDG

• Determine the scheduling vector using scheduling
inequalities
• Systolic mapping using the scheduling vector
• This formulation can accommodate different
computation times for various operations due to its
generality.

57
Example 7.4.1 (1/4)

58
Example 7.4.1 (2/4)

59
Example 7.4.1 (3/4)

60
Example 7.4.1 (4/4)

61
7.5 Matrix-Matrix Multiplication and
2D Systolic Array Design

62
Reduced Iterative Agorithm for Matrix Multiplication

63
Scheduling Inequalities for Matrix Multiplication

64
Solutions for Matrix Multiplication (1/2)

65
Solutions for Matrix Multiplication (2/2)

66
7.6 Systolic Design for Space
Representations Containing Delays

67
Multiprojection

68
Scheduling Inequality and Systolic Transformation

69
Example of DG with Delays (1/3)

70
Example of DG with Delays (2/3)

71
Example of DG with Delays (3/3)

72
Remark

• The performance of the resulting array is

affected by
– The choice of a particular DG for an algorithm
– The direction of the projection and the
schedule vectors

73
Conclusion
• Systolic architecture
– A massively parallel processing with limited I/O
communication with host computer
– Suitable for many regular interactive operations
• Design methodology
– Map an N-dimensional DG to (N-1) dimensional
space-time representation
– Needs to determine three critical vectors
• Projection vector
• Processor space vector
• Scheduling vector
74

Keshab K. Parhi - VLSI Digital Signal Processing Systems - Design and Implementation (1999, Wiley-Interscience) PDF
100% (1)
Keshab K. Parhi - VLSI Digital Signal Processing Systems - Design and Implementation (1999, Wiley-Interscience) PDF
791 pages
DSPA KK Parhi Solution Manual Chap7
No ratings yet
DSPA KK Parhi Solution Manual Chap7
9 pages
Matrix-Matrix Multiplication Using Systolic Array Architecture in Bluespec
No ratings yet
Matrix-Matrix Multiplication Using Systolic Array Architecture in Bluespec
8 pages
Introduction To Systolic Arrays
100% (1)
Introduction To Systolic Arrays
18 pages
Unit-3 PDF
No ratings yet
Unit-3 PDF
67 pages
Chapter 7: Systolic Architecture Design: Keshab K. Parhi
No ratings yet
Chapter 7: Systolic Architecture Design: Keshab K. Parhi
27 pages
Systolic Array
No ratings yet
Systolic Array
42 pages
Systolic Arrays & Their Applications
No ratings yet
Systolic Arrays & Their Applications
35 pages
VLSI Programming Systolic Design: Book Parhi, Chp. 7 Rudolf Mak R.h.mak@tue - NL
No ratings yet
VLSI Programming Systolic Design: Book Parhi, Chp. 7 Rudolf Mak R.h.mak@tue - NL
49 pages
Ijert Ijert: Design and Implementation Modified Booth Algorithm and Systolic Multiplier Using FPGA
No ratings yet
Ijert Ijert: Design and Implementation Modified Booth Algorithm and Systolic Multiplier Using FPGA
8 pages
Lecture006. Introduction Systolic Array
No ratings yet
Lecture006. Introduction Systolic Array
36 pages
Systolic Arrays & Their Applications
No ratings yet
Systolic Arrays & Their Applications
36 pages
Vhdl-Models of Parallel Fir Digital Filters
No ratings yet
Vhdl-Models of Parallel Fir Digital Filters
6 pages
Systolic Architecture
No ratings yet
Systolic Architecture
2 pages
Systolic Architecture
No ratings yet
Systolic Architecture
3 pages
5
No ratings yet
5
20 pages
4
No ratings yet
4
20 pages
VLSI Signal Processing
100% (2)
VLSI Signal Processing
19 pages
PDF Dspa KK Parhi Solution Manual Chap7 Compress
No ratings yet
PDF Dspa KK Parhi Solution Manual Chap7 Compress
9 pages
CHUONG9_FPGA
No ratings yet
CHUONG9_FPGA
79 pages
Final Papper JIOS T 674
No ratings yet
Final Papper JIOS T 674
16 pages
Parallel Architectures Parallel Architectures: Ever Faster
No ratings yet
Parallel Architectures Parallel Architectures: Ever Faster
11 pages
The Application of Systolic Architectures in VLSI Design
No ratings yet
The Application of Systolic Architectures in VLSI Design
16 pages
Systolic Architecture
No ratings yet
Systolic Architecture
9 pages
Factored Systolic Array Tensor Processing
No ratings yet
Factored Systolic Array Tensor Processing
7 pages
Tagged PDF
No ratings yet
Tagged PDF
6 pages
SIMD Computer Organizations
0% (1)
SIMD Computer Organizations
20 pages
VLSI Digital Signal Processing Systems Keshab Parhi
No ratings yet
VLSI Digital Signal Processing Systems Keshab Parhi
8 pages
Basic Matrix Operations On A DSP Array Architecture: September 2000
No ratings yet
Basic Matrix Operations On A DSP Array Architecture: September 2000
9 pages
Designing of 4-Bit Array Multiplayer
No ratings yet
Designing of 4-Bit Array Multiplayer
6 pages
Fpga Implementation ofa2-DIIR Beam Filter: Siji P.V Aece No:16
No ratings yet
Fpga Implementation ofa2-DIIR Beam Filter: Siji P.V Aece No:16
36 pages
12112024 Design and implementation of systems (1)
No ratings yet
12112024 Design and implementation of systems (1)
7 pages
MIT Course work
No ratings yet
MIT Course work
4 pages
Onur 447 Spring15 Lecture14 Simd Afterlecture
No ratings yet
Onur 447 Spring15 Lecture14 Simd Afterlecture
60 pages
Parallel and Pipelined 2-D Median Filter Architecture
No ratings yet
Parallel and Pipelined 2-D Median Filter Architecture
4 pages
Design and Implementation of 16 Bit Systolic Multiplier Using Modular Shifting Algorithm
No ratings yet
Design and Implementation of 16 Bit Systolic Multiplier Using Modular Shifting Algorithm
4 pages
PCNOTES.2024
No ratings yet
PCNOTES.2024
21 pages
A Practical Performance Comparison of Parallel Matrix Multiplication Algorithms On Networks of Workstations
No ratings yet
A Practical Performance Comparison of Parallel Matrix Multiplication Algorithms On Networks of Workstations
2 pages
Data-Level Parallelism in Vector, SIMD, and GPU Architectures
No ratings yet
Data-Level Parallelism in Vector, SIMD, and GPU Architectures
58 pages
A Systolic FFT Architecture For Real Time FPGA Systems
No ratings yet
A Systolic FFT Architecture For Real Time FPGA Systems
22 pages
22PA201
No ratings yet
22PA201
3 pages
Unit Iii Data-Level Parallelism in Vector, Simd, and Gpu Architectures
No ratings yet
Unit Iii Data-Level Parallelism in Vector, Simd, and Gpu Architectures
26 pages
Parallel and Distributed Algorithms-IMPORTANT QUESTION
100% (1)
Parallel and Distributed Algorithms-IMPORTANT QUESTION
15 pages
Matrix Multiplication On Linear Bidirectional Systolic Arrays
No ratings yet
Matrix Multiplication On Linear Bidirectional Systolic Arrays
10 pages
FULLTEXT01
No ratings yet
FULLTEXT01
134 pages
Systolic Architecture Design
No ratings yet
Systolic Architecture Design
5 pages
Introduction To Parallel Processing
No ratings yet
Introduction To Parallel Processing
49 pages
A Systolic FFT Architecture For Real Time FPGA Systems
No ratings yet
A Systolic FFT Architecture For Real Time FPGA Systems
33 pages
VLSI Synthesis of DSP Kernels - Algorithmic and Architectural Transformations
No ratings yet
VLSI Synthesis of DSP Kernels - Algorithmic and Architectural Transformations
220 pages
26-27 SIMD Architecture
No ratings yet
26-27 SIMD Architecture
33 pages
Reconfigurable Computing Using FPGA: State of The Art and Potential For Systolic Array Applications
No ratings yet
Reconfigurable Computing Using FPGA: State of The Art and Potential For Systolic Array Applications
2 pages
Advanced Asic Chip Synthesis Using Synopsys 1999
No ratings yet
Advanced Asic Chip Synthesis Using Synopsys 1999
149 pages
Chapter 8
No ratings yet
Chapter 8
59 pages
Some Case Studies on Signal, Audio and Image Processing Using Matlab
From Everand
Some Case Studies on Signal, Audio and Image Processing Using Matlab
Dr. Hedaya Mahmood Alasooly
No ratings yet
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
From Everand
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
Fouad Sabry
No ratings yet
Scanline Rendering: Exploring Visual Realism Through Scanline Rendering Techniques
From Everand
Scanline Rendering: Exploring Visual Realism Through Scanline Rendering Techniques
Fouad Sabry
No ratings yet
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
From Everand
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
Fouad Sabry
No ratings yet
Digital Image Processing: Fundamentals and Applications
From Everand
Digital Image Processing: Fundamentals and Applications
Fouad Sabry
No ratings yet
Signal, Audio and Image Processing
From Everand
Signal, Audio and Image Processing
Dr. Hidaia Mahmood Alassouli
No ratings yet
Robot Manipulators: Modeling, Performance Analysis and Control
From Everand
Robot Manipulators: Modeling, Performance Analysis and Control
Etienne Dombre
No ratings yet
CHUONG6_FPGA
No ratings yet
CHUONG6_FPGA
77 pages
TASK 2
No ratings yet
TASK 2
3 pages
TASK 1
No ratings yet
TASK 1
2 pages
Cs61c 2022fa l12 Risc v Formats II
No ratings yet
Cs61c 2022fa l12 Risc v Formats II
36 pages
Preface: About This Guide: Ibufg Ibufgds
No ratings yet
Preface: About This Guide: Ibufg Ibufgds
1 page
Speedmaster Sm2 Series
No ratings yet
Speedmaster Sm2 Series
70 pages
XK3118T1 Operation Manual
No ratings yet
XK3118T1 Operation Manual
10 pages
Keysight 34980A PROGRAMMER'S HELP
No ratings yet
Keysight 34980A PROGRAMMER'S HELP
1,108 pages
Bec Preliminary Wordlist PDF
No ratings yet
Bec Preliminary Wordlist PDF
42 pages
AI Project Updated
No ratings yet
AI Project Updated
11 pages
Real-Time Object Detection With IOT Using a Smart Cart
No ratings yet
Real-Time Object Detection With IOT Using a Smart Cart
6 pages
No.2 JLN Seri Dalam, KG Telok Empang, 81600 Pengerang, Johor Darul Ta'zim. Tel: +6013 2227246
No ratings yet
No.2 JLN Seri Dalam, KG Telok Empang, 81600 Pengerang, Johor Darul Ta'zim. Tel: +6013 2227246
3 pages
Computer Programming and Applications
No ratings yet
Computer Programming and Applications
1 page
MDRO3 Chapter 1
No ratings yet
MDRO3 Chapter 1
60 pages
Multi-Attribute Concept Design Procedure of A Generic Naval Vessel PDF
No ratings yet
Multi-Attribute Concept Design Procedure of A Generic Naval Vessel PDF
10 pages
Budgeting and Scheduling Projects:: by Ali Raafat Id:155593
No ratings yet
Budgeting and Scheduling Projects:: by Ali Raafat Id:155593
10 pages
Cyber U-3 One Shot Notes
No ratings yet
Cyber U-3 One Shot Notes
39 pages
Linear Predictive Coding
No ratings yet
Linear Predictive Coding
4 pages
Animation Tips
No ratings yet
Animation Tips
12 pages
Spcifications and Description - Audio/Sound System:: Behringer B2150
No ratings yet
Spcifications and Description - Audio/Sound System:: Behringer B2150
4 pages
TS4F01-1 Unit 4 - Document Control
No ratings yet
TS4F01-1 Unit 4 - Document Control
66 pages
An Duong Vuong High School - Mock Test K1006 Full Name - 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11
No ratings yet
An Duong Vuong High School - Mock Test K1006 Full Name - 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11
5 pages
Patch Management Solution Group IT Global Unix Tower: Ibm Bigfix
No ratings yet
Patch Management Solution Group IT Global Unix Tower: Ibm Bigfix
15 pages
Generalizable Implicit Motion Modeling For Video Frame Interpolation
No ratings yet
Generalizable Implicit Motion Modeling For Video Frame Interpolation
18 pages
Ccs3000 User Manual
No ratings yet
Ccs3000 User Manual
21 pages
ARL-700 MR Geared EN81-20.en
No ratings yet
ARL-700 MR Geared EN81-20.en
51 pages
The Limitations of TORA Software in Solving Linear Programming Problems: Case of The Unrevised Simplex Method
No ratings yet
The Limitations of TORA Software in Solving Linear Programming Problems: Case of The Unrevised Simplex Method
4 pages
Triana Saleh CV
No ratings yet
Triana Saleh CV
4 pages
AI Artificial Intelligence, 60 Leaders 17 Questions
100% (11)
AI Artificial Intelligence, 60 Leaders 17 Questions
236 pages
Garmin Pilot Users Guide For Ios
No ratings yet
Garmin Pilot Users Guide For Ios
188 pages
CSC 222 - Lecture 1
No ratings yet
CSC 222 - Lecture 1
30 pages
Pearson
No ratings yet
Pearson
4 pages
Tablero
No ratings yet
Tablero
8 pages
RRU3168 Hardware Description (04) (PDF) - EN
No ratings yet
RRU3168 Hardware Description (04) (PDF) - EN
24 pages

CHUONG7_FPGA

Uploaded by

CHUONG7_FPGA

Uploaded by

ĐHBK Tp HCM

• What is a Systolic Architecture?

Like flow of blood

Systolic array: Moore machines

– DG: contains computations for

y(n)=b0x(n) + b1x(n-1) + b2x(n-2)

Define PE mappings of nodes in DG!

Means how many delays each edge needs!

2. The edges of a dependence graph represent

3. There is no notion of time in a dependence

A dependence graph G is regular when:

2. There exists a finite set E of vectors, called

Note: Wn is n-th weight

• Constructing scheduling inequalities using RDG

• The performance of the resulting array is

You might also like