CHUONG7_FPGA
CHUONG7_FPGA
BMĐT
GV: Hồ Trung Mỹ
Chapter 07
Systolic Architecture Design
(Thiết kế kiến trúc tâm thu)
TLTK:
1. Các slide từ sách của Prof. Parhi
2. Slide của Prof. Lan-Da Van
3. Slide của Prof. Rudolf Mak 1
Outline
7.1 Introduction
7.2 Systolic Array Design Methodology
7.3 FIR Systolic Arrays
7.4 Selection of Scheduling Vector
7.5 Matrix-Matrix Multiplication and 2D Systolic Array Design
7.6 Systolic Design for Space Representations Containing Delays
7.7 Conclusions
7.1 Introduction
3
FSM reminder
4
Systolic system (Leiserson)
5
Systolic = Uniform Pipelined SDF
6
Characteristics of Systolic Arrays
• Synchronization
• Modularity
• Regularity
• Locality
• Finite Connection
• Parallel/Pipeline
• Extendibility
• Some relaxations are introduced to increase the
utility of systolic arrays
– Neighbor interconnection ( near, but not nearest )
– Data broadcast operations
– Different PEs, especially at the boundaries
7
Relaxations
8
Introduction (cont’d)
MultiD
2D
11
Typical Applications
• Signal and Image Processing
– FIR and IIR filters
– Convolution and correlation
– DFT
– Interpolation
– Median filter
• Matrix Arithmetic
– Matrix- vector multiplication
– Matrix- matrix multiplication
– Matrix triangularisation
– Decomposition
12
Typical Applications (cont’d)
• Non-numerical Applications
– Graph algorithms
– Data structures
– Language and character recognition
– Encoders and decoders
– Relation data-base operations
13
Typical Applications: Example 1
• Systolic FIR Filter
14
Typical Applications: Example 2
15
Typical Applications: Example 3
16
7.2 Systolic Array Design Methodology
• Systolic architectures are designed by using linear mapping
techniques on regular dependence graphs (DG).
• Regular Dependence Graph : The presence of an edge in a
certain direction at any node in the DG represents presence of
an edge in the same direction at all nodes in the DG.
– DG corresponds to space representation no time instance is
assigned to any computation t=0.
• Systolic architectures have a space-time representation
where each node is mapped to a certain processing
element(PE) and is scheduled at a particular time instance.
• Systolic design methodology maps an N-dimensional DG to a
lower dimensional systolic architecture.
– Mapping of N-dimensional DG to (N-1) dimensional systolic
array is considered.
17
DG vs DFG
• DG is Similar to DFG, differences
– DFG: only covers the
computations in one iteration
(executed repetitively),
containing delay elements
18
Ex: Regular DG for a 3-tap FIR Filter
19
Means how many PEs we need in whole DG!
20
Basic Vectors in Systolic Array Design
21
Feasibility Constraints
22
Space-Time Representation of the Graph
i' i 0 0 1 i
j' T j pT 0 j
t' t sT 0 t
23
Dependence graphs (DG)
1. The nodes of a dependence graph represent
(small) computations. There is a separate node
for each computation.
25
Regular dependence graphs
26
FIR: DG in space representation
27
Systolic Array Design Methodology
28
7.3 FIR Systolic Arrays
• This section derives a family of systolic arrays
for FIR digital filters using the linear mapping
technique.
1. Design B1
2. Design B2
3. Design F
4. Design R1
5. Design R2
6. Design W1
7. Design W2
29
FIR Systolic Array (Design B1)
• B1 design is derived by selecting projection vector, processor vector
and scheduling vector as follows:
1
d , pT (0 1), sT (1 0).
0
• Any node with index IT=(i, j) is mapped to processor
i
p T I 0 1
j j
all nodes on a horizontal line are mapped to the same processor
i
• Any node with index IT=(i, j) is executed at time s T
I 1 0 i
• Since T 1 then HUE 1 1 j
s d 1 0 1 | sT d |
0
• Edge mapping:
eT pTe sTe
Weight (wt(1 0)) 0 1
Input ( i/p(0 1)) 1 0
Result (1 -1) -1 1
30
Systolic Array Design Methodology
31
DG of FIR Filter
33
Applying Projection and Scheduling (1/2)
34
Applying Projection and Scheduling (2/2)
35
Systolic Array Design Methodology
36
Edge Mapping (1/3)
37
Edge Mapping (2/3)
38
Edge Mapping (3/3)
39
40
Systolic Array Design Methodology
41
Construct the Final Systolic Architecture
42
Alternative Designs
43
B2 – Broadcast Inputs, Move Weight, Results Stay
44
F - Fan-in Results, Move Inputs, Weight Stay
45
R1 - Results Stay, Inputs and Weight Move in
Opposite Directions
46
R2 and Dual R2-Results Stay, Inputs and Weights Move
in the Same Direction but at Different Speeds
47
W1 – Weights Stay, Inputs and Results Move in
Opposite Directions
48
W2 and Dual W2-Weights Stay, Inputs and Results
Move in Same Direction but at Different Speeds
49
Relating Systolic Designs Using Transformations
50
Cutset Retiming Transformation
51
7.4 Selection of Scheduling Vector
Scheduling Vector
• Consider the dependence X Y
• Y can start after X has started and completed.
• We also have to take into consideration the
time it will take the data to travel from X to Y
• Constraints on the scheduling vector.
52
Scheduling Inequalities (1/3)
53
Scheduling Inequalities (2/3)
54
Scheduling Inequalities (3/3)
55
Regular Iterative Algorithm (RIA)
56
Scheduling Vector and Systolic Array Design Using RDG
57
Example 7.4.1 (1/4)
58
Example 7.4.1 (2/4)
59
Example 7.4.1 (3/4)
60
Example 7.4.1 (4/4)
61
7.5 Matrix-Matrix Multiplication and
2D Systolic Array Design
62
Reduced Iterative Agorithm for Matrix Multiplication
63
Scheduling Inequalities for Matrix Multiplication
64
Solutions for Matrix Multiplication (1/2)
65
Solutions for Matrix Multiplication (2/2)
66
7.6 Systolic Design for Space
Representations Containing Delays
67
Multiprojection
68
Scheduling Inequality and Systolic Transformation
69
Example of DG with Delays (1/3)
70
Example of DG with Delays (2/3)
71
Example of DG with Delays (3/3)
72
Remark
73
Conclusion
• Systolic architecture
– A massively parallel processing with limited I/O
communication with host computer
– Suitable for many regular interactive operations
• Design methodology
– Map an N-dimensional DG to (N-1) dimensional
space-time representation
– Needs to determine three critical vectors
• Projection vector
• Processor space vector
• Scheduling vector
74