0% found this document useful (0 votes)
2 views

CHUONG7_FPGA

Chapter 07 discusses Systolic Architecture Design, focusing on the methodology for designing systolic arrays, particularly for FIR filters and matrix-matrix multiplication. It outlines the characteristics, advantages, and disadvantages of systolic arrays, as well as their typical applications in signal processing and data structures. The chapter emphasizes the importance of mapping dependence graphs to lower-dimensional architectures and selecting appropriate scheduling vectors for efficient design.

Uploaded by

Quang Anh Vu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

CHUONG7_FPGA

Chapter 07 discusses Systolic Architecture Design, focusing on the methodology for designing systolic arrays, particularly for FIR filters and matrix-matrix multiplication. It outlines the characteristics, advantages, and disadvantages of systolic arrays, as well as their typical applications in signal processing and data structures. The chapter emphasizes the importance of mapping dependence graphs to lower-dimensional architectures and selecting appropriate scheduling vectors for efficient design.

Uploaded by

Quang Anh Vu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

ĐHBK Tp HCM

BMĐT
GV: Hồ Trung Mỹ

Chapter 07
Systolic Architecture Design
(Thiết kế kiến trúc tâm thu)

TLTK:
1. Các slide từ sách của Prof. Parhi
2. Slide của Prof. Lan-Da Van
3. Slide của Prof. Rudolf Mak 1
Outline
7.1 Introduction
7.2 Systolic Array Design Methodology
7.3 FIR Systolic Arrays
7.4 Selection of Scheduling Vector
7.5 Matrix-Matrix Multiplication and 2D Systolic Array Design
7.6 Systolic Design for Space Representations Containing Delays
7.7 Conclusions
7.1 Introduction

• What is a Systolic Architecture?


• “A network of processing elements (PEs) that
computes and rhythmically passes data
through it”
• Systolic – compare blood flowing through the
veins

3
FSM reminder

4
Systolic system (Leiserson)

5
Systolic = Uniform Pipelined SDF

6
Characteristics of Systolic Arrays
• Synchronization
• Modularity
• Regularity
• Locality
• Finite Connection
• Parallel/Pipeline
• Extendibility
• Some relaxations are introduced to increase the
utility of systolic arrays
– Neighbor interconnection ( near, but not nearest )
– Data broadcast operations
– Different PEs, especially at the boundaries
7
Relaxations

8
Introduction (cont’d)

Like flow of blood


through the heart

Systolic array: Moore machines


9
Introduction (cont’d)
• Advantages
– Simple and regular design
– Cost-effective
– Concurrency and communication
– Computation and I/O
• Disadvantages
– Not all algorithms can be implemented using a
systolic architecture
– Cost in hardware and area
– Cost in latency
10
Systolic Array Structures
1D

MultiD
2D

11
Typical Applications
• Signal and Image Processing
– FIR and IIR filters
– Convolution and correlation
– DFT
– Interpolation
– Median filter
• Matrix Arithmetic
– Matrix- vector multiplication
– Matrix- matrix multiplication
– Matrix triangularisation
– Decomposition

12
Typical Applications (cont’d)

• Non-numerical Applications
– Graph algorithms
– Data structures
– Language and character recognition
– Encoders and decoders
– Relation data-base operations

13
Typical Applications: Example 1
• Systolic FIR Filter

14
Typical Applications: Example 2

15
Typical Applications: Example 3

16
7.2 Systolic Array Design Methodology
• Systolic architectures are designed by using linear mapping
techniques on regular dependence graphs (DG).
• Regular Dependence Graph : The presence of an edge in a
certain direction at any node in the DG represents presence of
an edge in the same direction at all nodes in the DG.
– DG corresponds to space representation no time instance is
assigned to any computation  t=0.
• Systolic architectures have a space-time representation
where each node is mapped to a certain processing
element(PE) and is scheduled at a particular time instance.
• Systolic design methodology maps an N-dimensional DG to a
lower dimensional systolic architecture.
– Mapping of N-dimensional DG to (N-1) dimensional systolic
array is considered.
17
DG vs DFG
• DG is Similar to DFG, differences
– DFG: only covers the
computations in one iteration
(executed repetitively),
containing delay elements

– DG: contains computations for


all iterations in an algorithm,
no delay elements
without considering HW
architecture
y(n)=b0x(n) + b1x(n-1) + b2x(n-2)

18
Ex: Regular DG for a 3-tap FIR Filter

y(n)=b0x(n) + b1x(n-1) + b2x(n-2)

19
Means how many PEs we need in whole DG!

Define PE mappings of nodes in DG!

Means how many delays each edge needs!

20
Basic Vectors in Systolic Array Design

21
Feasibility Constraints

22
Space-Time Representation of the Graph

 i'   i  0 0 1  i 
      
 j'  T  j    pT 0  j 
 t'  t   sT 0  t 
    

23
Dependence graphs (DG)
1. The nodes of a dependence graph represent
(small) computations. There is a separate node
for each computation.

2. The edges of a dependence graph represent


causal dependencies between computations, i.e.,
an edge from node x to node y indicates that the
result of the computation performed by x is used
in the computation performed by y.

3. There is no notion of time in a dependence


graph. It is an (index-) space representation.
24
FIR: Dependence graph

25
Regular dependence graphs

A dependence graph G is regular when:


1. There is a injective mapping I from the nodes of
G, to a grid of points in the N - dimensional index
space.

2. There exists a finite set E of vectors, called


fundamental edges, such that every pair of
neighboring nodes is mapped to a pair of grid
locations that differ by a fundamental edge e ∈ E.

26
FIR: DG in space representation

27
Systolic Array Design Methodology

28
7.3 FIR Systolic Arrays
• This section derives a family of systolic arrays
for FIR digital filters using the linear mapping
technique.
1. Design B1
2. Design B2
3. Design F
4. Design R1
5. Design R2
6. Design W1
7. Design W2
29
FIR Systolic Array (Design B1)
• B1 design is derived by selecting projection vector, processor vector
and scheduling vector as follows:
1
d   , pT  (0 1), sT  (1 0).
0
• Any node with index IT=(i, j) is mapped to processor
i
p T I  0 1
 j j
 
 all nodes on a horizontal line are mapped to the same processor
i
• Any node with index IT=(i, j) is executed at time s T
I  1 0    i
• Since T 1 then HUE  1  1  j
s d  1 0   1 | sT d |
 0
• Edge mapping:
eT pTe sTe
Weight (wt(1 0)) 0 1
Input ( i/p(0 1)) 1 0
Result (1 -1) -1 1
30
Systolic Array Design Methodology

31
DG of FIR Filter

Note: Wn is n-th weight


32
Systolic Array Design Methodology

33
Applying Projection and Scheduling (1/2)

34
Applying Projection and Scheduling (2/2)

35
Systolic Array Design Methodology

36
Edge Mapping (1/3)

37
Edge Mapping (2/3)

38
Edge Mapping (3/3)

39
40
Systolic Array Design Methodology

41
Construct the Final Systolic Architecture

42
Alternative Designs

43
B2 – Broadcast Inputs, Move Weight, Results Stay

44
F - Fan-in Results, Move Inputs, Weight Stay

45
R1 - Results Stay, Inputs and Weight Move in
Opposite Directions

46
R2 and Dual R2-Results Stay, Inputs and Weights Move
in the Same Direction but at Different Speeds

47
W1 – Weights Stay, Inputs and Results Move in
Opposite Directions

48
W2 and Dual W2-Weights Stay, Inputs and Results
Move in Same Direction but at Different Speeds

49
Relating Systolic Designs Using Transformations

50
Cutset Retiming Transformation

51
7.4 Selection of Scheduling Vector

Scheduling Vector
• Consider the dependence X  Y
• Y can start after X has started and completed.
• We also have to take into consideration the
time it will take the data to travel from X to Y
• Constraints on the scheduling vector.

52
Scheduling Inequalities (1/3)

53
Scheduling Inequalities (2/3)

54
Scheduling Inequalities (3/3)

55
Regular Iterative Algorithm (RIA)

56
Scheduling Vector and Systolic Array Design Using RDG

• Constructing scheduling inequalities using RDG


• Determine the scheduling vector using scheduling
inequalities
• Systolic mapping using the scheduling vector
• This formulation can accommodate different
computation times for various operations due to its
generality.

57
Example 7.4.1 (1/4)

58
Example 7.4.1 (2/4)

59
Example 7.4.1 (3/4)

60
Example 7.4.1 (4/4)

61
7.5 Matrix-Matrix Multiplication and
2D Systolic Array Design

62
Reduced Iterative Agorithm for Matrix Multiplication

63
Scheduling Inequalities for Matrix Multiplication

64
Solutions for Matrix Multiplication (1/2)

65
Solutions for Matrix Multiplication (2/2)

66
7.6 Systolic Design for Space
Representations Containing Delays

67
Multiprojection

68
Scheduling Inequality and Systolic Transformation

69
Example of DG with Delays (1/3)

70
Example of DG with Delays (2/3)

71
Example of DG with Delays (3/3)

72
Remark

• The performance of the resulting array is


affected by
– The choice of a particular DG for an algorithm
– The direction of the projection and the
schedule vectors

73
Conclusion
• Systolic architecture
– A massively parallel processing with limited I/O
communication with host computer
– Suitable for many regular interactive operations
• Design methodology
– Map an N-dimensional DG to (N-1) dimensional
space-time representation
– Needs to determine three critical vectors
• Projection vector
• Processor space vector
• Scheduling vector
74

You might also like