0% found this document useful (0 votes)
85 views

Systolic Architecture

Systolic architectures replace single processors with an array of regular processing elements (PEs) to orchestrate data flow for high throughput with less memory access. Each PE may perform a different operation and have local memory. Data flows through the array in multiple directions. An example is a 3x3 systolic array for matrix multiplication where each PE accumulates one element of the product by receiving inputs from PEs according to its position in the array.

Uploaded by

Rohini Shah
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views

Systolic Architecture

Systolic architectures replace single processors with an array of regular processing elements (PEs) to orchestrate data flow for high throughput with less memory access. Each PE may perform a different operation and have local memory. Data flows through the array in multiple directions. An example is a 3x3 systolic array for matrix multiplication where each PE accumulates one element of the product by receiving inputs from PEs according to its position in the array.

Uploaded by

Rohini Shah
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 9

Systolic Architectures

 Replace single processor with an array of regular processing elements


 Orchestrate data flow for high throughput with less memory access

M M

PE
PE PE PE
 Different from pipelining
 Nonlinear array structure, multidirection data flow, each PE may have (small) local instruction and data memory
 Different from SIMD: each PE may do something different
 Initial motivation: VLSI Application-Specific Integrated Circuits (ASICs)
 Represent algorithms directly by chips connected in regular pattern
C=AXB
Systolic Array Example:3x3 Systolic Array Matrix
Multiplication
• Processors arranged in a 2-D grid b2,2
• Each processor accumulates one b2,1 b1,2
element of the product b2,0 b1,1 b0,2
b1,0 b0,1
Alignments in time b0,0
Columns of B

Rows of A

a0,2 a0,1 a0,0

a1,2 a1,1 a1,0

a2,2 a2,1 a2,0


T=0
Systolic Array Example:
3x3 Systolic Array Matrix
Multiplication
• Processors arranged in a 2-D grid
• Each processor accumulates one b2,2
element of the product b2,1 b1,2
b2,0 b1,1 b0,2
Alignments in time b1,0 b0,1
b0,0
a0,0*b0,0
a0,0
a0,2 a0,1

a1,2 a1,1 a1,0

a2,2 a2,1 a2,0


T=1
Systolic Array Example:
3x3 Systolic Array Matrix
Multiplication
• Processors arranged in a 2-D grid
• Each processor accumulates one
element of the product b2,2
b2,1 b1,2
Alignments in time b2,0 b1,1 b0,2

b1,0 b0,1
a0,0*b0,0 a0,0*b0,1
a0,1 + a0,1*b1,0 a0,0
a0,2

b0,0
a1,0*b0,0
a1,2 a1,1 a1,0

a2,2 a2,1 a2,0

T=2
Systolic Array Example:
3x3 Systolic Array Matrix
Multiplication
• Processors arranged in a 2-D grid
• Each processor accumulates one
element of the product
b2,2
Alignments in time
b2,1 b1,2
b2,0 b1,1 b0,2
a0,0*b0,0 a0,0*b0,1
a0,2 + a0,1*b1,0 a0,1 + a0,1*b1,1 a0,0 a0,0*b0,2
+ a0,2*b2,0

b1,0 b0,1
a1,0*b0,0
a1,1 a1,0 a1,0*b0,1
a1,2 + a1,1*b1,0

b0,0
a2,0*b0,0
a2,0
a2,2 a2,1

T=3
Systolic Array Example:
3x3 Systolic Array Matrix
Multiplication
• Processors arranged in a 2-D grid
• Each processor accumulates one
element of the product

Alignments in time
b2,2
b2,1 b1,2
a0,0*b0,0 a0,0*b0,1
+ a0,1*b1,0 a0,2 + a0,1*b1,1 a0,1 a0,0*b0,2
+ a0,1*b1,2
+ a0,2*b2,0 + a0,2*b2,1

b2,0 b1,1 b0,2


a1,0*b0,0
a1,2 + a1,1*b1,0 a1,1 a1,0*b0,1 a1,0 a1,0*b0,2
+ a1,2*a2,0 +a1,1*b1,1

b1,0 b0,1
a2,0*b0,1
a2,2 a2,1 a2,0*b0,0
+ a2,1*b1,0
a2,0

T=4
Systolic Array Example:
3x3 Systolic Array Matrix
Multiplication
• Processors arranged in a 2-D grid
• Each processor accumulates one
element of the product

Alignments in time

b2,2
a0,0*b0,0 a0,0*b0,1
+ a0,1*b1,0 + a0,1*b1,1 a0,2 a0,0*b0,2
+ a0,1*b1,2
+ a0,2*b2,0 + a0,2*b2,1
+ a0,2*b2,2

b2,1 b1,2
a1,0*b0,0
+ a1,1*b1,0 a1,2 a1,0*b0,1 a1,1 a1,0*b0,2
+ a1,1*b1,2
+ a1,2*a2,0 +a1,1*b1,1
+ a1,2*b2,1

b2,0 b1,1 b0,2


a2,0*b0,1 a2,0*b0,2
a2,2 a2,0*b0,0
+ a2,1*b1,0
a2,1 + a2,1*b1,1 a2,0
+ a2,2*b2,0

T=5
Systolic Array Example:
3x3 Systolic Array Matrix
Multiplication
• Processors arranged in a 2-D grid
• Each processor accumulates one
element of the product

Alignments in time

a0,0*b0,0 a0,0*b0,1
a0,0*b0,2
+ a0,1*b1,0 + a0,1*b1,1
+ a0,1*b1,2
+ a0,2*b2,0 + a0,2*b2,1
+ a0,2*b2,2

b2,2
a1,0*b0,0
a1,0*b0,2
+ a1,1*b1,0 a1,0*b0,1 a1,2 + a1,1*b1,2
+ a1,2*a2,0 +a1,1*b1,1
+ a1,2*b2,1 + a1,2*b2,2

b2,1 b1,2
a2,0*b0,1 a2,0*b0,2
a2,0*b0,0
+ a2,1*b1,0
a2,2 + a2,1*b1,1 a2,1 + a2,1*b1,2
+ a2,2*b2,0 + a2,2*b2,1

T=6
Systolic Array Example:
3x3 Systolic Array Matrix
Multiplication
• Processors arranged in a 2-D grid
• Each processor accumulates one
element of the product

Alignments in time

a0,0*b0,0 a0,0*b0,1
a0,0*b0,2
+ a0,1*b1,0 + a0,1*b1,1
+ a0,1*b1,2
+ a0,2*b2,0 + a0,2*b2,1
+ a0,2*b2,2

a1,0*b0,0
a1,0*b0,1 a1,0*b0,2
+ a1,1*b1,0
+a1,1*b1,1 + a1,1*b1,2
+ a1,2*a2,0
+ a1,2*b2,1 + a1,2*b2,2

Done
b2,2
a2,0*b0,1 a2,0*b0,2
a2,0*b0,0
+ a2,1*b1,0 + a2,1*b1,1 a2,2 + a2,1*b1,2
+ a2,2*b2,0 + a2,2*b2,1 + a2,2*b2,2

T=7

You might also like