A Systolic FFT Architecture For Real Time FPGA Systems
A Systolic FFT Architecture For Real Time FPGA Systems
This work was sponsored by DARPA ATO under Air Force Contract F19628-00-C-0002. Opinions, interpretations, conclusions
and recommendations are those of the authors and are not necessarily endorsed by the United States Government.
Outline
Introduction
Motivation
Evaluation metrics
Systolic Architecture-2
PAJ 9/29/2004
Parallel architecture
Systolic architecture
Performance summary
Conclusions
I/Q
Corrx,y [ m ] = x [ n ]y [ n m ]
32K
Correlation
ADC
y 1.2 GSPS
FFT
FIFO
Conjugate
8K FFT bottleneck
Real-time
Complex
0.6 GSPS input (16-bits)
1.2 GSPS output (12-bits)
- 1k
I/Q
Systolic Architecture-3
PAJ 9/29/2004
FFT
FIFO
FIFO
Evaluation Scorecard
Length of FFT
IO pins
Butterflies
Multipliers
Size
16
8192
Pins
Fly
Mult
Add
Shift
Adder/subtractors
Shift registers
Systolic Architecture-4
PAJ 9/29/2004
Outline
Introduction
Parallel architecture
Data flow graph
Effects of serial input
Systolic Architecture-5
PAJ 9/29/2004
Systolic architecture
Performance summary
Conclusions
Systolic Architecture-6
PAJ 9/29/2004
Size
16
8192
Pins
448
229K
Fly
32
53K
Mult
Add
Shift
10
10
10
10
11
11
11
11
12
12
12
12
13
13
13
13
14
14
14
14
15
15
15
15
16
16
16
16
Parallel FFT
Butterfly structure
Removes
redundant
calculation
Complex Butterfly
Butterfly contains
1 complex addition
1 complex subtraction
1 complex, constant multiply
Size
16
8192
Pins
448
229K
Fly
32
53K
Mult
Add
Shift
WNr
Systolic Architecture-7
PAJ 9/29/2004
Complex Addition
Size
16
8192
Pins
448
229K
Fly
32
53K
Add
128
213K
Shift
Mult
2 adds
real
imag
c
b
d
Systolic Architecture-8
PAJ 9/29/2004
Complex Multiply
Size
16
8192
Pins
448
229K
Fly
32
53K
Mult
128
213K
Add
192
320K
Shift
a
c
Systolic Architecture-9
PAJ 9/29/2004
real
imag
Size
16
8192
Pins
448
229K
Fly
32
53K
Mult
96
159K
75%
Add
288
480K
150%
Shift
real
imag
d
Systolic Architecture-10
PAJ 9/29/2004
Parallel-Pipelined Architecture
Systolic Architecture-11
PAJ 9/29/2004
Size
16
8192
Pins
448
229K
Fly
32
53K
Mult
96
159K
Add
288
480K
Shift
10
10
10
10
11
11
11
11
12
12
12
12
13
13
13
13
14
14
14
14
15
15
15
15
16
16
16
16
A pipelined version
IO Bound
100% Efficient
Serial Input
Systolic Architecture-12
PAJ 9/29/2004
Size
16
8192
.01%
Pins
28
28
Fly
32
53K
Mult
96
159K
Add
288
480K
Shift
10
10
10
10
11
11
11
11
12
12
12
12
13
13
13
13
14
14
14
14
15
15
15
15
16
16
16
16
A serial version
IO-rate matches
A/D
6.25% Efficient
Outline
Introduction
Parallel architecture
Systolic architecture
Serial implementation
Application specific optimizations
Systolic Architecture-13
PAJ 9/29/2004
Performance summary
Conclusions
Serial Architecture
Stage 1
Stage 2
Stage 3
Size
16
8192
Pins
28
28
Fly
13
.03%
Mult
12
39
.03%
Add
36
117
.03%
Shift
22
12K
Stage 4
50% Efficiency
Systolic Architecture-14
PAJ 9/29/2004
Size
16
8192
Pins
28
28
Fly
13
Mult
12
39
Add
36
117
Shift
22
12K
Stage 1
Stage 2
Stage 3
Stage 4
Systolic Architecture-15
PAJ 9/29/2004
8192-Point Architecture
Requires 13 stages
Fixed point arithmetic
Varies the dynamic range to increase
accuracy
Overflow replaced with saturated value
Size
16
8192
Pins
28
28
Fly
13
Mult
12
39
Add
36
117
Shift
22
12K
10 11 12 13
4 int
4 int
5 int
6 int
7 int
8 int
9 int
10 int
4 frac
14 frac
13 frac
12 frac
11 frac
10 frac
9 frac
8 frac
Systolic Architecture-16
PAJ 9/29/2004
0110.0101
6 + 165
MIT Lincoln Laboratory
Increase Parallelism
Size
16
8192
Pins
112
112
400%
Fly
16
52
400%
Mult
48
156
400%
Add
144
468
400%
Shift
16
12K
100%
1 2 3 4 5 6 7 8 9 10 11
12
13
1 2 3 4 5 6 7 8 9 10 11
12
13
1 2 3 4 5 6 7 8 9 10 11
12
13
1 2 3 4 5 6 7 8 9 10 11
12
13
Systolic Architecture-17
PAJ 9/29/2004
Simplification
Size
16
8192
Pins
160
160
143%
Fly
16
52
Mult
36
144
92%
Add
108
432
92%
Shift
8K
67%
1 2 3 4 5 6 7 8 9 10 11
12
13
1 2 3 4 5 6 7 8 9 10 11
12
13
1 2 3 4 5 6 7 8 9 10 11
12
13
1 2 3 4 5 6 7 8 9 10 11
12
13
Systolic Architecture-18
PAJ 9/29/2004
Outline
Introduction
Parallel architecture
Systolic architecture
Performance summary
Power, operations per second
FPGA resources, frequency
Latency, throughput
Systolic Architecture-19
PAJ 9/29/2004
Conclusions
Results
Power: 22 Watts @ 65 C
GOPS: 86 total @ 3.9 GOPS/Watt
Systolic Architecture-20
PAJ 9/29/2004
Outline
Introduction
Parallel architecture
Systolic architecture
Performance summary
Conclusions
Applicability to other platforms
Future work
Systolic Architecture-21
PAJ 9/29/2004
Conclusions
General architecture
Extendable to a generic FPGA core
Retargetable to ASIC technology
Future work
Develop a parameterizable IP core generator
Systolic Architecture-22
PAJ 9/29/2004