0% found this document useful (0 votes)
29 views

The Design and Implementation of FFT Algorithm Based On The Xilinx FPGA IP Core

Uploaded by

Đông Hưng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

The Design and Implementation of FFT Algorithm Based On The Xilinx FPGA IP Core

Uploaded by

Đông Hưng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

The 2nd International Conference on Computer Application and System Modeling (2012)

The Design and Implementation of FFT Algorithm Based on The Xilinx FPGA IP
Core

Zhu Jin1, Luo Jun2, Zhang Shuang1


1. The Engineering & Technical College of Chengdu University of Technology, Leshan, 614000, China
2. Guilin Institute of Optical Communication, Guilin, 541004, China;

Abstract—This paper introduces a kind of FFT algorithm


design and realization based on the Xilinx IP core . On the II. XILINX FFT IP CORE FUNCTIONS
analysis of FFT algorithm, Rely on Xilinx Spartan -3A DSP Xilinx IP core function is hardware description language
FPGA series as platform, by calling FFT IP core, validating (HDL) design documents based on complex system function ,
the feasibility and reliability in FFT algorithm medium or these validation function for all the Xilinx FPGA device
lower end FPGA.
structure can reach optimization, and provide hardware
Keywords-FFT algorithm module, Xilinx IP core, Spartan -
description language (VHDL Verilog) function simulation
3A DSP, Fixed-point compression model, can be in the design and debugged in standard EDA
simulation tools .
Xilinx FFT IP core V5.0 put forwarded by Xilinx
I. INTRODUCTION company mating the FPGA development tools ISE10.1, its
FFT (Fast Fourier Transform) algorithm is the high biggest system clock frequency reached 550MHz, the
efficient algorithm of calculating DFT. Originally algorithm maximum data throughput reached 550MSPS, highest can
is proposed by J.W.Cooley and J.W.Tukey in 1965, and later undertake 65536 point FFT operations, maximum input data
new algorithms constantly emerging, in a word development and phase factor wide for 24bit (bitwidth larger, the dynamic
direction have two: one is for N = 2 integer times power range greater, support all the mainstream) Xilinx FPGA chip.
algorithm, such as base 2 algorithm, and base 4 algorithm Meanwhile, Xilinx FFT IP core V5.0 can be realized
and split base algorithm, etc; Another is N is not equal to 2 transformation length is N point of real or plural form of
integer times the algorithm, such as meal factor algorithm, transformation FFT and inverter FFT change (IFFT), N
Wino grad algorithm etc. One base 2 algorithm is currently scope is (8 ~ 65536). Input data real part plural part should
common FFT algorithm, its core thought is to decompose of be showed with bitwidth M bits of Two-Complement form,
the sequence of N points into (N - 1) / 2 points, finally M value range is (8 ~ 24); Similarly, phase factor bitwidth
decomposed into 2 points DF T and calculated, thereby scope is also (8 ~ 24). Data, phase factor and output data
eliminating DFT large numbers of repeated computation of double-ranked cache data in FFT realization process, can use
DFT. FFT algorithm can decompose of sequence from time Block RAM or Distributed RAM to storage. For I/O
domain or frequency domain: (1) decimation in time (DIT) architecture Burst, block RAM can store any points of data
namely directly divided the sequence x (n) on the odd and and phase factor, while distributed RAM can only store
even to the odd and even son sequence , then through points no greater than 1024 point data and phase factors; For
calculating the son sequence DFT to realize the whole Streaming I/O structure, can be use mixed storage method,
sequence DFT; (2) decimation in frequency (DIF), and Firstly choose to use the order number of block RAM
divided the number of frequency x(k) on the odd and even to memory, then adopt the distributed RAM for remains.
the even and odd points son sequence, then calculate son Xilinx FFT IP nuclear has four kinds of structure can be
sequence of DFT, get the DFT of whole frequency domain. choosed, the user can choose between the number of logical
The calculation complexity and need the amount of resources using and the length of convert time ,concrete
calculation of decimation in time and frequency are the same, cases as follows.
and by two different methods decomposition form: DIT need (1) assembly line, Streaming I/O structure: allows
to rearrange to input data sequence x(n), frequency continuous data processing, use the most logical resources;
extraction method requires to rearrange to output data (2) base 4, Burst the I/O structure: provide data
sequences x(k). Currently FFT algorithm has been used import/export stage and processing stage, import data and
widely in digital signal processing, image processing, oil processing the data separately. This structure has smaller
exploration and earthquake prediction etc. Meanwhile, in structure, but the conversion is time longer;
order to facilitate FFT algorithm application in engineering (3) base 2, Burst the I/O structure: use less logical
practice, each big FPGA producers also have put forwarded resources, went with four stages, provide two stage process.
related function of IP (Intellectual Property) module base. (4) base 2 Lite Burst the I/O structure: this is a kind of
Among them the IP core Fast Fourier Transform V5.0 Xilinx based on base 2 structure variants, adopted the time multiple
produced by Xilinx company offered multiple-selectable method with a minimum of logic resources, but the
calculation parameters, structure, data input/output flow conversion time is longest.
order way of FFT algorithm, can easily realize FFT For Burst the I/O structure, use DIT method; Line,
algorithm according to the needs of the users. Streaming I/O use DIF method.

Published by Atlantis Press, Paris, France.


© the authors
0863
The 2nd International Conference on Computer Application and System Modeling (2012)

In the actual hardware operation, the module execution the two base 2 of same group have the same compression
speed is very important parameters, this paper is based on the ratio. Example: the data length N = 1024, Scale_SCH = [10
assembly line, the simulation validation of Streaming I/O 10 00 01 11] for group 0 (stage 0 and stage 1) right shift bit 3,
structure do the continuous data processing. Line, Streaming group 1 (stage 2 and stage 3) right shift bit 1, group 2 (stage
I/O architecture adopt assembly line technology design to a 4 and stage 5) no shift, group 3 (stage 6 and stage 7) right
series of base 2 wing processing engine, and each wing shift bit 2, group 4 (stage 8 and stage 9) right shift bit 2. If
processing engine has its own independent memory to input transform lenth N is not 4 integer times power, the last group
data and intermediate data (figure 1). In this structure, FFT only contains a base 2 bands, can use 00 or 01 said.
IP nuclear has also deal with the current frame N point data, Experience conclusion (can prevent to produce data
load the next frame N point data, output an ancient frame N overflow) : N = 512, Scale_SCH = [11] 01 10 10 10; N =
point before data ability. 1024, Scale_SCH = [10 10 10 10 11].
Xilinx FFT IP nuclear V5.0 support three algorithm types: Compression ratio, the Scale_SCH bitwidth for assembly
full precision no compression, block floating-point and line, Streaming I/O architecture and base 4, Burst the I/O
fixed-point compression (compression ratio by user-defined). structure, for 2 * ceil (0.5 * log2 (N); For base 2, Burst the
For all the precision no compression structure, any a I/O structure and base 2 Lite Burst the I/O structure for 2 *
meaningful integer in data channels will be retained, the log2 (N), including N for converting data length.
decimal part produced during the operation will be truncated
or integer. This structure, for fixed-point algorithm, after III. FFT IP CORE OF SIMULATION VALIDATION
multistage multiplication operation later, data bits wide will Through invoke Xilinx IP core to achieve a 512 points,
double its output bitwidth, increasing input for (input data bitwidth. and phase factor for 16bit bitwidth of FFT
bitwidth+log2 (data conversion length) + 1) bits. algorithm modules, clock frequency for 50MHz (clock
For Block floating-point type, any data point in one frequency higher, can obtain higher reuse multiples, save
frame data have the same compression ratio, the compression more resources area), uses assembly line, Streaming I/O and
ratio as the output value by Block Exponent shows, and only fixed-point compression structure, complete in the
in FFT IP nuclear testing will produce a data, we will do commissioning medium or ower end FPGA, verify its
compression operations. reliability and feasibility. In order to facilitatly verify the
This paper adopted the fix-point compression structure. correctness of the nuclear function of FFT IP core: with zero
This structure, compared to full precision no compression start counting, in every clock rise along comes, add an
structure can greatly reduce the FPGA internal resources operation obtained data respectively as real part and plural
Xtreme DSP Slices and the use of block RAM, and relative part of input signal. Scale_SCH = [01 10 10 01 11], in
to block floating-point type, can be adjusted flexibly ISE10.1 build engineering, in invoke Xilinx FFT IP core,
compression ratio. The compression ratio chart(Scale_SCH) then use SE6.5 ModelSim to simulate, the simulation timing
of fixed-point compression structure. Compression ratio is as figure2shows.
according to 1, 2, 4, or 8 for each order compression, namely Timing validation aspects: the whole timing sequence is
separately shift right corresponds to 0, 1, 2 or 3. If entirely correct. As can be seen from the timing diagram:
compression are inadequate, the output wing will become signal high indicates that FFT IP core is ongoing FFT
beyond the dynamic range, cause data overflow. For Burst operations, after doing signal down that operation to have
I/O architecture, Scale_SCH’s expression methods: for each ended, the output FFT operation result; Edone signal done
stage compression ratio are made by appointed 2bits number, signal in a cycle reached before; At this time, a cycle, done
the zero stage 2bits number are the lowest 2bits zero order, market-place complete; that FFT operations And, because of
concrete for [... N4 , N3, N2, N1, N0], each 2bits number the 512 points, so, each operation FFT 512 clock cycle,
respectively correspond to the corresponding stage interval edone and done signal will push a; RFD signal has
compression ratio. For example: to base 4 structure, data been pulled that input data has been transferred to FFT IP
transfer length N = 1024, Scale_SCH = [01 10 00 11 01]for core of input ports, Streaming with using line, I/O
stage 0 right shift bit 2, stage 1 right shift bit 3, stage 2 right architecture continuous data processing, are consistent; Dv
shift bit 0, stage 3 right shift bit 2, stage 4 right shift bit 1. signal is high, show for the output signal is effective.
Experience conclusion (can prevent to produce data A functional verification aspects: according to FFT IP
overflow) : for the base 4 structure 1024 point, Burst I/O core in assembly line, in Streaming I/O architecture, interval
architecture, Scale_SCH = [10 10 10 10 11]; But for the base each frame data need three frames can output the
2 structure 1,024 point Scale_SCH = [01, 01 01 01 01 01 01 characteristic of the calculation results, can calculate inside
01 01 10]. the simulation above output corresponding to the [94:
For assembly line, Streaming I/O structure, put near a moments [94:605] + [94:605]* j FFT output results. Inside
pair of base near 2 bands group together, namely stage 0 and the Matlab simulation result, according to the proportion of
stage 1 for group 0, stage 2 and stage 3 is group 1, etc. Scale_SCH compress, and it is consistent with the result
Scale_SCH expression methods: for each group of shows that the FFT IP core woeking is normal .
compression ratio are made by appointed 2bits number, the
2bits number of zero group is the lowest, concrete form for IV. CONCLUSION
[... N4, N3, N2, N1, N0], each 2bits number respectively This paper mainly through FFT IP nuclear overall testing
correspond to the corresponding group of compression, said and validatiing FFT algorithm the feasibility and reliability

Published by Atlantis Press, Paris, France.


© the authors
0864
The 2nd International Conference on Computer Application and System Modeling (2012)

in medium or lower end FPGA. In selecting lines structure [5] Gregory W. WORNELL , Alan V. OPPENHEIN. Estimation of
realize FFT basis, adopts fixed point, reduce the time of data Signals from Noisy Measurements Using Wavelets [J]. IEEE
Transactions on Signal Processing,1992,40(3):611-623.
reading and processing, better meet the needs of the FFT
[6] Stéphane MALLAT. A Wavelet Tour of Signal Processing[M].
processing data. BeiJing:China Machine Press,2002.
REFERENCES [7] David L. DONOHO. De-Noising by Soft-Thresholding [J]. IEEE
Transactions on Information Theory,1995,41(3):613-627.
[1] Cooley.J.W,Tukey.J.W. An algorithm for the machine computation of [8] SWELDENS W. The Lifting Scheme: A Custom-Design
complex Fourier series. Mathematics of Computation, 1965, 297~301 Construction of Biorthogonal Wavelets[J]. Applied and
[2] K.M.Lakin. A Review Of The Thin Film Resonator Technology. Computational Harmonic Analysis 3, 1996 : 186-200.
IEEE Microwave Magazine, 2003, 4(4 SPE(ISS)):333~336 [9] FranLke.U.andS.Heinrieh.Fast Obstaele Deteetion for Urban Traffie
[3] Ng Kuang Chern,Nathaniel,Poo Ann Neow and Marcelo H.Ang Situations IEEE Trans. Intelligeni Trans Portation Systems,2002.3
Jr,Practical issues in Pixel-based Autofocusing For Maxhine [10] S.Zhang et al.The Research of Mixed Programming Auto-Focus
Vision[C].Proceedings of the 2001 IEEE International Conference on Based On Image Processing .ICICA2010,PartI,CCIS105,PP.217-
Robtics&Automation,May 2001 Seoul,Korea.p.2791-6. 225,2010.
[4] Santos A,Ortiz de Solorzano C,de la Pena J,Vaquero J,Malpica N, del [11] S. Zhang, G. Jin, J. Xiao, S. Li, Y.P. Qin, J.H. Liu, T. An and W.F.
Pozo F.Evaluation of autofocus functions in molecular cytogenetic Zhong.Generalized Constraint Neural Network Model System
analysis[J].Journal of Microscopy 1997;188;264-72. Parameter Identification.Advanced Materials Research Vols. 143-
144 ,pp 1207-1212,2011

Memory Memory Memory Memory


bank bank bank bank

Radix- Radix-
Radix- Radix- 2 butterfly
2 butterfly
Input data 2 butterfly processin 2 butterfly processin
g engine g engine
processin processin
i i

Order 0 Order3
Order 1 Order4

Memory Memory
bank bank

Radix- Output
Radix- 2 butterfly
2 butterfly processin reordering output
processin
g engine data
i

Order n-
Order n
1

Figure 1. FFT module assembly line, Streaming I/O structure

Figure 2. FFT simulation results local figure

Published by Atlantis Press, Paris, France.


© the authors
0865

You might also like