Design and implementation of multichannel pulse compression system
Design and implementation of multichannel pulse compression system
Yajie Yue
School of Software, Harbin University of Science and Technology, Harbin, China
Email:[email protected]
Chenming Sha
School of Software, Harbin University of Science and Technology, Harbin, China
Email: [email protected]
Abstract—The implementation of digital signal process The current FPGA chip has been becoming the core
system based on FPGA is an important method for component of digital signal processing system, no longer
embedded system. Pulse compression can be implemented in playing the role of glue logic[4]. In the chip, it includes
digital method. The realization of a multichannel pulse not only the logical resources as well as multiplexers,
compression system by using all digital method has the
characteristics of high reliability, strong anti-disturbance,
memory, hard-core multiply-add units and embedded
good flexibility and convenient for application. In essence, processors and other equipments, but also can be
pulse compress is a method of frequency spectrum provided with the ability of highly parallel computing,
expanding, and is used in matched filtering. It incarnates which has made FPGA become the ideal device of high-
the matching level of filter and the expectant phase of performance digital signal processing, particularly
received signal. A multichannel pulse compression system is suitable for completing digital filtering, fast Fourier
designed using FFT IP core which can be reused in different transform etc.
periods of digital pulse compression, respectively This design uses Xilinx's Virtex5 FPGA family, and
performing FFT and IFFT calculation, so that the hardware achieves the pulse compression processing on AD
consumption is saved significantly. This paper presents the
logic programming of Multichannel Pulse Compression
collection down-converted signal by the method of
System based on Xilinx FPGA, and introduces the high- frequency domain pulse pressure, meantime, it uses
speed transmission module, the digital signal System Generator to perform the program’s development
process module and the data buffer module in detail. in digital signal processing, which is the Xilinx's latest
integrated development tools for digital processing
Index Terms—FPGA, Pulse Compression, Xilinx, System system[5][6].
Generator The structure of the paper is as follows. Section II
introduces the chosen process platform. Section III
introduces the system function and structure. Section IV
I. INTRODUCTION introduces the design of the key modules in FPGA
With the development of the high-speed large scale program. And Section V gives the simulation and
integrated circuit, generating linear frequency modulated implementation results.
signal and pulse compression can be implemented in
digital method. Besides the matched- filtering processing II THE CHOSEN PLATFORM
the digital pulse compression system can also apply The FPGA device in this system use is
sidelobe suppression[1]. It can decrease the size of the XC5VSX95T(Virtex-5). The Virtex-5 family provides
system and it also has high stability and maintainability, the newest most powerful features in the FPGA market.
and this can improve the programmable ability of the Using the second generation ASMBL™ (Advanced
system. So the digital processing method is of catholic Silicon Modular Block) column-based architecture, the
concern and has been widely applied[2]. Virtex-5 family contains five distinct platforms (sub-
Digital pulse compression processing of the linear FM families), the most choice offered by any FPGA family.
usually adopts the double channels of perpendicular Each platform contains a different ratio of features to
processing scheme to avoid the effect of Echo Signal’s address the needs of a wide variety of advanced logic
random-phase, which may reduce the loss of system designs. In addition to the most advanced, high-
processing by about 3dB, meantime, this method can performance logic fabric, Virtex-5 FPGAs contain many
reduce the demand for AD acquisition devices[3]. hard-IP system level blocks, including powerful 36-Kbit
block RAM/FIFOs, second generation 25 x 18 DSP transmit the data after pulse compression to the DSP
slices, SelectIO™ technology with built-in digitally board which makes further data processing.
controlled impedance, ChipSync™ source-synchronous The Control-FPGA is used for data transmission and
interface blocks, system monitor functionality, enhanced the system working state control. The data transmission
clock management tiles with integrated DCM (Digital includes PCI transmission, serial Rapid IO transmission
Clock Managers) and phase-locked-loop (PLL) clock and serial RocketIO transmission.
generators, and advanced configuration options. PCI bus interface is used for the transmission of
Additional platform dependant features include power- control and status information between the upper machine
optimized high-speed serial transceiver blocks for and the Control-FPGA.
enhanced serial connectivity, PCI Express® compliant RocketIO transmission interface is used for the high-
integrated Endpoint blocks, tri-mode Ethernet MACs speed transmission between the Control-FPGA and the
(Media Access Controllers), and high-performance signal-processing FPGA.
PowerPC® 440 microprocessor embedded blocks. These The main tasks of the signal-processing FPGA include
features allow advanced logic designers to build the receiving the data after AD sampling and digital down
highest levels of performance and functionality into their converter, multichannel pulse compression process and
FPGA-based systems. Built on a 65-nm state-of-the-art transmitting the data after pulse compression to the DSP
copper process technology, Virtex-5 FPGAs are a board. The signal-processing FPGA is the core of the
programmable alternative to custom ASIC technology. pulse compression system. The following section mainly
Most advanced system designs require the programmable introduce the software structure of the signal-processing
strength of FPGAs. Virtex-5 FPGAs offer the best FPGA and the design of the key modules. As shown in
solution for addressing the needs of high-performance Fig1.
logic designers, high-performance DSP designers, and
high-performance embedded systems designers with
unprecedented logic, DSP, hard/soft microprocessor, and
connectivity capabilities. The Virtex-5 LXT, SXT, TXT,
and FXT platforms include advanced high-speed serial
connectivity and link/transaction layer capability.
GTP_RXDATA
Data
Data Receive
port
receive
4x_mode unit
RocketIO
Data
transmit Data
unit Transmit
port
Rocket IO consists of PMA (physical medium Figure 5. The program structure of the signal-processing FPGA
adaptation layer) and PCS (Physical Coding Sub-layer) Rocket IO module
two parts. Among them, PMA sub-layer is mainly used
for serialization and de-string, PCS includes circuit
B. The Design of Pulse Compression Module
encoding and CRC encoding.
Rocket IO in the signal processing FPGA is used for The linear frequency modulation pulse compression
the transmission and control of high-speed data between can be implemented in time domain or frequency
FPGAs. Its bit rate is 2.5Gbit/s. domain[9]. In time domain, the process of the pulse
Rocket IO uses 4X mode(four serial transceivers are compression can be expressed as:
combined into one group to transmit and receive), and N −1
channel bonding modes whose channel bonding code is y ( n) = x ( n) ∗ h( n) = ∑ x ( k ) h( n − k ) (1)
5C and channel bonding length is 8 bytes, and comma k =0
code is BC. Channel bonding is used for the data
alignment between four channels, while comma code is In frequency domain, the process of the pulse
used for the bytes separation between channels. Channel compression can be expressed as:
bonding cancels the skew between GTP lanes by using y(n) = IFFT[ X (ω)]⋅ H (ω)] (2)
the RX elastic buffer as a variable latency block. The
transmitter sends a pattern simultaneously on all lanes,
which the channel bonding circuit uses to set the latency = IFFT {FFT {[ x(n)] ⋅ FFT [h(n)]}
for each lane so that data is presented without skew at the
The time-domain digital pulse compression system
FPGA RX interface[8]. Channel Bonding Conceptual
usually uses the FIR filter, which is implemented by
View As shown in Fig4.
convoluting two finite-length sequences. The
real part and imaginary part of the output data needs
correlation operation. The operational volume of FIR
complex-number correlation process gradually increases
as the signal time width increases. And the magnitude of Viterbi decoder, Reed-Solomon encoder/decoder),
the processing device also increases. So, the time-domain arithmetic, memories (e.g., FIFO, RAM, ROM), and
pulse compression has advantages when the time width of digital logic.
the signal is not wide [10]. But when the width increases, Automatic code generation of VHDL or Verilog from
the amount of the processer will increase. Simulink. Implement behavioral (RTL) generation and
For the frequency-domain pulse compression, when target specific Xilinx IP cores from the Xilinx Blockset.
the time width of the signal is wide, the magnitude of the System Generator also supports custom HDL through its
processing device doesn’t increase much, because the HDL import flow.
process system is based on the high-speed and effective Hardware co-simulation. A code generation option that
FFT device[11]. The operational volume of the time allows you to validate working hardware and accelerate
domain pulse compression is N2, and N is the number of simulations in Simulink and MATLAB. System
the input data sequence. If Radix-2 FFT Algorithms is Generator supports Ethernet (10/100/Gigabit) and JTAG
applied, the operational volume of the frequency-domain communication between a hardware platform and
pulse compression can be reduced to (1/2)N[log2N] [12]. Simulink.
So compared with the time-domain process, the Xilinx Power Analyzer (XPA) Integration. Integration
frequency-domain process has large benefits. Besides, with XPA enables designers to analyze power
because the FFT and digital signal process technology requirements and explore design modifications to meet
develops, the operational volume becomes smaller and their power targets.
the process speed becomes faster. So we choose the Hardware/software co-design of embedded systems.
frequency domain method in the pulse compression Build and debug DSP co-processors for the Xilinx
system. MicroBlaze™ soft processor core. System Generator
The main parts of the frequency-domain pulse provides a shared memory abstraction of the HW/SW
compression processs is the input data buffer unit, FFT interface, automatically generating the DSP the bus
unit, complex multiplication unit, IFFT unit, data-format interface logic, software drivers, and software
converting unit and the output data buffer unit. The documentation .
frequency-domain pulse compression processs structure The FPGA program structure of the pulse compression
as shown in Fig6. in the System Generator as shown in Fig7. It realizes the
digital signal process by FFT, adder, multiplication, ram
and other logic IP core.
The data from the AD acquisition board has been
processed by DDC (digital down converter), and the data
format is 16-bit signed integer. The data after pulse
compression will be transmitted to the DSP board for the
further process. Because the data format in the DSP is
floating-point, in the last stage of pulse compression
FPGA program is the format converting.
This system use the FFT core of the Xilinx to
implement FFT and IFFT operation. The Xilinx
Figure 6. The frequency-domain pulse compression processs structure
LogiCore™ IP Fast Fourier Transform (FFT) implements
the Cooley-Tukey FFT algorithm, a computationally
The pulse compression module is the core of the signal
efficient method for calculating the Discrete Fourier
processing FPGA. We use System Generator to develop
Transform (DFT). The FFT core can work at very high
the FPGA program of the pulse compression module.
clock frequency (395MHz), and has good reliability.
System Generator is a DSP design tool from Xilinx that
The FFT core computes an N-point forward DFT or
enables the use of the Mathworks model-based Simulink
inverse DFT (IDFT) where N can be 2m. For fixed-point
design environment for FPGA design. Previous
inputs, the input data is a vector of N complex values
experience in Xilinx FPGAs or RTL design
represented as dual bx-bit two’s complement numbers,
methodologies is not required when using System
that is, bx bits for each of the real and imaginary
Generator. Designs are captured in the DSP friendly
components of the data sample, where bx is in the range 8
Simulink modeling environment using a Xilinx specific
to 34 bits inclusive. Similarly, the phase factors bw can
blockset. All of the downstream FPGA implementation
be 8 to 34 bits wide. For single-precision floating-point
steps including synthesis and place and route are
inputs, the input data is a vector of N complex values
automatically performed to generate an FPGA
represented as dual 32-bit floating-point numbers with the
programming file.
phase factors represented as 24-bit or 25-bit fixed-point
System Generator have many Key Features. Build and
numbers.
debug high-performance DSP systems in Simulink using
the Xilinx Blockset that contains functions for signal
processing (e.g., FIR filters, FFTs), error correction (e.g.,
Though the FFT core supports floating-point operation, To achieve maximum operation speed, we choose the
we weight the operation volume and output data precision, pipelined streaming I/O operation.
and we choose the fixed-point way. The data format of the pulse compression FPGA
Three arithmetic fixed-point options are available for program is fixed-point, and the data format in DSP
computing the FFT: (Ts201) is single floating point. So the last operation of
Full-precision unscaled arithmetic. the pulse compression is data transformation. We use
Scaled fixed-point, where you provide the scaling Xilinx Floating-Point core and other logic to implement
schedule. the data format transformation. The Xilinx Floating-Point
Block floating-point (run-time adjusted scaling). core provides designers with the means to perform
With block floating-point, each stage applies sufficient floating-point arithmetic on an FPGA device[13]. The
scaling to keep numbers in range, and the scaling is core can be customized for operation, word length,
tracked by a block exponent. The block floating-point latency, and interface.
mode may use significantly more resources than the Block Diagram of Generic Floating-Point Binary
scaled mode, as it must maintain extra bits of precision to Operator Core as shown in Fig 8.
allow dynamic scaling without impacting performance.
Therefore, if the input data is well understood and is
unlikely to exhibit large amplitude fluctuation, using
scaled arithmetic (with a suitable scaling schedule to
avoid overflow in the known worst case) is sufficient, and
resources may be saved.
The FFT core provides four architecture options to
offer a trade-off between core size and transform time.
Four architecture options are available:
Pipelined Streaming I/O – Allows continuous data Figure 8. Block Diagram of Generic Floating-Point Binary
processing. Operator Core
Radix-4 Burst I/O – Loads and processes data
separately, using an iterative approach. It is smaller in When use this core to achieve fixed-point to floating-
size than the pipelined solution, but has a longer point data transform, the two inputs A and B are the
transform time. fraction input and exponent input, and the output result is
Radix-2 Burst I/O – Uses the same iterative approach the floating-point data output. Signal floating-point
as Radix-4, but the butterfly is smaller. This means it is number uses 32 bits, with a 24-bit fraction and 8-bit
smaller in size than the Radix-4 solution, but the exponent[14]. We add the two block-floating outputs of
transform time is longer. the FFT and IFFT, and make it as the exponent input of
Radix-2 Lite Burst I/O – Based on the Radix-2 the core. And we use the output of the IFFT core as the
architecture, this variant uses a time-multiplexed. fraction input of the core. This method not only
transforms the fixed-point format to the single floating-
point format, but also uses the block-floating output to widths, independent of the other port. In addition, the
weight the IFFT output, which is final pulse compression read port width can be different from the write port width
output. for each port. The memory content can be initialized or
cleared by the configuration bitstream. During a write
C. The Design of Data Buffer Module
operation the memory can be set to have the data output
In the pulse compression system, data buffer module is either remain unchanged, reflect the new data being
used mainly for receiving the data acquisition after down written or the previous data now being overwritten.
converter transmitted by ADC board, after meeting the In the Virtex-5 architecture, the special logic in the
data requirements on a pulse compression, the data Block RAM enable users to easily achieve synchronize or
waiting to receive is transmitted to the pulse compression multiple rate (asynchronous) FIFO. This eliminates using
module for processing. In addition, a buffer is required other CLB logic since for the counter, comparator, or
for the data after pulse compression processing, After markers’ generating, Instead, each FIFO can use only a
meeting the size of Rocket IO transmission packet only Block RAM. Standards and the first words fall through
once, the data after processing is transmitted to Rocket IO (FWFT) can support both.
module for external sending[15]. The asynchronous FIFO The Xilinx LogiCORE™ IP FIFO Generator is a fully
is used to achieve the data buffer design in this design. verified first-in first-out (FIFO) memory queue for
Virtex-5 FPGA provides two memory structures: applications requiring in-order storage and retrieval. The
Distributed memory architecture and block memory core provides an optimized solution for all FIFO
structure. Distributed Memory ( Distributed Select RAM ) configurations and delivers maximum performance (up to
is achieved by the CLB's lookup table (LUT). Block 500 MHz) while utilizing minimum resources. Delivered
memory (Block RAM) is a special memory module in the through the Xilinx CORE Generator™ software, the
FPGA, each 18Kb, number varies as the device size, structure can be customized by the user including the
which can be configured for single or dual port Block width, depth, status flags, memory type, and the
RAM. Compared with the distributed storage structure, write/read port aspect ratios. The FIFO Generator core
block storage architecture can achieve higher clock speed, supports Native interface FIFOs and AXI4 interface
therefore, we used block memory to achieve the FIFOs. The Native interface FIFO cores include the
asynchronous FIFO in the design. Block RAM schematic original standard FIFO functions delivered by the
symbol as shown in Fig9. previous versions of the FIFO Generator (up to v6.2).
Native interface FIFO cores are optimized for buffering,
ADDRA[n:0] data width conversion and clock domain decoupling
DINA[m:0]
applications, providing in-order storage and retrieval.
WEA
Top-Level View of FIFO in Block RAM as shown in
ENA DOUTA[n:0]
Fig 10.
SINITA RFDA
NDA RDYA
CLKA
ADDRB[n:0]
DINB[m:0]
WEB
ENB DOUTA[n:0]
SINITB RFDA
NDB RDYA
CLKB