Rocket Io Aurora Core For Xilinx Virtex II Pro Board
Rocket Io Aurora Core For Xilinx Virtex II Pro Board
The Rocket IO Transceiver module provides hardware support for serial communication between FPGAs. We still need a protocol that supports and controls the hardware. Xilinx developed an open protocol which can be typically used in applications requiring simple, low cost, high rate, data channels. It is the Aurora communication. The Aurora communication protocol is a serial interconnection protocol which is designed to provide a transparent interface that can be used as a point to point serial data transfer method for the interconnection of high speed. Aurora is a relatively simple protocol which only controls the link layer and the physical layer. These features let other upper- level protocols be easily applied on top of the Aurora protocol. The Aurora protocol can use one or more high speed serial channels to construct a higher speed channel. Connections can be fullduplex or simplex. Aurora cores initialize a channel and applications can pass data across the channel as frames or streams of data. The framing user interface is Local- Link compliant. After initialization, it allows framed data to be sent across the Aurora channel. The streaming user interface allows users to start a single, infinite frame. After initialization, the user writes words to the frame using a simple register style interface that has a data valid signal. We will use the Streaming core. Xilinx Core Generator provides a tool to customize the parameters of the Aurora core. The Aurora core is an IP core which supports the entire Aurora protocol, and it can be customized to meet a wide variety of requirements .The Aurora core is instantiated using the Core Generator function in the Xilinx tool flow with a single MGT lane. The streaming interface:
Streaming Interface
RX_D
RX_SRC_RDY_N
TX_D and RX_D are the data signal vectors. Other signals control the operation of the Aurora module. Signal TX_SRC_RDY_N is asserted low when the data is valid; in the streaming interface, this signal is used to start the infinite frame. Signal TX_DST_RDY_N is asserted low when the channel is ready to receive data. If the signal RX_SRC_RDY_N is asserted low, the data must be read immediately or it will be lost. Thus, when data is presented on the TX_D port and the signal TX_SRC_RDY_N is asserted low, the data is valid and ready to be written to the channel. If the signal TX_DST_RDY_N is asserted, the channel will start to receive the data. The received data will be stored in the Aurora core and the signal RX_SRC_RDY_N will be asserted, and data will be read through the RX_D port immediately.
How to generate the aurora core: 1-opens the ISE Coregen (Xilinx-> ISE->Accessories->Core Generator)
3-go to (Communications & Networking->Serial interfaces->aurora ) and click on aurora to generate the aurora core
If you have not the license for aurora core, you can obtain it for free at the Xilinx website after a free registration on the website. 4- Configure the aurora core that you want to generate
Features:
5-Click on finish to generate the aurora core and you can now use it. You can look inside the folder where you generate the core to see the structure and read information on the aurora core before use it.
RXP: Positive differential serial data input pin.
RXN: Negative differential serial data input pin. TXP: Positive differential serial data output pin.
RX_D[0:31
RX_SRC_RDY_N
TXN: Negative differential serial data output pin. CHANNEL_UP: Asserted when Aurora channel initialization is complete and channel is ready to send data. The Aurora core can receive data before CHANNEL_UP. LANE_UP: The Aurora core can only receive data after all LANE_UP signals are High. HARD_ERROR: Hard error detected. (asserted until Aurora core resets). LOOPBACK [0:1] Refer to the RocketIO Transceiver User Guide for details. POWER_DOWN: Drives the powerdown input to the MGT. RESET: Resets the Aurora core. SOFT_ERROR: Soft error detected in the incoming serial stream. DCM_NOT_LOCKED:If a DCM is used to generate clocks for the Aurora core, the DCM_NOT_LOCKED signal should be connected to the inverse of the DCMs LOCKED signal. The clock modules provided with the Aurora core use the DCM for clock division. The DCM_NOT_LOCKED signal from the clock module should be connected to the DCM_NOT_LOCKED signal on the Aurora core. USER_CLK_N_2X: Parallel clock required for Virtex-II Pro FPGA cores with 4-byte lanes. This clock is used to drive the internal synchronization logic of the RocketIO transceiver. The clock must be aligned to the negative edge of USER_CLK and twice the frequency. USER_CLK: Parallel clock shared by the Aurora core and the user application. On Virtex-II Pro FPGA core with 2-byte lanes, the rate is the same as the reference clock. On Virtex-II Pro FPGA cores with 4-byte lanes, the rate is half the reference clock rate. TOP_<ref_cl)>:This port is used when RocketIO transceivers in the Aurora core are placed on the top edge of a Virtex-II Pro FPGA. DO_CC:The Aurora core sends CC sequences on all lanes on every clock cycle when this signal is asserted. Connects to the DO_CC output on the CC module. CC means clock compensation.
LOOPBACK [0:1]
RXP RXN
In ML310, as features, we use 1 aurora lane; here we dont need to bond channels. We also use 4 bytes per lane; that means the width of transmit and received data is 32 bits. We use the streaming interface which has been already explained. We specify a line rate of 1.5Gbps. Transceiver placement is well explained by this figure:
In the designed board, we use 2 MGTS, one on the top and one on the bottom. We should use the same location when we instantiate the aurora core. We choose to use BREFCLK clock as clock, The BREFCLK configuration uses dedicated routing resources that reduce jitter. Therefore, ML310 board jumpers are used to connect pins 1-2 on J20 and pins 2-3 on J21 for enabling 156.25/125 MHz clock signals. Another jumper is used to connect pins 1-2 on J10 for LVDS 2.5v functionality. An important point is that the line rate and the transceiver placement can be modified in UCF file so that we dont need to generate every time the aurora core. Here are examples that show how line rate and transceiver placement are defined in the UCF file:
# Place lane_0_mgt_i at location X0Y1 INST aurora_0/aurora_0/USER_LOGIC_I/aurora_201_aurora_module_i0/lane_0_mgt_i LOC=GT_X0Y1; # set a line rate of 75MHZ*20=1.5Gbps NET aurora_0/aurora_0/USER_LOGIC_I/user_clk_i0 PERIOD = 13 ns;
Another important point is, when you use the BREFCLK clock, you can not use the same clock location for a MGT on top and a MGT on bottom. Here is the clock location that we use:
NET TOP_BREF_CLK_P_pin LOC = F16; NET TOP_BREF_CLK_N_pin LOC = G16;
We need also to modify the source file aurora_201.v[hd] before using the aurora core with the designed board. There, the generic attribute TX_PREEMPHASIS must be change from 1 to 3. This attribute is defined in Rocket io user guide as An integer value (0-3) that sets the output driver preemphasis to improve output waveform shaping for various load conditions. Larger value denotes stronger pre-emphasis. This is not adjustable during the IP core generation. Like explained in the section High-Speed Serial Trace Design of the rocketio user guide, The characteristic
impedance of a pair of differential traces depends not only on the individual trace dimensions, but also on the spacing between them.
II- Fifo core: Fifos are generated by ISE Coregen using the following specifications: target device: Virtex 2 pro ff896, grade 6, independent clocks using Block RAM, First-Word Fall-Through. The independent clock FIFOs enables the user to implement unique clock domains on the write and read ports. The FIFOs handles the synchronization between clock domains, placing no requirements on phase and frequency relationships between clocks. in order to synchronize the different clocks, we choose the independent clock FIFOs.
In such a FIFO, there are many interface signals; some of which are not used in the design. The signal RST is a reset of the entire core logic (both write and read clock domain). This signal is an asynchronous reset that initializes all internal pointers and output registers. When it is enabled, it will be High for at least three read clock and write clock cycles to ensure all internal states are reset to the correct values. Signal vector DIN [N-1: 0] and DOUT [N-1: 0] are the input and output data buses used when writing and reading the FIFO. N represents the width of the data and it will 32 because we works with 32 bits data. WR_CLK and RD_CLK are the clock signal interface, and they could be connected to same clock or different clocks. All signals on the write domain are synchronous to the WR_CLK and signals on the read domain will be synchronous to the RD_CLK. WR_EN and RD_EN are the signal which could be controlled by the outside logic, when the WR_EN or RD_EN is asserted, it causes the data (on DIN or DOUT) to be written or read from the FIFO. FULL, ALMOST_FULL, EMPTY and ALMOST_EMPTY signal are the status flag of the FIFO. When the ALMOST_FULL signal is asserted, it indicates that only one more write can be performed before the FIFO is full. When ALMOST_EMPTY is asserted, this signal indicates that the FIFO is almost empty and one word
remains in the FIFO. Read requests are ignored when the FIFO is empty; initiating a read while empty is non-destructive to the FIFO. Here are waveforms showing how this FIFO is working
More information and waveforms are provided in the datasheet present in the folder where you generated your FIFO core. How to generate the FIFO core: 1-open the ISE Coregen (Xilinx-> ISE->Accessories->Core Generator) 2- Create a new project with the specific device 3-go to (Memories & Storages Elements->FIFOs->Fifo generator) and click on Fifo generator to generate the FIFO core
4- Configure the FIFO core that you want to generate and Click on finish to generate the FIFO core. You look inside the folder where you generate the core to see the structure and read some informations on the FIFO core before use it.
Our fifo interface: n DATAIN WR_CLK WR_EN FULL ALMOST_FULL III- Designed module: The designed module includes two fifos ( TX and RX FIFO) and the aurora core. n FIFO DATAOUT RD_CLK RD_EN EMPTY VALID RST wr_data_count
Input
FRAME ENCODER
AURORA
FRAME DECODER
Ouput
IV- EDK The Aurora peripheral is created in EDK Based on the peripheral wizard. The Aurora peripheral also needs some registers. We will use read and write FIFOs to connect the Aurora peripheral to Processor Local Bus (PLB). The Aurora peripheral template will be created by the end of peripheral configuration in EDK. Then, we will modify the template to include the VHDL files of Aurora core and TX/RX FIFOs. The CORE Generator will generate a .ngc file after the FIFO is generated. We will
use that file to implement the TX and RX FIFOs. The CORE Generator will create Aurora core source files after the core is generated and we needed to copy these files into the Aurora peripheral source folder within the XPS project. After copy those files into the peripheral, the .pao file of that peripheral must be modified. Those source files which will be included in the .pao file must be listed in exactly hierarchical order. The components at the top of the hierarchy are listed at the bottom of the file. POWER PC will send data stream to the TX FIFO of the Aurora peripheral through WRITE FIFO, and the TX FIFO pass the data to the Aurora core. The Aurora core will serialize the data stream and transmit them through the SATA ports. On another board, the other Aurora core will receive the data and deserialize them, then transmit the deserialized data stream through the RX FIFO to UART through READ FIFO. Figure shows the design of the Aurora communication system:
WRITE FIFO: HW/SW IF TXFIFO Bus 2 IP clock READ FIFO: HW/SW IF RXFIFO
BOARD
TX FIFO
From FIFO CORE RX FIFO From FIFO CORE UART Frame detector
We need also to modify the source file aurora_201.v[hd] before using the aurora core with the designed board. There, the generic attribute TX_PREEMPHASIS must be change from 1 to 3.We should also modify the UCF file by setting location of MGT that we should use; clocks location and clocks value. Each fifo in this architecture needs a finite state machine (FSM). Graphics below show and explain theses FSMs:
IDLE
IDLE
WFIFO2IP_RdAck = '1'
RFIFO2IP_WrAck = '1'
READ
WRITE
IDLE IDLE
not rxfifo_empty0 and not RFIFO2IP_full and not valid not RX_SRC_RDY_N_i0) and (not rxfifo_almostfull0) and valid
WRITE
FSM of TXFIFO
FSM of RXFIFO
In the case of TXFIFO and RXFIFO, theses FSMs are implemented by us and just explain how we manage the WR_EN and RD_EN signals that allow respectively the write and the read in the generated FIFO. Because we are using the streaming core, we have 2 possibilities; the first is to send valid data every time and as soon as the aurora core started. The second is to send valid data using frame mode and others data will be considered as invalid and will be rejected. We used the second method. The structure of the frame is: SOF SIZE VALID DATA
Where SOF=0xFFFFFFFF and SIZE the size of VALID DATA. When we receive the SOF, the signal valid goes to 1 and when we receive reach the size of VALID DATA, it goes to 0. To realize this data transfer protocol, we synthesize the FSMs of FRAME ENCODER and FRAME DECODER units. These FSMs are below:
IDLE
a
SEND_SOF
cbaedSEND_SIZE fgghijk-
tx_fifo2frame_enc_en ='1' tx_fifo2frame_enc_en ='1' tx_fifo2frame_enc_en ='1' tx_fifo2frame_enc_data=SOF and sizereg/=0 sizereg/=0 tx_fifo2frame_enc_data=ESC and sizereg/=0 sizereg/=0 tx_fifo2frame_enc_data=EOF and sizereg/=0 sizereg=0 sizereg=0
SEND_ESCEOF
SEND_ESCESC
g f e
n
SEND_DATA
SEND_ESCSOF
d c
TRANSFER
GET_SIZE
bacbd-
f
ESC
ecfdge-
We use 3 software registers; 2 registers for controlling the read and write state of the FIFOs and the last one contains the number of errors when we compare received data and sent data using hardware.