Xilinx Answer 56616 7 Series PCIe Link Training Debug Guide
Xilinx Answer 56616 7 Series PCIe Link Training Debug Guide
Debugging Guide for 7-Series Integrated PCI Express Block Link Training Issues
Important Note: This downloadable PDF of an Answer Record is provided to enhance its usability and
readability. It is important to note that Answer Records are Web-based content that are frequently updated as new
information becomes available. You are reminded to visit the Xilinx Technical Support Website and review (Xilinx Answer
56616) for the latest version of this Answer.
This answer record has screen shots of tables and figures from other documents. The guidelines provided may have
changed in the latest release of those documents. The readers are advised to refer to the latest release of the
corresponding documents.
Some GTX/GTP (Gigabit Transceiver) settings can be tuned to correct link training issues. Some guidance on which
parameters should be tuned is provided in this document. In general, the default settings should work across all boards
and systems. In the case where non-default value works, please contact Xilinx Technical Support before permanently
using those parameter values in your design.
Introduction
This document describes techniques to debug link training issues with 7-Series Integrated PCI Express Block. A complete
list of signals to capture in ChipScope Pro/Vivado ILA when debugging link training issues has been provided. Screen
captures of the signal waveforms illustrate how to analyze those signals and establish theories on potential reasons
causing the problem. One of the main reasons behind running into link training issues is due to Signal Integrity (SI) issues
on the board. A general guideline of things to check has been provided to debug probable issue due to SI.
Link training issues do not entirely depend on the PCIe Core. They are equally a function of the board and how the
system is connected up. Therefore, it is important to make sure that all the factors affecting the signal integrity on the
board should be thoroughly checked (e.g. reference clock quality, voltage signal level etc.). There are few transceiver
parameters that a user could tune to suit their system. These parameters will be discussed in this document.
In the DETECT state, each lane performs receiver detect to determine if a link partner is present on that lane. Lanes that
do not detect a link partner are not used and the FPGA drives an electrical idle on these lanes. The second state entered
during link training is the POLLING state. This is the first state where the link partners exchange TS1 and TS2 ordered
sets. During this state, bit symbol lock and lane polarity are established.
The CONFIGURATION state follows POLLING. During CONFIGURATION, link and lane numbers are exchanged through
TS1 and TS2 ordered sets and the link width are established. Once CONFIGURATION completes, the next state is L0.
The L0 state is the normal working state where data is transferred on the link. The core output signal user_lnk_up is
asserted during this state. Note that user_lnk_up does not assert immediately upon entering L0, but asserts after the
data link layer achieves the DL.ACTIVE state, meaning the initial flow control credits have been exchanged.
During the link training process, the following are discovered and determined:
Ordered Sets
During the link training process, the physical layer communicates by exchanging TS1 and TS2 ordered sets. Ordered sets
are packets that originate and terminate in the physical layer.
There are four different types of ordered sets. Ordered sets are not scrambled, so they are easily viewed using
ChipScope Pro/Vivado ILA or in simulation at the GT TX/RX interface. The four different types of ordered sets are
Training Sequence ordered sets (TS1s and TS2s), Electrical Idle ordered sets (EIOS), Skip ordered sets (SKP), and
Table 1 shows the description for each symbol in TS1 ordered set. TS1s and TS2s are mostly the same except for the
following:
• Symbols 6-15 which denote the TS2 identifier for a TS2 ordered set.
• TS2 symbol 4, bit 6, can be used to determine Link Upconfigure Capability/Selectable De-emphasis on top of the
"Autonomous Change" as in TS1.
• TS1 Symbol 5, bit 4, is required to be implemented for GEN2 speed while it is reserved in TS2 symbol.
For more details on TS1 and TS2, check section 4.2.4.1 of the PCI Express Base Specification Rev2.1.
• The Electrical Idle Ordered-Set consists of four symbols- COM, IDL, IDL, IDL = BC, 7C, 7C, 7C.
• The transmitter sends out the electrical idle ordered set before driving electrical idle.
• After receiving the electrical idle ordered set, the link partner prepares the link for transition to electrical idle.
• Consists of four symbols - COM, SKP, SKP, SKP = BC, 1C, 1C, 1C
• SKP ordered set is transmitted at regular intervals from transmitter to the receiver.
• Used for clock tolerance compensation
• Also consists of four symbols - COM, FTS, FTS, FTS = BC, 3C, 3C, 3C
• A transmitter sends FTS ordered sets
• The number of required ordered sets is agreed during link training and initialization
Figure 2 categorizes link training failures into five types. This covers only the most common issues. There could be other
issues that are seen in a system during link training. Debugging guidelines and approaches described in this document
should still be applicable to debug such issues.
1. Reset failure
2. Receiver detect failure
3. Receive errors
4. Link width train down
5. Link speed train down
The first thing to check is whether user_lnk_up is asserted or not. If user_lnk_up is asserted, but the core is not detected
by the host, go to the “FPGA Configuration Time Debug” section to proceed with further investigation. If user_lnk_up is
not asserted, then proceed investigating by identifying the failure type.
Most of the Link training problems are due to board signal integrity problems or incorrect GT usage. The board must meet
both the electrical requirements set forth by the GT user guide and also the PCI Express Base Specification. This will be
discussed in more detail in the latter part of this document.
If “pl_ltssm_state[5:0]” is stuck in 0,1,2 or 3, the link has probably run into a Receiver Detect Failure.
Receiver detect is the first state of the Link Training Status State Machine (LTSSM). The PCI Express specification
includes a feature that allows the transmitter on a given link to detect if a receiver is present. The decision if a receiver is
Figure 4 shows the general debug flow for issues related to Receiver Detect. After capturing “Signal Set-2, Detect State”,
“Signal Set-3, Reset” and “Signal Set-4, Clocking”, make sure the signals are toggling correctly as described in “Signal
Set-1, LTSSM” section.
Receive Errors
When the incoming data is corrupted due to crosstalk or other forms of interference on the link, the RXSTATUS signal
would normally indicate 8B/10B errors or disparity errors. In such a scenario, follow the debug flow as shown in Figure 5.
Reset Failures
During debug, make sure the reset signals are asserted and de-asserted correctly. Capture “Signal Set-3, Reset” signals
for further analysis. Users should make sure a device at either end of the link is not stuck in reset.
Whether the link is training to the correct link width or not can be checked by probing ‘pl_initial_link_width’ and
‘pl_sel_lnk_width’ signals in Chipscope/ Vivado ILA.
Multi-lane designs can introduce crosstalk and noise on the serial lanes. When having link training issues where the link is
training down to a lower link width, first try isolating the upper lanes and then force the link to attempt to train as an x1. For
add-in cards, this can be done by using any interposer or by placing scotch tape on the upper lane pins on the connector,
as shown in Figure 7 and Figure 8.
Figure 9 shows the debug flow chart for debugging the ‘Link Width Train Down’ issue. TS1 and TS2 analysis for checking
whether the link width train down is initiated by the endpoint or the host is discussed in the “CONFIGURATION State”
section.
Whether the link is training down to a lower speed e.g. Gen2 to Gen1, it can be checked by reading the configuration
registers or by probing pl_sel_lnk_rate. Probe pl_link_partner_gen2_supported (Figure 12) signal. If this signal indicates
the link partner does not support gen2, investigate the link partner configuration and make sure the host is capable of
gen2.
If pl_link_partner_gen2_supported is asserted but still the link is training down to a lower speed, capture “Signal Set-5,
GT RX” and “Signal Set-6, GT TX” signals and analyze TS1s and TS2s to investigate whether it is the endpoint or the link
partner that is not correctly following the required protocol to link train to Gen2 speed.
The main technique in debugging PCIe link training issue is try to figure out which LTSSM state the core is stuck at. After
this, analyze the ordered sets being exchanged in this particular LTSSM state and compare with the specification. This
will help narrow down whether it is the host or the endpoint that is not following the correct link training protocol.
In this section, a step-by-step method to narrow down probable causes in debugging ‘Link Speed Train Down’ issue is
presented. In the description provided below, first the statement in the protocol is presented and based on that what
signals to capture in Chipscope/ Vivado ILA and what to check for are discussed.
Protocol: Suppose a Link connects the two 5.0GT/s capable components, A and B. The Link comes up to L0 state in 2.5
GT/s speed. Component A decides to change the speed to 5.0 GT/s, sets the directed_speed_change variable to 1b and
enters Recovery.RcvrLock from L0. Component A sends TS1 Ordered Sets with speed_change bit set to 1b and
Debug Action: Trigger on Recovery.RcvrLock. Trigger at the middle of the buffer. Look at the gt_rx_data, there should be
8 TS1 coming in with speed_change bit set to '1'. Also check bit-2, to see if 5GT/s is supported or not. Look at the
gt_tx_data to see if the core is sending 8 TS1 with the speed_change bit set to '1'. This might not be the case immediately
after the trigger point. TS1 should have speed change bit set to '1' after 8 consecutive TS1 in gt_rx_data with
speed_change bit set to '1'. Also, check the corresponding bit-2, to find out the data rate it supports.
Protocol: Component B will enter Recovery.RcvrCfg from where it will enter Recovery.Speed.
Debug Action: Trigger on Recovery.RcvrCfg to check if the speed change process is in progress on or not. Also trigger
on Recovery.Speed.
Protocol: Component A will wait for eight consecutive TS1/TS2 with speed_change bit set from component B before
moving to Recovery.RcvrCfg and on to Recovery.Speed. Both component A and component B enter Recovery.Speed
and record 5.0 GT/s as the maximum speed they can operate with. The directed_speed_change variable will be reset to
0b when in Recovery.Speed. When they enter Recovery.RcvrLock from Recovery.Speed, they will operate in than 5.0
GT/s speed and send TS1s with speed_change set to 0b.
Protocol: If both sides work well at 5.0 GT/s, they will continue on to Recovery.RcvrCfg and enter L0 through
Recovery.Idle at 5.0 GT/s speed. However, if component B fails to achieve Symbol lock, it will timeout in
Recovery.RcvrLock and enters Recovery.Speed.
Debug Action: Trigger in Recovery.Speed. Check if it is going to this state directly from Recovery.RcvrLock state. If this
is the case, then Component B has failed to achieve Symbol lock.
For a multilane core, there will be the same number of rxelecidle signals as the number of lanes. Trigger on assertion of
each rxelecidle signal. If it does trigger on one of the lanes but rxelecidle signals for other lanes are still de-asserted, it is
an indication of potential issue with electrical idle detect threshold. Try different values of the RXOOB_CFG attribute as
described in “GTX/GTP Wrapper Settings” section. If the issue is still not resolved, follow the debug flow chart shown in
Figure 5.
• A component must enter the LTSSM Detect state within 20 ms of the end of the Fundamental reset.
• A system must guarantee that all components intended to be software visible at boot time are ready to receive
Configuration Requests within 100 ms of the end of Conventional Reset at the Root Complex.
These statements mean the FPGA must be configured within a certain finite time, and not meeting these requirements
could cause problems with link training and device recognition. When using JTAG to configure the device, configuration
typically occurs after the Chipset has enumerated each peripheral. After configuring the FPGA, a soft reset is required to
restart enumeration and configuration of the device. A soft reset on a Windows based PC is performed by going to Start -
> Shut Down and then selecting Restart.
To eliminate FPGA configuration as a root cause, the designer should perform a soft restart of the system. Performing a
soft reset on the system keeps power applied and forces re-enumeration of the device. If the device links up and is
recognized after a soft reset is performed, then FPGA configuration is most likely the issue. Most typical systems use ATX
power supplies which provide some margin on this 100 ms window, as the power supply is normally valid before the 100
ms window starts.
• Provide AC coupling between the oscillator output pins and the dedicated GTX/GTH transceiver Quad clock input
pins.
• Ensure that the differential voltage swing of the reference clock is the range as specified in DS182 (Kintex-7
FPGAs Data Sheet: DC and Switching Characteristics) and DS183 (Virtex-7 FPGAs Data Sheet: DC and
Switching Characteristics). The nominal range is 250 mV – 2000 mV, and the nominal value is 1200 mV.
• Meet or exceed the reference clock characteristics as specified in DS182 (Kintex-7 FPGAs Data Sheet: DC and
Switching Characteristics) and DS183 (Virtex-7 FPGAs Data Sheet: DC and Switching Characteristics).
• Meet or exceed the reference clock characteristics as specified in the standard for which the GTX/GTH
transceiver provides physical layer support.
• Fulfill the oscillator vendor's requirement regarding power supply, board layout, and noise specification.
• Provide a dedicated point-to-point connection between the oscillator and GTX/GTH transceiver Quad clock input
pins.
• Keep impedance discontinuities on the differential transmission lines to a minimum (impedance discontinuities
generate jitter).
The bit rate clock source for transmitter and receiver must be +/- 300 ppm or better. If Spread Spectrum Clocking is used,
both ports must use the same bit rate clock source.
There should be AC coupling between the clock source and the dedicated GTX transceiver Quad clock input pins.
Make sure the PCB Design Checklist provided in the 7 Series FPGAs GTX/GTH Transceivers (UG476) has been
followed. Table 2 is from (UG476 v1.9.1, April 23, 2013). Please visit the Xilinx website (7-Series documentation) for the
latest version of the UG476 where there may be more current guidelines on the PCB Design Checklist provided in Table
2.
The pl_ltssm_state[5:0] signal as shown in Figure 14 is one of the main signals when debugging link training issues.
Whenever there is a problem with the link, it is always advised to check this signal in ChipScope tool and see what
LTSSM state the core is in. In the normal working condition, this signal will show the value ‘16’ indicating L0 state.
The core goes into Recovery state to achieve bit lock and symbol lock. If the ChipScope capture shows frequent transition
into the recovery state, it normally indicates a noisy link.
Different states of the link training state machine (indicated by pl_ltssm_state) are shown in Figure 13.
Signals shown in Figure 15 should be captured to check whether the receiver was successfully detected or not.
PHYSTATUS: In PCI Express mode, this signal is used to communicate completion of several GTX transceiver functions,
including power management state transitions, rate change, and receiver detection. During receiver detection, this signal
is asserted High to indicate receiver detection completion.
RXSTATUS indicates the receiver status and error codes as shown in Figure 16.
GTRXRESET: This port is asserted high and then deasserted to start the full channel RX reset sequence.
RXRESETDONE: This port goes high when GT transceiver RX has finished reset and is ready for use.
TXRESETDONE: This port goes high when the GT transceiver TX has finished reset and is ready for use.
Signal Set-6, GT TX
Figure 22 - Reset and Link up signal from the core to the user application
The Physical Layer (PL) interface enables the user design to inspect the status of the Link and Link Partner and control
the Link State.
This section provides an analysis of signals described in previous section at different LTSSM states for debugging link
training issues. A number of waveform screenshots have been provided for each LTSSM state to illustrate the toggling of
corresponding signals. If you are capturing signals in ChipScope tool, compare your captures with the screenshots
provided below to make sure the signals in your design are toggling as expected.
DETECT State
Figure 25 shows a capture of signals related to Detect state during Detect.Active and Polling.Active states. On successful
receiver detection, the pipe wrapper should present ‘011’ on RXSTATUS when PHYSTATUS is asserted as shown in
Figure 26.
If the receiver detect is failing, then make sure the signals in “Signal Set-3, Rese”’ and “Signal Set-4, Clocking” as
shown in Figure 27 are correctly toggling.
POLLING State
When each link partner enters into POLLING, it begins transmitting TS1 ordered sets. However, each link partner might
not enter polling at the same time, so it is possible that the Xilinx endpoint might be transmitting TS1s on pipe_tx_data
while still receiving 00h on the pipe_rx_data pins. Hence, in ChipScope Pro/Vivado ILA tools, when TS1 appears
at pipe_tx_data, pipe_rx_data might still be 00.
To check whether TS1 transmission has started or not, trigger when ltssm_state enters POLLING. Figure 29 and Figure
30 show GT RX and TX interface signals when the endpoint device enters POLLING. As soon as the device comes out of
the electrical idle, the device starts to send TS1s. Note that the link and lane number are set to PAD value which
CONFIGURATION State
In CONFIGURATION, link numbers and lane numbers are negotiated. A downstream port proposes a link number to the
link partner. The upstream port accepts the link number and returns TS1 ordered sets with the link number value. Next,
the downstream port sends the lane numbers. If the upstream port agrees with the proposed lane numbers, it replies with
In CONFIGURATION, the N_FTS value is agreed. In the captures shown, the endpoint is sending FF in the N_FTS field in
TS1, indicating that the endpoint requires 255 FTS when exiting from L0s to L0 to achieve bit and symbol lock. On the
other hand, the RP also sends FF in its N_FTS field in TS1, indicating that it also requires o255 FTS to be transmitted by
the endpoint when exiting from L0s to L0.
Figure 32 – Both RP and EP sending PAD (F7) in link number and lane number fields
Figure 34 – RP sending link number and corresponding lane numbers on all four lanes, EP accepts link number
but still sending PAD (F7) in lane number field
Figure 36 – EP sending ‘00’ in link number field and corresponding lane numbers in lane number field.
After CONFIGURATION state, the next state is the normal working state, which is L0. The initial phase of link training
completes after user_lnk_up is asserted.
The wrappers that come with the generation of the core should be used as is, without any modification. If you have
changed some wrapper parameters during the debug or due to some other reasons, please verify you have the default
value for the parameters listed in Table 4.
Most of the signals listed in Table 5 have been discussed in previous sections. When capturing Idle Indicator and FSM
signals, capture other signals listed in the table as well to make it easier for analysis.
The PIPE Wrapper is in idle state when PIPE_RST_IDLE, PIPE_QRST_IDLE, and PIPE_RATE_IDLE are all HIGH. If
any idle status is LOW, add the following Wrapper FSM ports to ChipScope/Vivado ILA.
Table 6 - Wrapper FSM Ports
To reduce the effect of inter symbol interference, PCI express employs the concept of de-emphasis. Pre-emphasis and
De-emphasis are basically the same. If five consecutive bits are transmitted with the same polarity, the bits after the first
bit are de-emphasized compared to the first bit. In other words, the first bit is pre-emphasized compared to the rest of the
four following bits.
In 7 series FPGA transceivers, the tap weights are all programmable to meet different channel conditions. GTX/GTH/GTP
transceivers have 32 settings for post-tap de-emphasis (TXPOSTCURSOR), up to 12.96 dB, and 21 settings for pre-tap
de-emphasis (TXPRECURSOR), up to 6.02 dB. Both TXPOSTCURSOR and TXPRECURSOR attributes work on the
data transitions. To increase the signal strength (amplitude), change TXDIFFCTRL setting.
Figure 37 from WP419[4] shows the data stream without any de-emphasis. The symbols following the transition have a
peak-to-peak amplitude of ~0.28V.
Figure 40 and Figure 41 show the impact of applying 2 dB post-tap De-emphasis in the GTX Transceiver Eye Diagram.
Table 7, Table 8 and Table 9 are from UG476[2] and provide different values for TXDIFFCTRL, TXPOSTCURSOR, and
TXPRECURSOR, respectively.
RXOOB_CFG
During link training, if rxelecidle shows unexpected behavior, tune the RXOOB_CFG parameter in the GT wrapper. In the
generated wrapper, it is commented out. Uncomment this parameter setting and tune the parameter to suit your system.
In some boards, it has been observed that changing RXOOB_CFG from 7'b0000110 to 7'b0000010 fixed an incorrect
TX_RXDETECT_REF
The default value of TX_RXDETECT_REF parameter is 011. This value should work without any issue. In cases where
the link training is running into receive detect issues, test with different values (e.g. 010, 100). It is not recommended to
set a different value for this parameter other than the default value. If other values work and the default value does not,
please contact Xilinx Technical Support before using the non-default value in your design.
LPM/DFE
The PCIe wrapper uses LPM mode by default. DFE mode is recommended for medium- to long-reach applications, with
channel losses of 8 dB and above at the Nyquist frequency. A DFE has the advantage of equalizing a channel without
amplifying noise and crosstalk. In case of severe link training issues, try with DFE mode instead of LPM.
RXBUFSTATUS
Check RXBUFSTATUS[2:0] port from GTs to see if the buffer underflows (3’b101) or overflows (3’b110). During “normal”
operation, there should not be any underflows/overflows. If this is seen on RX GT’s, check if the link partner device is
sending the clock compensation sequences as it should be and if GTs are actually adjusting the RX Elastic Buffer pointers
to correct for bit rate differences. Check the RXCLKCORCNT[2:0] bus from the GTs to see if the GT has performed clock
correction. Also, check the RXDATA and RXCHARISK signals from the GT to see if there is clock compensation
sequence (SKP ordered set). If RXCLKCORCNT indicates the GT has performed clock correction, it is likely that SKP
ordered set will not be received on the RXDATA interface since the GT will have had to add or remove characters as part
of the correction.
Channel bonding is used by protocols to transmit data over multiple lanes. PCIe uses channel bonding over multiple
lanes, so there is a chance that due to variation in PCB trace lengths, or other factors when the data is received, it may no
longer be perfectly aligned. Channel bonding realigns the data by adjusting the RX BUFFER FIFO read pointers. On the
left in Figure 43, it shows the aligned data RRRR coming out of the transmitter and due to various system level electrical
effects (like tracelength, etc.), there can be a skew introduced when the data is captured in the receiver, so the original
RRRR data can be received as RSQR as shown in the middle section of Figure 43. The PCS section of the GTs adjusts
the read pointers in the RX Buffer FIFOs and realigns the data as RRRR.
To provide enough time for the slave to collect bytes for bonding sequence CHAN_BOND_MAX_SKEW attribute is used.
This attribute controls the number of USRCLK cycles that the master waits before ordering the slaves to execute channel
bonding. This attribute determines the maximum skew that can be handled by channel bonding. It must always be less
than one-half the minimum distance (in bytes or 10-bit codes) between channel bonding sequences. Valid values range
from 1 to 14. More information on this is available in UG476. Ideally, this parameter should not be changed from the
default value. However, based on the board and the interacting system, it might be required to make necessary tuning.
The maximum allowable distance between the channel bonding characters sets maximum skew
CHAN_BOND_x_MAX_SKEW attribute.
One channel bonding character is 10-bit (8B/10B) and one bit equals one UI (unit interval = 1/line rate). If you run at a
PCIe Gen1 line rate of 2.5 Gb/s (UI = 0.4 ns) and set the skew to 7 (default for CHAN_BOND_x_MAX_SKEW attribute),
the calculated skew will be:
So, if you are using PCIe over back plane or extender cards or any other system where there is a possibility of large skew,
you may need to adjust CHAN_BOND_x_MAX_SKEW accordingly.
CLK_COR_MIN_LAT is another parameter you could tune if you run into channel bonding issue indicated by the de-
assertion of RXCHANISALIGNED signal. When using channel bonding, you add the additional requirement that the buffer
needs to have head room to see the skewed channel bonding sequences. By increasing CLK_COR_MIN_LAT, you buy a
little more room to allow for greater skew between lanes. If the link is heavily skewed, increasing the value of
CLK_COR_MIN_LAT might help.
o Check GT PLL lock signal (“Signal Set-4, Clocking”). It indicates if the GT is locked to the reference
clock.
o Use a scope to measure the reference clock frequency and jitter.
o Make sure the reference clock is within the phase noise limits as discussed in (Xilinx Answer 44549).
PCI Express Card Electromechanical Specification, Rev2.0 Section 4.7, describes requirements
for Eye Diagrams at the add-in Card Interface that must be met for both the add-in card and a
system board interfacing with such an add-in card.
For Eye capture, solder down diff. probes at the receive VIA. A high sampling scope must be
used. Capture the eye and see the quality. Apply PCIe mask to see if it meets PCIe specification
requirements. Users should make sure jitter characteristics are met as provided in Table 10,
taken from DS182.
Table 10 - GTX Transceiver PCI Express Jitter Characteristics (DS182)
• Check RXSTATUS (see Section – “Signal Set-5, GT RX”) to see if it reports any error.
• It is important to check the Power supply. Check if the the correct voltage has been applied or not as shown in
Table 11. Measure the power voltage to make sure there are no periodical spikes of noise that cause intermittent
bit errors.
o GTs need dedicated power supplies and should not be shared with other digital supplies.
Below is a list of scenarios where in-system eye scans provide valuable debug information:
• A link analyzer detects replay packets to the FPGA. This typically means the FPGA NAK’d a packet which can
mean there was an LCRC error due to a bit flip.
• A marginal link going in and out of recovery under different environmental conditions
• A production system where only a few boards exhibit link failures.
• A system down-trains in speed or lane width occasionally
Below is a list where eye scan data will not provide helpful debug information:
Implementing an eye scan on a PCI express link is very simple with the example design provided in Xilinx Answer Record
56648. This example uses a MicroBlaze processor to control the accesses to the DRP interface of the transceiver.
MicroBlaze processor also manages the eye scan data by storing the data to Block RAM. Once the Block RAM fills up,
XMD reads the data from Block RAM and stores it locally on the PC.
Download the example designs from Xilinx Answer Record 56648 for the appropriate transceiver family of interest. For
example, the KC705 example lends well for any GTX transceiver. Likewise, the VC709 lends well for the GTH
transceiver. After downloading the example, the example will build a bitstream by sourcing the Tcl script in the
‘pcie_eyescan/proj’ directory.
Sourcing the Tcl script will generate a bit file for the evaluation board and it is ready to be programmed to the board. After
the board is programmed, an XMD connection is required to extract the data from the FPGA. Connecting via XMD will
require an XMD console. To get an XMD window in Linux, source the Vivado or ISE tools and type ‘xmd’ into the console.
In Windows, click on XMD as shown in Figure 44.
• connect mb mdm
• source get_eyescan_data.tcl
• run_test
This will begin the eye scan and you will see the XMD console actively scrolling by as it is extracting the data.
After the scans are completed, the eye scan data will be stored in the ‘tcl’ directory. The data is stored in the CSV files
and they are labeled:
To view the scan data open a new Vivado session. In the Tcl console of Vivado, change directories to the ‘tcl’ directory.
Then source the load_vivado_scans.tcl file. This will show the eye scan data as shown in Figure 45.
This 2D eye scan feature can be used through IBERT tool as well. As discussed in the previous section, Xilinx 7 series
Transceivers have an inbuilt piece of hardware in them which is useful for RX margin analysis. This piece of hardware can
be used to see the post equalization statistical eye (an external oscilloscope shows the eye before equalization on the
transceiver pins) and can operate with any type of traffic without any pre-known pattern as it operates by comparison of
the offset sample with the center sample, and counts the number of times it disagrees as an error. More information on
hardware architecture and the process can be found in (UG476) in RX margin analysis section.
Figure 46 - Offset Sample and Data Sample to Calculate BER as a Function of Offset - Statistical Eye
Xilinx transceivers support the Far End PMA loopback mode which works by accepting the data from the link partner
transceivers RX port and then putting it back to the TX port of the Xilinx transceiver
Figure 47 - Far End PMA Loopback of the data from the PCIe Link Partner
This loopback control can be done with the 3 bit loopback control port available on the transceivers. Xilinx IBERT
(Integrated Bit Error Rate Tester) is a standalone design available from Xilinx core generator which can be used to control
all the parameters of the Xilinx transceiver and can do the 2D eye scan for link debugging.
Figure 48 shows IBERT in the IP catalogue under ‘Debug and Verification’ tab.
The first screen of the IBERT IP configurator which allows selection of the naming style and the external clock source is
shown in Figure 49. External clock source is optional and instead you can use the transceiver reference clock as the
system clock used for running the logic in the standalone design.
Xilinx Endpoint link partner vendors may have PCIe debug features built in it which could be useful in debugging PCIe link
training issues. One such example of a link partner is a PCIe PLX chip.
PLX chips have loopback and PRBS counter features built into their transceivers. They also have PLX visionpak debug
software, similar to IBERT, which is used for transceiver eye capture without use of an external scope.
The loopback feature can be used by enabling the PRBS counters in the Xilinx transceiver and doing an external TX
loopback in the PLX chip. The pattern will be sent back to the Xilinx endpoint transceiver RX pattern checker. This could
be used to tune the link parameters as described in section – “GTX/GTP Wrapper Settings”.
Similarly, the Xilinx Transceiver PMA Far End loopback can be used for testing with PRBS counters and checkers built in
SerDes of the PLX transceivers and accessed through register read and write on PLX chips.
While debugging link training issues, users should explore debug capabilities available in the link partner device and how
they can be used in conjunction with debug capabilities available in Xilinx transceivers for quicker debug of system level
and link training issues.
2. When the link frequently goes into Recovery state, it is another indication of a poor link. This could be checked by
looking at LTSSM graph in a link analyzer or by doing multiple triggers in ChipScope tool on entry into the
Recovery state.
3. If the link analyzer shows numerous NAKs on the link, this is also an indication of a bad link and could affect the
bandwidth of the system. NAKs are generated due to reasons such as bad CRC, bad sequence number, and et
cetera.
Figure 55 - PCI Express Base Specification, v2.1 - Correctable Error Status Register
Figure 59 - RP in Polling.Active
Figure 61 - EP in Polling.Active
5. EP achieves bit/symbol lock moves to Polling.Config and starts transmitting TS2s, as shown in Figure 62.
Figure 62 - EP in Polling.Config
7. EP in Polling.Config does not receive TS2s and times out to Detect after 48 ms, as shown in Figure 64.
The cause of this issue is explained in the Intel Errata shown in Figure 67. The issue that was seen on Virtex-6 board is
because the EP enters Polling.Compliance as mentioned in the errata, which in turn causes misalignment between the
Polling.Active states of EP/RP, causing bit/symbol lock issues ending up in timeouts that result in the ~60 ms link training.
Case Study - 2 - Multiple resets result in link training down to Gen1 from Gen2
This was an issue in a particular system where after multiple resets the link was training to Gen1 from Gen2. This was
seen in ChipScope capture by triggering ChipScope on the falling edge of pl_sel_lnk_rate when pl_link_gen2_cap was
asserted, as shown in Figure 68.
Points to note
Most of the things that need to be taken care of in a design and on the board have been discussed in the previous
sections. Below are few more points that a designer must check to ascertain proper working of the link.
• An AC coupling capacitor given by CTX = 75 nF to 200 nF (per Differential Transmitter (TX) Output
Specifications) must be used on the Transmitter side of each lane of a link.
• REFCLK must meet the electrical specifications listed in REFCLCK DC Specification and AC Timing
Requirements mentioned in the PCI Express Card Electromechanical Specification.
• REFCLK must meet the jitter specifications listed in Maximum Allowed Phase Jitter When Applied to Fixed Filter
Characteristic mentioned in the PCI Express Card Electromechanical Specification.
• A PCI Express add-in card must incorporate AC coupling capacitors on the Transmitter differential pair. The
value must comply to the value in the PCI Express Base Specification.
• Add-in cards must meet the Add-in Card Transmitter Path Compliance Eye Requirements specified in Add-in
Card Transmitter Path Compliance Eye Requirements of the PCI Express Card Electromechanical Specification,
measured when all lanes are active.
• The PCB differential trace impedance for 5.0 GT/s capable add-in cards and motherboards must be between 68
and 105 ohms.
This section does not cover detailed steps for debugging link training issues using Lecroy Protocol Analyzer. However,
few major features that help in debugging link training with a Lecroy analyzer are listed below to illustrate to readers the
advantage of having a protocol analyzer for debugging link training issues. A comprehensive detail on how to use the link
capture software and how to setup triggers (and etc.) can be found in the Lecroy documentation.
2. Traffic summary, shown in Figure 72, provides a summary of different packet types (e.g. TLP, Physical Ordered
Sets such as TS1s/ TS2s, etc.) on the link. If there is no TS1/TS2 reported in the summary, it would indicate a
major issue with the link (i.e., due to signal integrity issue). If these ordered sets are not properly captured by the
analyzer, it is not expected that the core would be able to recognize these ordered sets.
Figure 74 shows Recovery sub-states State diagram. From L0 it goes to Rcvry.Rcvr.Lock -> Rcvry.RcvrCfg ->
Rcvry.Speed-> Rcvry.RcvrLock -> Rcvry.RcvrCfg -> Rcvry.Idle -> L0. This is exactly the flow that should be followed, as
defined in the PCI Express Base Specification, for link training to Gen2 speed. This is also illustrated in Figure 11.
Figure 75 shows a packet on the link when LTSSM goes into Rcvry.RcvrLock substate. The tool jumps into specific
packet on the link when clicking a substate either in the LTSSM state diagram shown in Figure 75 or by clicking on the
rectangular sub-state box in the vertical state flow diagram shown in the same figure.
Figure 77 shows Root Complex advertising Link Number-0 on all four lanes whereas the lane number on all lanes is set to
PAD. This is in Cfg.LW.Start configuration sub-state. After some time, the endpoint also goes into CfgLW.Start
configuration substate and starts to advertise the same link number on all of its four lanes.
The next states are Cfg.LW.Accept and Cfg.LN.Wait/Accept states. In these states, both Root Complex and the Endpoint
start to send link numbers and lane numbers on respective lanes, as shown in Figure 79. In the case of successful link
negotiation (without down training to lower lane width), both sides should be sending the same numbers on both link
number and lane number fields. When the link down trains to lower lane width, either one or both the partners would be
advertising link and lane number on lane-0 only if it down trains to x1 lane; other lanes would have PAD in the link and
lane number fields in TS1. If the Root Complex is advertising the link and lane numbers on all four lanes but the endpoint
replies with link and lane numbers on only lane-0, this could be an indication of an issue at the receive side, which causes
the endpoint to not be able to understand the incoming TS1s on upper lanes. If it was vice versa, it could be an indication
of an issue at the transmit side, causing the data to be garbled on the upper lanes on the link, and hence the Root
Complex would not be able to understand the incoming TS1s on upper lanes.
• http://www.xilinx.com/support/answers/40469.html
o Release notes for versions of the 7-Series Integrated Block for PCI Express which were released in ISE
Design Suite and Vivado Design tool (prior to 2013.1)
• http://www.xilinx.com/support/answers/54643.html
Appendix
Capturing Signals in Chipscope Pro
To capture signals in ChipScope Pro, a user may use either ChipScope Pro Inserter flow or ChipScope Pro CORE
Generator flow. In the Inserter flow, the user would enter the .ngc file into the tool and the tool then automatically lists the
signals for the user to select and capture in Chipscope Pro. In the CORE Generator flow, the user must generate the
ChipScope Pro cores in CORE Generator and instantiate them manually in the source file. ChipScope Pro Inserter flow is
easier, but the required signals might not be visible. However, in the CORE Generator flow, a user can select to capture
any signals in the source file. In this section, ChipScope Pro Inserter flow is discussed.
In some cases, the signals are optimized away during synthesis and hence the signals cannot be found in the ChipScope
Pro inserter. In such cases, use the KEEP attribute to stop XST from optimizing a particular signal.
In VHDL, declare the KEEP attribute in the file architecture, before the “begin” keyword:
After KEEP and the signal have been declared, specify the VHDL constraint as follows:
(* KEEP = "{TRUE}" *)
wire signal_name;
Below are the steps to capture signals with ChipScope Pro inserter flow.
1. After generating the core in CORE Generator, modify the xilinx_pcie_2_1_ep_7x.xst script in the ‘implement’
directory to set ‘KEEP_HIERARCHY’ to yes, if it has not already done so.
run
-p xc7k325t-ffg676-2
-ifn xilinx_pcie_2_1_ep_7x.prj
3. Once the synthesis is complete, the .ngc file called ‘xilinx_pcie_2_1_ep_7x.ngc’ is generated in the ‘results’
directory inside the ‘implement’ directory.
Figure 83: Chipscope Pro Inserter - Data Width and Data Depth Selection
• Double click on any of the ports shown in red below:
• Click on the appropriate section of the structure hierarchy to select the signals (Figure 85).
• Re-implement the design by running implement.bat or implement.sh. Make sure the section of the script
with commands to synthesize has been removed. If not, the synthesis will run again and replace the .ngc file that
contains the ChipScope Pro core. The implementation script should only contain following:
•
cd results
echo 'Running ngdbuild'
ngdbuild -verbose -uc ../../example_design/xilinx_pcie_2_1_ep_7x_01_lane_gen1_xc7k325t-
ffg676-2-PCIE_X0Y0.ucf xilinx_pcie_2_1_ep_7x.ngc -sd .
# Uncomment to enable Bitgen. To generate a bitfile, all I/O must be LOC'd to pin.
# Refer to AR 41615 for more information
#echo 'Running design through bitgen'
#bitgen -w routed.ncd
This section describes usage of the storage qualification and sequencer feature of ChipScope for PCIe Debug. This
feature is useful in capturing only the LTSSM transitions in ChipScope Pro. During link training, sometimes a Gen2x8 link
might come up as Gen1 speed. It would be helpful in debugging to find out whether the link initially trained as Gen1, or it
trained to Gen2 first and then to Gen1.
1. While generating the chipscope_ila core, on Page 1 - select Enable Storage Qualification under storage settings.
Note: pipe_tx_rate_gt[1:0] is not in user_clk domain but is being used for example purposes.
4. After all match cases are set up, the window looks as the one in figure below
6. In the capture section, click on All Data under storage qualification, select AND Equation and enable M0. This
will ensure that data will be stored only when LTSSM changes states.
Note: Select a small depth (ex: 64 in this case) as the buffer will not fill up with large sizes. Adjust the position to ensure
that buffer fills up.
8. The captured waveform shows all cfg_ltssm_state transitions before training to Gen3 on power up (using storage
depth of 64)
Figure 88 – PCIe Example Design Vivado Project GUI after opening the Synthesized Design
The details on how to debug a design using ChipScope in Vivado Design Suite are provided in UG936. This section
illustrates how to grab signals for debugging in the PCIe example design. For more information, please refer to UG936.
Vivado tools allow selecting signals for debugging, same as in ChipScope inserter. There is an additional feature where
you could search for specific nets, using wild cards, in the whole design. This is shown in Figure 89. To start grabbing
signals for ChipScope, you should first open the synthesized design as shown in Figure 90.
The ILA core(s) that you add to your design appear in the Hardware window under the target device as shown in Figure
102. If you do not see the ILA core(s) appear, right click on the device and select Refresh Hardware. This re-scans the
FPGA device and refreshes the Hardware window.
Use the Trigger Cond control in the Hardware window (or the Trigger Condition property in the ILA Core Properties
window) to select between “AND” and “OR” settings. The “AND” setting causes a trigger event when all of the ILA probe
comparisons are satisfied. The “OR” setting causes a trigger event when any of the ILA probe comparisons are satisfied.
You can also use the set_property Tcl command to change the ILA core trigger condition:
set_property CONTROL.TRIGGER_CONDITION AND [get_hw_ilas hw_ila_1]
The ILA probe file is automatically associated with the FPGA hardware device if the probes file is called
debug_nets.ltx and is found in the same directory as the bitstream programming (.bit) file that is associated with
the device.
You can run or arm the ILA core trigger in two different modes:
• Run Trigger Immediate: Selecting the ILA core to be armed, followed by clicking the Run Trigger Immediate button on
the Hardware window toolbar arms the ILA core to trigger immediately regardless of the settings of the ILA core trigger
condition and probe compare values. This command is useful for capturing any values that present at the probe inputs of
the ILA core.
You can also arm the trigger by selecting and right clicking the ILA core and selecting Run Trigger or Run Trigger
Immediate from the popup menu as shown in Figure 102.
You can stop the ILA core trigger by selecting the appropriate ILA core, followed by clicking the Stop Trigger button on
the Hardware window toolbar. You can also stop the trigger by selecting and right clicking the appropriate ILA core and
selecting Stop Trigger from the popup menu.
Viewing Captured Data from the ILA Core in the Waveform Viewer
Once the ILA core captured data has been uploaded to the Vivado Integrated Design Environment, it is displayed in the
Waveform Viewer. See Viewing ILA Probe Data Using Waveform Viewer (ug908) for details on using the Waveform
Viewer to view captured data from the ILA core.
In addition to displaying the captured data that is directly uploaded from the ILA core, you can also write the captured data
to a file then read the data from a file and display it in the waveform viewer.
Currently, the only way to upload captured data from an ILA core and save it to a file is to
use the following Tcl command:
Currently, the only way to restore captured data from a file and display it in the waveform viewer is to use the following Tcl
command:
Special Symbols are distinct from the Data Symbols. This is part of 8b/10b encoding scheme that is used to represent
control characters. These symbols are not scrambled, and hence readable on RX/TX GT interface. These Special
Symbols are used for various Link Management purposes.
References
1. DS182, Kintex-7 FPGAs Data Sheet: DC and AC Switching Characteristics
2. UG476, 7 Series FPGAs GTX/GTH Transceivers User Guide
3. PG054, 7-Series Integrated PCI Express Block Product Guide
4. WP419, Equalization for High-Speed Serial Interfaces in Xilinx 7-Series FPGA Transceivers
5. UG936, Vivado Design Suite Tutorial, Programming and Debugging
Revision History
07/29/2013 - Initial release