Xilinx Answer 42368 Debugging Guide
Xilinx Answer 42368 Debugging Guide
Important Note: This downloadable PDF of an Answer Record is provided to enhance its usability and
readability. It is important to note that Answer Records are Web-based content that are frequently updated as new
information becomes available. You are reminded to visit the Xilinx Technical Support Website and review (Xilinx Answer
42368) for the latest version of this Answer.
Introduction
This document describes techniques to debug link training issues related to designs using the Virtex-5 FPGA Endpoint
Block Plus core for PCI Express. A complete list of signals to capture in ChipScope Pro when debugging link training
issues has been provided. ChipScope Pro screen captures illustrate how to analyze those signals and establish theories
on potential reasons causing the problem.
There are two major sections in this guide. The first section provides an overview of link training including the Link
Training and Status State Machine (LTSSM) states and TS1 and TS2 ordered sets. The second section focuses on using
ChipScope Pro to capture the relevant signals on the GTP/GTX interface to identify potential problems during link training.
This guide helps the user understand how the LTSSM progresses and what states the signals should be in during this
progression. This debugging guide concludes with a checklist of common problems to address when having link training
issues.
There are usually three major link training failures. One is a complete failure to establish a link of any width; indicated by
the core output trn_lnk_up_n not asserting. The second is when the link trains to a lower width than intended, such as an
x8 link training as x4. Third, is a link that is constantly entering into the RECOVERY state. Link training problems are
normally due to board signal integrity problems or improper GTP/GTX usage. The board must meet both the electrical
requirements set forth by the GTP/GTX user guides and also the PCI Express Base Specification.
After FPGA configuration, the two connected devices go through the link training process. This Link Training and Status
State Machine (LTSSM) defines this process. Figure 1 shows the different states of the LTSSM. The main states to
consider while debugging link training issues are DETECT, POLLING, CONFIGURATION, and L0. Detailed descriptions
of the LTSSM states are found in Chapter 4 of the PCI Express Base Specification.
In the DETECT state, each lane performs receiver detect to determine if a link partner is present on that lane. Lanes that
do not detect a link partner, are not used and the FPGA drives electrical idle on these lanes.
The second stated entered during link training is the POLLING state. This is the first state that the link partners exchange
TS1 and TS2 ordered sets. During this state bit symbol lock and lane polarity are established.
The CONFIGURATION state follows POLLING. During CONFIGURATION, link and lane numbers are exchanged through
the training ordered sets and the link width is established.
Once CONFIGURATION completes, the next state is L0. The L0 state is the normal working state where data is
transferred on the link. The core output signal trn_lnk_up_n is asserted during this state. Note that trn_lnk_up_n, does not
assert immediately upon entering L0, but asserts after the data link layer achieves the DL.ACTIVE state meaning the
initial flow control credits have been exchanged.
Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 1
Figure 1: Link Training and Status State Machine (LTSSM)
During the link training process, following are discovered and determined:
Lane Polarity
Link Data Rate
Link and Lane Numbers
Link Width
Lane Reversal
Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 2
Ordered Sets
During the link training process the physical layer communicates by exchanging TS1 and TS2 ordered sets. Ordered sets
are packets that originate and terminate in the physical layer.
There are four different types of ordered sets. Ordered sets are not scrambled so they are easily viewed using ChipScope
Pro or in simulation. The four different types of ordered sets are training sequence ordered sets (TS1s and TS2s),
Electrical Idle ordered set, SKP ordered sets, and FTS ordered sets. Link training uses TS1 and TS2 ordered
sets to exchange information to establish the link. Occasionally, a SKP ordered set is transmitted during link training so it
is necessary to distinguish the difference.
Table 1 shows the description for each symbol in TS1 ordered set. TS1s and TS2s are the same except for the symbols
6-15 which denotes TS2 identifier for a TS2 ordered set.
Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 3
Table 1: TS1 ordered set
Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 4
Capturing Signals in the ChipScope Pro tool
To capture signals in the ChipScope tool, a user can use either the ChipScope Pro Inserter flow or the ChipScope Pro
CORE Generator flow. In the Inserter flow, the user would enter NGC file into the tool and the tool automatically lists the
signals for the user to select and capture in the ChipScope Pro debugging tool. In the CORE Generator tool flow, the user
must generate the ChipScope Pro cores in the CORE Generator software and instantiate it manually in the source file.
The ChipScope Pro Inserter flow is easier, but the required signals might not be visible. However, in the CORE Generator
tool flow, a user can select to capture any signals in the source file. In this section, the ChipScope Pro Inserter flow is
discussed.
Since the Block Plus wrapper for PCI Express source files are provided after v1.12 of the core, users might find it more
flexible to capture the signals with the ChipScope Pro CORE Generator flow. For more details on the ChipScope Pro
Inserter flow and the ChipScope Pro CORE Generator flow, see the ChipScope Pro User Guide (UG029).
In some cases, the signals are optimized away during synthesis, hence the signals cannot be found in the ChipScope Pro
Inserter. In this case, use the KEEP attribute to stop XST from optimizing a particular signal.
In VHDL, declare the KEEP attribute in the file architecture, before the begin keyword:
After KEEP and the signal have been declared, specify the VHDL constraint as follows:
(* KEEP = "{TRUE}" *)
wire signal_name;
Below are the steps to capture signals with the ChipScope Pro Inserter flow.
1. After generating the core in the CORE Generator tool, modify the xst.scr script in the /implement directory to set
KEEP_HIERARCHY to true.
run
-ifn xilinx_pci_exp_1_lane_ep_inc.xst
-ifmt Mixed
-p xc5vlx50t-ff1136-2
-bufg 0
-top xilinx_pci_exp_ep
-ofn xilinx_pci_exp_ep.ngc
-opt_mode SPEED
-opt_level 2
-ofmt NGC
-uc endpoint_blk_plus_v1_13.xcf
-keep_hierarchy YES
2. Run implement.bat [implement.sh] depending on the operating system you are using.
3. Once the synthesis is complete, the NGC file called xilinx_pci_exp_ep.ngc is generated in the /results directory inside
of the /implement directory.
Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 5
4. Open the ChipScope Pro Core Inserter tool. Specify the location of the input design netlist. If you are using the same
name for the output design netlist and the output directory you specify is where the original input design netlist is located,
the ChipScope Pro Core Inserter will replace the input design netlist with the output design netlist. If you either rename the
output design netlist or specify a different output directory, make sure you replace the input design netlist with the
generated output design netlist.
Figure 2: ChipScope Pro Core Inserter - Device and Design Netlist Entry
Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 6
5. Select USER1 in the Boundary Scan Chain.
Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 7
7. Select the Data Width and the Data Depth as required:
Figure 5: ChipScope Pro Core Inserter - Data Width and Data Depth Selection
Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 8
9. Click on the appropriate section of the structure hierarchy to select the signals.
Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 9
11. After the trigger, data, and the clock signals have been selected click OK and then click Insert.
13. Re-implement the design by running implement.bat or implement.sh. Make sure a section of the script with
commands to synthesize has been removed. If not, the synthesis will run again and replace the NGC file that
contains the ChipScope Pro core. The implementation script should only contain the following:
cd results
echo 'Running ngdbuild'
ngdbuild -verbose -uc ..\..\example_design\xilinx_pci_exp_blk_plus_1_lane_ep_xc5vlx50t-
ff1136-1.ucf xilinx_pci_exp_ep.ngc -sd ..\..\..\
echo 'Running map'
map -timing -ol high -xe c -pr b -o mapped.ncd xilinx_pci_exp_ep.ngd mapped.pcf
echo 'Running par'
par -ol high -xe c -w mapped.ncd routed.ncd mapped.pcf
echo 'Running trce'
trce -u -v 100 routed.ncd mapped.pcf
echo 'Running design through netgen'
netgen -sim -ofmt vhdl -w -tm xilinx_pci_exp_ep routed.ncd
echo 'Running design through bitgen'
bitgen -w routed.ncd
Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 10
Debug Signals for Link Training Issues
This section provides a list of debug signals that can be captured in the ChipScope Pro tool to diagnose where the
problem is coming from. All the signals are related to the interface between the integrated block for PCI Express and the
GTP/GTX interface.
Table 2 shows the transceiver interface transmit side debug signals. All these signals exist for each lane. pipe_tx_data
and pipe_tx_data_k are required to analyse the TS1 and TS2 ordered sets transmitted by the core POLLING and
CONFIGURATION.
Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 11
Transceiver Interface - Receive Side
Table 3 shows the transceiver Interface receive side debug signals. These signals are required to analyze the TS1 and
TS2 ordered coming downstream from the link partner to the endpoint during POLLING and CONFIGURATION. For
further details on signals listed in Table 3, refer to the Virtex-5 FPGA Integrated Endpoint Block for PCI Express Designs
User Guide (UG197).
Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 12
LTSSM States
The signal below indicates which LTSSM state the core is currently at. The core goes to Recovery state to achieve bit lock
and symbol lock. If the ChipScope Pro tool capture shows frequent transition to the Recovery state, it normally indicates a
noisy link.
l0_ltssm_state<3:0>
The states of the link training state machine (l0_ltssm_state) are encoded as shown in Table 4:
The signals below verify whether the different parts of the physical layer of the core are functioning correctly or not. For
example, sync_done is an output of the synchronization module in GTP wrapper. In few cases, it has been observed that
the core was not linking up due to this signal being not asserted. The investigation later found that the design had the
fast_train_simulation_only signal in the design set to 1. The fast_train_simulation_only signal should be
set only in simulation. Make sure the state of the signals below agree with Figure 10. If you see any descrepancies, create
a WebCase with Xilinx Technical Support and attach the VCD waveform from ChipScope Pro.
Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 13
LTSSM State Analysis
The first step in debugging a link training problem is to determine in which of the LTSSM states the problem is occurring.
Remember, the LTSSM states to look at during link training are DETECT, POLLING, and CONFIGURATION. Once
CONFIGURATION completes the LTSSM moves into the normal operation state of L0 and trn_lnk_up_n is asserted
once the data link layer reaches DL.ACTIVE. Use the signal pcie_blk l0_ltssm_state<3:0> to determine which
state the LTSSM is in and possibly as a trigger to pinpoint potential problems.
Detect State Signal Analysis
Trigger the ChipScope Pro tool when the l0_ltssm_state goes to DETECT. The relevant signals and how they are
toggled while entering the DETECT state is shown in Figure 10.
Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 14
During the DETECT state, the receiver detection takes place on each lane. If the detection process is done correctly, the
following sequence should be observed in the ChipScope Pro tool. Trigger on pipe_tx_detect_rx_loopback_l0.
After the receiver is detected, GTP asserts pipe_rx_phy_status and puts 011 on pipe_rx_status to
indicate the receiver is present.
The ChipScope Pro tool capture of the above sequence is show in Figure 11. If you see any descrepancies, create a
WebCase with Xilinx Technical Support and attach the VCD waveform from the ChipScope Pro tool.
Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 15
Figure 12 shows a zoomed in view of Figure 11.
Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 16
Polling State Signal Analysis
When each link partner enters into POLLING, it begins transmitting TS1 ordered sets. However, each link partner might
not enter polling at the same time, so it is possible that the Xilinx endpoint might be transmitting TS1s on pipe_tx_data
while still receiving 00h on the pipe_rx_data pins. Hence, in the ChipScope Pro tool, when TS1 appears at
pipe_tx_data, pipe_rx_data might still be 00.
To check whether TS1 transmission has started or not, trigger when ltssm_state enters POLLING. The screen shots
below show the ChipScope Pro tool capture of the signals when the endpoint device enters POLLING. As seen in the
image, as soon as the device comes out of the electrical idle (indicated by de-assertion of pipe_tx_elec_idle), the
device starts to send TS1s. Note that the link and lane number are set to PAD value which is F7. TS1 ends with 4A
whereas TS2 ends with 45. According to the PCI Express Base Specification v1.1, both devices should send a minimum
of 1024 TS1, which amounts to 64 µs, to achieve bit and symbol lock. If there is sufficient buffer space, it would be
possible to capture and verify whether 1024 TS1s are transmitted and received or not.
Figure 13 shows the zoomed out view when ltssm_state enters POLLING.
Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 17
Figure 13: Debug Signals state during entry to POLLING
Figure 14 shows the zoomed in view when the l0_ltssm_state transitions from DETECT to POLLING.
Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 18
Figure 14: Debug Signals state during transition from DETECT to POLLING
Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 19
After receiving eight consecutive TS2 ordered sets and transmitting 16 TS2 ordered sets (after receiving one TS2 ordered
set), the device exits to the configuration state. Devices at both ends of the link do not exit to CONFIGURATION at the
same time. In the ChipScope Pro tool capture below, it shows the endpoint entering the configuration state after
exchanging the required number of TS2s.
During POLLING, a device will exit to the DETECT if it receives TS1/TS2s with the link and the lane number fields set to a
value other than PAD. This indicates a bad link and the issues related with signal integrity should be investigated.
Polarity inversion occurs in the POLLING state. If the core sees the complement of the TS1/TS2 (B5/BA) ordered sets, it
®
has to invert the polarity of its differential input pair terminals. The PCIe Integrated Block Core asserts
pipe_rx_polarity signal for the corresponding lane where the polarity is reversed to tell the GTP/GTX to invert the
receive polarity for that lane. If the link training issues such as x8 core is training down to x4 core is seen on a board with
polarity inversed on some of the lanes, trigger on the pipe_rx_polarity signal for that lane and check if it is triggered
or not. If it does not trigger, create a WebCase with Xilinx Technical Support.
Polarity inversion is a mandatory feature described by PCI Express Base Specification v1.1. All PCI express compliant
cores must support polarity inversion on all lanes independently.
Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 20
Figure 15: Entry to CONFIGURATION after exchanging required number of TS2s
Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 21
Figure 16 is a zoomed in view of Figure 15. In the capture, BC-1C-1C-1C is also seen at pipe_tx_data. This is the SKP
ordered set transmitted by the GTP as part of the clock correction sequence. SKP ordered sets are used to compensate
for differences in frequencies between bit rates at two ends of a link. SKP ordered set is transmitted periodically. For more
information on SKP ordered set, see 4.2.7 of the PCI Express Base Specification (v1.1).
If the link trains down (for example, from x8 to x4), rx_data and tx_data at the GTP interface should be captured in
ChipScope Pro to figure out how the change in the link and lane number field in the ordered set are occurring. If the
endpoint device is sending lane numbers in all 8 lanes but link partner is replying with the lane numbers for only the first
four lanes and the rest still with PAD value, it would potentially indicate some signal integrity issue on the link. The value
that endpoint sent in the link in those lanes are probably not understood by the link partner due to the signal integrity
issue.
Figure 17 shows the root complex sending TS1s with link number assigned to 00. The endpoint agrees with this and
starts transmitting TS1s on tx_data with link number 00 in the link number field.
Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 22
Figure 17: Root Complex sending TS1s with Link Number assigned to ‘00’
After the link number has been negotiated, the root complex then starts to send TS1 with lane numbers on lane number
field of TS1. Figure 18 shows the same.
Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 23
In response to transmission of TS1 with lane numbers from the link partner, the endpoint starts sending the same
corresponding lane numbers on each lane, thus agreeing with the lane numbers to communicate with. This is shown in
Figure 19.
Figure 19: Lane Number negotiation between the endpoint and the root complex
In CONFIGURATION, the N_FTS value is agreed. In the ChipScope Pro capture above, the endpoint is sending FF in the
N_FTS field in TS1 indicating that the endpoint requires 255 FTS when exiting from L0s to L0 to achieve bit and symbol
lock. On the other hand, the root complex sends 32 in its N_FTS field in TS1, indicating that it requires only 32 FTS to be
transmitted by the endpoint when exiting from L0s to L0.
In the configuration state, lane-to-lane deskew must occur. This is indicated by the assertion of gt_deskew_lanes as
shown in Figure 20. A user can trigger at this signal to check whether the lane-to-lane deskew has been accomplished or
not.
Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 24
Figure 20: Lane-to-Lane Deskew Completion
PCIe Clocking
According to PCIe Base Specification v1.1, the reference clock should not exceed +/- 300ppm tolerance. In the case of
asynchronous clocking system, the following must be observed:
The ports at the two ends of the link transmit data at a rate that is within 600 ppm of each other at all times.
Spread Spectrum Clocking (SSC) is turned off.
If the link is not training (i.e., trn_lnk_up_n is never asserted), verify that the clock source meets the PCIe Base
Specification requirement. If the clock meets the specification requirement, probe the PLLLKDET signal. This port
indicates that the VCO rate is within acceptable tolerances of the desired rate. In other words, the assertion of this signal
indicates that the internal PLL is successfully locking on to the incoming reference clock. If PLLLKDET is not asserted,
neither of the two GTP transceivers in the GTP_DUAL tile operates reliably. If this signal is de-asserted, check (Xilinx
Answer 18329) to make sure the correct clocking infrastructure has been adopted.
In the scenario where the clocking infrastructure is correct and PLLKDET remains continuously de-asserted, check GTP
Transceiver User Guide (UG196) under the GTP-to-Board Interface chapter. The following points present the excerpts of
the detailed description provided in the user guide:
1. Verify that the power supply is noise free. GTP has following power supply pins: MGTAVCC, MGTAVCCPLL,
MGTAVTTRX, MGTAVTTRXC and MGTAVTTTX. Among these MGTAVCCPLL, MGTAVTTTX, MGTAVTTRX
and MGTAVCC require a filter circuit to suppress the high frequency noise. The Virtex-5 FPGA Data Sheet
provides the required exact voltage level and tolerance ranges of these analog supplies.
2. Each Virtex-5 LXT or SXT device requires one 50 ohm external precision (1%) resistor on the PCB (connected
directly to the MGTRREF pin and to the closest MGTAVTTTX pin). Make sure that this requirement is correctly
met.
3. The GTP Transceivers in the Virtex-5 LXT and SXT FPGAs use a calibration circuit to accurately determine the
termination resistance for all transceivers in a column. This circuit is located in bank 112 for each device and
utilizes a single reference resistor connected to MGTTREF_112. To correctly power this circuit and allow
propagation of the calibration information to instantiated GTP_DUAL tiles, certain power guidelines must be
followed. Check (Xilinx Answer 30915) for more details.
4. There are certain requirements to be fulfilled for the reference clock to the GTP. Some of them are mentioned
below. For more detail information refer the GTP Transceiver User Guide (UG196):
a. There should be AC coupling between the clock source and the dedicated GTP_DUAL clock input pins.
b. It is required to have a dedicated point-to-point connection between the oscillator and GTP_DUAL clock
input pins.
c. GTP_DUAL tile that sources a reference clock must be instantiated and REFCLKPWRDNB must be
asserted high.
d. If a GTP_DUAL tile is used only for forwarding a reference clock, user should meet the requirement in
Table 5.
Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 25
Table 5: GTP_DUAL_TILE Requirements when Forwarding only Reference Clock
The link between the transmitter and the receiver is subjected to different disruptive effects such as jitter induced by link
transmission, jitter due to dynamic data patterns on the link, noise induced into the signal pair and the signal attenuation
due to the impedance effect of the transmission line. Users should take into account all of these possibilities and make
sure that the input jitter requirement provided in section 4.3 of the PCI Express Base Specification v1.1 is properly
followed. Using a high speed scope, capture the link eye diagram and verify that the eye diagram meets the requirement
as defined in section 4.3.3.1 (Transmitter Compliance Eye Diagrams) and 4.3.4 that talks about minimum receiver eye
timing and voltage compliance specification.
In the PCI Express Base Specification v1.1, it provides two tables in section 4.3.3 and 4.3.4:
Users should verify that their design complies with the parameter values provided in these tables.
For proper high-speed operation, the GTP transceiver requires a high-quality, low-jitter reference clock. Using a high
speed scope, measure the input jitter on the provided reference clock. Verify that the measured jitter is within the jitter
margins provided in the Virtex-5 FPGA Data Sheet.
AC Coupling
As defined in the specification, it is required to put AC coupling capacitors at the transmitter lanes differential signal pair.
The value of AC coupling capacitor is between 75nF and 200nF.
The user should make sure that the PCI express card has AC coupling capacitor placed at the close proximity of the
transmitter lane. Check if the correct AC capacitor value has been put in place or not. There might be a possibility for a
cracked capacitor. Ensure it is not the case.
To reduce the effect of inter symbol interference, PCI express employs the concept of de-emphasis. Pre-emphasis and
De-emphasis are basically the same. If five consecutive bits are transmitted with the same polarity, the bits after the first
bit is de-emphasized compared to the first bit. In other words, the first bit is pre-emphasized compared to the rest of the
four following bits.
Each GTP transceiver has a TXPREMPHASIS port for controlling pre-emphasis. Table 7 from GTP Transceiver User
Guide (UG196), shows the percentage decrease of signal amplitude for de-emphasized bits at each TXPREEMPHASIS
level. The higher the percentage, the more de-emphasis is applied. The user should be careful in using the pre-emphasis
feature. Too much pre-emphasis can result in signal distortion. In the case where the link is down training, it is suggested
Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 26
to try with different pre-emphasis values. The value can be configured in the CORE Generator interface during core
generation, or can be directly changed in the source for the GTP/GTX wrapper.
The amplitude of the TX driver’s differential swing can be controlled using the TXDIFFCTRL ports. TXDIFFCTRL controls
the drive strength of the main pad driver and the pre-emphasis pad driver.
Table 6 shows the differential output voltage for different settings at the port. Along with the pre-emphasis (Table 7), it is
suggested to try different values for this port when having link training problems.
Transmitter Differential
Port Value
Swing(mV)
000 1100
001 1050
010 1000
TXDIFFCTRL0[2:0] = TXBUFDIFFCTRL0[2:0] 011 900
TXDIFFCTRL1[2:0] = TXBUFDIFFCTRL1[2:0] 100 800
101 600
110 400
111 0
Pre-emphasis Boost
Off Pre-emphasis Boost On
TX_DIFF_BOOST = TX_DIFF_BOOST = TRUE
FALSE (Default
Setting)
000 2 3
TXPREEMPHASIS0[2:0] 001 2 3
TXPREEMPHASIS1[2:0]
010 2.5 4
011 4.5 10.5
100 9.5 18.5
101 16 28
110 23 39
111 31 52
Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 27
Virtex-5 PCI Express Protocol Standard Characterization Test Report provides an example of an eye diagram that
illustrates the affect of applying pre-emphasis.
Figure 21: Before applying pre-emphasis Figure 22: After pre-emphasis is applied
Signal Integrity
Multi-lane designs can introduce crosstalk and noise onto the serial lanes. When having link training issues with multi-
lane links, first try isolating the upper lanes and force the link to attempt to train as an x1. For add-in cards, this can be
done by using any interposer or by placing tape on the upper lane pins on the connector. Use a tape similar to Scotch
tape.
The signal to monitor to detect probable link issues is pipe_rx_status. Table 8 shows the possible values for
pipe_rx_status and the interpretation for each value.
When the incoming data is corrupted due to crosstalk or other forms of interference on the link, the pipe_rx_status
signals would normally indicate 8B/10B errors or disparity errors. When 8B/10B or disparity errors are seen, the following
tests should be performed.
Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 28
Measure the eye diagram of the receive direction using a high speed scope. Verify that the measurement meets
the requirements of PCI Express Base specification v1.1. While using the high speed scope, probes should be
placed as close as possible to the FPGA receive pads and before the AC capacitors.
Ensure that AC capacitors have been placed on the receive and transmit lanes as discussed in the AC Coupling
section.
If the rx_status signal reports errors followed by the frequent LTSSM transition to RECOVERY, it is an indication of
possible signal integrity issues on the board. It is advised to consult a signal integrity expert to debug such issues.
When generating Endpoint Block Plus Wrapper for PCI Express in Virtex-5 FPGA, the GTP/GTX wrapper is generated
with all recommended settings. Although it is not recommended to change the default parameters, it might be necessary
to do so during the debug procedure. If there is a problem with the link such as trn_lnk_up_n is not asserted or the link
is training down, then change the TXDIFFCTRL and TXPREMPHASIS values as discussed in the Pre-emphasis (or De-
emphasis) section. This is for the transmit side.
For the receiver side, the user can test with receiver equalization value. For more information, refer to the Virtex-5
RocketIO GTP Transceiver User Guide (UG196). Table 9 (in the user guide) shows ports related to the receiver
equalization.
Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 29
Following are some of the signals that could be checked to make sure the GTP/GTX side is working correctly or not. The
signals could be probed with ChipScope Pro tool. The signals are shown in the ChipScope Pro tool capture mentioned in
the LTSSM State Analysis section.
The things to check if PLLLKDET is not asserted have already been discussed in the PCIe Clocking section. Apart from
that, verify if there exists incoming clock or not. This can be checked by probing REFCLKOUT and TXOUTCLK.
RECLKOUT is the same as CLKIN. It is the free-running clock (i.e., it operated before the PLLLKDET is asserted).
TXOUTCLK is not a free-running clock; it is only valid after PLLLKDET is asserted.
Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 30
Before generating the core, calculate the number of trigger signals required for the ILA core and how many outputs are
required from the VIO core. In the example below, the ltssm_state and trn_lnk_up_n were selected as the trigger
signals, thus Trigger Port Width of 5 is selected in the ILA core generation (shown in Figure 24).
For the VIO core, select signals based on the those that are to be changed dynamically. In the example below, all four
signals: TXDIFFCTRL, TXPREEMPHASIS, RXEQPOLE and REXEQMIX have been selected totaling the number of
output ports from VIO core to be 24.
Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 31
In Figure 25, Enable Synchronous Output Port is selected since the output from the core is provided as the input to the
GTP. Select two control ports when generating ICON core.
After all the cores have been generated, modify the pcie_gt_wrapper.v file to add following:
…………………………………………………………..
…………………………………………………………..
…………………………………………………………..
…………………………………………………………..
.RXEQMIX0(sync_out[23:22]),
.RXEQMIX1(sync_out[21:20]),
.RXEQPOLE0(sync_out[19:16]),
.RXEQPOLE1(sync_out[15:12]),
//.RXEQMIX0(2'b01),
//.RXEQMIX1(2'b01),
//.RXEQPOLE0(4'b0000),
//.RXEQPOLE1(4'b0000),
…………………………………………………………..
…………………………………………………………..
Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 32
.TXDIFFCTRL0(sync_out[5:3]), //3'b100
.TXDIFFCTRL1(sync_out[2:0]), //3'b100
…………………………………………………………..
…………………………………………………………..
//.TXDIFFCTRL0(gt_txdiffctrl_0), //3'b100
//.TXDIFFCTRL1(gt_txdiffctrl_1), //3'b100
…………………………………………………………..
…………………………………………………………..
.TXPREEMPHASIS0(sync_out[11:9]), //3'b111
.TXPREEMPHASIS1(sync_out[8:6]), //3'b111
…………………………………………………………..
…………………………………………………………..
//.TXPREEMPHASIS0(gt_txpreemphesis_0), //3'b111
//.TXPREEMPHASIS1(gt_txpreemphesis_1), //3'b111
…………………………………………………………..
…………………………………………………………..
//-----------------------------------------------------------------
// ILA core instance
//-----------------------------------------------------------------
Chipscope Pro_ila i_ila
(
.CLK(gtclk_bufg),
.CONTROL(control0),
.TRIG0(trig0)
);
//-----------------------------------------------------------------
// ICON core instance
//
//-----------------------------------------------------------------
Chipscope Pro_icon i_icon
(
.CONTROL0(control0),
.CONTROL1(control1)
);
//-----------------------------------------------------------------
// VIO core instance
//-----------------------------------------------------------------
Chipscope Pro_vio i_vio
(
.CLK(gtclk_bufg),
.CONTROL(control1),
.SYNC_OUT(sync_out)
);
Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 33
Modify the pcie_gt_wrapper_top.v file to add the highlighted lines shown below:
Implement the design after making the above modifications. If the design is implemented in the ISE tools, you should see
the following hierarchy:
Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 34
After making these modifications, implement the design and download the new bit file to the board. Open the ChipScope
Pro analyzer. You will initially see the following without any corresponding signal names as shown in Figure 17:
Figure 27: ChipScope Pro Analyzer: Waveform, Trigger Setup and VIO Console
Manually modify the names to make it more readable. To rename the signals, check the pcie_gt_wrapper.v file and
rename the signals accordingly.
Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 35
Figure 28: ChipScope Pro Analyzer with Signals Renamed
Modify the parameter by selecting the values in the VIO console. Trigger can be setup to trigger at different LTSSM
states.
Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 36
Debugging Checklist
So far, different aspects of debugging link training issues were discussed. In this section, few debugging tips are provided
based on experience while working with customers who ran into link training issues.
1. If the board has some of the lanes polarity reversed and the link is training down, capture pipe_rx_polarity in
the ChipScope Pro tool. The pipe_rx_polarity should be asserted for the lanes which are polarity reversed.
The assertion of this signal indicates that the core has detected the polarity inversion. By asserting the
pipe_rx_polarity signal, it tells the transceiver to reverse the polarity of the incoming signal.
2. Make sure the endpoint device is not in reset. This could be checked by capturing the sys_reset_n signal in the
ChipScope Pro tool.
3. Probe the signal lines and make sure that the signal levels are within the level provided in the specification.
4. If the link is not consistently training, check the value of fast_train_simulation_only core input. This
should be set to 1 for simulation only. For hardware implementation it should be set to 0.
5. The link up issue might occur if the MGTAVTTRCAL pin is not connected to its power supply. Make sure the pins
are connected correctly. For more information, refer to the Virtex-5 RocketIO GTP Transceiver User Guide
(UG196).
6. The signal trn_reset_n must deassert (go to logic 1) before the link can train (i.e., trn_lnk_up_n is
asserted). If trn_reset_n is not asserted, the probable reasons are as follows:
Therefore, to debug the issue regarding trn_reset_n not being de-asserted, follow these steps:
7. Make sure the scrambling is not turned off. When you generate the core, there is an option in the CORE
Generator interface where you can force no scrambling. For normal operation of the hardware, this option should
not be checked.
9. If it is a custom board, it could be a problem with board. To rule out the board issue, try with demo boards (e.g.,
ML555).
Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 37
Conclusion
This document presented different aspects of debugging link training issues in the Virtex-5 FPGA Endpoint Block Plus
Core. If a user of the core is experiencing issues with link training, it is recommended to go through this document and
check the provided suggestions. With this document, it is expected that the user will be able to capture the signals related
to link training in the ChipScope Pro tool, perform analysis of the captured waveform to figure out where the problem
might be.
If this document does not help to resolve the problem, please create a WebCase with Xilinx Technical Support. Attach all
of the captured ChipScope Pro waveforms, and the details of your investigation and analysis.
Revision History
Xilinx Answer 42368 - Debugging Link Training Issues in PCI Express Block Plus Core 38