v6 Pcie Ug517
v6 Pcie Ug517
User Guide
Virtex-6 FPGA Integrated Block for PCI Express www.xilinx.com UG517 (v5.1) September 21, 2010
Revision History
The following table shows the revision history for this document.
UG517 (v5.1) September 21, 2010 www.xilinx.com Virtex-6 FPGA Integrated Block for PCI Express
Date Version Revision
09/21/10 5.1 Updated ISE software to v12.3. Added cfg_pm_send_pme_to_n to Table 2-19.
Added Cadence INCISIV to Example Design Elements, page 55. Removed discussion about
example design from Example Design Elements, page 58. Updated step 4, page 59 in
Generating the Core. Added ISim to Simulating the Example Design, page 61. Added
isim_cmd.tcl, simulate_isim.bat/simulate_isim.sh, and wave.wcfg to
Table 4-13.
Updated first bullet under Design Considerations for a Directed Link Change, page 151.
Updated Figure 6-52, Figure 6-53, and Figure 6-54. Updated third bullet in Reset, page 183.
Added SX315T to FF1156 package in Table 7-1. Added note 2 to Table 7-2.
Added Chapter 10, Hardware Verification.
Added ISim to Simulating the Design, page 237, Verilog Test Selection, page 238, Table A-11,
and VHDL Flow, page 239. Replaced IUS with INCISIV in VHDL Flow, page 239.
Added ISim to Simulating the Design, page 257 and Table B-2.
Removed 5.0 Gb/s rate from description of PIPETXDEEMPH in Table G-4. Added 100b and
101b to description of CFGDEVCONTROLMAXREADREQ[2:0] in Table G-13.
Virtex-6 FPGA Integrated Block for PCI Express www.xilinx.com UG517 (v5.1) September 21, 2010
Table of Contents
Revision History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Chapter 1: Introduction
About the Core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
System Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Recommended Design Experience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Additional Core Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Appendix B: Example Design and Model Test Bench for Root Port
Configuration
Configurator Example Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
Guide Contents
This manual contains these chapters and appendices:
• Chapter 1, Introduction, describes the core and related information, including
recommended design experience and additional resources.
• Chapter 2, Core Overview, describes the main components of the Integrated Block
architecture.
• Chapter 3, Licensing the Core, provides information about obtaining a license for the
core.
• Chapter 4, Getting Started Example Design, provides instructions for quickly
generating, simulating, and implementing the example design using the
demonstration test bench.
• Chapter 5, Generating and Customizing the Core, describes how to use the graphical
user interface (GUI) to configure the integrated block using the CORE Generator™
software.
• Chapter 6, Designing with the Core, provides instructions on how to design a device
using the Integrated Block core.
• Chapter 7, Core Constraints, discusses the required and optional constraints for the
integrated block.
• Chapter 8, FPGA Configuration, discusses considerations for FPGA configuration
and PCI Express.
• Chapter 9, Known Restrictions, describes restrictions or issues where the integrated
block deviates from the PCI Base Specification or in cases where the specification is
ambiguous.
• Appendix A, Programmed Input/Output: Endpoint Example Design, describes the
Programmed Input/Output (PIO) example design for use with the core and the Root
Port model test bench environment, which provides a test program interface for use
with the PIO example design.
• Appendix B, Example Design and Model Test Bench for Root Port Configuration,
describes the Configurator example design for use with the core, and the Endpoint
Model test bench environment for use with the Configurator example design.
• Appendix C, Migration Considerations, defines the differences in behavior and
options between the Virtex-6 FPGA Integrated Block for PCI Express and the
Endpoint Block Plus for PCI Express.
Additional Documentation
The following documents are also available for download at:
http://www.xilinx.com/support/documentation/virtex-6.htm.
• Virtex-6 Family Overview
The features and product selection of the Virtex-6 family are outlined in this overview.
• Virtex-6 FPGA Data Sheet: DC and Switching Characteristics
This data sheet contains the DC and Switching Characteristic specifications for the
Virtex-6 family.
• Virtex-6 FPGA Packaging and Pinout Specifications
This specification includes the tables for device/package combinations and maximum
I/Os, pin definitions, pinout tables, pinout diagrams, mechanical drawings, and
thermal specifications.
• Virtex-6 FPGA SelectIO Resources User Guide
This guide describes the SelectIO™ resources available in all Virtex-6 devices.
• Virtex-6 FPGA Clocking Resources User Guide
This guide describes the clocking resources available in all Virtex-6 devices, including
the MMCM and PLLs.
• Virtex-6 FPGA Block RAM Resources User Guide
The functionality of the block RAM and FIFO are described in this user guide.
• Virtex-6 FPGA Configurable Logic Block User Guide
This guide describes the capabilities of the configurable logic blocks (CLBs) available
in all Virtex-6 devices.
• Virtex-6 FPGA GTH Transceivers User Guide
This guide describes the GTH transceivers available in all Virtex-6 HXT FPGAs except
the XC6VHX250T and the XC6VHX380T in the FF1154 package.
• Virtex-6 FPGA GTX Transceivers User Guide
This guide describes the GTX transceivers available in all Virtex-6 FPGAs except the
XC6VLX760.
• Virtex-6 FPGA DSP48E1 Slice User Guide
This guide describes the architecture of the DSP48E1 slice in Virtex-6 FPGAs and
provides configuration examples.
• Virtex-6 FPGA Embedded Tri-Mode Ethernet MAC User Guide
This guide describes the dedicated Tri-Mode Ethernet Media Access Controller
available in all Virtex-6 FPGAs except the XC6VLX760.
• Virtex-6 FPGA System Monitor User Guide
The System Monitor functionality available in all Virtex-6 devices is outlined in this
guide.
• Virtex-6 FPGA PCB Design Guide
This guide provides information on PCB design for Virtex-6 devices, with a focus on
strategies for making design decisions at the PCB and interface level.
Additional Resources
To find additional documentation, see the Xilinx website at:
www.xilinx.com/literature.
To search the Answer Database of silicon, software, and IP questions and answers, or to
create a technical support WebCase, see the Xilinx website at:
www.xilinx.com/support.
Introduction
This chapter introduces the Virtex®-6 FPGA Integrated Block for PCI Express® core and
provides related information including system requirements and recommended design
experience.
System Requirements
Windows
• Windows XP Professional 32-bit/64-bit
• Windows Vista Business 32-bit/64-bit
Linux
• Red Hat Enterprise Linux WS v4.0 32-bit/64-bit
• Red Hat Enterprise Desktop v5.0 32-bit/64-bit (with Workstation Option)
• SUSE Linux Enterprise (SLE) v10.1 32-bit/64-bit
Software
• ISE® v12.3 software
Check the release notes for the required Service Pack; ISE software Service Packs can be
downloaded from www.xilinx.com/support/download/index.htm.
Core Overview
This chapter describes the main components of the Virtex®-6 FPGA Integrated Block for
PCI Express® architecture.
Overview
The Virtex-6 FPGA Integrated Block for PCI Express contains full support for 2.5 Gb/s and
5.0 Gb/s PCI Express Endpoint and Root Port configurations. Table 2-1 defines the
Integrated Block for PCIe® solutions.
Notes:
1. See Link Training: 2-Lane, 4-Lane, and 8-Lane Components for additional information.
2. Endpoint configuration only.
The LogiCORE IP Virtex-6 FPGA Integrated Block for PCI Express core internally
instantiates the Virtex-6 FPGA Integrated Block for PCI Express (PCIE_2_0). The
integrated block follows the PCI Express Base Specification layering model, which consists of
the Physical, Data Link, and Transaction layers. The integrated block is compliant with the
PCI Express Base Specification, rev. 2.0.
Figure 2-1 illustrates these interfaces to the Virtex-6 FPGA Integrated Block for PCI
Express:
• System (SYS) interface
• PCI Express (PCI_EXP) interface
• Configuration (CFG) interface
• Transaction (TRN) interface
• Physical Layer Control and Status (PL) interface
The core uses packets to exchange information between the various modules. Packets are
formed in the Transaction and Data Link Layers to carry information from the transmitting
component to the receiving component. Necessary information is added to the packet
being transmitted, which is required to handle the packet at those layers. At the receiving
end, each layer of the receiving element processes the incoming packet, strips the relevant
information and forwards the packet to the next layer.
As a result, the received packets are transformed from their Physical Layer representation
to their Data Link Layer representation and the Transaction Layer representation.
X-Ref Target - Figure 2-1
User Transaction
Logic (TRN)
PCI
PCI Express
(PCI_EXP) Express
Physical Layer Physical Virtex-6 FPGA Transceivers Logic
(PL)
Control and Status Integrated Block for
PCI Express
(PCIE_2_0)
Host Configuration Optional Debug User Logic
Interface (CFG)
User System
Clock
Optional Debug
Logic (DRP) (SYS) and
Reset
UG517_c2_02_020910
Protocol Layers
The functions of the protocol layers, as defined by the PCI Express Base Specification, include
generation and processing of Transaction Layer Packets (TLPs), flow control management,
initialization, power management, data protection, error checking and retry, physical link
interface initialization, maintenance and status tracking, serialization, deserialization, and
other circuitry for interface operation. Each layer is defined in the next subsections.
Transaction Layer
The Transaction Layer is the upper layer of the PCI Express architecture, and its primary
function is to accept, buffer, and disseminate Transaction Layer packets or TLPs. TLPs
communicate information through the use of memory, I/O, configuration, and message
transactions. To maximize the efficiency of communication between devices, the
Transaction Layer enforces PCI-compliant Transaction ordering rules and manages TLP
buffer space via credit-based flow control.
Services provided by the Data Link Layer include data exchange (TLPs), error detection
and recovery, initialization services and the generation and consumption of Data Link
Layer Packets (DLLPs). DLLPs are used to transfer information between Data Link Layers
of two directly connected components on the link. DLLPs convey information such as
Power Management, Flow Control, and TLP acknowledgments.
Physical Layer
The Physical Layer interfaces the Data Link Layer with signalling technology for link data
interchange, and is subdivided into the Logical sub-block and the Electrical sub-block.
• The Logical sub-block frames and deframes TLPs and DLLPs. It also implements the
Link Training and Status State machine (LTSSM), which handles link initialization,
training, and maintenance. Scrambling, descrambling, and 8B/10B encoding and
decoding of data is also performed in this sub-block.
• The Electrical sub-block defines the input and output buffer characteristics that
interfaces the device to the PCIe link.
The Physical Layer also supports Lane Reversal (for multi-lane designs) and Lane Polarity
Inversion, as indicated in the PCI Express Base Specification, rev. 2.0 requirement.
Configuration Management
The Configuration Management layer maintains the PCI™ Type 0 Endpoint configuration
space and supports these features:
• Implements the PCI Configuration Space
• Supports Configuration Space accesses
• Power Management functions
• Implements error reporting and status functionality
• Implements packet processing functions
• Receive
- Configuration Reads and Writes
• Transmit
- Completions with or without data
- TLM Error Messaging
- User Error Messaging
- Power Management Messaging/Handshake
• Implements MSI and INTx interrupt emulation
• Optionally implements MSIx Capability Structure in the PCI Configuration Space
• Optionally implements the Device Serial Number Capability in the PCI Express
Extended Capability Space
• Optionally implements Virtual Channel Capability (support only for VC0) in the
PCI Express Extended Capability Space
• Optionally implements Xilinx defined Vendor Specific Capability Structure in the
PCI Express Extended Capability space to provide Loopback Control and Status
010h
014h
018h
01Ch
Header Type Specific
020h
(see Table 2-3 and Table 2-4)
024h
028h
02Ch
030h
CapPtr 034h
038h
Notes:
1. The MSI Capability Structure varies dependent on the selections in the
CORE Generator tool GUI.
2. Reserved for Endpoint configurations (returns 0x00000000).
Reserved 38h
Core Interfaces
The Virtex-6 FPGA Integrated Block for PCI Express core includes top-level signal
interfaces that have sub-groups for the receive direction, transmit direction, and signals
common to both directions.
System Interface
The System (SYS) interface consists of the system reset signal (sys_reset_n) and the system
clock signal (sys_clk), as described in Table 2-5.
The system reset signal is an asynchronous active-Low input. The assertion of sys_reset_n
causes a hard reset of the entire core. The system input clock must be 100 MHz, 125 MHz,
or 250 MHz, as selected in the CORE Generator™ software GUI.
Table 2-6: PCI Express Interface Signals for the 1-Lane Core
Lane
Name Direction Description
Number
0 pci_exp_txp0 Output PCI Express Transmit Positive: Serial Differential
Output 0 (+)
0 pci_exp_txn0 Output PCI Express Transmit Negative: Serial
Differential Output 0 (–)
0 pci_exp_rxp0 Input PCI Express Receive Positive: Serial Differential
Input 0 (+)
0 pci_exp_rxn0 Input PCI Express Receive Negative: Serial Differential
Input 0 (–)
Table 2-7: PCI Express Interface Signals for the 2-Lane Core
Lane
Name Direction Description
Number
0 pci_exp_txp0 Output PCI Express Transmit Positive: Serial
Differential Output 0 (+)
0 pci_exp_txn0 Output PCI Express Transmit Negative: Serial
Differential Output 0 (–)
0 pci_exp_rxp0 Input PCI Express Receive Positive: Serial
Differential Input 0 (+)
0 pci_exp_rxn0 Input PCI Express Receive Negative: Serial
Differential Input 0 (–)
1 pci_exp_txp1 Output PCI Express Transmit Positive: Serial
Differential Output 1 (+)
1 pci_exp_txn1 Output PCI Express Transmit Negative: Serial
Differential Output 1 (–)
1 pci_exp_rxp1 Input PCI Express Receive Positive: Serial
Differential Input 1 (+)
1 pci_exp_rxn1 Input PCI Express Receive Negative: Serial
Differential Input 1 (–)
Table 2-8: PCI Express Interface Signals for the 4-Lane Core
Lane
Name Direction Description
Number
0 pci_exp_txp0 Output PCI Express Transmit Positive: Serial
Differential Output 0 (+)
0 pci_exp_txn0 Output PCI Express Transmit Negative: Serial
Differential Output 0 (–)
0 pci_exp_rxp0 Input PCI Express Receive Positive: Serial
Differential Input 0 (+)
0 pci_exp_rxn0 Input PCI Express Receive Negative: Serial
Differential Input 0 (–)
1 pci_exp_txp1 Output PCI Express Transmit Positive: Serial
Differential Output 1 (+)
1 pci_exp_txn1 Output PCI Express Transmit Negative: Serial
Differential Output 1 (–)
1 pci_exp_rxp1 Input PCI Express Receive Positive: Serial
Differential Input 1 (+)
1 pci_exp_rxn1 Input PCI Express Receive Negative: Serial
Differential Input 1 (–)
2 pci_exp_txp2 Output PCI Express Transmit Positive: Serial
Differential Output 2 (+)
Table 2-8: PCI Express Interface Signals for the 4-Lane Core (Cont’d)
Lane
Name Direction Description
Number
2 pci_exp_txn2 Output PCI Express Transmit Negative: Serial
Differential Output 2 (–)
2 pci_exp_rxp2 Input PCI Express Receive Positive: Serial
Differential Input 2 (+)
2 pci_exp_rxn2 Input PCI Express Receive Negative: Serial
Differential Input 2 (–)
3 pci_exp_txp3 Output PCI Express Transmit Positive: Serial
Differential Output 3 (+)
3 pci_exp_txn3 Output PCI Express Transmit Negative: Serial
Differential Output 3 (–)
3 pci_exp_rxp3 Input PCI Express Receive Positive: Serial
Differential Input 3 (+)
3 pci_exp_rxn3 Input PCI Express Receive Negative: Serial
Differential Input 3 (–)
Table 2-9: PCI Express Interface Signals for the 8-Lane Core
Lane
Name Direction Description
Number
0 pci_exp_txp0 Output PCI Express Transmit Positive: Serial
Differential Output 0 (+)
0 pci_exp_txn0 Output PCI Express Transmit Negative: Serial
Differential Output 0 (–)
0 pci_exp_rxp0 Input PCI Express Receive Positive: Serial
Differential Input 0 (+)
0 pci_exp_rxn0 Input PCI Express Receive Negative: Serial
Differential Input 0 (–)
1 pci_exp_txp1 Output PCI Express Transmit Positive: Serial
Differential Output 1 (+)
1 pci_exp_txn1 Output PCI Express Transmit Negative: Serial
Differential Output 1 (–)
1 pci_exp_rxp1 Input PCI Express Receive Positive: Serial
Differential Input 1 (+)
1 pci_exp_rxn1 Input PCI Express Receive Negative: Serial
Differential Input 1 (–)
2 pci_exp_txp2 Output PCI Express Transmit Positive: Serial
Differential Output 2 (+)
2 pci_exp_txn2 Output PCI Express Transmit Negative: Serial
Differential Output 2 (–)
2 pci_exp_rxp2 Input PCI Express Receive Positive: Serial
Differential Input 2 (+)
Table 2-9: PCI Express Interface Signals for the 8-Lane Core (Cont’d)
Lane
Name Direction Description
Number
2 pci_exp_rxn2 Input PCI Express Receive Negative: Serial
Differential Input 2 (–)
3 pci_exp_txp3 Output PCI Express Transmit Positive: Serial
Differential Output 3 (+)
3 pci_exp_txn3 Output PCI Express Transmit Negative: Serial
Differential Output 3 (–)
3 pci_exp_rxp3 Input PCI Express Receive Positive: Serial
Differential Input 3 (+)
3 pci_exp_rxn3 Input PCI Express Receive Negative: Serial
Differential Input 3 (–)
4 pci_exp_txp4 Output PCI Express Transmit Positive: Serial
Differential Output 4 (+)
4 pci_exp_txn4 Output PCI Express Transmit Negative: Serial
Differential Output 4 (–)
4 pci_exp_rxp4 Input PCI Express Receive Positive: Serial
Differential Input 4 (+)
4 pci_exp_rxn4 Input PCI Express Receive Negative: Serial
Differential Input 4 (–)
5 pci_exp_txp5 Output PCI Express Transmit Positive: Serial
Differential Output 5 (+)
5 pci_exp_txn5 Output PCI Express Transmit Negative: Serial
Differential Output 5 (–)
5 pci_exp_rxp5 Input PCI Express Receive Positive: Serial
Differential Input 5 (+)
5 pci_exp_rxn5 Input PCI Express Receive Negative: Serial
Differential Input 5 (–)
6 pci_exp_txp6 Output PCI Express Transmit Positive: Serial
Differential Output 6 (+)
6 pci_exp_txn6 Output PCI Express Transmit Negative: Serial
Differential Output 6 (–)
6 pci_exp_rxp6 Input PCI Express Receive Positive: Serial
Differential Input 6 (+)
6 pci_exp_rxn6 Input PCI Express Receive Negative: Serial
Differential Input 6 (–)
7 pci_exp_txp7 Output PCI Express Transmit Positive: Serial
Differential Output 7 (+)
7 pci_exp_txn7 Output PCI Express Transmit Negative: Serial
Differential Output 7 (–)
Table 2-9: PCI Express Interface Signals for the 8-Lane Core (Cont’d)
Lane
Name Direction Description
Number
7 pci_exp_rxp7 Input PCI Express Receive Positive: Serial
Differential Input 7 (+)
7 pci_exp_rxn7 Input PCI Express Receive Negative: Serial
Differential Input 7 (–)
Transaction Interface
The Transaction (TRN) interface provides a mechanism for the user design to generate and
consume TLPs. The signal names and signal descriptions for this interface are shown in
Table 2-10, Table 2-12, and Table 2-13.
Notes:
1. Endpoint configuration only.
Configuration Interface
The Configuration (CFG) interface enables the user design to inspect the state of the
Endpoint for PCIe configuration space. The user provides a 10-bit configuration address,
which selects one of the 1024 configuration space doubleword (DWORD) registers. The
Endpoint returns the state of the selected register over the 32-bit data output port.
Table 2-17 defines the Configuration interface signals. See Design with Configuration
Space Registers and Configuration Interface, page 155 for usage.
Table 2-20: Configuration Interface Signals: Interrupt Interface - Endpoint Only (Cont’d)
Name Direction Description
cfg_interrupt_msixenable Output Configuration Interrupt MSI-X Enabled: Indicates that the Message
Signalling Interrupt-X (MSI-X) messaging is enabled.
• 0: Only Legacy (INTX) interrupts or MSI Interrupts can be sent.
• 1: Only MSI-X Interrupts should be sent.
cfg_interrupt_msixfm Output Configuration Interrupt MSI-X Function Mask: Indicates the state of the
Function Mask bit in the MSI-X Message Control field. If 0, each vector’s
Mask bit determines its masking. If 1, all vectors are masked, regardless of
their per-vector Mask bit states.
Notes:
1. The user should assert these signals only if the device power state is D0. Asserting these signals in
non-D0 device power states might result in an incorrect operation on the PCIe link. For additional
information, see the PCI Express Base Specification, rev. 2.0, Section 5.3.1.2.
pcie_drp_clk Input PCI Express DRP Clock: The rising edge of this signal
is the timing reference for all the other DRP signals.
Normally, drp_clk is driven with a global clock buffer.
The maximum frequency is defined in the Virtex-6
FPGA Data Sheet.
pcie_drp_den Input PCI Express DRP Data Enable: When asserted, this
signal enables a read or write operation. If drp_dwe is
deasserted, it is a read operation, otherwise a write
operation. For any given drp_clk cycle, all other input
signals are don’t cares if drp_den is not active.
pcie_drp_dwe Input PCI Express DRP Write Enable: When asserted, this
signal enables a write operation to the port (see
drp_den).
pcie_drp_daddr[8:0] Input PCI Express DRP Address Bus: The value on this bus
specifies the individual cell that is written or read.
The address is presented in the cycle that drp_den is
active.
pcie_drp_di[15:0] Input PCI Express DRP Data Input: The value on this bus is
the data written to the addressed cell. The data is
presented in the cycle that drp_den and drp_dwe are
active, and is captured in a register at the end of that
cycle, but the actual write occurs at an unspecified
time before drp_drdy is returned.
pcie_drp_drdy Output PCI Express DRP Ready: This signal is a response to
drp_den to indicate that the DRP cycle is complete
and another DRP cycle can be initiated. In the case of
a DRP read, the drp_do bus must be captured on the
rising edge of drp_clk in the cycle that drp_drdy is
active. The earliest that drp_den can go active to start
the next port cycle is the same clock cycle that
drp_drdy is active.
pcie_drp_do[15:0] Output PCI Express DRP Data Out: If drp_dwe was inactive
when drp_den was activated, the value on this bus
when drp_drdy goes active is the data read from the
addressed cell. At all other times, the value on
drp_do[15:0] is undefined.
Before Beginning
This chapter assumes that the core has been installed by running Xilinx Update using
either the CORE Generator™ IP Software Update installer, or by performing a manual
installation after downloading the core from the web.
License Options
If ISE v11.3 software or later is used, this section can be skipped.
The Virtex-6 FPGA Integrated Block for PCI Express core requires installation of a full
license key and the relevant ISE software update. The full license key provides full access
to all core functionality both in simulation and in hardware, including:
• Functional simulation support
• Full implementation support including place and route and bitstream generation
• Full functionality in the programmed device with no time-outs
Test
usrapp_rx usrapp_tx Program
dsport
Endpoint Core
for PCI Express
PIO
Design
PIO_TO_CTRL
ep_mem0
ep_mem1
ep_mem3
EP_MEM
PIO_EP
PIO
• The PIO Endpoint Model test bench, which consists of a PCI Express Endpoint, a PIO
Slave design, and the test bench that monitors the bus traffic.
5.0 Gb/s
Data (Gen2)
Checker Enabler
Completion
Decoder
Virtex-6 FPGA
Integrated Block
Controller Controller for PCI Express
(Configured as
Root Port)
Packet
Packet Generator
Generator
TX Mux
UG517_c4_03_021210
Figure 4-4 illustrates the simulation design provided with the Root Port of the Virtex-6
FPGA Integrated Block for PCI Express. For more information about the Configurator
example design and the Endpoint model test bench, see Appendix B, Example Design and
Model Test Bench for Root Port Configuration.
X-Ref Target - Figure 4-4
PIO Master
Configurator
Wrapper
Configurator Configurator
Block ROM
TRN Interface
Integrated
Endpoint
Model
PIO Slave
Endpoint
Design
illustrated inFigure 4-3. Source code for the example is provided with the core. For more
information about the example design, see Appendix B, Example Design and Model Test
Bench for Root Port Configuration.
6. In the Component Name field, enter a name for the core. <component_name> is used
in this example.
X-Ref Target - Figure 4-7
7. From the Device/Port Type drop-down menu, select the appropriate device/port type
of the core (Endpoint or Root Port).
8. Click Finish to generate the core using the default parameters. The core and its
supporting files, including the example design and model test bench, are generated in
the project directory. For detailed information about the example design files and
directories, see Directory Structure and File Contents, page 64. In addition, see the
README file.
Endpoint Configuration
The simulation environment provided with the Virtex-6 FPGA Integrated Block for PCI
Express core in Endpoint configuration performs simple memory access tests on the PIO
example design. Transactions are generated by the Root Port Model and responded to by
the PIO example design.
• PCI Express Transaction Layer Packets (TLPs) are generated by the test bench
transmit User Application (pci_exp_usrapp_tx). As it transmits TLPs, it also
generates a log file, tx.dat.
• PCI Express TLPs are received by the test bench receive User Application
(pci_exp_usrapp_rx). As the User Application receives the TLPs, it generates a log
file, rx.dat.
For more information about the test bench, see Root Port Model Test Bench for Endpoint in
Appendix A.
Simulator Requirements
Virtex-6 device designs require a Verilog LRM-IEEE 1364-2005 encryption-compliant
simulator. This core supports these simulators:
• ModelSim: v6.4b
• Cadence INCISIV: v9.2 (Verilog only)
• Synopsys VCS and VCS MX: 2009.12 (Verilog only)
• ISE Simulator (ISim)
xilinx_pcie_2_0_ep_v6_01_lane_gen1_ML605.ucf
4. map: Maps design to the selected FPGA using the constraints provided.
5. par: Places cells onto FPGA resources and routes connectivity.
6. trce: Performs static timing analysis on design using constraints specified.
7. netgen: Generates a logical Verilog or VHDL HDL representation of the design and
an SDF file for post-layout verification.
8. bitgen: Generates a bitstream file for programming the FPGA.
These FPGA implementation related files are generated in the results directory:
• routed.bit
FPGA configuration information.
• routed.v[hd]
Verilog or VHDL functional Model.
• routed.sdf
Timing model Standard Delay File.
• mapped.mrp
Xilinx map report.
• routed.par
Xilinx place and route report.
• routed.twr
Xilinx timing analysis report.
The script file starts from an EDIF/NGC file and results in a bitstream file. It is possible to
use the Xilinx ISE software GUI to implement the example design. However, the GUI flow
is not presented in this document.
Example Design
<project directory> topdirectory
<project directory>
The project directory contains all the CORE Generator tool project files.
<component name>/doc
The doc directory contains the PDF documentation provided with the core.
<component name>/example_design
The example_design directory contains the example design files provided with the core.
Table 4-4 and Table 4-5 show the directory contents for an Endpoint configuration core and
a Root Port configuration core.
<component name>/implement
The implement directory contains the core implementation script files.
implement/results
The results directory is created by the implement script. The implement script results are
placed in the results directory.
Table 4-7: Results Directory
Name Description
<project_dir>/<component_name>/implement/results
Implement script result files.
Back to Top
implement/xst
The xst directory is created by the XST script. The synthesis results are placed in the xst
directory.
implement/synplify
The synplify directory is created by the Synplify script. The synthesis results are placed
in the synplify directory.
<component name>/source
The source directory contains the generated core source files.
<component name>/simulation
The simulation directory contains the simulation source files provided with the core.
simulation/dsport
The dsport directory contains the files for the Root Port model test bench.
simulation/ep
The ep directory contains the Endpoint model files.
simulation/functional
The functional directory contains functional simulation scripts provided with the core.
Table 4-13: Functional Directory
Name Description
<project_dir>/<component_name>/simulation/functional
board.f List of files for RTL simulations.
isim_cmd.tcl Simulation helper script for ISim.
simulate_isim.bat Simulation scripts for ISim DOS/UNIX.
simulate_isim.sh
wave.wcfg Simulation wave file for ISim.
simulate_mti.do Simulation script for ModelSim.
simulate_ncsim.sh Simulation script for Cadence INCISIV (Verilog only).
simulate_vcs.sh Simulation script for VCS (Verilog only).
xilinx_lib_vcs.f Points to the required SecureIP Model.
board_common.v Contains test bench definitions (Verilog only).
(Endpoint configuration only)
board.v[hd] Top-level simulation module.
sys_clk_gen_ds.v[hd] System differential clock source.
(Endpoint configuration only)
sys_clk_gen.v[hd] System clock source.
Back to Top
simulation/tests
Note: This directory exists for Endpoint configuration only.
The tests directory contains test definitions for the example test bench.
Component Name
Base name of the output files generated for the core. The name must begin with a letter and
can be composed of these characters: a to z, 0 to 9, and “_.”
Number of Lanes
The Virtex-6 FPGA Integrated Block for PCI Express requires the selection of the initial lane
width. Table 5-1 defines the available widths and associated generated core. Wider lane
width cores are capable of training down to smaller lane widths if attached to a smaller
lane-width device. See Link Training: 2-Lane, 4-Lane, and 8-Lane Components, page 181
for more information.
Link Speed
The Virtex-6 FPGA Integrated Block for PCI Express allows the selection of Maximum Link
Speed supported by the device. Table 5-2 defines the lane widths and link speeds
supported by the device. Higher link speed cores are capable of training to a lower link
speed if connected to a lower link speed capable device.
Interface Frequency
It is possible to select the clock frequency of the core's user interface. Each lane width
provides multiple frequency choices: a default frequency and alternative frequencies, as
defined in Table 5-3. Where possible, Xilinx recommends using the default frequency.
Selecting the alternate frequencies does not result in a difference in throughput in the core,
but does allow the user application to run at an alternate speed.
Notes:
1. Endpoint configuration only.
The Base Address Register (BAR) screen shown in Figure 5-3 sets the base address register
space and I/O and Prefetchable Memory Base and Limit registers for the Root Port
configuration.
X-Ref Target - Figure 5-3
I/O Base and I/O Limit Registers: Root Port Configuration Only
For the Virtex-6 FPGA Integrated Block for PCI Express in the Root Port configuration, the
I/O Base and I/O Limit Registers are used to define an address range that can be used by
an implemented PCI™ to PCI Bridge to determine how to forward I/O transactions.
Prefetchable Memory Base and Prefetchable Memory Limit Registers: Root Port
Configuration Only
For the Virtex-6 FPGA Integrated Block for PCI Express in the Root Port configuration, the
Prefetchable Memory Base and Prefetchable Memory Limit Registers are used to define a
prefetchable memory address range that can be used by an implemented PCI-PCI Bridge
to determine how to forward Memory transactions.
PCI Registers
The PCI Registers Screen shown in Figure 5-4 is used to customize the IP initial values,
class code and Cardbus CIS pointer information.
X-Ref Target - Figure 5-4
ID Initial Values
• Vendor ID: Identifies the manufacturer of the device or application. Valid identifiers
are assigned by the PCI Special Interest Group to guarantee that each identifier is
unique. The default value, 10EEh, is the Vendor ID for Xilinx. Enter a vendor
identification number here. FFFFh is reserved.
• Device ID: A unique identifier for the application; the default value, which depends
on the configuration selected, is 60<link speed><link width>h. This field can be any
value; change this value for the application.
• Revision ID: Indicates the revision of the device or application; an extension of the
Device ID. The default value is 00h; enter values appropriate for the application.
Class Code
The Class Code identifies the general function of a device, and is divided into three byte-
size fields:
• Base Class: Broadly identifies the type of function performed by the device.
• Sub-Class: More specifically identifies the device function.
• Interface: Defines a specific register-level programming interface, if any, allowing
device-independent software to interface with the device.
Class code encoding can be found at www.pcisig.com.
UG517_c5_05_030110
Capabilities Register
• Capability Version: Indicates the PCI-SIG defined PCI Express capability structure
version number; this value cannot be changed.
• Device Port Type: Indicates the PCI Express logical device type.
• Slot Implemented: Indicates the PCI Express Link associated with this port is
connected to a slot. Only valid for a Root Port of a PCI Express Root Complex or a
Downstream Port of a PCI Express Switch.
• Capabilities Register: Displays the value of the Capabilities register presented by the
integrated block, and is not editable.
• Maximum Link Width: This value is set to the initial lane width specified in the first
GUI screen and is not editable.
• DLL Link Active Reporting Capability: Indicates the optional Capability of
reporting the DL_Active state of the Data Link Control and Management State
Machine.
• Link Capabilities Register: Displays the value of the Link Capabilities register sent to
the core and is not editable.
• Hot-Plug Surprise: Indicates that an adaptor in this slot might be removed from the
system without any prior notification.
• Hot-Plug Capable: Indicates that this slot is capable of supporting Hot-Plug
operations.
• MRL Sensor Present: Indicates that an MRL (Manually operated Retention Latch)
sensor is implemented for this slot on the chassis.
• Electromechanical Interlock Present: Indicates that an Electromechanical Interlock is
implemented on the chassis for this slot.
• No Command Completed Support: Indicates that the slot does not generate software
notification when and issue command is completed by the Hot-Plug Controller.
• Slot Power Limit Value: Specifies the Upper Limit on power supplied to the slot, in
combination with Slot Power Limit Scale.
• Slot Power Limit Scale: Specifies the Scale used for the Slot Power Limit value.
• Physical Slot Number: Specifies the Physical Slot Number attached to this Port. This
field must be hardware initialized to a value that assigns a slot number that is unique
within the chassis, regardless of form factor associated with this slot.
• Slot Capabilities Register: Displays the value of the Slot Capabilities Register sent to
the Core and is not editable.
Interrupt Capabilities
The Interrupt Settings screen shown in Figure 5-8 sets the Legacy Interrupt Settings, MSI
Capabilities and MSI-X Capabilities.
X-Ref Target - Figure 5-8
MSI Capabilities
• Enable MSI Capability Structure: Indicates that the MSI Capability structure exists.
• 64 bit Address Capable: Indicates that the function is capable of sending a 64-bit
Message Address.
• Multiple Message Capable: Selects the number of MSI vectors to request from the
Root Complex.
• Per Vector Masking Capable: Indicates that the function supports MSI per-vector
Masking.
MSI-X Capabilities
• Enable MSIx Capability Structure: Indicates that the MSI-X Capability structure
exists.
Note: This Capability Structure needs at least one Memory BAR to be configured.
• MSIx Table Settings: Defines the MSI-X Table Structure.
• Table Size: Specifies the MSI-X Table Size.
• Table Offset: Specifies the Offset from the Base Address Register that points to the
Base of the MSI-X Table.
• BAR Indicator: Indicates the Base Address Register in the Configuration Space that
is used to map the function’s MSI-X Table, onto Memory Space. For a 64-bit Base
Address Register, this indicates the lower DWORD.
• MSIx Pending Bit Array (PBA) Settings: Defines the MSI-X Pending Bit Array (PBA)
Structure.
• PBA Offset: Specifies the Offset from the Base Address Register that points to the
Base of the MSI-X Pending Bit Array (PBA).
• PBA BAR Indicator: Indicates the Base Address Register in the Configuration
Space that is used to map the function’s MSI-X Pending Bit Array (PBA), onto
Memory Space.
• Device Specific Initialization: This bit indicates whether special initialization of this
function is required (beyond the standard PCI configuration header) before the
generic class device driver is able to use it. When selected, this option indicates that
the function requires a device specific initialization sequence following transition to
the D0 uninitialized state. See section 3.2.3 of the PCI Bus Power Management Interface
Specification Revision 1.2.
• D1 Support: When selected, this option indicates that the function supports the D1
Power Management State. See section 3.2.3 of the PCI Bus Power Management Interface
Specification Revision 1.2.
• D2 Support: When selected, this option indicates that the function supports the D2
Power Management State. See section 3.2.3 of the PCI Bus Power Management Interface
Specification Revision 1.2.
• PME Support From: When this option is selected, it indicates the power states in
which the function can assert cfg_pm_wake_n. See section 3.2.3 of the PCI Bus Power
Management Interface Specification Revision 1.2.
• No Soft Reset: Checking this box indicates that if the device transitions from D3hot to
D0 because of a Power State Command, it does not perform an internal reset and
Configuration context is preserved. This option is not supported.
Power Consumption
The Virtex-6 FPGA Integrated Block for PCI Express always reports a power budget of 0W.
For information about power consumption, see section 3.2.6 of the PCI Bus Power
Management Interface Specification Revision 1.2.
Power Dissipated
The Virtex-6 FPGA Integrated Block for PCI Express always reports a power dissipation of
0W. For information about power dissipation, see section 3.2.6 of the PCI Bus Power
Management Interface Specification Revision 1.2.
Pinout Selection
The Pinout Selection screen shown in Figure 5-11 includes options for pinouts specific to
Xilinx Development Boards and PCIe Block Location.
X-Ref Target - Figure 5-11
• Xilinx Development Boards: Selects the Xilinx Development Board to enable the
generation of Xilinx Development Board specific constraints files.
• PCIe Block Location Selection: Selects from the PCIe Blocks available to enable
generation of location specific constraint files and pinouts. When options “X0Y0 &
X0Y1” or “X0Y2 & X0Y3” are selected, constraints files for both PCIe Block locations
are generated, and the constraints file for the X0Y0 or X0Y3 location is used.
This option is not available if a Xilinx Development Board is selected.
Advanced Settings
The Advanced Settings screens shown in Figure 5-12 and Figure 5-13 include settings for
Transaction Layer, Link Layer, Physical Layer, DRP Ports, and Reference Clock Frequency
options.
X-Ref Target - Figure 5-12
UG517_c5_13_021210
Debug Ports
• PCIe DRP Ports: Checking this box enables the generation of DRP ports for the PCIe
Hard Block, giving users dynamic control over the PCIe Hard Block attributes. This
setting can be used to perform advanced debugging. Any modifications to the PCIe
default attributes must be made only if directed by Xilinx Technical Support.
.
X-Ref Target - Figure 6-1
+0 +1 +2 +3
7 6 5 4 3 2 10 7 6 5 4 3 2 1 0 7 6 54 3 2 1 0 7 6 5 4 3 2 10
When using the Transaction interface, packets are arranged on the entire 64-bit datapath.
Figure 6-2 shows the same example packet on the Transaction interface. Byte 0 of the
packet appears on trn_td[63:56] (outbound) or trn_rd[63:56] (inbound) of the first
QWORD, byte 1 on trn_td[55:48] or trn_rd[55:48], and so forth. Byte 8 of the packet then
appears on trn_td[63:56] or trn_rd[63:56] of the second QWORD. The Header section of the
packet consists of either three or four DWORDs, determined by the TLP format and type as
described in section 2.2 of the PCI Express Base Specification.
X-Ref Target - Figure 6-2
+0 +1 +2 +3 +4 +5 +6 +7
[63:56] [55:48] [47:40] [39:32] [31:24] [23:16] [15:8] [7:0]
Fmt T E Attr R Last DW 1st DW
Byte 0 > R x 0 Type R TC Rsvd D P Length Requester ID Tag BE BE
Packets sent to the core for transmission must follow the formatting rules for Transaction
Layer Packets (TLPs) as specified in the “Transaction Layer Specification” chapter of the
PCI Express Base Specification. The User Application is responsible for ensuring its packets’
validity, as the core does not check packet validity or validate packets. The exact fields of a
given TLP vary depending on the type of packet being transmitted.
The core allows the User Application to add an extra level of error checking by using the
optional TLP Digest Field in the TLP header. The presence of a TLP Digest or ECRC is
indicated by the value of TD field in the TLP Header section. When TD=1, a correctly
computed CRC32 remainder is expected to be presented as the last DWORD of the packet.
The CRC32 remainder DWORD is not included in the length field of the TLP header. The
User Application must calculate and present the TLP Digest as part of the packet when
transmitting packets. Upon receiving packets with a TLP Digest present, the User
Application must check the validity of the CRC32 based on the contents of the packet. The
core does not check the TLP Digest for incoming packets. The PCI Express Base Specification
requires Advanced Error Reporting (AER) capability when implementing ECRC.
Although the integrated block does not support AER, users can still implement ECRC for
custom solutions that do not require compliance with the PCI Express Base Specification.
trn_td[63:0] and trn_trem_n is driven to 0b; otherwise, the four remaining data bytes
are presented on trn_td[63:32] and trn_trem_n is driven to 1b.
4. At the next clock cycle, the User Application deasserts trn_tsrc_rdy_n to signal the end
of valid transfers on trn_td[63:0].
Figure 6-3 illustrates a 3-DW TLP header without a data payload; an example is a 32-bit
addressable Memory Read request. When the User Application asserts trn_teof_n, it also
places a value of 1b on trn_trem_n, notifying the core that only trn_td[63:32] contains valid
data.
X-Ref Target - Figure 6-3
trn_clk
trn_tsof_n
trn_trem_n
trn_teof_n
trn_tsrc_rdy_n
trn_tdst_rdy_n
trn_tsrc_terr_drop_n
trn_terr_drop_n
Figure 6-4 illustrates a 4-DW TLP header without a data payload; an example is a 64-bit
addressable Memory Read request. When the User Application asserts trn_teof_n, it also
places a value of 0b on trn_trem_n, notifying the core that trn_td[63:0] contains valid data.
X-Ref Target - Figure 6-4
trn_clk
trn_tsof_n
trn_trem_n
trn_teof_n
trn_tsrc_rdy_n
trn_tdst_rdy_n
trn_tsrc_dsc_n
trn_terr_drop_n
UG517_c6_04_020210
Figure 6-5 illustrates a 3-DW TLP header with a data payload; an example is a 32-bit
addressable Memory Write request. When the User Application asserts trn_teof_n, it also
puts a value of 0b on trn_trem_n, notifying the core that trn_td[63:0] contains valid data.
The user must ensure the remainder field selected for the final data cycle creates a packet
of length equivalent to the length field in the packet header.
X-Ref Target - Figure 6-5
trn_clk
trn_td[63:0] HDR1 + HDR2 HDR3 + DATA1 DATA2 + DATA3 DATA4 + DATA5 DATA DATAn-1+DATAn
trn_tsof_n
trn_teof_n
trn_trem_n
trn_tsrc_rdy_n
trn_tdst_rdy_n
trn_tsrc_dsc_n
trn_terr_drop_n
UG517_c6_05_020210
Figure 6-6 illustrates a 4-DW TLP header with a data payload; an example is a 64-bit
addressable Memory Write request. When the User Application asserts trn_teof_n, it also
places a value of 1b on trn_trem_n, notifying the core that only trn_td[63:32] contains valid
data. The user must ensure the remainder field is selected for the final data cycle creates a
packet of length equivalent to the length field in the packet header.
X-Ref Target - Figure 6-6
trn_clk
trn_td[63:0] HDR1 + HDR2 HDR3 + HDR4 DATA1 + DATA2 DATA3 + DATA4 DATA DATAn + NULL
trn_tsof_n
trn_teof_n
trn_trem_n
trn_tsrc_rdy_n
trn_tdst_rdy_n
trn_tsrc_dsc_n
trn_terr_drop_n
UG517_c6_06_020210
trn_clk
trn_td[63:0] HDR1 + HDR2 HDR3 + HDR4 DATA2 + DATA3 DATA4 + DATA5 HDR1 + HDR2 HDR3 + DATA1 DATA2 + DATA3
trn_tsof_n
trn_trem_n
trn_teof_n
trn_trsc_rdy_n
trn_tdst_rdy_n
trn_tsrc_dsc_n
trn_terr_drop_n
TLP1 TLP2
UG517_c6_07_020210
trn_clk
trn_td[63:0] HDR1 + HDR2 HDR3 + HDR4 DATA1 + DATA2 DATA3 + DATA4 DATA5 + DATA6 DATA7 + DATA8
trn_tsof_n
trn_trem_n
trn_teof_n
trn_tsrc_rdy_n
trn_tdst_rdy_n
trn_tsrc_dsc_n
trn_terr_drop_n
trn_clk
trn_tsof_n
trn_trem_n
trn_teof_n
trn_tsrc_rdy_n
trn_tdst_rdy_n
trn_terr_drop_n
trn_tdst_dsc_n
If the core transmit Transaction interface accepts the start of a TLP by asserting
trn_tdst_rdy_n, it is guaranteed to accept the complete TLP with a size up to the value
contained in the Max_Payload_Size field of the PCI Express Device Capability Register
(offset 04H). To stay compliant to the PCI Express Base Specification users should not violate
the Max_Payload_Size field of the PCI Express Device Control Register (offset 08H). The
core transmit Transaction interface deasserts trn_tdst_rdy_n only under these conditions:
• After it has accepted the TLP completely and has no buffer space available for a new
TLP.
• When the core is transmitting an internally generated TLP (configuration Completion
TLP, error Message TLP or error response as requested by the User Application on the
cfg_err interface), after it has been granted use of the transmit datapath by the User
Application, by assertion of trn_tcfg_gnt_n. The core subsequently asserts
trn_tdst_rdy_n after transmitting the internally generated TLP.
On deassertion of trn_tdst_rdy_n by the core, the User Application needs to hold all
control and data signals until the core asserts trn_tdst_rdy_n.
The core transmit Transaction interface throttles the User Application when the Power
State field in Power Management Control/Status Register (offset 0x4) of the PCI Power
Management Capability Structure is changed to a non-D0 state. When this occurs, any
ongoing TLP is accepted completely and trn_tdst_rdy_n is subsequently deasserted,
disallowing the User Application from initiating any new transactions—for the duration
that the core is in the non-D0 power state.
trn_clk
trn_td[63:0] TLP1
qw1
TLP1
qw2
TLP1
qw3
TLP1
qw4
TLP1
qw5
TLP
qw p-3
TLP
qw p-2
TLP
qw p-1
TLP
qw p
trn_tsof_n
trn_trem_n
trn_teof_n
trn_tsrc_rdy_n
trn_tdst_rdy_n
trn_tsrc_dsc_n
trn_terr_drop_n
trn_clk
trn_td[63:0] TLP TLP TLP TLP TLP TLP TLP TLP TLP TLP TLP
qw1 qw2 qw3 qw4 qw5 qw p-3 qw1 qw2 qw3 qw4 qw5
trn_tsof_n
trn_trem_n
trn_teof_n
trn_tsrc_rdy_n
trn_tdst_rdy_n
trn_tsrc_dsc_n
trn_terr_drop_n
trn_terrfwd_n
trn_tbuf_av[5:0]
UG517_c6_11_021210
trn_clk
trn_tsof_n
trn_trrem_n
trn_teof_n
trn_tsrc_rdy_n
trn_tstr_n
trn_tdst_rdy_n
trn_tsrc_dsc_n
trn_terr_drop_n
trn_terrfwd_n
trn_clk
trn_td[63:0] TLP1 TLP2 TLP1 TLP1 TLP2 TLP2 TLP TLP2 TLP2 TLP2
qw1 qw2 qw25 qw26 qw1 qw2 qw3 qw4 qw5 qw6
trn_tsof_n
trn_trem_n
trn_teof_n
trn_tsrc_rdy_n
trn_tstr_n
trn_tdst_rdy_n
trn_terr_drop_n
Figure 6-14 shows a 3-DW TLP header without a data payload; an example is a 32-bit
addressable Memory Read request. When the core asserts trn_reof_n, it also places a value
of 1b on trn_rrem_n, notifying the user that only trn_rd[63:32] contains valid data.
X-Ref Target - Figure 6-14
trn_clk
trn_rsof_n
trn_reof_n
trn_rrem_n
trn_rsrc_rdy_n
trn_rdst_rdy_n
trn_rsrc_dsc_n
trn_rerrfwd_n
trn_rnp_ok_n
trn_rbar_hit_n[6:0]
UG517_c6_14_020210
Figure 6-15 shows a 4-DW TLP header without a data payload; an example is a 64-bit
addressable Memory Read request. When the core asserts trn_reof_n, it also places a value
of 0b on trn_rrem_n, notifying the user that trn_rd[63:0] contains valid data.
X-Ref Target - Figure 6-15
trn_clk
trn_rsof_n
trn_rrem_n
trn_reof_n
trn_rsrc_rdy_n
trn_rdst_rdy_n
trn_rsrc_dsc_n
trn_rerrfwd_n
trn_rnp_ok_n
trn_rbar_hit_n[6:0]
UG517_c6_15_020210
Figure 6-16 shows a 3-DW TLP header with a data payload; an example is a 32-bit
addressable Memory Write request. When the core asserts trn_reof_n, it also places a value
of 0b on trn_rrem_n, notifying the user that trn_rd[63:0] contains valid data.
X-Ref Target - Figure 6-16
trn_clk
trn_rrem_n
trn_rsof_n
trn_reof_n
trn_rsrc_rdy_n
trn_rdst_rdy_n
trn_rsrc_dsc_n
trn_rerrfwd_n
trn_rnp_ok_n
trn_rbar_hit_n[6:0]
UG517_c6_16_020210
Figure 6-17 shows a 4-DW TLP header with a data payload; an example is a 64-bit
addressable Memory Write request. When the core asserts trn_reof_n, it also places a value
of 1b on trn_rrem_n, notifying the user that only trn_rd[63:32] contains valid data.
X-Ref Target - Figure 6-17
trn_clk
trn_rrem_n
trn_rsof_n
trn_reof_n
trn_rsrc_rdy_n
trn_rdst_rdy_n
trn_rsrc_dsc_n
trn_rerrfwd_n
trn_rnp_ok_n
trn_rbar_hit_n[6:0]
trn_clk
trn_rd[63:00] TLP qw1 TLP qw2 TLP qw3 TLP qw4 TLP qw5 TLP qw6 TLP qw7
trn_rsof_n
trn_rrem_n
trn_reof_n
trn_rsrc_rdy_n
trn_rdst_rdy_n
trn_rsrc_dsc_n
trn_rerrfwd_n
trn_rnp_ok_n
trn_rbar_hit_n[6:0]
trn_clk
TLP1 TLP1 TLP1 TLP1 TLP1 TLP1 TLP2 TLP2 TLP2 TLP2 TLP2 TLP2
trn_rd[63:00] qw1 qw2 qw3 qw4 qw5 qw6 qw1 qw2 qw3 qw4 qw5 qw6
trn_rsof_n
trn_rrem_n
trn_reof_n
trn_rsrc_rdy_n
trn_rdst_rdy_n
trn_rsrc_dsc_n
trn_rerrfwd_n
trn_rnp_ok_n
trn_rbar_hit_n[6:0]
If the User Application cannot accept back-to-back packets, it can stall the transfer of the
TLP by deasserting trn_rdst_rdy_n as discussed in the Throttling the Datapath on the
Receive Transaction Interface section. Figure 6-20 shows an example of using
trn_rdst_rdy_n to pause the acceptance of the second TLP.
X-Ref Target - Figure 6-20
trn_clk
TLP1 TLP1 TLP1 TLP1 TLP1 TLP1 TLP2 TLP2 TLP2 TLP2
trn_rd[63:00] qw1 qw2 qw3 qw4 qw5 qw6 qw1 qw2 qw3 qw4
trn_rsof_n
trn_rrem_n
trn_reof_n
trn_rsrc_rdy_n
trn_rdst_rdy_n
trn_rsrc_dsc_n
trn_clk
trn_rsof_n
trn_rrem_n
trn_reof_n
trn_rsrc_rdy_n
trn_rdst_rdy_n
trn_rsrc_dsc_n
trn_rerrfwd_n
trn_rnp_ok_n
trn_rbar_hit_n[6:0]
Packet re-ordering allows the User Application to optimize the rate at which Non-Posted
TLPs are processed, while continuing to receive and process Posted and Completion TLPs
in a non-blocking fashion. The trn_rnp_ok_n signaling restrictions require that the User
Application be able to receive and buffer at least three Non-Posted TLPs. This algorithm
describes the process of managing the Non-Posted TLP buffers:
Consider that Non-Posted_Buffers_Available denotes the size of Non-Posted buffer
space available to User Application. The size of the Non-Posted buffer space is three
Non-Posted TLPs. Non-Posted_Buffers_Available is decremented when Non Posted
TLP is accepted for processing from the core, and is incremented when Non-Posted
TLP is drained for processing by the User Application.
For every clock cycle do {
if (Non-Posted_Buffers_Available <= 2) {
if (Valid transaction Start-of-Frame accepted by user
application) {
Extract TLP Format and Type from the 1st TLP DW
if (TLP type == Non-Posted) {
Deassert trn_rnp_ok_n on the following clock
cycle
- or -
Other optional user policies to stall NP
transactions
} else {
}
}
} else { // Non-Posted_Buffers_Available > 2
Assert trn_rnp_ok_n on the following clock cycle.
}
}
trn_clk
trn_rsof_n
trn_rrem_n
trn_reof_n
trn_rsrc_rdy_n
trn_rdst_rdy_n
trn_rsrc_dsc_n
trn_rerrfwd_n
trn_rnp_ok_n
trn_rbar_hit_n[6:0]
decoded by one of the BARs (that is, a misdirected TLP), then the core drops it without
presenting it to the user and it automatically generates an Unsupported Request message.
Even if the core is configured for a 64-bit BAR, the system might not always allocate a
64-bit address, in which case only one trn_rbar_hit_n[6:0] signal is asserted.
Table 6-1 illustrates mapping between trn_rbar_hit_n[6:0] and the BARs, and the
corresponding byte offsets in the core Type0 configuration header.
1 1 14h
2 2 18h
3 3 1Ch
4 4 20h
5 5 24h
For a Memory or I/O TLP Transaction on the receive interface, trn_rbar_hit_n[6:0] is valid
for the entire TLP, starting with the assertion of trn_rsof_n, as shown in Figure 6-23. When
receiving non-Memory and non-I/O transactions. The signal trn_rbar_hit_n[6:0] is
undefined.
X-Ref Target - Figure 6-23
trn_clk
TLP1 TLP1 TLP1 TLP1 TLP1 TLP1 TLP2 TLP2 TLP2 TLP2
trn_rd[63:00] qw1 qw2 qw3 qw4 qw5 qw6 qw1 qw2 qw3 qw4
trn_rsof_n
trn_rrem_n
trn_reof_n
trn_rsrc_rdy_n
trn_rdst_rdy_n
trn_clk
trn_reset_n
trn_lnk_up_n
trn_rsof_n
trn_rrem_n
trn_reof_n
trn_rsrc_rdy_n
trn_rdst_rdy_n
trn_rsrc_dsc_n
+0 +1 +2 +3
7 6 5 4 3 2 10 7 6 5 4 3 2 1 0 7 6 54 3 2 1 0 7 6 5 4 3 2 10
When using the Transaction interface, packets are arranged on the entire 128-bit datapath.
Figure 6-26 shows the same example packet on the Transaction interface. Byte 0 of the
packet appears on trn_td[127:120] (outbound) or trn_rd[127:120] (inbound) of the first
DWORD, byte 1 on trn_td[119:112] or trn_rd[119:112], and so forth. The Header section of
the packet consists of either three or four DWORDs, determined by the TLP format and
type as described in section 2.2 of the PCI Express Base Specification.
X-Ref Target - Figure 6-26
Packets sent to the core for transmission must follow the formatting rules for Transaction
Layer Packets (TLPs) as specified in Chapter 2 of the PCI Express Base Specification. The
User Application is responsible for ensuring its packets’ validity, as the core does not check
packet validity or validate packets. The exact fields of a given TLP vary depending on the
type of packet being transmitted.
The core allows the User Application to add an extra level of error checking by using the
optional TLP Digest Field in the TLP header. The presence of a TLP Digest or ECRC is
indicated by the value of TD field in the TLP Header section. When TD=1, a correctly
computed CRC32 remainder is expected to be presented as the last DWORD of the packet.
The CRC32 remainder DWORD is not included in the length field of the TLP header. The
User Application must calculate and present the TLP Digest as part of the packet when
transmitting packets. Upon receiving packets with a TLP Digest present, the User
Application must check the validity of the CRC32 based on the contents of the packet. The
core does not check the TLP Digest for incoming packets. The PCI Express Base Specification
Table 6-3 lists the possible signaling for ending a multicycle packet. If a packet ends in the
upper QW of the data bus, the next packet cannot start in the lower QW of that beat. All
packets must start in the upper QW of the data bus. trn_trem_n[1] indicates whether the
EOF occurs in the upper or lower QW of the data bus.
Figure 6-27 illustrates a 3-DW TLP header without a data payload; an example is a 32-bit
addressable Memory Read request. When the User Application asserts trn_teof_n, it also
places a value of 01b on trn_trem_n[1:0], notifying the core that only trn_td[127:32]
contains valid data.
X-Ref Target - Figure 6-27
trn_clk
trn_td[127:0] H1 H2 H3 --
trn_tsof_n
trn_teof_n
trn_trem_n[1]
trn_trem_n[0]
trn_tsrc_rdy_n
trn_tdst_rdy_n
trn_terr_drop_n
Figure 6-28 illustrates a 4-DW TLP header without a data payload; an example is a 64-bit
addressable Memory Read request. When the User Application asserts trn_teof_n, it also
places a value of 00b on trn_trem_n[1:0] notifying the core that trn_td[127:0] contains valid
data and the EOF occurs in the lower QW.
X-Ref Target - Figure 6-28
trn_clk
trn_td[127:0] H1 H2 H3 H4
trn_tsof_n
trn_teof_n
trn_trem_n[1]
trn_trem_n[0]
trn_tsrc_rdy_n
trn_tdst_rdy_n
trn_terr_drop_n
trn_tsrc_dsc_n
Figure 6-29 illustrates a 3-DW TLP header with a data payload; an example is a 32-bit
addressable Memory Write request. When the User Application asserts trn_teof_n, it also
puts a value of 10b on trn_trem_n[1:0] notifying the core that trn_td[127:64] contains valid
data and the EOF occurs in the upper QW. The user must ensure the remainder field
selected for the final data cycle creates a packet of length equivalent to the length field in
the packet header.
X-Ref Target - Figure 6-29
trn_clk
trn_tsof_n
trn_eof_n
trn_trem_n[1]
trn_trem_n[0]
trn_tsrc_rdy_n
trn_tdst_rdy_n
trn_terr_drop_n
trn_tsrc_dsc_n
Figure 6-30 illustrates a 4-DW TLP header with a data payload; an example is a 64-bit
addressable Memory Write request. When the User Application asserts trn_teof_n, it also
places a value of 01b on trn_trem_n[1:0], notifying the core that only trn_td[127:64]
contains valid data. The user must ensure the remainder field is selected for the final data
cycle creates a packet of length equivalent to the length field in the packet header.
X-Ref Target - Figure 6-30
trn_clk
trn_tsof_n
trn_teof_n
trn_trem_n[1]
trn_trem_n[0]
trn_tsrc_rdy_n
trn_tdst_rdy_n
trn_tsrc_dsc_n
trn_terr_drop_n
trn_clk
trn_td[127:0] H1 H2 H3 H4 D1 D2 D3 -- H1 H2 H3 D1 D2 -- -- --
trn_tsof_n
trn_teof_n
trn_trem_n[1]
trn_trem_n[0]
trn_tsrc_rdy_n
trn_tdst_rdy_n
trn_tsrc_dsc_n
trn_terr_drop_n
UG517_c6_31_020210
trn_clk
trn_tsof_n
trn_teof_n
trn_trem_n[1]
trn_trem_n[0]
trn_tsrc_rdy_n
trn_tdst_rdy_n
trn_tsrc_dsc_n
trn_terr_drop_n
trn_clk
trn_tsof_n
trn_teof_n
trn_trem_n[1]
trn_trem_n[0]
trn_tsrc_rdy_n
trn_tdst_rdy_n
trn_tdst_dsc_n
trn_terr_drop_n
If the core transmit Transaction interface accepts the start of a TLP by asserting
trn_tdst_rdy_n, it is guaranteed to accept the complete TLP with a size up to the value
contained in the Max_Payload_Size field of the PCI Express Device Capability Register
(offset 04H). To stay compliant to the PCI Express Base Specification users should not violate
the Max_Payload_Size field of the PCI Express Device Control Register (offset 08H). The
core transmit Transaction interface deasserts trn_tdst_rdy_n only under these conditions:
• After it has accepted the TLP completely and has no buffer space available for a new
TLP.
• When the core is transmitting an internally generated TLP (configuration Completion
TLP, error Message TLP or error response as requested by the User Application on the
cfg_err interface).
On deassertion of trn_tdst_rdy_n by the core, the User Application needs to hold all
control and data signals until the core asserts trn_tdst_rdy_n.
The core transmit Transaction interface throttles the User Application when the Power
State field in Power Management Control/Status Register (offset 0x4) of the PCI Power
Management Capability Structure is changed to a non-D0 state. When this occurs, any
ongoing TLP is accepted completely and trn_tdst_rdy_n is subsequently deasserted,
disallowing the User Application from initiating any new transactions—for the duration
that the core is in the non-D0 power state.
trn_clk
trn_tsof_n
trn_teof_n
trn_trem_n[1]
trn_trem_n[0]
trn_tsrc_rdy_n
trn_tdst_rdy_n
trn_tsrc_dsc_n
trn_terr_drop_n
trn_clk
trn_tsof_n
trn_teof_n
trn_trem_n[1]
trn_trem_n[0]
trn_tsrc_rdy_n
trn_tdst_rdy_n
trn_tsrc_dsc_n
trn_terr_drop_n
trn_terrfwd_n
UG517_c6_35_020210
trn_clk
trn_tsof_n
trn_teof_n
trn_trem_n[1]
trn_trem_n[0]
trn_tsrc_rdy_n
trn_tdst_rdy_n
trn_tstr_n
trn_terr_drop_n
trn_terrfwd_n
trn_clk
trn_td[127:0] Q1 Q2 Q3 Q4
Q13 Q14
Q15 Q16 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12
Q13 Q14
Q15 Q16
trn_tsof_n
trn_teof_n
trn_trem_n[1]
trn_trem_n[0]
trn_tsrc_rdy_n
trn_tdst_rdy_n
trn_tstr_n
trn_terr_drop_n
UG517_c6_37_020810
Note: Source Driven Transaction Discontinue (assertion of trn_tsrc_dsc_n) is not supported when
in the streaming mode of operation.
Table 6-5 lists the possible signaling for ending a multicycle packet. If a packet ends in the
upper QW of the data bus, the next packet can start in the lower QW of that beat.
trn_trem_n[1] indicates whether the EOF occurs in the upper or lower QW of the data bus.
Table 6-6 lists the possible signaling for a straddled data transfer beat. A straddled data
transfer beat occurs when one packet ends in the upper QW and a new packet starts in the
lower QW of the same cycle. Straddled data transfers only occur in the receive direction. A
packet can start in the lower QW without having a packet in the upper QW.
Figure 6-38 shows a 3-DW TLP header without a data payload; an example is a 32-bit
addressable Memory Read request. When the core asserts trn_reof_n, it also places a value
of 01b on trn_rrem_n, notifying the user that only trn_rd[127:32] contains valid data.
X-Ref Target - Figure 6-38
trn_clk
trn_rd[127:0] H1 H2 H3 -
trn_rsof_n
trn_reof_n
trn_rrem_n[1]
trn_rrem_n[0]
trn_rsrc_rdy_n
trn_rdst_rdy_n
trn_rsrc_dsc_n
trn_rerrfwd_n
trn_rnp_ok_n
UG517_c6_38_020210
Figure 6-39 shows a 4-DW TLP header without a data payload; an example is a 64-bit
addressable Memory Read request. When the core asserts trn_reof_n, it also places a value
of 00b on trn_rrem_n, notifying the user that trn_rd[127:0] contains valid data.
X-Ref Target - Figure 6-39
trn_clk
trn_rd[127:0] H1 H2 H3 H4
trn_rsof_n
trn_reof_n
trn_rrem_n[1]
trn_rrem_n[0]
trn_rsrc_rdy_n
trn_rdst_rdy_n
trn_rsrc_dsc_n
trn_rerrfwd_n
trn_rnp_ok_n
UG517_c6_39_020210
Figure 6-40 shows a 3-DW TLP header with a data payload; an example is a 32-bit
addressable Memory Write request. When the core asserts trn_reof_n, it also places a value
of 00b on trn_rrem_n, notifying the user that trn_rd[127:0] contains valid data.
X-Ref Target - Figure 6-40
trn_clk
trn_rd[127:00] H1 H2 H3 D1 D2 D3 D4 D5
trn_rsof_n
trn_reof_n
trn_rrem_n[1]
trn_rrem_n[0]
trn_rsrc_rdy_n
trn_rdst_rdy_n
trn_rsrc_dsc_n
trn_rerrfwd_n
trn_rnp_ok_n
Figure 6-41 shows a 4-DW TLP header with a data payload; an example is a 64-bit
addressable Memory Write request. When the core asserts trn_reof_n, it also places a value
of 11b on trn_rrem_n, notifying the user that only trn_rd[127:96] contains valid data.
X-Ref Target - Figure 6-41
trn_clk
trn_rd[127:00] H1 H2 H3 H4 D1 D2 D3 D4 D5 - - -
trn_rsof_n
trn_reof_n
trn_rrem_n[1]
trn_rrem_n[0]
trn_rsrc_rdy_n
trn_rdst_rdy_n
trn_rsrc_dsc_n
trn_rerrfwd_n
trn_rnp_ok_n
trn_clk
trn_rsof_n
trn_reof_n
trn_rrem_n[1]
trn_rrem_n[0]
trn_rsrc_rdy_n
trn_rdst_rdy_n
trn_rsrc_dsc_n
trn_rerrfwd_n
trn_rnp_ok_n
trn_clk
trn_rd[127:0] QW1 QW2 QW3 QW4 QW5 QW6 QW1 QW2 QW3 QW4
trn_rsof_n
trn_reof_n
trn_rrem_n[1]
trn_rrem_n[0]
trn_rsrc_rdy_n
trn_rdst_rdy_n
trn_rsrc_dsc_n
trn_rerrfwd_n
trn_rnp_ok_n
UG517_c6_43_020210
If the User Application cannot accept back-to-back packets, it can stall the transfer of the
TLP by deasserting trn_rdst_rdy_n as discussed in the Throttling the Datapath on the
Receive Transaction Interface section. Figure 6-44 shows an example of using
trn_rdst_rdy_n to pause the acceptance of the second TLP.
X-Ref Target - Figure 6-44
trn_clk
trn_rsof_n
trn_reof_n
trn_rrem_n[1]
trn_rrem_n[0]
trn_rsrc_rdy_n
trn_rdst_rdy_n
trn_rsrc_dsc_n
UG517_c6_44_020210
trn_clk
trn_rd[127:00] H1 H2 H3 D1 D2 -- H1 H2 H3 -- -- --
trn_rsof_n
trn_reof_n
trn_rrem_n[1]
trn_rrem_n[0]
trn_rsrc_rdy_n
trn_rdst_rdy_n
trn_reof_n = 0
trn_rrem_n[1] = 1
trn_rrem_n[0] = 1
In Figure 6-45, the first packet is a 3 DW packet with 64 bits of data and the second packet
is a 3 DW packet which begins on the lower QWORD portion of the bus. In the figure,
assertion of trn_reof_n and trn_rrem_n[1] = 1'b1 indicates that the EOF of the previous
occurs in bits [127:64]. Simultaneous deassertion of trn_rrem_n[0] (1'b1) indicates that
only bits [127:96] are valid.
trn_clk
trn_rd[127:0] H1 H2 H3 H4 H1 H2 H3 H4 H1 H2 H3 D0
trn_rsof_n
trn_rrem_n[1]
trn_rrem_n[0]
trn_reof_n
trn_rsrc_rdy_n
trn_rdst_rdy_n
trn_rsrc_dsc_n
trn_rerrfwd_n
trn_rnp_ok_n
trn_rbar_hit_n[6:0]
UG517_c6_46_020210
Packet re-ordering allows the User Application to optimize the rate at which Non-Posted
TLPs are processed, while continuing to receive and process Posted and Completion TLPs
in a non-blocking fashion. The trn_rnp_ok_n signaling restrictions require that the User
Application be able to receive and buffer at least three Non-Posted TLPs. This algorithm
describes the process of managing the Non-Posted TLP buffers:
trn_clk
trn_rsof_n
trn_reof_n
trn_rrem_n[1]
trn_rrem_n[0]
trn_rsrc_rdy_n
trn_rdst_rdy_n
trn_rsrc_dsc_n
trn_rerrfwd_n
trn_rnp_ok_n
If the TLP Digest bit field in the TLP header is set (TD=1), the TLP contains an End-to-End
CRC (ECRC). The core performs these operations based on how the user configured the
core during core generation:
• If the Trim TLP Digest option is on, the core removes and discards the ECRC field
from the received TLP and clears the TLP Digest bit in the TLP header.
• If the Trim TLP Digest option is off, the core does not remove the ECRC field from the
received TLP and presents the entire TLP including TLP Digest to the User
Application receiver interface.
See Chapter 5, Generating and Customizing the Core, for more information about how to
enable the Trim TLP Digest option during core generation.
1 1 14h
2 2 18h
3 3 1Ch
4 4 20h
5 5 24h
For a Memory or I/O TLP Transaction on the receive interface, trn_rbar_hit_n[6:0] is valid
for the entire TLP, starting with the assertion of trn_rsof_n, as shown in Figure 6-48. For
straddled data transfer beats, trn_rbar_hit_n corresponds to the new packet (packet
corresponding to the trn_rsof_n). When receiving non-Memory and non-I/O transactions,
trn_rbar_hit_n[6:0] is undefined.
X-Ref Target - Figure 6-48
trn_clk
trn_rd[127:00] QW1 QW2 QW3 QW4 QW5 QW6 QW1 QW2 QW3 QW4 QW5 QW6
trn_rsof_n
trn_reof_n
trn_rrem_n[1]
trn_rrem_n[0]
trn_rsrc_rdy_n
trn_rdst_rdy_n
trn_clk
trn_reset_n
trn_lnk_up_n
trn_rsof_n
trn_reof_n
trn_rrem_n[1]
trn_rrem_n[0]
trn_rsrc_rdy_n
trn_rdst_rdy_n
trn_rsrc_dsc_n
Notes:
1. The TLP is indicated on the cfg_msg* interface and also appears on the trn_r* interface only if enabled in the GUI.
Transmit Buffers
The Endpoint for PCIe transmit Transaction interface provides trn_tbuf_av, an
instantaneous indication of the number of Max_Payload_Size buffers available for use in
the transmit buffer pool. Table 6-10 defines the number of transmit buffers available and
maximum supported payload size for a specific core.
Each buffer can hold one maximum sized TLP. A maximum sized TLP is a TLP with a
4-DWORD header plus a data payload equal to the MAX_PAYLOAD_SIZE of the core (as
defined in the Device Capability register) plus a TLP Digest. After the link is trained, the
root complex sets the MAX_PAYLOAD_SIZE value in the Device Control register. This
value is equal to or less than the value advertised by the core’s Device Capability register.
For more information about these registers, see section 7.8 of the PCI Express Base
Specification. A TLP is held in the transmit buffer of the core until the link partner
acknowledges receipt of the packet, at which time the buffer is released and a new TLP can
be loaded into it by the User Application.
For example, if the Capability Max Payload Size selected for the Endpoint core is 256 bytes,
and the performance level selected is high, there are 29 total transmit buffers. Each of these
buffers can hold at a maximum one 64-bit Memory Write Request (4 DWORD header) plus
256 bytes of data (64 DWORDs) plus TLP Digest (1 DWORD) for a total of 69 DWORDs.
This example assumes the root complex set the MAX_PAYLOAD_SIZE register of the
Device Control register to 256 bytes, which is the maximum capability advertised by this
core. For this reason, at any given time, this core could have 29 of these 69 DWORD TLPs
awaiting transmittal. There is no sharing of buffers among multiple TLPs, so even if user is
sending smaller TLPs such as 32-bit Memory Read request with no TLP Digest totaling
3 DWORDs only per TLP, each transmit buffer still holds only one TLP at any time.
The internal transmit buffers are shared between the User Application and the core's
configuration management module (CMM). Due to this, the trn_tbuf_av bus can fluctuate
even if the User Application is not transmitting packets. The CMM generates completion
TLPs in response to configuration reads or writes, interrupt TLPs at the request of the User
Application, and message TLPs when needed.
The Transmit Buffers Available indication enables the User Application to completely
utilize the PCI transaction ordering feature of the core transmitter. The transaction
ordering rules allow for Posted and Completion TLPs to bypass Non-Posted TLPs. See
section 2.4 of the PCI Express Base Specification for more information about ordering rules.
The core supports the transaction ordering rules and promotes Posted and Completion
packets ahead of blocked Non-Posted TLPs. Non-Posted TLPs can become blocked if the
link partner is in a state where it momentarily has no Non-Posted receive buffers available,
which it advertises through Flow Control updates. In this case, the core promotes
Completion and Posted TLPs ahead of these blocked Non-Posted TLPs. However, this can
only occur if the Completion or Posted TLP has been loaded into the core by the User
Application. By monitoring the trn_tbuf_av bus, the User Application can ensure there is
at least one free buffer available for any Completion or Posted TLP. Promotion of
Completion and Posted TLPs only occurs when Non-Posted TLPs are blocked; otherwise
packets are sent on the link in the order they are received from the User Application.
Six different types of flow control information can be read by the user application. The
trn_fc_sel[2:0] input selects the type of flow control information represented by the
trn_fc_* outputs. The Flow Control Information Types are shown in Table 6-12.
trn_fc_sel[2:0] can be changed on every clock cycle to indicate a different Flow Control
Information Type. There is a two clock-cycle delay between the value of trn_fc_sel[2:0]
changing and the corresponding Flow Control Information Type being presented on the
trn_fc_* outputs for the 64-bit interface and a four clock cycle delay for the 128-bit
interface. Figure 6-50 and Figure 6-51 illustrate the timing of the Flow Control Credits
signals for the 64-bit and 128-bit interfaces, respectively.
X-Ref Target - Figure 6-50
trn_clk
UG517_c6_50_020910
trn_clk
UG517_c6_51_020910
The output values of the trn_fc_* signals represent credit values as defined in the PCI
Express Base Specification. One Header Credit is equal to either a 3 or 4 DWORD TLP
Header and one Data Credit is equal to 16 bytes of payload data. Initial credit information
is available immediately after trn_lnk_up_n assertion, but before the reception of any TLP.
Table 6-13 defines the possible values presented on the trn_fc_* signals. Initial credit
information varies depending on the size of the receive buffers within the integrated block
and the Link Partner.
Notes:
1. Only Transmit Credits Available Space indicate Negative or Infinite credits available.
No trn_lnk_up_n = 0b
and
pl_ltssm_state[5:0] = L0
Yes
Assign target_link_width[1:0]
No
target_link_width[1:0] != pl_sel_link_width[1:0]
Yes
Yes No
pl_link_upcfg_capable == 1b
Unsupported
Operation
pl_directed_link_width[1:0] = target_link_width[1:0]
pl_directed_link_change[1:0] = 01b
((pl_ltssm_state[5:0] == Configuration.Idle) || No
(trn_lnk_up_n == 1b))
Yes
pl_directed_link_change[1:0] = 00b
Change Complete
No trn_lnk_up_n = 0b
and
pl_ltssm_state[5:0] = L0
Yes
Assign target_link_speed
No
target_link_speed != pl_sel_link_rate
Yes
pl_directed_link_speed = target_link_speed
pl_directed_link_change[1:0] = 10b
((pl_ltssm_state[5:0] == Recovery.Idle) || No
(trn_lnk_up_n == 1b))
Yes
pl_directed_link_change[1:0] = 00b
Change Complete
No trn_lnk_up_n = 0b
and
pl_ltssm_state[5:0] = L0
Yes
Assign target_link_width[1:0]
Assign target_link_speed
No
(target_link_width[1:0] != pl_sel_link_width[1:0])
&&
(target_link_speed != pl_sel_link_rate)
Yes
Yes No
pl_link_upcfg_capable == 1b
Unsupported
Operation
pl_directed_link_width[1:0] = target_link_width[1:0]
pl_directed_link_speed = target_link_speed
pl_directed_link_change[1:0] = 11b
((pl_ltssm_state[5:0] == Configuration.Idle) || No
(trn_lnk_up_n == 1b))
Yes
pl_directed_link_change[1:0] = 00b
Change Complete
Table 6-14: Command and Status Registers Mapped to the Configuration Port
Port Name Direction Description
cfg_bus_number[7:0] Output Bus Number: Default value after reset is 00h.
Refreshed whenever a Type 0 Configuration Write
packet is received.
cfg_device_number[4:0] Output Device Number: Default value after reset is
00000b. Refreshed whenever a Type 0
Configuration Write packet is received.
cfg_function_number[2:0] Output Function Number: Function number of the core,
hardwired to 000b.
cfg_status[15:0] Output Status Register: Status register from the
Configuration Space Header. Not supported.
cfg_command[15:0] Output Command Register: Command register from the
Configuration Space Header.
cfg_dstatus[15:0] Output Device Status Register: Device status register from
the PCI Express Capability Structure.
cfg_dcommand[15:0] Output Device Command Register: Device control register
from the PCI Express Capability Structure.
cfg_dcommand2[15:0] Output Device Command 2 Register: Device control 2
register from the PCI Express Capability Structure.
cfg_lstatus[15:0] Output Link Status Register: Link status register from the
PCI Express Capability Structure.
cfg_lcommand[15:0] Output Link Command Register: Link control register
from the PCI Express Capability Structure.
cfg_status[15:0]
This output bus is not supported. If the user wishes to retrieve this information, this can be
derived by Read access of the Configuration Space in the Virtex-6 FPGA Integrated Block
for PCI Express via the Configuration Port.
cfg_command[15:0]
This bus reflects the value stored in the Command register in the PCI Configuration Space
Header. Table 6-15 provides the definitions for each bit in this bus. See the PCI Express Base
Specification for detailed information.
The User Application must monitor the Bus Master Enable bit (cfg_command[2]) and
refrain from transmitting requests while this bit is not set. This requirement applies only to
requests; completions can be transmitted regardless of this bit.
cfg_dstatus[15:0]
This bus reflects the value stored in the Device Status register of the PCI Express
Capabilities Structure. Table 6-16 defines each bit in the cfg_dstatus bus. See the PCI
Express Base Specification for detailed information.
cfg_dcommand[15:0]
This bus reflects the value stored in the Device Control register of the PCI Express
Capabilities Structure. Table 6-17 defines each bit in the cfg_dcommand bus. See the PCI
Express Base Specification for detailed information.
Notes:
1. During L1 negotiation, the user should not trigger a link retrain by writing a 1 to cfg_lcommand[5]. L1
negotiation can be observed by monitoring the cfg_pcie_link_state_n port.
cfg_lstatus[15:0]
This bus reflects the value stored in the Link Status register in the PCI Express Capabilities
Structure. Table 6-18 defines each bit in the cfg_lstatus bus. See the PCI Express Base
Specification for details.
Table 6-18: Bit Mapping of PCI Express Link Status Register
Bit Name
cfg_lstatus[15] Link Autonomous Bandwidth Status
cfg_lstatus[14] Link Bandwidth Management Status
cfg_lstatus[13] Data Link Layer Link Active
cfg_lstatus[12] Slot Clock Configuration
cfg_lstatus[11] Link Training
cfg_lstatus[10] Reserved
cfg_lstatus[9:4] Negotiated Link Width
cfg_lstatus[3:0] Current Link Speed
cfg_lcommand[15:0]
This bus reflects the value stored in the Link Control register of the PCI Express
Capabilities Structure. Table 6-19 provides the definition of each bit in cfg_lcommand bus.
See the PCI Express Base Specification, rev. 2.0 for more details.
cfg_dcommand2[15:0]
This bus reflects the value stored in the Device Control 2 register of the PCI Express
Capabilities Structure. Table 6-20 defines each bit in the cfg_dcommand bus. See the PCI
Express Base Specification for detailed information.
trn_clk
cfg_dwaddr [9:0] A0 A1
cfg_rd_en_n
cfg_wr_en_n
cfg_do [31:0] D0 D1
cfg_rd_wr_done_n
Configuration Space registers which are defined as “RW” by the PCI Local Bus
Specification and PCI Express Base Specification are writable via the Configuration
Management Interface. To write a register in this address space, the User Application
drives the register DWORD address onto cfg_dwaddr[9:0] and the data onto cfg_di[31:0].
This data is further qualified by cfg_byte_en_n[3:0], which validates the bytes of data
presented on cfg_di[31:0]. These signals should be held asserted until cfg_rd_wr_done_n is
asserted. Figure 6-56 illustrates an example with two consecutive writes to the
Configuration Space, the first write with the User Application writing to all 32 bits of data,
and the second write with the User Application selectively writing to only bits [23:26].
Note: Writing to the Configuration Space could have adverse system side effects. Users should
ensure these writes do not negatively impact the overall system functionality.
X-Ref Target - Figure 6-56
trn_clk
cfg_wr_en_n
cfg_rd_en_n
cfg_rd_wr_done_n
The rest of the PCI Express Extended Configuration Space is optionally available for the
Users to implement.
31 0 Byte Offset
Next Capability Offset Capability Version = 1h PCI Express extended capability = 000Bh 00h
For example, to implement address range 0xC0 to 0xCF, there are several address ranges
defined that should be treated differently depending on the access. See Table 6-31 for more
details on this example.
Table 6-32: Min Start Addresses of the User Implemented Extended Capabilities
All Three
No Capabilities DSN DSN and DSN and
Capabilities
Selected Only VC VSEC
Selected
Starting byte address
100 h 10C h 128 h 124 h 140 h
available
The Virtex-6 FPGA Integrated Block for PCI Express allows the user to select the start
address of the user implemented PCI Express Extended Configuration Space. This space
must be implemented in the User Application. The User Application is required to
generate a CplD with 0x00000000 for Configuration Read and successful Cpl for
Configuration Write to addresses in this selected range not implemented in the user
application.
The user can choose to implement a Configuration Space with a start address other than
that allowed by the Virtex-6 FPGA Integrated Block for PCI Express. In such a case, the
core returns a completion with 0x00000000 for configuration accesses to the region that
the user has chosen to not implement. Table 6-33 further illustrates this scenario.
Generation of Completions
The Integrated Block core does not generate Completions for Memory Reads or I/O
requests made by a remote device. The user is expected to service these completions
according to the rules specified in the PCI Express Base Specification.
Error Types
The User Application triggers six types of errors using the signals defined in Table 2-21,
page 48.
• End-to-end CRC ECRC Error
• Unsupported Request Error
• Completion Timeout Error
• Unexpected Completion Error
• Completer Abort Error
• Correctable Error
Multiple errors can be detected in the same received packet; for example, the same packet
can be an Unsupported Request and have an ECRC error. If this happens, only one error
should be reported. Because all user-reported errors have the same severity, the User
Application design can determine which error to report. The cfg_err_posted_n signal,
combined with the appropriate error reporting signal, indicates what type of
error-reporting packets are transmitted. The user can signal only one error per clock cycle.
See Figure 6-58, Figure 6-59, and Figure 6-60, and Table 6-34 and Table 6-35.
The User Application must ensure that the device is in a D0 Power state prior to reporting
any errors via the cfg_err_ interface. The User Application can ensure this by checking that
the PMCSR PowerState (cfg_pmcsr_pme_powerstate[1:0]) is set to 2'b00. If the
PowerState is not set to 2'b00 (the core is in a non-D0 power state) and PME_EN
cfg_pmcsr_pme_en is asserted (1'b1), then the user can assert (pulse) cfg_pm_wake_n
and wait for the Root to set the PMCSR PowerState bits to 2'b00. If the PowerState
(cfg_pmcsr_pme_powerstate) is not equal to 2'b00 and PME_EN cfg_pmcsr_pme_en is
deasserted (1'b0), the user must wait for the Root to set the PowerState to 2'b00.
Table 6-35: Possible Error Conditions for TLPs Received by the User Application
Possible Error Condition Error Qualifying Signal Status
Memory
✓ X N/A ✓ X 0 No
Write
Memory
✓ ✓ N/A ✓ X 1 Yes
Read
Completion X X N/A ✓ ✓ 0 No
Notes:
1. A checkmark indicates a possible error condition for a given TLP type. For example, users can signal Unsupported Request or ECRC Error
for a Memory Write TLP, if these errors are detected. An X indicates not a valid error condition for a given TLP type. For example, users
should never signal Completion Abort in response to a Memory Write TLP.
clk
cfg_err_ur_n
cfg_err_cpl_rdy_n
cfg_err_locked_n
cfg_err_posted_n
cfg_dcommand[3]
trn_clk
cfg_err_ur_n
cfg_err_posted_n
cfg_err_tlp_cpl_header[47:0] header
cfg_dcommand[1]
clk
cfg_err_ur_n
cfg_err_cpl_rdy_n
cfg_err_locked_n
cfg_err_posted_n
cfg_dcommand[3]
Figure 6-60: Signaling Locked Unsupported Request for Locked Non-Posted TLP
Completion Timeouts
The Integrated Block core does not implement Completion timers; for this reason, the User
Application must track how long its pending Non-Posted Requests have each been
waiting for a Completion and trigger timeouts on them accordingly. The core has no
method of knowing when such a timeout has occurred, and for this reason does not filter
out inbound Completions for expired requests.
If a request times out, the User Application must assert cfg_err_cpl_timeout_n, which
causes an error message to be sent to the Root Complex. If a Completion is later received
after a request times out, the User Application must treat it as an Unexpected Completion.
Unexpected Completions
The Integrated Block core automatically reports Unexpected Completions in response to
inbound Completions whose Requestor ID is different than the Endpoint ID programmed
in the Configuration Space. These completions are not passed to the User Application. The
current version of the core regards an Unexpected Completion to be an Advisory
Non-Fatal Error (ANFE), and no message is sent.
Completer Abort
If the User Application is unable to transmit a normal Completion in response to a
Non-Posted Request it receives, it must signal cfg_err_cpl_abort_n. The cfg_err_posted_n
signal can also be set to 1 simultaneously to indicate Non-Posted and the appropriate
request information placed on cfg_err_tlp_cpl_header[47:0]. This sends a Completion with
non-Successful status to the original Requester, but does not send an Error Message. When
in Legacy mode if the cfg_err_locked_n signal is set to 0 (to indicate the transaction causing
the error was a locked transaction), a Completion Locked with Non-Successful status is
sent. If the cfg_err_posted_n signal is set to 0 (to indicate a Posted transaction), no
Completion is sent, but a Non-Fatal Error Message is sent (if enabled).
Unsupported Request
If the User Application receives an inbound Request it does not support or recognize, it
must assert cfg_err_ur_n to signal an Unsupported Request. The cfg_err_posted_n signal
must also be asserted or deasserted depending on whether the packet in question is a
Posted or Non-Posted Request. If the packet is Posted, a Non-Fatal Error Message is sent
out (if enabled); if the packet is Non-Posted, a Completion with a non-Successful status is
sent to the original Requester. When in Legacy mode if the cfg_err_locked_n signal is set to
0 (to indicate the transaction causing the error was a locked transaction), a Completion
Locked with Unsupported Request status is sent.
The Unsupported Request condition can occur for several reasons, including:
• An inbound Memory Write packet violates the User Application's programming
model, for example, if the User Application has been allotted a 4 KB address space but
only uses 3 KB, and the inbound packet addresses the unused portion.
Note: If this occurs on a Non-Posted Request, the User Application should use
cfg_err_cpl_abort_n to flag the error.
• An inbound packet uses a packet Type not supported by the User Application, for
example, an I/O request to a memory-only device.
ECRC Error
The Integrated Block core does not check the ECRC field for validity. If the User
Application chooses to check this field, and finds the CRC is in error, it can assert
cfg_err_ecrc_n, causing a Non-Fatal Error Message to be sent.
Power Management
The Integrated Block core supports these power management modes:
• Active State Power Management (ASPM)
• Programmed Power Management (PPM)
Implementing these power management functions as part of the PCI Express design
enables the PCI Express hierarchy to seamlessly exchange power-management messages
to save system power. All power management message identification functions are
implemented. The subsections below describe the user logic definition to support the
above modes of power management.
For additional information on ASPM and PPM implementation, see the PCI Express Base
Specification.
PPM L0 State
The L0 state represents normal operation and is transparent to the user logic. The core
reaches the L0 (active state) after a successful initialization and training of the PCI Express
Link(s) as per the protocol.
PPM L1 State
These steps outline the transition of the core to the PPM L1 state:
1. The transition to a lower power PPM L1 state is always initiated by an upstream
device, by programming the PCI Express device power state to D3-hot (or to D1 or D2
if they are supported).
2. The device power state is communicated to the user logic through the
cfg_pmcsr_powerstate[1:0] output.
3. The core then throttles/stalls the user logic from initiating any new transactions on the
user interface by deasserting trn_tdst_rdy_n. Any pending transactions on the user
interface are, however, accepted fully and can be completed later.
There are two exceptions to this rule:
• The core is configured as an Endpoint and the User Configuration Space is
enabled. In this situation, the user must refrain from sending new Request TLPs if
cfg_pmcsr_powerstate[1:0] indicates non-D0, but the user can return Completions
to Configuration transactions targeting User Configuration space.
• The core is configured as a Root Port. To be compliant in this situation, the user
should refrain from sending new Requests if cfg_pmcsr_powerstate[1:0] indicates
non-D0.
4. The core exchanges appropriate power management DLLPs with its link partner to
successfully transition the link to a lower power PPM L1 state. This action is
transparent to the user logic.
5. All user transactions are stalled for the duration of time when the device power state is
non-D0, with the exceptions indicated in step 3.
Note: The user logic, after identifying the device power state as non-D0, can initiate a request
through the cfg_pm_wake_n to the upstream link partner to configure the device back to the D0
power state. If the upstream link partner has not configured the device to allow the generation of
PM_PME messages (cfg_pmcsr_pme_en = 0), the assertion of cfg_pm_wake_n is ignored by the
core.
PPM L3 State
These steps outline the transition of the Endpoint for PCI Express to the PPM L3 state:
1. The core negotiates a transition to the L23 Ready Link State upon receiving a
PME_Turn_Off message from the upstream link partner.
2. Upon receiving a PME_Turn_Off message, the core initiates a handshake with the user
logic through cfg_to_turnoff_n (see Table 6-36) and expects a cfg_turnoff_ok_n back
from the user logic.
3. A successful handshake results in a transmission of the Power Management Turn-off
Acknowledge (PME-turnoff_ack) Message by the core to its upstream link partner.
4. The core closes all its interfaces, disables the Physical/Data-Link/Transaction layers
and is ready for removal of power to the core.
There are two exceptions to this rule:
• The core is configured as an Endpoint and the User Configuration Space is
enabled. In this situation, the user must refrain from sending new Request TLPs if
cfg_pmcsr_powerstate[1:0] indicates non-D0, but the user can return Completions
to Configuration transactions targeting User Configuration space.
• The core is configured as a Root Port. TO be compliant in this situation, the user
should refrain from sending new Requests if cfg_pmcsr_powerstate[1:0] indicates
non-D0.
Table 6-36: Power Management Handshaking Signals
Port Name Direction Description
cfg_to_turnoff_n Output Asserted if a power-down request TLP is received from
the upstream device. After assertion, cfg_to_turnoff_n
remains asserted until the user asserts
cfg_turnoff_ok_n.
cfg_turnoff_ok_n Input Asserted by the User Application when it is safe to
power down.
trn_clk
cfg_to_turnoff_n
cfg_turnoff_ok_n
The MSI Enable bit in the MSI control register, the MSI-X Enable bit in the MSI-X Control
Register, and the Interrupt Disable bit in the PCI Command register are programmed by
the Root Complex. The User Application has no direct control over these bits.
The Internal Interrupt Controller in the Virtex-6 FPGA Integrated Block for PCI Express
core only generates Legacy Interrupts and MSI Interrupts. MSI-X Interrupts need to be
generated by the User Application and presented on the TRN TX Interface. The status of
The User Application requests interrupt service in one of two ways, each of which are
described below.
trn_clk
cfg_interrupt_msienable
cfg_interrupt_n
Legacy
cfg_interrupt_di INTA INTA
Mode
cfg_interrupt_assert_n
cfg_interrupt_rdy_n
cfg_interrupt_msienable
MSI cfg_interrupt_n
Mode
cfg_interrupt_di 01h 07h
cfg_interrupt_rdy_n
MSI Mode
• As shown in Figure 6-62, the User Application first asserts cfg_interrupt_n.
Additionally the User Application supplies a value on cfg_interrupt_di[7:0] if
Multi-Vector MSI is enabled (see below).
• The core asserts cfg_interrupt_rdy_n to signal that the interrupt has been accepted
and the core sends a MSI Memory Write TLP. On the following clock cycle, the User
Application deasserts cfg_interrupt_n if no further interrupts are to be sent.
The MSI request is either a 32-bit addressable Memory Write TLP or a 64-bit addressable
Memory Write TLP. The address is taken from the Message Address and Message Upper
Address fields of the MSI Capability Structure, while the payload is taken from the
Message Data field. These values are programmed by system software through
configuration writes to the MSI Capability structure. When the core is configured for
Multi-Vector MSI, system software can permit Multi-Vector MSI messages by
programming a non-zero value to the Multiple Message Enable field.
The type of MSI TLP sent (32-bit addressable or 64-bit addressable) depends on the value
of the Upper Address field in the MSI capability structure. By default, MSI messages are
sent as 32-bit addressable Memory Write TLPs. MSI messages use 64-bit addressable
Memory Write TLPs only if the system software programs a non-zero value into the Upper
Address register.
When Multi-Vector MSI messages are enabled, the User Application can override one or
more of the lower-order bits in the Message Data field of each transmitted MSI TLP to
differentiate between the various MSI messages sent upstream. The number of lower-order
bits in the Message Data field available to the User Application is determined by the lesser
of the value of the Multiple Message Capable field, as set in the CORE Generator software,
and the Multiple Message Enable field, as set by system software and available as the
MSI-X Mode
The Virtex-6 FPGA Integrated Block for PCI Express optionally supports the MSI-X
Capability Structure. The MSI-X vector table and the MSI-X Pending Bit Array need to be
implemented as part of the user’s logic, by claiming a BAR aperture.
If the cfg_interrupt_msixenable output of the core is asserted, the User Application should
compose and present the MSI-X interrupts on the TRN TX Interface.
When the 8-lane core is connected to a device that only implements 4 lanes, it trains and
operates as a 4-lane device using lanes 0-3. Additionally, if the connected device only
implements 1 or 2 lanes, the 8-lane core trains and operates as a 1- or 2-lane device.
X-Ref Target - Figure 6-63
Figure 6-63: Scaling of 4-Lane Endpoint Core from 4-Lane to 1-Lane Operation
Lane Reversal
The integrated Endpoint block supports limited lane reversal capabilities and therefore
provides flexibility in the design of the board for the link partner. The link partner can
choose to lay out the board with reversed lane numbers and the integrated Endpoint block
continues to link train successfully and operate normally. The configurations that have
lane reversal support are x8 and x4 (excluding downshift modes). Downshift refers to the
link width negotiation process that occurs when link partners have different lane width
capabilities advertised. As a result of lane width negotiation, the link partners negotiate
down to the smaller of the two advertised lane widths. Table 6-40 describes the several
possible combinations including downshift modes and availability of lane reversal
support.
Reset
The Virtex-6 FPGA Integrated Block for PCI Express core uses sys_reset_n to reset the
system, an asynchronous, active-Low reset signal asserted during the PCI Express
Fundamental Reset. Asserting this signal causes a hard reset of the entire core, including
the GTX transceivers. After the reset is released, the core attempts to link train and resume
normal operation. In a typical endpoint application, for example, an add-in card, a
sideband reset signal is normally present and should be connected to sys_reset_n. For
Endpoint applications that do not have a sideband system reset signal, the initial hardware
reset should be generated locally. Three reset events can occur in PCI Express:
• Cold Reset. A Fundamental Reset that occurs at the application of power. The signal
sys_reset_n is asserted to cause the cold reset of the core.
• Warm Reset. A Fundamental Reset triggered by hardware without the removal and
re-application of power. The sys_reset_n signal is asserted to cause the warm reset to
the core.
• Hot Reset: In-band propagation of a reset across the PCI Express Link through the
protocol. In this case, sys_reset_n is not used. In the case of Hot Reset, the
received_hot_reset signal is asserted to indicate the source of the reset.
The User Application interface of the core has an output signal called trn_reset_n. This
signal is deasserted synchronously with respect to trn_clk. trn_reset_n is asserted as a
result of any of these conditions:
• Fundamental Reset: Occurs (cold or warm) due to assertion of sys_reset_n.
• PLL within the core wrapper: Loses lock, indicating an issue with the stability of the
clock input.
• Loss of Transceiver PLL Lock: Any transceiver loses lock, indicating an issue with the
PCI Express Link.
The trn_reset_n signal deasserts synchronously with trn_clk after all of the above reasons
are resolved, allowing the core to attempt to train and resume normal operation.
Important Note: Systems designed to the PCI Express electro-mechanical specification
provide a sideband reset signal, which uses 3.3V signaling levels—see the FPGA device
data sheet to understand the requirements for interfacing to such signals.
Clocking
The Integrated Block input system clock signal is called sys_clk. The core requires a
100 MHz,125 MHz, or 250 MHz clock input. The clock frequency used must match the
clock frequency selection in the CORE Generator software GUI. For more information, see
Answer Record 18329.
In a typical PCI Express solution, the PCI Express reference clock is a Spread Spectrum
Clock (SSC), provided at 100 MHz. In most commercial PCI Express systems, SSC cannot
be disabled. For more information regarding SSC and PCI Express, see section 4.3.1.1.1 of
the PCI Express Base Specification.
Figure 6-64 through Figure 6-67 illustrate high-level representations of the board
layouts. Designers must ensure that proper coupling, termination, and so forth are
used when laying out the board.
X-Ref Target - Figure 6-64
100 MHz
100 MHz
PCI Express
Clock Oscillator
UG517_c6_64_021010
UG517_c6_65_021010
Virtex-6 FPGA
Endpoint
100 MHz with SSC
PCI Express Clock GTX
Transceivers
PCIe Link
PCIe Link
PCI Express Connector
PCIe Link
PCIe Link
+ _
UG517_c6_66_021010
Figure 6-66: Open System Add-In Card Using 100 MHz Reference Clock
X-Ref Target - Figure 6-67
+ Virtex-6 FPGA
External PLL 125/250 MHz Endpoint
-
GTX
Transceivers
+ -
PCIe Link
PCIe Link
UG517_c6_67_021010
Figure 6-67: Open System Add-In Card Using 125/250 MHz Reference Clock
drp_clk
drp_den
drp_drdy
drp_dwe
drp_daddr[8:0] bb
drp_di[15:0] BB
drp_do[15:0]
drp_dclk
drp_den
drp_drdy
drp_dwe
drp_daddr[8:0] AA
drp_di[15:0]
drp_do[15:0] AA
Table 6-41: DRP Address Map for PCIE_2_0 Library Element Attributes
Data Bits
Address
Attribute Name drp_di[15:0] or
drp_daddr[8:0]
drp_do[15:0]
AER_CAP_ECRC_CHECK_CAPABLE 0x000 [0]
AER_CAP_ECRC_GEN_CAPABLE 0x000 [1]
AER_CAP_ID[15:0] 0x001 [15:0]
AER_CAP_INT_MSG_NUM_MSI[4:0] 0x002 [4:0]
AER_CAP_INT_MSG_NUM_MSIX[4:0] 0x002 [9:5]
AER_CAP_PERMIT_ROOTERR_UPDATE 0x002 [10]
AER_CAP_VERSION[3:0] 0x002 [14:11]
AER_BASE_PTR[11:0] 0x003 [11:0]
AER_CAP_NEXTPTR[11:0] 0x004 [11:0]
AER_CAP_ON 0x004 [12]
BAR0[15:0] 0x005 [15:0]
BAR0[31:16] 0x006 [15:0]
BAR1[15:0] 0x007 [15:0]
BAR1[31:16] 0x008 [15:0]
BAR2[15:0] 0x009 [15:0]
BAR2[31:16] 0x00a [15:0]
BAR3[15:0] 0x00b [15:0]
BAR3[31:16] 0x00c [15:0]
BAR4[15:0] 0x00d [15:0]
BAR4[31:16] 0x00e [15:0]
BAR5[15:0] 0x00f [15:0]
BAR5[31:16] 0x010 [15:0]
EXPANSION_ROM[15:0] 0x011 [15:0]
EXPANSION_ROM[31:16] 0x012 [15:0]
CAPABILITIES_PTR[7:0] 0x013 [7:0]
CARDBUS_CIS_POINTER[15:0] 0x014 [15:0]
CARDBUS_CIS_POINTER[31:16] 0x015 [15:0]
Table 6-41: DRP Address Map for PCIE_2_0 Library Element Attributes (Cont’d)
Data Bits
Address
Attribute Name drp_di[15:0] or
drp_daddr[8:0]
drp_do[15:0]
CLASS_CODE[15:0] 0x016 [15:0]
CLASS_CODE[23:16] 0x017 [7:0]
CMD_INTX_IMPLEMENTED 0x017 [8]
CPL_TIMEOUT_DISABLE_SUPPORTED 0x017 [9]
CPL_TIMEOUT_RANGES_SUPPORTED[3:0] 0x017 [13:10]
DEV_CAP_ENABLE_SLOT_PWR_LIMIT_SCALE 0x017 [14]
DEV_CAP_ENABLE_SLOT_PWR_LIMIT_VALUE 0x017 [15]
DEV_CAP_ENDPOINT_L0S_LATENCY[2:0] 0x018 [2:0]
DEV_CAP_ENDPOINT_L1_LATENCY[2:0] 0x018 [5:3]
DEV_CAP_EXT_TAG_SUPPORTED 0x018 [6]
DEV_CAP_FUNCTION_LEVEL_RESET_CAPABLE 0x018 [7]
DEV_CAP_MAX_PAYLOAD_SUPPORTED[2:0] 0x018 [10:8]
DEV_CAP_PHANTOM_FUNCTIONS_SUPPORT[1:0] 0x018 [12:11]
DEV_CAP_ROLE_BASED_ERROR 0x018 [13]
DEV_CAP_RSVD_14_12[2:0] 0x019 [2:0]
DEV_CAP_RSVD_17_16[1:0] 0x019 [4:3]
DEV_CAP_RSVD_31_29[2:0] 0x019 [7:5]
DEV_CONTROL_AUX_POWER_SUPPORTED 0x019 [8]
DEVICE_ID[15:0] 0x01a [15:0]
DSN_BASE_PTR[11:0] 0x01b [11:0]
DSN_CAP_ID[15:0] 0x01c [15:0]
DSN_CAP_NEXTPTR[11:0] 0x01d [11:0]
DSN_CAP_ON 0x01d [12]
DSN_CAP_VERSION[3:0] 0x01e [3:0]
EXT_CFG_CAP_PTR[5:0] 0x01e [9:4]
EXT_CFG_XP_CAP_PTR[9:0] 0x01f [9:0]
HEADER_TYPE[7:0] 0x020 [7:0]
INTERRUPT_PIN[7:0] 0x020 [15:8]
IS_SWITCH 0x021 [0]
LAST_CONFIG_DWORD[9:0] 0x021 [10:1]
LINK_CAP_ASPM_SUPPORT[1:0] 0x021 [12:11]
LINK_CAP_CLOCK_POWER_MANAGEMENT 0x021 [13]
Table 6-41: DRP Address Map for PCIE_2_0 Library Element Attributes (Cont’d)
Data Bits
Address
Attribute Name drp_di[15:0] or
drp_daddr[8:0]
drp_do[15:0]
LINK_CAP_DLL_LINK_ACTIVE_REPORTING_CAP 0x021 [14]
LINK_CAP_L0S_EXIT_LATENCY_COMCLK_GEN1[2:0] 0x022 [2:0]
LINK_CAP_L0S_EXIT_LATENCY_COMCLK_GEN2[2:0] 0x022 [5:3]
LINK_CAP_L0S_EXIT_LATENCY_GEN1[2:0] 0x022 [8:6]
LINK_CAP_L0S_EXIT_LATENCY_GEN2[2:0] 0x022 [11:9]
LINK_CAP_L1_EXIT_LATENCY_COMCLK_GEN1[2:0] 0x022 [14:12]
LINK_CAP_L1_EXIT_LATENCY_COMCLK_GEN2[2:0] 0x023 [2:0]
LINK_CAP_L1_EXIT_LATENCY_GEN1[2:0] 0x023 [5:3]
LINK_CAP_L1_EXIT_LATENCY_GEN2[2:0] 0x023 [8:6]
LINK_CAP_LINK_BANDWIDTH_NOTIFICATION_CAP 0x023 [9]
LINK_CAP_MAX_LINK_SPEED[3:0] 0x023 [13:10]
LINK_CAP_RSVD_23_22[1:0] 0x023 [15:14]
LINK_CAP_SURPRISE_DOWN_ERROR_CAPABLE 0x024 [0]
LINK_CONTROL_RCB 0x024 [1]
LINK_CTRL2_DEEMPHASIS 0x024 [2]
LINK_CTRL2_HW_AUTONOMOUS_SPEED_DISABLE 0x024 [3]
LINK_CTRL2_TARGET_LINK_SPEED[3:0] 0x024 [7:4]
LINK_STATUS_SLOT_CLOCK_CONFIG 0x024 [8]
MSI_BASE_PTR[7:0] 0x025 [7:0]
MSI_CAP_64_BIT_ADDR_CAPABLE 0x025 [8]
MSI_CAP_ID[7:0] 0x026 [7:0]
MSI_CAP_MULTIMSG_EXTENSION 0x026 [8]
MSI_CAP_MULTIMSGCAP[2:0] 0x026 [11:9]
MSI_CAP_NEXTPTR[7:0] 0x027 [7:0]
MSI_CAP_ON 0x027 [8]
MSI_CAP_PER_VECTOR_MASKING_CAPABLE 0x027 [9]
MSIX_BASE_PTR[7:0] 0x028 [7:0]
MSIX_CAP_ID[7:0] 0x028 [15:8]
MSIX_CAP_NEXTPTR[7:0] 0x029 [7:0]
MSIX_CAP_ON 0x029 [8]
MSIX_CAP_PBA_BIR[2:0] 0x029 [11:9]
MSIX_CAP_PBA_OFFSET[15:0] 0x02a [15:0]
Table 6-41: DRP Address Map for PCIE_2_0 Library Element Attributes (Cont’d)
Data Bits
Address
Attribute Name drp_di[15:0] or
drp_daddr[8:0]
drp_do[15:0]
MSIX_CAP_PBA_OFFSET[28:16] 0x02b [12:0]
MSIX_CAP_TABLE_BIR[2:0] 0x02b [15:13]
MSIX_CAP_TABLE_OFFSET[15:0] 0x02c [15:0]
MSIX_CAP_TABLE_OFFSET[28:16] 0x02d [12:0]
MSIX_CAP_TABLE_SIZE[10:0] 0x02e [10:0]
PCIE_BASE_PTR[7:0] 0x02f [7:0]
PCIE_CAP_CAPABILITY_ID[7:0] 0x02f [15:8]
PCIE_CAP_CAPABILITY_VERSION[3:0] 0x030 [3:0]
PCIE_CAP_DEVICE_PORT_TYPE[3:0] 0x030 [7:4]
PCIE_CAP_INT_MSG_NUM[4:0] 0x030 [12:8]
PCIE_CAP_NEXTPTR[7:0] 0x031 [7:0]
PCIE_CAP_ON 0x031 [8]
PCIE_CAP_RSVD_15_14[1:0] 0x031 [10:9]
PCIE_CAP_SLOT_IMPLEMENTED 0x031 [11]
PCIE_REVISION[3:0] 0x031 [15:12]
PM_BASE_PTR[7:0] 0x032 [7:0]
PM_CAP_AUXCURRENT[2:0] 0x032 [10:8]
PM_CAP_D1SUPPORT 0x032 [11]
PM_CAP_D2SUPPORT 0x032 [12]
PM_CAP_DSI 0x032 [13]
PM_CAP_ID[7:0] 0x033 [7:0]
PM_CAP_NEXTPTR[7:0] 0x033 [15:8]
PM_CAP_ON 0x034 [0]
PM_CAP_PME_CLOCK 0x034 [1]
PM_CAP_PMESUPPORT[4:0] 0x034 [6:2]
PM_CAP_RSVD_04 0x034 [7]
PM_CAP_VERSION[2:0] 0x034 [10:8]
PM_CSR_B2B3 0x034 [11]
PM_CSR_BPCCEN 0x034 [12]
PM_CSR_NOSOFTRST 0x034 [13]
PM_DATA_SCALE0[1:0] 0x034 [15:14]
PM_DATA_SCALE1[1:0] 0x035 [1:0]
Table 6-41: DRP Address Map for PCIE_2_0 Library Element Attributes (Cont’d)
Data Bits
Address
Attribute Name drp_di[15:0] or
drp_daddr[8:0]
drp_do[15:0]
PM_DATA_SCALE2[1:0] 0x035 [3:2]
PM_DATA_SCALE3[1:0] 0x035 [5:4]
PM_DATA_SCALE4[1:0] 0x035 [7:6]
PM_DATA_SCALE5[1:0] 0x035 [9:8]
PM_DATA_SCALE6[1:0] 0x035 [11:10]
PM_DATA_SCALE7[1:0] 0x035 [13:12]
PM_DATA0[7:0] 0x036 [7:0]
PM_DATA1[7:0] 0x036 [15:8]
PM_DATA2[7:0] 0x037 [7:0]
PM_DATA3[7:0] 0x037 [15:8]
PM_DATA4[7:0] 0x038 [7:0]
PM_DATA5[7:0] 0x038 [15:8]
PM_DATA6[7:0] 0x039 [7:0]
PM_DATA7[7:0] 0x039 [15:8]
REVISION_ID[7:0] 0x03a [7:0]
ROOT_CAP_CRS_SW_VISIBILITY 0x03a [8]
SELECT_DLL_IF 0x03a [9]
SLOT_CAP_ATT_BUTTON_PRESENT 0x03a [10]
SLOT_CAP_ATT_INDICATOR_PRESENT 0x03a [11]
SLOT_CAP_ELEC_INTERLOCK_PRESENT 0x03a [12]
SLOT_CAP_HOTPLUG_CAPABLE 0x03a [13]
SLOT_CAP_HOTPLUG_SURPRISE 0x03a [14]
SLOT_CAP_MRL_SENSOR_PRESENT 0x03a [15]
SLOT_CAP_NO_CMD_COMPLETED_SUPPORT 0x03b [0]
SLOT_CAP_PHYSICAL_SLOT_NUM[12:0] 0x03b [13:1]
SLOT_CAP_POWER_CONTROLLER_PRESENT 0x03b [14]
SLOT_CAP_POWER_INDICATOR_PRESENT 0x03b [15]
SLOT_CAP_SLOT_POWER_LIMIT_SCALE[1:0] 0x03c [1:0]
SLOT_CAP_SLOT_POWER_LIMIT_VALUE[7:0] 0x03c [9:2]
SUBSYSTEM_ID[15:0] 0x03d [15:0]
SUBSYSTEM_VENDOR_ID[15:0] 0x03e [15:0]
VC_BASE_PTR[11:0] 0x03f [11:0]
Table 6-41: DRP Address Map for PCIE_2_0 Library Element Attributes (Cont’d)
Data Bits
Address
Attribute Name drp_di[15:0] or
drp_daddr[8:0]
drp_do[15:0]
VC_CAP_NEXTPTR[11:0] 0x040 [11:0]
VC_CAP_ON 0x040 [12]
VC_CAP_ID[15:0] 0x041 [15:0]
VC_CAP_REJECT_SNOOP_TRANSACTIONS 0x042 [0]
VENDOR_ID[15:0] 0x043 [15:0]
VSEC_BASE_PTR[11:0] 0x044 [11:0]
VSEC_CAP_HDR_ID[15:0] 0x045 [15:0]
VSEC_CAP_HDR_LENGTH[11:0] 0x046 [11:0]
VSEC_CAP_HDR_REVISION[3:0] 0x046 [15:12]
VSEC_CAP_ID[15:0] 0x047 [15:0]
VSEC_CAP_IS_LINK_VISIBLE 0x048 [0]
VSEC_CAP_NEXTPTR[11:0] 0x048 [12:1]
VSEC_CAP_ON 0x048 [13]
VSEC_CAP_VERSION[3:0] 0x049 [3:0]
USER_CLK_FREQ[2:0] 0x049 [6:4]
CRM_MODULE_RSTS[6:0] 0x049 [13:7]
LL_ACK_TIMEOUT[14:0] 0x04a [14:0]
LL_ACK_TIMEOUT_EN 0x04a [15]
LL_ACK_TIMEOUT_FUNC[1:0] 0x04b [1:0]
LL_REPLAY_TIMEOUT[14:0] 0x04c [14:0]
LL_REPLAY_TIMEOUT_EN 0x04c [15]
LL_REPLAY_TIMEOUT_FUNC[1:0] 0x04d [1:0]
DISABLE_LANE_REVERSAL 0x04d [2]
DISABLE_SCRAMBLING 0x04d [3]
ENTER_RVRY_EI_L0 0x04d [4]
INFER_EI[4:0] 0x04d [9:5]
LINK_CAP_MAX_LINK_WIDTH[5:0] 0x04d [15:10]
LTSSM_MAX_LINK_WIDTH[5:0] 0x04e [5:0]
N_FTS_COMCLK_GEN1[7:0] 0x04e [13:6]
N_FTS_COMCLK_GEN2[7:0] 0x04f [7:0]
N_FTS_GEN1[7:0] 0x04f [15:8]
N_FTS_GEN2[7:0] 0x050 [7:0]
Table 6-41: DRP Address Map for PCIE_2_0 Library Element Attributes (Cont’d)
Data Bits
Address
Attribute Name drp_di[15:0] or
drp_daddr[8:0]
drp_do[15:0]
ALLOW_X8_GEN2 0x050 [8]
PL_AUTO_CONFIG[2:0] 0x050 [11:9]
PL_FAST_TRAIN 0x050 [12]
UPCONFIG_CAPABLE 0x050 [13]
UPSTREAM_FACING 0x050 [14]
EXIT_LOOPBACK_ON_EI 0x050 [15]
DNSTREAM_LINK_NUM[7:0] 0x051 [7:0]
DISABLE_ASPM_L1_TIMER 0x051 [8]
DISABLE_BAR_FILTERING 0x051 [9]
DISABLE_ID_CHECK 0x051 [10]
DISABLE_RX_TC_FILTER 0x051 [11]
ENABLE_MSG_ROUTE[10:0] 0x052 [10:0]
ENABLE_RX_TD_ECRC_TRIM 0x052 [11]
TL_RX_RAM_RADDR_LATENCY 0x052 [12]
TL_RX_RAM_RDATA_LATENCY[1:0] 0x052 [14:13]
TL_RX_RAM_WRITE_LATENCY 0x052 [15]
TL_TFC_DISABLE 0x053 [0]
TL_TX_CHECKS_DISABLE 0x053 [1]
TL_RBYPASS 0x053 [2]
TL_TX_RAM_RADDR_LATENCY 0x053 [3]
TL_TX_RAM_RDATA_LATENCY[1:0] 0x053 [5:4]
TL_TX_RAM_WRITE_LATENCY 0x053 [6]
VC_CAP_VERSION[3:0] 0x053 [10:7]
VC0_CPL_INFINITE 0x053 [11]
VC0_RX_RAM_LIMIT[12:0] 0x054 [12:0]
VC0_TOTAL_CREDITS_CD[10:0] 0x055 [10:0]
VC0_TOTAL_CREDITS_CH[6:0] 0x056 [6:0]
VC0_TOTAL_CREDITS_NPH[6:0] 0x056 [13:7]
VC0_TOTAL_CREDITS_PD[10:0] 0x057 [10:0]
VC0_TOTAL_CREDITS_PH[6:0] 0x058 [6:0]
VC0_TX_LASTPACKET[4:0] 0x058 [11:7]
RECRC_CHK[1:0] 0x058 [13:12]
Table 6-41: DRP Address Map for PCIE_2_0 Library Element Attributes (Cont’d)
Data Bits
Address
Attribute Name drp_di[15:0] or
drp_daddr[8:0]
drp_do[15:0]
RECRC_CHK_TRIM 0x058 [14]
UR_INV_REQ 0x058 [15]
PGL0_LANE[2:0] 0x059 [2:0]
PGL1_LANE[2:0] 0x059 [5:3]
PGL2_LANE[2:0] 0x059 [8:6]
PGL3_LANE[2:0] 0x059 [11:9]
PGL4_LANE[2:0] 0x059 [14:12]
PGL5_LANE[2:0] 0x05a [2:0]
PGL6_LANE[2:0] 0x05a [5:3]
PGL7_LANE[2:0] 0x05a [8:6]
TEST_MODE_PIN_CHAR 0x05a [9]
Core Constraints
The Virtex®-6 FPGA Integrated Block for PCI Express® solution requires the specification
of timing and other physical implementation constraints to meet specified performance
requirements for PCI Express. These constraints are provided with the Endpoint and Root
Port solutions in a User Constraints File (UCF). Pinouts and hierarchy names in the
generated UCF correspond to the provided example design.
To achieve consistent implementation results, a UCF containing these original, unmodified
constraints must be used when a design is run through the Xilinx tools. For additional
details on the definition and use of a UCF or specific constraints, see the Xilinx® Libraries
Guide and/or Development System Reference Guide.
Constraints provided with Integrated Block solution have been tested in hardware and
provide consistent results. Constraints can be modified, but modifications should only be
made with a thorough understanding of the effect of each constraint. Additionally, support
is not provided for designs that deviate from the provided constraints.
Required Modifications
Several constraints provided in the UCF utilize hierarchical paths to elements within the
Integrated Block. These constraints assume an instance name of core for the core. If a
different instance name is used, replace core with the actual instance name in all
hierarchical constraints.
For example:
Using xilinx_pcie_ep as the instance name, the physical constraint
INST "core/pcie_2_0_i/pcie_gt_i/gtx_v6_i/GTXD[0].GTX"
LOC = GTXE1_X0Y15;
becomes
INST "xilinx_pci_ep/pcie_2_0_i/pcie_gt_i/gtx_v6_i/GTXD[0].GTX"
LOC = GTXE1_X0Y15;
The provided UCF includes blank sections for constraining user-implemented logic. While
the constraints provided adequately constrain the Integrated Block core itself, they cannot
adequately constrain user-implemented logic interfaced to the core. Additional constraints
must be implemented by the designer.
Device Selection
The device selection portion of the UCF informs the implementation tools which part,
package, and speed grade to target for the design. Because Integrated Block cores are
designed for specific part and package combinations, this section should not be modified
by the designer.
The device selection section always contains a part selection line, but can also contain part
or package-specific options. An example part selection line:
CONFIG PART = XC6VLX240T-FF1156-1
Notes:
1. 8-lane Endpoint configuration at 5.0 Gb/s with 512 byte MPS is not supported on this PCIe block
location.
2. High performance level for MPS settings 512 bytes and 1024 bytes is not supported for this PCIe Block
location on this device.
3. 8-lane configuration is not supported for this PCIe block location on this device.
FPGA Configuration
This chapter discusses how to configure the Virtex®-6 FPGA so that the device can link up
and be recognized by the system. This information is provided for the user to choose the
correct FPGA configuration method for the system and verify that it will work as expected.
This chapter discusses how specific requirements of the PCI Express Base Specification and
PCI Express Card Electromechanical Specification apply to FPGA configuration. Where
appropriate, Xilinx recommends that the user read the actual specifications for detailed
information. This chapter is divided into four sections:
• Configuration Terminology: Defines terms used in this chapter.
• Configuration Access Time. Several specification items govern when an Endpoint
device needs to be ready to receive configuration accesses from the host (Root
Complex).
• Board Power in Real-World Systems. Understanding real-world system constraints
related to board power and how they affect the specification requirements.
• Recommendations. Describes methods for FPGA configuration and includes sample
problem analysis for FPGA configuration timing issues.
Configuration Terminology
In this chapter, these terms are used to differentiate between FPGA configuration and
configuration of the PCI Express device:
• Configuration of the FPGA. FPGA configuration is used.
• Configuration of the PCI Express device. After the link is active, configuration is used.
• If a device is not ready and does not respond to configuration requests, the root
complex does not discover it and treats it as non-existent.
• The operating system will not report the device's existence and the user's application
will not be able to communicate with the device.
Choosing the appropriate FPGA configuration method is key to ensuring the device is able
to communicate with the system in time to achieve link up and respond to the
configuration accesses.
Power Stable
3.3 Vaux
3.3V/12V
PERST#
100 ms
T
PVPERL
UG517_c8_01_030410
Section 2.6.2 of the PCI Express Card Electromechanical Specification, v1.1 defines TPVPREL as
a minimum of 100 ms, indicating that from the time power is stable the system reset is
asserted for at least 100 ms (as shown in Table 8-1).
From Figure 8-1 and Table 8-1, it is possible to obtain a simple equation to define the FPGA
configuration time as follows:
FPGA Configuration Time ≤ TPWRVLD + TPVPERL Equation 8-1
Given that TPVPERL is defined as 100 ms minimum, this becomes:
FPGA Configuration Time ≤ TPWRVLD + 100 ms Equation 8-2
Note: Although TPWRVLD is included in Equation 8-2, it has yet to be defined in this discussion
because it depends on the type of system in use. The Board Power in Real-World Systems section
defines TPWRVLD for both ATX-based and non ATX-based systems.
FPGA configuration time is only relevant at cold boot; subsequent warm or hot resets do
not cause reconfiguration of the FPGA. If the design appears to be having problems due to
FPGA configuration, the user should issue a warm reset as a simple test, which resets the
system, including the PCI Express link, but keeps the board powered. If the problem does
not appear, the issue could be FPGA configuration time related.
T1
VAC
PS_ON#
95%
+12VDC
+5VDC
+3.3VDC } O/P's
10%
T2
T3
PWR_OK
T4
T1 = Power On Time (T1 < 500 ms)
T2 = Risetime (0.1 ms <= T2 <= 20 ms)
T3 = PWR_OK Delay (100 ms < T3 < 500 ms)
T4 = PWR_OK risetime (T4 <= 10 ms)
Figure 8-2: ATX Power Supply
Figure 8-2 shows that power is actually valid before PWR_OK is asserted High. This is
represented by T3 and is the PWR_OK delay. The ATX 12V Power Supply Design Guide
defines PWR_OK as 100 ms < T3 < 500 ms, indicating the following: From the point at
which the power level reaches 95% of nominal, there is a minimum of at least 100 ms but
no more than 500 ms of delay before PWR_OK is asserted. Remember, according to the PCI
Express Card Electromechanical Specification, the PERST# is guaranteed to be asserted a
minimum of 100 ms from when power is stable indicated in an ATX system by the
assertion of PWR_OK.
Again, the FPGA configuration time equation is:
FPGA Configuration Time ≤ TPWRVLD + 100 ms Equation 8-3
TPWRVLD is defined as PWR_OK delay period; that is, TPWRVLD represents the amount of
time that power is valid in the system before PWR_OK is asserted. This time can be added
to the amount of time the FPGA has to configure. The minimum values of T2 and T4 are
negligible and considered zero for purposes of these calculations. For ATX-based
motherboards, which represent the majority of real-world motherboards in use, TPWRVLD
can be defined as:
100 ms ≤ TPWRVLD ≤ 500 ms Equation 8-4
This provides the following requirement for FPGA configuration time in both ATX and
non-ATX-based motherboards:
• FPGA Configuration Time ≤ 200 ms (for ATX based motherboard)
• FPGA Configuration Time ≤ 100 ms (for non-ATX based motherboard)
The second equation for the non-ATX based motherboards assumes a TPWRVLD value of
0 ms because it is not defined in this context. Designers with non-ATX based motherboards
should evaluate their own power supply design to obtain a value for TPWRVLD.
This chapter assumes that the FPGA power (VCCINT) is stable before or at the same time
that PWR_OK is asserted. If this is not the case, then additional time must be subtracted
from the available time for FPGA configuration. Xilinx recommends to avoid designing
add-in cards with staggered voltage regulators with long delays.
Recommendations
Xilinx recommends using the Platform Flash XL High-Density Storage and Configuration
Device (XCF128X) in Slave Map x16 Mode with a CCLK frequency of 50 MHz, which
allows time for FPGA configuration on any Virtex-6 FPGA in ATX-based motherboards.
Other valid configuration options are represented by green cells in Table 8-2 and Table 8-3
depending on the type of system in use. This section discusses these recommendations and
includes sample analysis of potential problems that might arise during FPGA
configuration.
3. Wait for assertion of DONE, the actual time required for a bitstream to transfer, and
depends on:
• Bitstream size
• Clock frequency
• Transfer mode used in the Flash Device
- SPI = Serial Peripheral Interface
- BPI = Byte Peripheral Interface
- PFP = Platform Flash PROMs
For detailed information about the configuration process, see the Virtex-6 FPGA
Configuration User Guide.
Table 8-2 and Table 8-3 show the comparative data for all Virtex-6 FPGA LXT, and SXT
devices with respect to a variety of flash devices and programming modes. The default
clock rate for configuring the device is always 2 MHz. Any reference to a different clock
rate implies a change in the settings of the device being used to program the FPGA. The
configuration clock (CCLK), when driven by the FPGA, has variation and is not exact. See
the Virtex-6 FPGA Configuration Guide for more information on CCLK tolerances.
Table 8-2: Configuration Time Matrix (ATX Motherboards): Virtex-6 FPGA Bitstream Transfer Time (ms)
XCF128X XCF128X(4)
BPIx16(2)
Virtex-6 FPGA Bitstream (Bits) SPIx1(1) PFPx8(3) (Master- (Slave-
(Page mode)
BPIx16) SMAPx16)
XC6VLX75T 26,239,328 525 240 100 137 33
XC6VLX130T 43,719,776 875 399 166 228 55
XC6VLX195T 61,552,736 1232 562 234 321 77
XC6VLX240T 73,859,552 1478 674 280 385 93
XC6VLX365T 96,067,808 1922 876 364 501 121
XC6VSX315T 104,465,888 2090 953 396 545 131
XC6VLX550T 144,092,384 2882 1314 546 751 (5)
GREEN: Bitstream Transfer Time + FPGA INIT Time (50 ms) ≤ 200 ms.
YELLOW: Bitstream Transfer Time + FPGA INIT Time (50 ms) > 200 ms
Notes:
1. SPI flash assumptions: 50 MHz maximum.
2. BPIx16 assumptions: P30 4-page read with 4-cycle first page read (4-1-1-1), maximum configuration time.
3. PFP assumptions: 33 MHz maximum.
4. XCF128X Slave-SMAPx16 assumptions: CCLK=50 MHz.
5. The XC6VLX550T and XC6VSX475T devices will not fit into a single XCF128X Platform Flash - a Dual Flash solution is required and
is currently under development.
Table 8-3: Configuration Time Matrix (Generic Platforms: Non-ATX Motherboards): Virtex-6 FPGA
Bitstream Transfer Time (ms)
XCF128X XCF128X(4)
BPIx16(2)
Virtex-6 FPGA Bitstream (Bits) SPIx1(1) PFPx8(3) (Master- (Slave-
(Page mode)
BPIx16) SMAPx16)
XC6VLX75T 26,239,328 525 240 100 137 33
XC6VLX130T 43,719,776 875 399 166 228 55
XC6VLX195T 61,552,736 1232 562 234 321 77
XC6VLX240T 73,859,552 1478 674 280 385 93
XC6VLX365T 96,067,808 1922 876 364 501 121
XC6VSX315T 104,465,888 2090 953 396 545 131
XC6VLX550T 144,092,384 2882 1314 546 751 (5)
GREEN: Bitstream Transfer Time + FPGA INIT Time (50 ms) ≤ 100 ms.
YELLOW: Bitstream Transfer Time + FPGA INIT Time (50 ms) > 100 ms
Notes:
1. SPI flash assumptions: 50 MHz maximum.
2. BPIx16 assumptions: P30 4-page read with 4-cycle first page read (4-1-1-1), maximum configuration time.
3. PFP assumptions: 33 MHz maximum.
4. XCF128X Slave-SMAPx16 assumptions: CCLK=50 MHz.
5. The XC6VLX550T and XC6VSX475T devices will not fit into a single XCF128X Platform Flash. A Dual Flash solution will be
required and is currently under development.
Figure 8-4: Fast Configuration Time on LX50T Device (50 MHz Clock)
Known Restrictions
This chapter describes several restrictions or issues where the integrated block deviates
from the PCI Base Specification, v2.0 or in cases where the specification is ambiguous. All
issues listed in this chapter are considered low impact and are not a concern for most
applications. The Comments sections describe where the associated problem might occur
so that designers can decide quickly if further investigation is needed.
Area of Impact
Link Training
Detailed Description
During a speed change, a bit error in symbol 4 of a TS2 ordered set (the Data Rate
Identifier) can cause the integrated block to erroneously move to LTSSM state Detect, in
turn causing a link-down condition. In addition to the bit error occurring during symbol 4,
the error must also occur:
• During any of the last eight TS2 ordered sets, and
• Before the link partner transitions to electrical idle from the LTSSM state
Recovery.Speed
Comments
The probability of this error occurring is extremely small.
Bit errors must occur on the link for this condition to appear. Bit errors on the link are
typically an indication of poor signal integrity or other severe disturbances on the link, and
which might lead to other issues such as poor data throughput. Additionally, the bit error
must occur at a very precise moment in time, as discussed above, in order for a Link-down
condition to occur.
This issue was discovered during targeted error-injection testing of the integrated block. It
has not been seen in any interoperability testing as of the publication of this document.
This issue does not affect designs that operate at 2.5 Gb/s data rates only.
There are no known workarounds for this issue. Designers should maintain good signal
integrity and BER on the link to avoid this issue.
Area of Impact
Link Training
Detailed Description
During link training, a bit error on bit 2 of symbol 4 of a TS2 ordered set (Data Rate
Supported) can cause an unsuccessful speed negotiation. In addition to the bit error
occurring during the specific symbol and bit, this bit error must also occur during the last
received TS2 in the LTSSM state Recovery.RcvrCfg before transitioning to Recovery.Speed.
If this error occurs, the speed change from 2.5 Gb/s to 5.0 Gb/s line rate fails, and the link
remains at the 2.5 Gb/s rate.
Comments
The probability of this error occurring is extremely small.
Bit errors must occur on the link for this condition to appear. Bit errors on the link are
typically an indication of poor signal integrity or other severe disturbances on the link, and
might lead to other issues such as poor data throughput. Additionally, the bit error must
occur at a very specific moment in time, as discussed in the Detailed Description, for an
unsuccessful speed negotiation to occur.
This issue was discovered during targeted error-injection testing of the integrated block.
This issue has not been seen in any interoperability testing as of the publication of this
document. This issue does not affect designs that operate at 2.5 Gb/s data rates only.
Users can attempt a directed speed change to work around this issue.
Area of Impact
Root Port Configuration Space
Detailed Description
The integrated block does not always set Link Status[14] (the Link Bandwidth
Management Status bit) in the PCI Configuration space when it should. Specifically, it does
not set this bit if hardware has changed the link speed or width to attempt to correct
unreliable link operation.
Comments
This issue only affects Root Port configurations.
Currently there is no pre-engineered workaround for this issue. Contact Xilinx Technical
Support for options if Link Bandwidth Management is necessary for the design.
Area of Impact
Physical Layer; Root Port only
Detailed Description
When configured as a Root Port, the integrated block misinterprets bit 6 of symbol 4 of
received TS2s from the Downstream component when in the LTSSM Recovery state.
During Recovery, this bit is meant to indicate a directed versus autonomous speed or
width change. However, the integrated block interprets this bit as the Downstream
component’s preferred de-emphasis level.
Comments
Generally, preferred de-emphasis levels are a function of the channel and are known by the
designer of the link ahead of time. When this is the case, the de-emphasis level can be
statically set via the PLDOWNSTREAMDEMPHSOURCE attribute to 1b as a simple
workaround. This forces the Root Port transmitter to use the de-emphasis value from Link
Control 2 bit 12, and the de-emphasis value from the downstream component is ignored.
Currently, there is no pre-engineered workaround for this issue when the de-emphasis
level is not known ahead of time. Contact Xilinx Technical Support for options if
programmable de-emphasis is necessary for the design.
Area of Impact
Simulation of compliance mode (LTSSM state Polling.Compliance)
Detailed Description
If the integrated block LTSSM is in the Polling.Active state prior to entering the
Polling.Compliance state, and the PIPERXnELECIDLE signal on any receiver lane is
toggled during this time (indicating a change between electrical idle and active states on
the receiver lanes), the integrated block enters the Polling.Compliance state and sends the
modified compliance pattern even if the Enter Modified Compliance bit in the Link
Control 2 register is 0.
Comments
The PCI Express Base Specification 2.0 does not specify any requirements for the electrical
state of the receiver lanes while in the Polling.Active state.
This issue does not affect PCI-SIG compliance testing. The PCI-SIG compliance test fixtures
(CLB2.0 and CBB2.0) terminate all receivers and therefore statically drive electrical idle on
the receivers for the duration of the compliance test.
This issue can occur in simulation when using commercially available testbenches that
enter into the Polling.Compliance state. Some testbenches might switch between electrical
idle and active states on the receivers while in the Polling.Active state.
Area of Impact
Configuration Space
Detailed Description
The PCI Express Base Specification 2.0 defines reserved configuration space registers and bits
as read-only. However, all 32 bits of the MSI Mask register in the MSI Capability Structure
of the integrated block are read/write regardless if they are considered reserved.
For example, the MSI Message Control register bits [6:4] are 000b indicating only one MSI
vector is enabled. This indicates that only bit 0 in the MSI Mask register is read/write; bits
[31:1] should be read-only.
Comments
This issue only affects Endpoint configurations that use MSI.
There are no hardware-related side effects to writing the reserved bits in the MSI Mask
register.
System software effects are system dependent; however, it is unlikely that software will
react to these bits changing value. Users have the option to disable the MSI Mask/Pending
extension.
Hardware Verification
PCI Special Interest Group
Xilinx attends the PCI Special Interest Group (PCI-SIG®) Compliance Workshops to verify
the Integrated Block compliance and interoperability with various systems available on
the market and those that are not yet released. While Xilinx cannot list the actual systems
tested at the PCI-SIG Compliance Workshops due to requirements of the PCI-SIG by-laws,
Xilinx IP designers can view the PCI-SIG integrators list.
The integrators list confirms that Xilinx satisfies the PCI-SIG Compliance Program and that
the Integrated Block wrapper successfully interoperates with other available systems at the
PCI-SIG Compliance Workshop. Virtex®-6 FPGA entries can be found in the Components
and Add-In Cards sections under the company name Xilinx.
Hardware validation testing is performed at Xilinx for each release of the Integrated Block
core with the chipsets listed below. Hardware testing performed by Xilinx that is also
available to customers includes the PCIECV tool available from the PCI-SIG website, BMD
Design, available from XAPP1052, Bus Master DMA Performance Demonstration Reference
Design for the Xilinx Endpoint PCI Express Solutions, and the MET design, available from
XAPP1022, Using the Memory Endpoint Test Driver (MET) with the Programmed Input/Output
Example Design for PCI Express Endpoint Cores.
The hardware validation chipsets are:
• Intel x58
• Intel 5400
• Intel P55
• AMD 790
• AMD 780
• Intel Atom
• Intel x38
System Overview
The PIO design is a simple target-only application that interfaces with the Endpoint for
PCIe core’s Transaction (TRN) interface and is provided as a starting point for customers to
build their own designs. The following features are included:
• Four transaction-specific 2 KB target regions using the internal Xilinx® FPGA block
RAMs, providing a total target space of 8192 bytes
• Supports single DWORD payload Read and Write PCI Express transactions to
32-/64-bit address memory spaces and I/O space with support for completion TLPs
• Utilizes the core’s trn_rbar_hit_n[6:0] signals to differentiate between TLP destination
Base Address Registers
• Provides separate implementations optimized for 32-bit, 64-bit, and 128-bit TRN
interfaces
Figure A-1 illustrates the PCI Express system architecture components, consisting of a
Root Complex, a PCI Express switch device, and an Endpoint for PCIe. PIO operations
move data downstream from the Root Complex (CPU register) to the Endpoint, and/or
upstream from the Endpoint to the Root Complex (CPU register). In either case, the PCI
Express protocol request to move the data is initiated by the host CPU.
PCIe
Root Complex
CPU
Memory
Main Controller
Memory Device
PCI_BUS_0
PCIe
Port
PCI_BUS_1
PCIe
Switch
PCI_BUS_X
PCIe
Endpoint
Data is moved downstream when the CPU issues a store register to a MMIO address
command. The Root Complex typically generates a Memory Write TLP with the
appropriate MMIO location address, byte enables, and the register contents. The
transaction terminates when the Endpoint receives the Memory Write TLP and updates the
corresponding local register.
Data is moved upstream when the CPU issues a load register from a MMIO address
command. The Root Complex typically generates a Memory Read TLP with the
appropriate MMIO location address and byte enables. The Endpoint generates a
Completion with Data TLP once it receives the Memory Read TLP. The Completion is
steered to the Root Complex and payload is loaded into the target register, completing the
transaction.
PIO Hardware
The PIO design implements a 8192 byte target space in FPGA block RAM, behind the
Endpoint for PCIe. This 32-bit target space is accessible through single DWORD I/O Read,
I/O Write, Memory Read 64, Memory Write 64, Memory Read 32, and Memory Write 32
TLPs.
The PIO design generates a completion with 1 DWORD of payload in response to a valid
Memory Read 32 TLP, Memory Read 64 TLP, or I/O Read TLP request presented to it by
the core. In addition, the PIO design returns a completion without data with successful
status for I/O Write TLP request.
The PIO design processes a Memory or I/O Write TLP with 1 DWORD payload by
updating the payload into the target address in the FPGA block RAM space.
Based on the specific trn_rbar_hit_n[6:0] signal asserted, the RX state machine indicates to
the internal read request controller the appropriate 2 KB block RAM to use before asserting
the read enable request. For example, if a Memory Read 32 Request TLP is received by the
core targeting the default MEM32 BAR2, the core passes the TLP to the PIO design and
asserts trn_rbar_hit_n[2]. The RX State machine extracts the lower address bits from the
Memory 32 Read TLP and instructs the internal Memory Read Request controller to start a
read operation.
In this example, the assertion of trn_rbar_hit_n[2] instructs the PIO memory read
controller to access the Mem32 space, which by default represents 2 KB of memory space.
A notable difference in handling of memory write and read TLPs is the requirement of the
receiving device to return a Completion with Data TLP in the case of memory or I/O read
request.
While the read is being processed, the PIO design RX state machine deasserts
trn_rdst_rdy_n, causing the Receive TRN interface to stall receiving any further TLPs until
the internal Memory Read controller completes the read access from the block RAM and
generates the completion. Deasserting trn_rst_rdy_n in this way is not required for all
designs using the core. The PIO design uses this method to simplify the control logic of the
RX state machine.
Three configurations of the PIO design are provided: PIO_32, PIO_64, and PIO_128 with
32-, 64-, and 128-bit TRN interfaces, respectively. The PIO configuration generated
depends on the selected endpoint type (that is, Virtex-6 FPGA Integrated Block, PIPE, PCI
Express, and Block Plus) as well as the number of PCI Express lanes selected by the user.
Table A-3 identifies the PIO configuration generated based on the user’s selection.
Table A-3: PIO Configuration
Core x1 x2 x4 x8
Endpoint for PIPE PIO_32 NA NA NA
Endpoint for PCI Express PIO_32 NA PIO_64 PIO_64
Endpoint for PCI Express Block Plus PIO_64 NA PIO_64 PIO_64
Virtex-6 FPGA Integrated Block PIO_64 PIO_64 PIO_64 PIO_64,
PIO_128(1)
Spartan®-6 FPGA Integrated PIO_32 NA NA NA
Endpoint Block
Notes:
1. The PIO_128 configuration is only provided for the 128-bit x8 5.0 Gb/s core.
Figure A-2 shows the various components of the PIO design, which is separated into four
main parts: the TX Engine, RX Engine, Memory Access Controller, and Power
Management Turn-Off Controller.
X-Ref Target - Figure A-2
Virtex-6 FPGA Integrated Block for PCI Express Core (Configured as an Endpoint)
PIO_TO_CTRL
ep_mem0
ep_mem1
ep_mem3
EP_MEM
PIO_EP
PIO
UG517_aA_02_021210
PIO Application
Figure A-3 and Figure A-4 depict 64-bit and 32-bit PIO application top-level connectivity,
respectively. The datapath width, either 32 bits or 64 bits, depends on which Endpoint for
PCIe core is used The PIO_EP module contains the PIO FPGA block RAM modules and the
transmit and receive engines. The PIO_TO_CTRL module is the Endpoint Turn-Off
controller unit, which responds to power turn-off message from the host CPU with an
acknowledgment.
The PIO_EP module connects to the Endpoint Transaction (trn) and Configuration (cfg)
interfaces.
X-Ref Target - Figure A-3
Receive Path
Figure A-5 illustrates the PIO_32_RX_ENGINE and PIO_64_RX_ENGINE modules. The
datapath of the module must match the datapath of the core being used. These modules
connect with Endpoint for PCIe Transaction Receive (trn_r*) interface.
X-Ref Target - Figure A-5
PIO_32_Rx_Engine PIO_64_Rx_Engine
trn_rdst_rdy_n trn_rdst_rdy_n
req_compl_o req_compl_o
req_td_o req_td_o
wr_data_o[31:0] wr_data_o[31:0]
EP_Rx EP_Rx
The RX Engine parses 1 DWORD 32- and 64-bit addressable memory and I/O write
requests. The RX state machine extracts needed information from the TLP and passes it to
the memory controller, as defined in Table A-5.
The read datapath stops accepting new transactions from the core while the application is
processing the current TLP. This is accomplished by trn_rdst_rdy_n deassertion. For an
ongoing Memory or I/O Read transaction, the module waits for compl_done_i input to be
asserted before it accepts the next TLP, while an ongoing Memory or I/O Write transaction
is deemed complete after wr_busy_i is deasserted.
Transmit Path
Figure A-6 shows the PIO_32_TX_ENGINE and PIO_64_TX_ENGINE modules. The
datapath of the module must match the datapath of the core being used. These modules
connect with the core Transaction Transmit (trn_r*) interface.
X-Ref Target - Figure A-6
PIO_32_Tx_Engine PIO_64_Tx_Engine
clk clk
rst_n rst_n
trn_tdst_rdy_n trn_tdst_rdy_n
trn_tdst_dsc_n trn_tdst_dsc_n
rd_data_i[31:0] rd_data_i[31:0]
completer_id_i[15:0] completer_id_i[15:0]
EP_Tx EP_Tx
After the completion is sent, the TX engine asserts the compl_done_i output indicating to
the RX engine that it can assert trn_rdst_rdy_n and continue receiving TLPs.
Endpoint Memory
Figure A-7 displays the PIO_EP_MEM_ACCESS module. This module contains the
Endpoint memory space.
X-Ref Target - Figure A-7
PIO_EP_MEM_ACCESS
clk
rst_n
wr_en_i
rd_addr_i[10:0]
wr_busy_o
rd_be_i[3:0]
rd_data_o[31:0]
wr_addr_i[10:0]
wr_be_i[7:0]
wr_data_i[31:0]
EP_MEM
The PIO_EP_MEM_ACCESS module processes data written to the memory from incoming
Memory and I/O Write TLPs and provides data read from the memory in response to
Memory and I/O Read TLPs.
The EP_MEM module processes 1 DWORD 32- and 64-bit addressable Memory and I/O
Write requests based on the information received from the RX Engine, as defined in
Table A-7. While the memory controller is processing the write, it asserts the wr_busy_o
output indicating it is busy.
Both 32- and 64-bit Memory and I/O Read requests of one DWORD are processed based
on the inputs defined in Table A-8. After the read request is processed, the data is returned
on rd_data_o[31:0].
PIO Operation
PIO Read Transaction
Figure A-8 depicts a Back-to-Back Memory Read request to the PIO design. The receive
engine deasserts trn_rdst_rdy_n as soon as the first TLP is completely received. The next
Read transaction is accepted only after compl_done_o is asserted by the transmit engine,
indicating that Completion for the first request was successfully transmitted.
X-Ref Target - Figure A-8
clk
trn_rrem_n[7:0] 00 0F 00 0F 00
trn_rsof_n
trn_reof_n
trn_rsrc_rdy_n
trn_rsrc_dsc_n
trn_rdst_rdy_n
trn_rbar_hit_n[6:0] 7F 7D 7F 7D 7F
compl_done_o
trn_trem_n[7:0] 00
trn_tsof_n
trn_teof_n
trn_tsrc_rdy_n
trn_tsrc_dsc_n
trn_tdst_dsc_n
trn_tdst_rdy_n
TLP1 TLP2
Device Utilization
Table A-9 shows the PIO design FPGA resource utilization.
Summary
The PIO design demonstrates the Endpoint for PCIe and its interface capabilities. In
addition, it enables rapid bring-up and basic validation of end user endpoint add-in card
FPGA hardware on PCI Express platforms. Users can leverage standard operating system
utilities that enable generation of read and write transactions to the target space in the
reference design.
Root Port
Output
usrapp_com Model TPI for
Logs
PCI Express
dsport
PIO
Design
Architecture
The Root Port Model consists of these blocks, illustrated in Figure A-10:
• dsport (Root Port)
• usrapp_tx
• usrapp_rx
• usrapp_com (Verilog only)
The usrapp_tx and usrapp_rx blocks interface with the dsport block for transmission and
reception of TLPs to/from the Endpoint Design Under Test (DUT). The Endpoint DUT
consists of the Endpoint for PCIe and the PIO design (displayed) or customer design.
The usrapp_tx block sends TLPs to the dsport block for transmission across the PCI
Express Link to the Endpoint DUT. In turn, the Endpoint DUT device transmits TLPs
across the PCI Express Link to the dsport block, which are subsequently passed to the
usrapp_rx block. The dsport and core are responsible for the data link layer and physical
link layer processing when communicating across the PCI Express fabric. Both usrapp_tx
and usrapp_rx utilize the usrapp_com block for shared functions, for example, TLP
processing and log file outputting. Transaction sequences or test programs are initiated by
the usrapp_tx block to stimulate the endpoint device's fabric interface. TLP responses from
the endpoint device are received by the usrapp_rx block. Communication between the
usrapp_tx and usrapp_rx blocks allow the usrapp_tx block to verify correct behavior and
act accordingly when the usrapp_rx block has received TLPs from the endpoint device.
Test Selection
Table A-10 describes the tests provided with the Root Port Model, followed by specific
sections for VHDL and Verilog test selection.
Speed Differences
The VHDL test bench is slower than the Verilog test bench, especially when testing the x8
core. For initial design simulation and speed enhancement, the user might want to use the
x1 core, identify basic functionality issues, and then move to x4 or x8 simulation when
testing design performance.
Waveform Dumping
Table A-11 describes the available simulator waveform dump file formats, each of which is
provided in the simulator’s native file format. The same mechanism is used for VCS and
ModelSim.
VHDL Flow
Waveform dumping in the VHDL flow does not use the +dump_all mechanism described
in the Verilog Flow section. Because the VHDL language itself does not provide a common
interface for dumping waveforms, each VHDL simulator has its own interface for
supporting waveform dumping. For both the supported ModelSim and INCISIV flows,
dumping is supported by invoking the VHDL simulator command line with a command
line option that specifies the respective waveform command file, wave.do (ModelSim)
and wave.sv (INCISIV) and wave.wcfg (ISim). This command line can be found in the
respective simulation script files simulate_mti.do, simulate_ncsim.sh, and
simulate_isim.bat[.sh].
ModelSim
This command line initiates waveform dumping for the ModelSim flow using the VHDL
test bench:
>vsim +notimingchecks –do wave.do –L unisim –L work work.board
INCISIV
This command line initiates waveform dumping for the INCISIV flow using the VHDL test
bench:
>ncsim –gui work.board -input @”simvision –input wave.sv”
Verilog Flow
The Root Port Model provides a mechanism for outputting the simulation waveform to file
by specifying the +dump_all command line parameter to the simulator.
For example, the script file simulate_ncsim.sh (used to start the Cadence INCISIV
simulator) can indicate to the Root Port Model that the waveform should be saved to a file
using this command line:
ncsim work.boardx01 +TESTNAME=sample_smoke_test0 +dump_all
Output Logging
When a test fails on the example or customer design, the test programmer debugs the
offending test case. Typically, the test programmer inspects the wave file for the simulation
and cross-reference this to the messages displayed on the standard output. Because this
approach can be very time consuming, the Root Port Model offers an output logging
mechanism to assist the tester with debugging failing test cases to speed the process.
The Root Port Model creates three output files (tx.dat, rx.dat, and error.dat) during
each simulation run. Log files rx.dat and tx.dat each contain a detailed record of every
TLP that was received and transmitted, respectively, by the Root Port Model. With an
understanding of the expected TLP transmission during a specific test case, the test
programmer can more easily isolate the failure.
The log file error.dat is used in conjunction with the expectation tasks. Test programs
that utilize the expectation tasks will generate a general error message to standard output.
Detailed information about the specific comparison failures that have occurred due to the
expectation error is located within error.dat.
A typical parallel test uses the form of one command thread and one or more expectation
threads. These threads work together to verify a device's functionality. The role of the
command thread is to create the necessary TLP transactions that cause the device to receive
and generate TLPs. The role of the expectation threads is to verify the reception of an
expected TLP. The Root Port Model TPI has a complete set of expectation tasks to be used
in conjunction with parallel tests.
Because the example design is a target-only device, only Completion TLPs can be expected
by parallel test programs while using the PIO design. However, the full library of
expectation tasks can be used for expecting any TLP type when used in conjunction with
the customer's design (which can include bus-mastering functionality). Currently, the
VHDL version of the Root Port Model Test Bench does not support Parallel tests.
Test Description
The Root Port Model provides a Test Program Interface (TPI). The TPI provides the means
to create tests by simply invoking a series of Verilog tasks. All Root Port Model tests should
follow the same six steps:
1. Perform conditional comparison of a unique test name
2. Set up master timeout in case simulation hangs
3. Wait for Reset and link-up
4. Initialize the configuration space of the endpoint
5. Transmit and receive TLPs between the Root Port Model and the Endpoint DUT
6. Verify that the test succeeded
TSK_TX_MEMORY_WRITE_64 tag_ 7:0 Sends a PCI Express Memory Write TLP from
tc_ 2:0 Root Port Model to 64-bit memory address
addr_ of Endpoint DUT.
len_ 9:0
CplD returned from the Endpoint DUT will
addr_ 63:0
use the contents of global
last_dw_be_ 3:0 COMPLETE_ID_CFG as the completion ID.
first_dw_be_ 3:0 The global DATA_STORE byte array is used to
ep_ – pass write data to task.
System Overview
PCI Express devices require setup after power-on, before devices in the system can begin
application specific communication with each other. Minimally, two devices connected via
a PCI Express Link must have their Configuration spaces initialized and be enumerated in
order to communicate.
Root Ports facilitate PCI Express enumeration and configuration by sending Configuration
Read (CfgRd) and Write (CfgWr) TLPs to the downstream devices such as Endpoints and
Switches to set up the configuration spaces of those devices. When this process is complete,
higher-level interactions, such as Memory Reads (MemRd TLPs) and Writes (MemWr
TLPs), can occur within the PCI Express System.
The Configurator example design described herein performs the configuration
transactions required to enumerate and configure the Configuration space of a single
connected PCI Express Endpoint and allow application-specific interactions to occur.
The Configurator example design, as delivered, is designed to be used with the PIO Slave
example included with Xilinx® Endpoint cores and described in Appendix A, Example
Design and Model Test Bench for Endpoint Configuration. The PIO Master is useful for
simple bring-up and debugging, and is an example of how to interact with the
Configurator Wrapper. The Configurator example design can be easily modified to be used
with other Endpoints.
Figure B-1 shows the various components of the Configurator example design.
X-Ref Target - Figure B-1
5.0 Gb/s
Data (Gen2)
Checker Enabler
Completion
Decoder
Virtex-6 FPGA
Integrated Block
Controller Controller for PCI Express
(Configured as
Root Port)
Packet
Packet Generator
Generator
TX Mux
UG517_aB_01_021210
Figure B-2 shows how the blocks are connected in an overall system view.
X-Ref Target - Figure B-2
PIO Master
Configurator
Wrapper
Configurator Configurator
Block ROM
TRN Interface
Integrated
Endpoint
Model
PIO Slave
Endpoint
Design
Configurator Block
The Configurator Block is responsible for generating CfgRd and CfgWr TLPs and
presenting them to TRN interface of the Integrated Block in Root Port configuration. The
TLPs that the Configurator Block generates are determined by the contents of the
Configurator ROM.
The generated configuration traffic is predetermined by the designer to address their
particular system requirements. The configuration traffic is encoded in a memory-
initialization file (the Configurator ROM) which is synthesized as part of the Configurator.
The Configurator Block and the attached Configurator ROM is intended to be usable a part
of a real-world embedded design.
The Configurator Block steps through the Configuration ROM file and sends the TLPs
specified therein. Supported TLP types are Message, Message w/Data, Configuration
Write (Type 0), and Configuration Read (Type 0). For the Configuration packets, the
Configurator Block waits for a Completion to be returned before transmitting the next TLP.
If the Completion TLP fields do not match the expected values, PCI Express configuration
fails. However, the Data field of Completion TLPs is ignored and not checked
Note: There is no completion timeout mechanism in the Configurator Block, so if no Completion is
returned, the Configurator Block waits forever.
The Configurator Block has these parameters, which can be altered by the user:
• TCQ: Clock-to-out delay modeled by all registers in design.
• EXTRA_PIPELINE: Controls insertion of an extra pipeline stage on the RX TRN
interface for timing.
• ROM_FILE: File name containing configuration steps to perform.
• ROM_SIZE: Number of lines in ROM_FILE containing data (equals number of TLPs
to send/2).
• REQUESTER_ID: Value for the Requester ID field in outgoing TLPs.
When the Configurator Block design is used, all TLP traffic must pass through the
Configurator Block. The user design is responsible for asserting the start_config input (for
one clock cycle) to initiate the configuration process when trn_lnk_up_n has been asserted
by the core. Following start_config, the Configurator Block performs whatever
configuration steps have been specified in the Configuration ROM. During configuration,
the Configurator Block controls the core's TRN interface. Following configuration, all TRN
traffic is routed to/from the User Application, which in the case of this example design is
the PIO Master. The end of configuration is signaled by the assertion of finished_config. If
configuration is unsuccessful for some reason, failed_config is also asserted.
If used in a system that supports PCIe v2.0 5.0 Gb/s links, the Configurator Block begins its
process by attempting to up-train the link from 2.5 Gb/s to 5.0 Gb/s. This feature is
enabled depending on the LINK_CAP_MAX_LINK_SPEED parameter on the
Configurator Wrapper.
The Configurator does not support the user throttling received data on the trn_r* interface.
Because of this, the Root Port inputs which control throttling are not included on the
Configurator Wrapper. These signals are trn_rdst_rdy_n and trn_rnp_ok_n. This is a
limitation of the Configurator Example Design and not of the Integrated Block for PCI
Express in Root Port configuration. This means that the user design interfacing with the
Configurator Example Design must be able to accept received data at line rate.
Configurator ROM
The Configurator ROM stores the necessary configuration transactions to configure a PCI
Express Endpoint. This ROM interfaces with the Configurator Block to send these
transactions over the PCI Express link.
The example ROM file included with this design shows the operations needed to configure
a Virtex-6 FPGA Integrated Endpoint Block for PCI Express and PIO Example Design.
The Configurator ROM can be customized for other Endpoints and PCI Express system
topologies. The unique set of configuration transactions required depends on the Endpoint
that will be interacting with the Root Port. This information can be obtained from the
documentation provided with the Endpoint.
The ROM file follows the format specified in the Verilog specification (IEEE 1364-2001)
section 17.2.8, which describes using the $readmemb function to pre-load data into a RAM
or ROM. Verilog-style comments are allowed.
The file is read by the simulator or synthesis tool and each memory value encountered is
used as a single location in memory. Digits can be separated by an underscore character (_)
for clarity without constituting a new location.
Each configuration transaction specified uses two adjacent memory locations - the first
location specifies the header fields, while the second location specifies the 32-bit data
payload. (For CfgRd TLPs and Messages without data, the data location is unused but still
present). In other words, header fields are on even addresses, while data payloads are on
odd addresses.
For headers, Messages and CfgRd/CfgWr TLPs use different fields. For all TLPs, two bits
specify the TLP type. For Messages, Message Routing and Message Code are specified. For
CfgRd/CfgWr TLPs, Function Number, Register Number, and 1st Dword Byte-Enable are
specified. The specific bit layout is shown in the example ROM file.
PIO Master
The PIO Master demonstrates how a user-application design might interact with the
Configurator Block. It directs the Configurator Block to bring up the link partner at the
appropriate time, and then (after successful bring-up) generates and consumes bus traffic.
The PIO Master performs writes and reads across the PCI Express Link to the PIO Slave
Example Design (from the Endpoint core) to confirm basic operation of the link and the
Endpoint.
The PIO Master waits until trn_lnk_up_n is asserted by the Root Port. It then asserts
start_config to the Configurator Block. When the Configurator Block asserts
finished_config, the PIO Master writes and reads to/from each BAR in the PIO Slave
design. If the readback data matches what was written, the PIO Master asserts its
pio_test_finished output. If there is a data mismatch or the Configurator Block fails to
configure the Endpoint, the PIO Master asserts its pio_test_failed output. The PIO Master's
operation can be restarted by asserting its pio_test_restart input for one clock cycle.
• cgator_wrapper
- pcie_2_0_rport_v6 (in the source directory)
This directory contains all the source files for the Integrated Block for PCI
Express in Root Port Configuration.
- cgator
- cgator_cpl_decoder
- cgator_pkt_generator
- cgator_tx_mux
- cgator_gen2_enabler
- cgator_controller
This directory contains <cgator_cfg_rom.data> (specified by ROM_FILE)*
• pio_master
- pio_master_controller
- pio_master_checker
- pio_master_pkt_generator
Note: cgator_cfg_rom.data is the default name of the ROM data file. The user can override this
by changing the value of the ROM_FILE parameter.
Architecture
The Endpoint model consists of these blocks:
• PCI Express Endpoint (Virtex-6 FPGA Integrated Block for PCI Express in Endpoint
configuration) model.
• PIO slave design, consisting of:
• PIO_RX_ENGINE
• PIO_TX_ENGINE
• PIO_EP_MEM
• PIO_TO_CTRL
The PIO_RX_ENGINE and PIO_TX_ENGINE blocks interface with the ep block for
reception and transmission of TLPs from/to the Root Port Design Under Test (DUT). The
Root Port DUT consists of the Integrated Block for PCI Express configured as a Root Port
and the Configurator Example Design, which consists of a Configurator block and a PIO
Master design, or customer design.
The PIO slave design is described in detail in Appendix A, Programmed Input/Output:
Endpoint Example Design.
Note: For Cadence INCISIV users, the work construct must be manually inserted into the cds.lib
file: DEFINE WORK WORK.
Waveform Dumping
Table B-2 describes the available simulator waveform dump file formats, each of which is
provided in the simulators native file format. The same mechanism is used for VCS and
ModelSim.
The Endpoint model test bench provides a mechanism for outputting the simulation
waveform to file by specifying the +dump_all command line parameter to the simulator.
For example, the script file simulate_ncsim.sh (used to start the Cadence INCISIV
simulator) can indicate to the Endpoint model that the waveform should be saved to a file
using this command line:
ncsim work.boardx01 +dump_all
Output Logging
The test bench will output messages, captured in the simulation log, indicating the time at
which the following occur:
• trn_reset deasserted
• trn_lnk_up_n asserted
• cfg_done asserted by the Configurator
• pio_test_finished asserted by the PIO Master
• Simulation Timeout (if pio_test_finished or pio_test_failed never asserted)
Migration Considerations
For users migrating to the Virtex®-6 FPGA Integrated Block for PCI Express® from the
Endpoint Block Plus for PCI Express, the following list describes the differences in
behaviors and options between the Virtex-6 FPGA Integrated Block for PCI Express core
and the Endpoint Block Plus core.
Transaction Interface
Table C-2 shows which transaction interface signals were changed, created, or deprecated.
Configuration Interface
Table C-2 shows which configuration interface signals were changed and created.
Configuration Space
• MSI-X Support: The MSI-X Capability Structure is optionally supported. MSI-X Vector
Table and the Pending Bit Array need to be implemented as part of the user's logic, by
claiming a BAR aperture.
• Device Serial Number Capability: Device Serial Number Capability can optionally be
disabled.
• Virtual Channel Capability: Virtual Channel Capability is optionally supported.
When enabled, the User Application is allowed to operate in a TCx-VC0 mode. If
disabled, the user application must operate in TC0-VC0 mode.
• Vendor Specific Capability - Loopback Master: Vendor Specific Capability enables
Xilinx specific PCI Express Loopback Control is optionally supported. This enables
Electrical compliance testing based on Endpoint Loopback Master.
• User Implemented Configuration Space: The Virtex-6 FPGA Integrated Block
optionally enables User Implemented Configuration Space, in either the Legacy PCI
Configuration Space or the PCI Express Extended Configuration Space, or in both.
Debugging Designs
This appendix provides information on using resources available on the Xilinx Support
website, available debug tools, and a step-by-step process for debugging designs that use
the Virtex®-6 FPGA Integrated Block for PCI Express. This appendix uses flow diagrams
to guide the user through the debug process.
The following information is found in this appendix:
• Finding Help on Xilinx.com
• Contacting Xilinx Technical Support
• Debug Tools
• Hardware Debug
• Simulation Debug
Documentation
The Data Sheet and User Guide are the main documents associated with the Virtex-6 FPGA
Integrated Block, as shown in Table D-1.
Table D-1: Virtex-6 FPGA Integrated Block for PCI Express Documentation
Designation Description
Data Sheet: provides a high-level description of the Integrated Block and
key features. It includes information on which ISE software version is
DS
supported by the current LogiCORE IP version used to instantiate the
Integrated Block.
User Guide: provides information on generating a Integrated Block
design, detailed descriptions of the interface and how to use the product.
UG
The User Guide contains waveforms to show interactions with the block
and other important information needed to design with the product.
These Integrated Block for PCI Express documents along with documentation related to all
products that aid in the design process can be found on the Xilinx Support webpage.
Documentation is sorted by product family at the main support page or by solution at the
Documentation Center.
Answer Records
Answer Records include information on commonly encountered problems, helpful
information on how to resolve these problems, and any known issues with a product.
Answer Records are created and maintained daily ensuring users have access to the most
up-to-date information on Xilinx products. Answer Records can be found by searching the
Answers Database.
To use the Answers Database Search:
• Navigate to www.xilinx.com/support. The Answers Database Search is located at the
top of this webpage.
• Enter keywords in the provided search field and select Search.
• Examples of searchable keywords are product names, error messages, or a generic
summary of the issue encountered.
• To see all answer records directly related to the Virtex-6 FPGA Integrated Block
for PCI Express, search for the phrase “Virtex-6 FPGA Integrated Block for PCI
Express”
Debug Tools
There are many tools available to debug PCI Express design issues. It is important to know
which tools would be useful for debugging for the various situations encountered. This
appendix references the following tools:
Example Design
Xilinx Endpoint for PCI Express products come with a synthesizable back-end application
called the PIO design that has been tested and is proven to be interoperable in available
systems. The design appropriately handles all incoming 1 DWORD read and write
transactions. It returns completions for non-posted transactions and updates the target
memory space for writes. For more information, see , Programmed Input/Output:
Endpoint Example Design.
Link Analyzers
Third party link analyzers show link traffic in a graphical or text format. Lecroy, Agilent,
and Vmetro are companies that make common analyzers available today. These tools
greatly assist in debugging link issues and allow users to capture data which Xilinx
support representatives can view to assist in interpreting link behavior.
LSPCI (Linux)
LSPCI is available on Linux platforms and allows users to view the PCI Express device
configuration space. LSPCI is usually found in the /sbin directory. LSPCI will display a
list of devices on the PCI buses in the system. See the LSPCI manual for all command
options. Some useful commands for debugging include:
• lspci -x -d [<vendor>]:[<device>]
This will display the first 64 bytes of configuration space in hexadecimal form for the
device with vendor and device ID specified (omit the -d option to display information
for all devices). The default Vendor/Device ID for Xilinx cores is 10EE:6012. Below is a
sample of a read of the configuration space of a Xilinx device:
> lspci -x -d 10EE:6012
81:00.0 Memory controller: Xilinx Corporation: Unknown device 6012
00: ee 10 12 60 07 00 10 00 00 00 80 05 10 00 00 00
10: 00 00 80 fa 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 ee 10 6f 50
30: 00 00 00 00 40 00 00 00 00 00 00 00 05 01 00 00
Included in this section of the configuration space are the Device ID, Vendor ID, Class
Code, Status and Command registers, and Base Address Registers.
• lspci -xxxx -d [<vendor>]:[<device>]
This displays the extended configuration space of the device. It can be useful to read
the extended configuration space on the root and look for the Advanced Error
Reporting (AER) registers. These registers provide more information on why the
device has flagged an error (for example, it might show that a correctable error was
issued because of a replay timer time-out).
• lspci -k
Shows kernel drivers handling each device and kernel modules capable of handling it
(works with kernel 2.6 or later).
PCItree (Windows)
PCItree can be downloaded at www.pcitree.de and allows the user to view the PCI Express
device configuration space and perform 1 DWORD memory writes and reads to the
aperture.
The configuration space is displayed by default in the lower right corner when the device
is selected, as shown in Figure D-1.
X-Ref Target - Figure D-1
HWDIRECT (Windows)
HWDIRECT can be purchased at www.eprotek.com and allows the user to view the
PCI Express device configuration space as well as the extended configuration space
(including the AER registers on the root).
X-Ref Target - Figure D-2
Hardware Debug
Hardware issues can range from device recognition issues to problems seen after hours of
testing. This section provides debug flow diagrams for some of the most common issues
experienced by users. Endpoints that are shaded gray indicate that more information can
be found in sections below Figure D-3.
No
Yes
ChipScope may be used try and determine Use the link analyzer to monitor the training
the point of failure. sequence and to determine the point of failure.
Have the analyzer trigger on the first TS1 that it
recognizes and then compare the output to the
LTSSM state machine sequences outlined in
Chapter 4 of the PCI Express Base Specification.
Link is Training
(trn_lnk_up_n = 0)
Yes
No
• A component must enter the LTSSM Detect state within 20 ms of the end of the
Fundamental reset.
• A system must guarantee that all components intended to be software visible at boot
time are ready to receive Configuration Requests within 100 ms of the end of
Conventional Reset at the Root Complex.
These statements basically mean the FPGA must be configured within a certain finite time,
and not meeting these requirements could cause problems with link training and device
recognition.
Configuration can be accomplished using an onboard PROM or dynamically using JTAG.
When using JTAG to configure the device, configuration typically occurs after the Chipset
has enumerated each peripheral. After configuring the FPGA, a soft reset is required to
restart enumeration and configuration of the device. A soft reset on a Windows based PC
is performed by going to Start → Shut Down and then selecting Restart.
To eliminate FPGA configuration as a root cause, perform a soft restart of the system.
Performing a soft reset on the system will keep power applied and forces re-enumeration
of the device. If the device links up and is recognized after a soft reset is performed, then
FPGA configuration is most likely the problem. Most typical systems use ATX power
supplies which provides some margin on this 100 ms window as the power supply is
normally valid before the 100 ms window starts. For more information on FPGA
configuration, see Chapter 8, FPGA Configuration.
Application Requirements
During enumeration, it is possible for the chipset to issue TLP traffic that will be passed
from the core to the backend application. A common oversight when designing custom
backend applications is to not have logic which handles every type incoming request. As a
result, no response is created and problems arise. The PIO design has the necessary
backend functions to respond correctly to any incoming request. It is the responsibility of
the application to generate the correct response. The following packet types will be
presented to the application:
• Requests targeting the Expansion ROM (if enabled)
• Message TLP's
• Memory or I/O requests targeting a BAR
• All completion packets
The PIO design, can be used to rule out any of these types of concerns, as the PIO design
will respond to all incoming transactions to the user application in some way to ensure the
host receives the proper response allowing the system to progress. If the PIO design works,
but the custom application does not, this means that some transaction is not being handled
properly.
The ChipScope tool should be implemented on the wrapper TRN Receive interface to
identify if requests targeting the backend application are drained and completed
successfully. The TRN interface signals that should be probed in the ChipScope tool are
defined in Table D-2, page 277.
Link is Up (trn_lnk_up_n = 0)
Device is recognized by system.
Data Transfers failing.
Errors are reported to the user Errors flagged by the core are due
interface on the output cfg_dstatus[3:0]. Fatal Error? Blue screen? Yes to problems on the receive data path.
This is a copy of the device status Other errors? Use a link analyzer if possible to
register. Using ChipScope monitor check incoming packets. See the
this bus for errors. "Identifying Errors" section.
No
Receive Transmit
Is the problem with receiving
or transmitting TLPs?
No
No
Identifying Errors
Hardware symptoms of system lock up issues are indicated when the system hangs or a
blue screen appears (PC systems). The PCI Express Base Specification, rev. 2.0 requires that
error detection be implemented at the receiver. A system lock up or hang is commonly the
result of a Fatal Error and will be reported in bit two of the receivers Device Status register.
Using the ChipScope tool, monitor the core’s device status register to see if a fatal error is
being reported.
A fatal error reported at the Root complex implies an issue on the transmit side of the EP.
The Root Complex Device Status register can often times be seen using PCITree (Windows)
or LSPCI (Linux). If a fatal error is detected, refer to the Transmit section. A Root Complex
can often implement Advanced Error Reporting (AER), which further distinguishes the
type of error reported. AER provides valuable information as to why a certain error was
flagged and is provided as an extended capability within a devices configuration space.
Section 7.10 of the PCI Express Base Specification, rev. 2.0 provides more information on AER
registers.
Transmit
Fatal Error Detected on Root or Link Partner
Check to make sure the TLP is correctly formed and that the payload (if one is attached)
matches what is stated in the header length field. The Endpoints device status register does
not report errors created by traffic on the transmit channel.
The signals shown in Table D-2 should be monitored on the Transmit interface to verify all
traffic being initiated is correct.
Receive
Xilinx solutions for PCI Express provide the Device Status register to the application on
CFG_DSTATUS[3:0].
System lock up conditions due to issues on the receive channel of the PCI Express core are
often result of an error message being sent upstream to the root. Error messages are only
sent when error reporting is enabled in the Device Control register.
A fatal condition is reported if any of these events occur:
• Training Error
• DLL Protocol Error
• Flow Control Protocol Error
• Malformed TLP
• Receiver Overflow
The first four bullets are not common in hardware because both Xilinx solutions for PCI
Express and connected components have been thoroughly tested in simulation and
hardware. However, a receiver overflow is a possibility. Users must ensure they follow
requirements discussed in the section Receiver Flow Control Credits Available in
Chapter 6 when issuing memory reads.
Non-Fatal Errors
Below are lists of conditions that are reported as Non-Fatal errors. See the PCI Express Base
Specification, rev. 2.0 for more details.
If the error is being reported by the root, the Advanced Error Reporting (AER) registers can
be read to determine the condition that led to the error. Use a tool such as HWDIRECT,
discussed in Third Party Software Tools, page 268, to read the root’s AER registers.
Chapter 7 of the PCI Express Base Specification defines the AER registers. If the error is
signaled by the endpoint, debug ports are available to help determine the specific cause of
the error.
Correctable Non-Fatal errors are:
• Receiver Error
• Bad TLP
• Bad DLLP
• Replay Timeout
• Replay NUM Rollover
The first three errors listed above are detected by the receiver and are not common in
hardware systems. The replay error conditions are signaled by the transmitter. If an ACK is
not received for a packet within the allowed time, it will be replayed by the transmitter.
Throughput can be reduced if many packets are being replayed, and the source can usually
be determined by examining the link analyzer or ChipScope tool captures.
Uncorrectable Non-Fatal errors are:
• Poisoned TLP
• Received ECRC Check Failed
• Unsupported Request (UR)
• Completion Timeout
• Completer Abort
• Unexpected Completion
• ACS Violation
An unsupported request usually indicates that the address in the TLP did not fall within
the address space allocated to the BAR. This often points to a problem with the address
translation performed by the driver. Ensure also that the BAR has been assigned correctly
by the root at startup. LSPCI or PCItree discussed in Third Party Software Tools, page 268
can be used to read the BAR values for each device.
A completion timeout indicates that no completion was returned for a transmitted TLP
and is reported by the requester. This can cause the system to hang (could include a blue
screen on Windows) and is usually caused when one of the devices locks up and stops
responding to incoming TLPs. If the root is reporting the completion timeout, the
ChipScope tool can be used to investigate why the User Application did not respond to a
TLP (for example, the User Application is busy, there are no transmit buffers available, or
trn_tdst_rdy_n is deasserted). If the endpoint is reporting the Completion timeout, a
link analyzer would show the traffic patterns during the time of failure and would be
useful in determining the root cause.
Next Steps
If the debug suggestions listed above do not resolve the issue, open a support case to have
the appropriate Xilinx expert assist with the issue.
To create a technical support case in Webcase, see the Xilinx website at:
www.xilinx.com/support/clearexpress/websupport.htm
Items to include when opening a case:
• Detailed description of the issue and results of the steps listed above.
• Attach ChipScope tool VCD captures taken in the steps above.
To discuss possible solutions, use the Xilinx User Community:
forums.xilinx.com/xlnx/
Simulation Debug
This section provides simulation debug flow diagrams for some of the most common
issues experienced by users. Endpoints that are shaded gray indicate that more
information can be found in sections below Figure D-6.
ModelSim Debug
X-Ref Target - Figure D-6
ModelSim
Simulation Debug
Yes
Yes
No
If the libraries are not compiled and
mapped correctly, it will cause errors
such as:
# ** Error: (vopt-19) Failed to access Yes Need to compile and map the
library 'secureip' at "secureip". Do you get errors referring to proper libraries. See "Compiling
# No such file or directory. failing to access library? Simulation Libraries Section."
(errno = ENOENT)
# ** Error: ../../example_design/
xilinx_pcie_2_0_ep_v6.v(820):
Library secureip not found. No
No
For example:
Vmap unisims_ver C:\my_unisim_lib
Next Step
If the debug suggestions listed above do not resolve the issue, a support case should be
opened to have the appropriate Xilinx expert assist with the issue.
To create a technical support case in Webcase, see the Xilinx website at:
www.xilinx.com/support/clearexpress/websupport.htm
Items to include when opening a case:
• Detailed description of the issue and results of the steps listed above.
• Attach a VCD or WLF dump of the simulation.
To discuss possible solutions, use the Xilinx User Community:
forums.xilinx.com/xlnx/
Completion Space
Table E-1 defines the completion space reserved in the receive buffer by the core. The
values differ depending on the different Capability Max Payload Size settings of the core
and the performance level selected by the designer. If the designer chooses to not have TLP
Digests (ECRC) removed from the incoming packet stream, the TLP Digests (ECRC) must
be accounted for as part of the data payload. Values are credits, expressed in decimal.
When calculating the number of Completion credits a Non-Posted Request requires, the
user must determine how many RCB-bounded blocks the Completion response might
require; this is the same as the number of Completion Header credits required.
LIMIT_FC Method
The LIMIT_FC method is the simplest to implement. The User Application assesses the
maximum number of outstanding Non-Posted Requests allowed at one time, MAX_NP. To
calculate this value, perform the following steps:
1. Determine the number of CplH credits required by a Max_Request_Size packet:
Max_Header_Count = ceiling(Max_Request_Size / RCB)
2. Determine the greatest number of maximum-sized Completions supported by the
CplD credit pool:
Max_Packet_Count_CplD = floor(CplD / Max_Request_Size)
PACKET_FC Method
The PACKET_FC method allocates blocks of credit in finer granularities than LIMIT_FC,
using the receive Completion space more efficiently with a small increase in user logic.
Start with two registers, CPLH_PENDING and CPLD_PENDING, (loaded with zero at
reset), and then perform these steps:
1. When the User Application needs to send an NP request, determine the potential
number of CplH and CplD credits it might require:
NP_CplH = ceiling[((Start_Address mod RCB) + Request_Size) / RCB]
NP_CplD = ceiling[((Start_Address mod 16 bytes) + Request_Size) /16 bytes]
(except I/O Write, which returns zero data)
The modulo and ceiling functions ensure that any fractional RCB or credit blocks are
rounded up. For example, if a Memory Read requests 8 bytes of data from address
7Ch, the returned data can potentially be returned over two Completion packets
(7Ch-7Fh, followed by 80h-83h). This would require two RCB blocks and two data
credits.
2. Check the following:
CPLH_PENDING + NP_CplH < Total_CplH (from Table E-1)
CPLD_PENDING + NP_CplD < Total_CplD (from Table E-1)
3. If both inequalities are true, transmit the Non-Posted Request, increase
CPLH_PENDING by NP_CplH and CPLD_PENDING by NP_CplD. For each NP
Request transmitted, keep NP_CplH and NP_CplD for later use.
4. When all Completion data is returned for an NP Request, decrement
CPLH_PENDING and CPLD_PENDING accordingly.
This method is less wasteful than LIMIT_FC but still ties up all of an NP Request’s
Completion space until the entire request is satisfied. RCB_FC and DATA_FC provide finer
deallocation granularity at the expense of more logic.
RCB_FC Method
The RCB_FC method allocates and de-allocates blocks of credit in RCB granularity. Credit
is freed on a per-RCB basis.
As with PACKET_FC, start with two registers, CPLH_PENDING and CPLD_PENDING
(loaded with zero at reset).
1. Calculate the number of data credits per RCB:
CplD_PER_RCB = RCB / 16 bytes
2. When the User Application needs to send an NP request, determine the potential
number of CplH credits it might require. Use this to allocate CplD credits with RCB
granularity:
NP_CplH = ceiling[((Start_Address mod RCB) + Request_Size) / RCB]
NP_CplD = NP_CplH × CplD_PER_RCB
3. Check the following:
CPLH_PENDING + NP_CplH < Total_CplH
CPLD_PENDING + NP_CplD < Total_CplD
4. If both inequalities are true, transmit the Non-Posted Request, increase
CPLH_PENDING by NP_CplH and CPLD_PENDING by NP_CplD.
5. At the start of each incoming Completion, or when that Completion begins at or
crosses an RCB without ending at that RCB, decrement CPLH_PENDING by 1 and
CPLD_PENDING by CplD_PER_RCB. Any Completion could cross more than one
RCB. The number of RCB crossings can be calculated by:
RCB_CROSSED = ceiling[((Lower_Address mod RCB) + Length) / RCB]
Lower_Address and Length are fields that can be parsed from the Completion header.
Alternatively, a designer can load a register CUR_ADDR with Lower_Address at the
start of each incoming Completion, increment per DW or QW as appropriate, then
count an RCB whenever CUR_ADDR rolls over.
This method is less wasteful than PACKET_FC but still gives us an RCB granularity. If a
User Application transmits I/O requests, the User Application could adopt a policy of only
allocating one CplD credit for each I/O Read and zero CplD credits for each I/O Write. The
User Application would have to match each incoming Completion’s Tag with the Type
(Memory Write, I/O Read, I/O Write) of the original NP Request.
DATA_FC Method
The DATA_FC method provides the finest allocation granularity at the expense of logic.
As with PACKET_FC and RCB_FC, start with two registers, CPLH_PENDING and
CPLD_PENDING (loaded with zero at reset).
1. When the User Application needs to send an NP request, determine the potential
number of CplH and CplD credits it might require:
NP_CplH = ceiling[((Start_Address mod RCB) + Request_Size) / RCB]
NP_CplD = ceiling[((Start_Address mod 16 bytes) + Request_Size) / 16 bytes]
(except I/O Write, which returns zero data)
2. Check the following:
CPLH_PENDING + NP_CplH < Total_CplH
STREAM_FC Method
When configured as an Endpoint, user applications can maximize Downstream (away
from Root Complex) data throughput by streaming Memory Read Transactions Upstream
(towards the Root Complex) at the highest rate allowed on the Integrated Block
Transaction transmit interface. Streaming Memory Reads are allowed only if
trn_rdst_rdy_n can be held asserted; so that Downstream Completion Transactions, along
with Posted Transactions, can be presented on the integrated block’s receive Transaction
interface and processed at line rate. Asserting trn_rdst_rdy_n in this manner guarantees
that the Completion space within the receive buffer is not oversubscribed (that is, Receiver
Overflow will not occur).
Board Stackup
Board stackup design is dependent on many variables, including design, manufacturing,
and cost constraints. See the information on board stackup design in UG373 and UG366.
Generally speaking, signal layers for high speed signals such as PCI Express data signals
should be sandwiched between ground planes. It is also preferable to use the layers closest
to the top or bottom of the chip so that via stubs are minimized.
ML605 Example
Figure F-1 shows the stackup that the ML605 Add-in Card reference board employs. All
internal signal layers are sandwiched between (uninterrupted) ground planes. Power
planes are located in the center of the stackup and are not adjacent to signal layers.
X-Ref Target - Figure F-1
TOP SIDE
LAYER 1 TOP
Virtex -6 FPGA
PRE-PREG
LAYER 2 GND1 and
CORE PCI Express Edge Connector Side B (RX)
LAYER 3 SIG1
PRE-PREG located on top
LAYER 4 GND2
CORE
LAYER 5 SIG2
PRE-PREG
LAYER 6 GND3
CORE
LAYER 7 SIG3
PRE-PREG
LAYER 8 GND4
CORE
LAYER 9 PWR1
PRE-PREG
LAYER 10 PWR2
CORE
LAYER 11 GND5
PRE-PREG
LAYER 12 SIG4
CORE
LAYER 13 GND6
PRE-PREG
LAYER 14 SIG5
CORE
LAYER 15 GND7
PRE-PREG PCI Express Edge Connector Side A
LAYER 16 BOT located on bottom
BOTTOM SIDE
Transmit (TX) data lines initiate from the FPGA on the top layer, immediately drop to SIG1
(Layer 3) for routing across the PCB, and then terminate at the PCI Express edge connector
side A on the bottom layer.
Receive (RX) data lines initiate from the FPGA on the top layer, immediately drop to SIG5
(Layer 14) for routing across the PCB, and then terminate at the PCI Express edge
connector side B on the top layer.
Edge
Connector
FPGA Side B
Signal layer 14
Edge
Connector
Side A
TX capacitors
Bends
Follow the recommendations in UG373 regarding microstrip and stripline bends. Tight
bends (such as 90 degrees) should be avoided; only mitered, 45-degree or less, bends are
recommended.
Propagation Delay
PCI Express generally does not specify a maximum propagation delay for data signals,
with the exception of add-in cards. Add-in card designs should meet the propagation
delay specification in the CEM specification for data traces. The delay from the edge finger
to the GTX transceiver must not exceed 750 ps.
Lane-to-Lane Skew
Lane-to-lane skew is generally not an issue for a PCI Express link as the specification
allows large amounts of skew, and GTX transceivers can handle large amounts of
lane-to-lane skew. Designers should not violate the PCI Express Specifications where
dictated.
The lane-to-lane skew between any two lanes in a multi-lane link should not exceed the
summarized specifications in Table F-2. These specifications include PCB skew as well as
any skew introduced by repeater or re-timing devices.
Intrapair Skew
Intrapair skew refers to the skew between a P and N leg of a differential pair. Skew can
introduce common-mode effects, which lead to increased EMI, crosstalk, and other DC
effects. It is important to match the skew for differential pairs as close as possible.
Xilinx recommends intrapair trace length-matching to within 5 mils to minimize these
effects.
Symmetrical Routing
Always use symmetrical routing to prevent common-mode effects, such as EMI, from
being introduced into the system.
Figure F-5 illustrates two examples of non-symmetrical routing, which should be avoided.
X-Ref Target - Figure F-5
Vias
Users should follow the recommendations in UG373 for differential vias. Specifically,
wherever high-speed signals must transition signal layers, a
Ground-Signal-Signal-Ground (GSSG) type via should be used if possible. This will
provide a low inductance return current path.
All vias for a differential pair should employ symmetrical routing rules.
Trace Impedance
Differential data-line trace impedance was not specified in the Rev 1.0, 1.0a, or 1.1 (1.x) of
the PCI Express Base and PCI Express CEM Specifications. The transmitters and receivers
were specified to have 100Ω nominal differential impedance; therefore, most 1.x designs
opt for a default 100Ω differential trace impedance for all PCI Express differential
connections.
The PCI Express CEM Specification Rev 2.0 now specifies a differential trace impedance for
data lines that are 5.0 Gb/s capable in the range of 68Ω to 105Ω (85Ω nominal). Designers
targeting PCI Express compliant add-in cards or system boards (motherboards) should
adhere to this specification. Although 100Ω falls within the limits of this specification, PCB
design-for-manufacturability tolerances for trace impedance are generally greater than 5%.
Therefore 5.0 Gb/s add-in card designs that use 100Ω might fall above the 105Ω upper
limit.
PCI Express add-in card connector vendors are now targeting 85Ω for 5.0 Gb/s capable
connections; therefore, Xilinx recommends that 5.0 Gb/s capable designs for open systems
target 85Ω differential impedance for data lines.
Xilinx recommends using simulation techniques to determine the optimum trace
impedance. Simulation using HSPICE or Hyperlynx can help determine the optimum
trace impedance to reduce signal loss.
PCB dielectric material, board stack up, microstrip, and stripline traces affect signal
impedance. It is important that all of these factors are taken into consideration together.
If a simulator is not available, Xilinx recommends these basic guidelines for differential
data-line trace impedance targets:
• 100Ω ± 10% for 2.5 Gb/s only links
• 85W ± 10% for 5.0 Gb/s capable links
Trace Separation
Generally, simulation or post-layout analysis tools should be used to determine the
optimum spacing required to reduce crosstalk from nearby aggressor signals. In the
absence of these tools, Xilinx suggests that spacing between differential pairs and other
non-PCI Express signals should be at least three times the dielectric height above the
reference planes to minimize crosstalk. Exceptions to this are allowed in the break-out area
of the FPGA; however, these sections should be kept as short as possible.
Lane Reversal
Lane reversal is an optional feature of the PCI Express Base Specification and provides
flexibility in the design of the PCB. The Virtex-6 FPGA Integrated Block for PCI Express
supports lane reversal capabilities with some restrictions. See the section titled Lane
Reversal in Chapter 6 for a description of these restrictions.
AC Coupling
System and Add-in Cards
AC coupling capacitors should be placed on the TX pairs. Place the capacitors either near
the edge connector or the FPGA - not in the middle of the interconnect.
Chip-to-Chip
AC coupling capacitors can be placed anywhere on the interconnect, except in the very
middle.
General Guidelines
Capacitors for coupled traces should always be located at the same relative place as its
partner, that is, symmetrical routing guidelines apply for differential pairs.
Use 0.1 uF ceramic chip capacitors in the smallest package possible.
System Edge
Motherboard Connector Add-in Card
TX PETp1
RX
PETn1
PERp1
RX TX
PERn1
PETp2
TX RX
PETn2
PERp2 TX
RX PERn2
. . . . .
. . . . .
. . . . .
Jitter
Reference clock jitter has the potential to close both the TX and RX eyes, depending of the
frequency content of the phase jitter. Therefore, it is very important to maintain as clean a
reference clock as possible.
Reduce crosstalk on the REFCLK signal by isolating the clock signal from nearby
high-speed traces. Maintain a separation of at least 25 mils from the nearest aggressor
signals.
Ensure a clean power supply on MGTAVCC power supply. See UG366 for more details on
GTX transceiver power supply layout and design.
In some cases where the designer has no control over the clock source, it might be desirable
to add a jitter attenuator chip.
If an external PLL or jitter attenuator chip is used, ensure that it meets the specifications for
PLL bandwidth as defined in the PCI Express Base Specification. The PLL bandwidth
specification is different for 1.x and 2.0 versions of the specification.
Trace Impedance
The reference clock should use a 100Ω differential trace impedance.
Termination
The REFCLK signal should be routed to the dedicated reference clock input pins on the
MGT, and the user design should instantiate an IBUFDS_GTXE1 primitive in the user
design. An internal 100Ω differential termination biased to 4/5 MGTAVCC is
automatically included on these input pins when the IBUFDS_GTXE1 is used, and no
external termination is required or needed for Virtex-6 devices. This is true for both HSCL
and LVDS clocks.
See UG366 for more information on GTX transceiver reference clock termination.
AC Coupling
The REFCLK signal should be AC coupled at the input to the FPGA. Xilinx recommends
0.1 µF ceramic-chip capacitors for this purpose. See UG366 for more information
Fanout
If the reference clock needs to be routed to more than one location, then a dedicated clock
fanout chip should be used. Make sure to follow the specifications for the fanout chip. For
instance, 100Ω termination might be required on the input to the fanout chip.
Figure F-7 shows an example of a clock fanout chip used to route the reference clock to
multiple locations. The Virtex-6 FPGA requires no external resistive termination (just AC
coupling capacitors). The fanout chip is shown with a single resistor terminator at its clock
input pins.
X-Ref Target - Figure F-7
PCIe Edge Connector
Virtex-6
FPGA
REFCLK+ Clock
REFCLK- Fanout
Chip
Other
Location
UG517_aF_07_020510
PRSNT#
The PRSNT# pins should be connected as recommended in the CEM specification. Also see
the ML605 board for an example.
Summary Checklist
Table F-3 provides a checklist which summarizes the items discussed in this appendix.
Table G-12: Received Configuration TLP Status Port Descriptions (Configuration Management Interface)
Port Direction Clock Domain Description
CFGTRANSACTION Output USERCLK Configuration Transaction
Received. This output pulses when
a valid Config read or write is
received in the range of 0 - 7Fh
(DWORD# 0 to 127).
CFGTRANSACTIONADDR[6:0] Output USERCLK Configuration Transaction
Address. This 7-bit output contains
the DWORD offset that was
addressed (0 - 7Fh). This output is
valid only when
CFGTRANSACTION pulses.
CFGTRANSACTIONTYPE Output USERCLK Configuration Transaction Type.
This output indicates the type of
Configuration transaction when
CFGTRANSACTION pulses:
0: Read
1: Write