PCIe Express
PCIe Express
UG-PCI10605-2.8
© 2010 Altera Corporation. All rights reserved. ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS and STRATIX are Reg. U.S. Pat.
& Tm. Off. and/or trademarks of Altera Corporation in the U.S. and other countries. All other trademarks and service marks are the property of their respective
holders as described at www.altera.com/common/legal.html. Altera warrants performance of its semiconductor products to current specifications in accordance
with Altera’s standard warranty, but reserves the right to make changes to any products and services at any time without notice. Altera assumes no responsibility or
liability arising out of the application or use of any information, product, or service described herein except as expressly agreed to in writing by Altera. Altera
customers are advised to obtain the latest version of device specifications before relying on any published information and before placing orders for products or
services.
Chapter 1. Datasheet
Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–1
Release Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–4
Device Family Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–4
General Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–5
Device Programming Modes with PCI Express Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–7
Device Families with PCI Express Hard IP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–8
External PHY Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–11
Debug Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–11
IP Core Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–12
Simulation Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–12
Compatibility Testing Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–12
Performance and Resource Utilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–12
Recommended Speed Grades . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–13
OpenCore Plus Evaluation (Not Required for Hard IP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–15
Additional Information
Revision History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Info–1
How to Contact Altera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Info–8
Typographic Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Info–8
This document describes Altera’s IP core for PCI Express. PCI Express is a
high-performance interconnect protocol for use in a variety of applications including
network adapters, storage area networks, embedded controllers, graphic accelerator
boards, and audio-video products. The PCI Express protocol is software
backwards-compatible with the earlier PCI and PCI-X protocols, but is significantly
different from its predecessors. It is a packet-based, serial, point-to-point interconnect
between two devices. The performance is scalable based on the number of lanes and
the generation that is implemented. Altera offers both endpoints and root ports that
are compliant with PCI Express Base Specification 1.0a or 1.1 for Gen1 and PCI Express
Base Specification 2.0 for Gen2. Both endpoints and root ports can be implemented as a
configurable hard IP block rather than programmable logic, saving significant FPGA
resources. The PCI Express IP core is available in ×1, ×2, ×4, and ×8 configurations.
Table 1–1 shows the aggregate bandwidth of a PCI Express link for Gen1 and Gen2
PCI Express IP cores for 1, 2, 4, and 8 lanes. The protocol specifies 2.5 giga-transfers
per second for Gen1 and 5 giga-transfers per second for Gen2. Because the PCI
Express protocol uses 8B10B encoding, there is a 20% overhead which is included in
the figures in Table 1–1. Table 1–1 provides bandwidths for a single TX or RX channel,
so that the numbers in Table 1–1 would be doubled for duplex operation.
×1 ×2 ×4 ×8
PCI Express Gen1 Gbps (1.x compliant) 2 4 8 16
PCI Express Gen2 Gbps (2.0 compliant) 4 8 16 32
f Refer to the PCI Express High Performance Reference Design for bandwidth numbers for
the hard IP implementation in Stratix® IV GX and Arria® II GX devices.
Features
Altera’s PCI Express IP core offers extensive support across multiple device families.
If supports the following key features:
■ Hard IP implementation—PCI Express Base Specification 1.1 or 2.0. The PCI Express
protocol stack including the transaction, data link, and physical layers is hardened
in the device.
■ Soft IP implementation:
■ PCI Express Base Specification 1.0a or 1.1.
■ Many other device families supported. Refer to Table 1–4.
■ The PCI Express protocol stack including transaction, data link, and physical
layer is implemented using FPGA fabric logic elements
■ Feature rich:
■ Support for ×1, ×2, ×4, and ×8 configurations. You can select the ×2 lane
configuration for the Cyclone IV GX without down configuring a ×4
configuration.
■ Optional end-to-end cyclic redundancy code (ECRC) generation and checking
and advanced error reporting (AER) for high reliability applications.
■ Extensive maximum payload size support:
Stratix IV GX and Stratix V GX hard IP—Up to 2 KBytes (128, 256, 512, 1,024,
or 2,048 bytes).
Arria II GX and Cyclone IV GX hard IP—Up to 256 bytes (128 or 256).
Soft IP Implementations—Up to 2 KBytes (128, 256, 512, 1,024, or 2,048 bytes).
■ Easy to use:
■ Easy parameterization.
■ Substantial on-chip resource savings and guaranteed timing closing using the
PCI Express hard IP implementation.
■ Easy adoption with no license requirement for the hard IP implementation.
■ Example designs to get started.
■ SOPC Builder support.
■ New features in the 10.1 release:
■ Support for Stratix V devices has the following new features:
■ 256-bit interface for the Stratix V hard IP implementation.
■ Target design example demonstrating the 256-bit interface that connects the
PCI Express IP core to a root complex and a downstream application with
the 256-bit interface.
■ Verilog HDL and VHDL simulation support.
■ Support for the Gen1 ×1 soft IP implementation in Cyclone IV GX device with
the Avalon-ST interface.
■ Support for the hard IP implementation in the Arria II GZ device with the
Avalon-ST interface and the following capabilities:
■ Gen1 ×1, ×4 64-bit interface, Gen1 ×8 128-bit interface.
■ Gen2 ×1, 64-bit interface, Gen2 ×4, 128-bit interface.
■ Single virtual channel.
Different features are available for the soft and hard IP implementations and for the
three possible design flows. Table 1–2 outlines these different features.
Release Information
Table 1–3 provides information about this release of the PCI Express Compiler.
Altera verifies that the current version of the Quartus® II software compiles the
previous version of each IP core. Any exceptions to this verification are reported in the
MegaCore IP Library Release Notes and Errata. Altera does not verify compilation with
IP core versions older than one release.
General Description
The PCI Express Compiler generates customized PCI Express IP cores you use to
design PCI Express root ports or endpoints, including non-transparent bridges, or
truly unique designs combining multiple PCI Express components in a single Altera
device. The PCI Express IP cores implement all required and most optional features of
the PCI Express specification for the transaction, data link, and physical layers.
The hard IP implementation includes all of the required and most of the optional
features of the specification for the transaction, data link, and physical layers.
Depending upon the device you choose, one to four instances of the hard PCI Express
IP core are available. These instances can be configured to include any combination of
root port and endpoint designs to meet your system requirements. A single device can
also use instances of both the soft and hard IP PCI Express IP core. Figure 1–1
provides a high-level block diagram of the hard IP implementation.
Figure 1–1. PCI Express Hard IP High-Level Block Diagram (Note 1) (2) (3) (4)
Configuration Block
PCI Express TL
Protocol Stack Interface
PCS PMA
LMI
Virtual
Retry Channel
Buffer Test, Debug &
RX PCIe
Buffer Configuration
Reconfig
Logic
This user guide includes a design example and testbench that you can configure as a
root port (RP) or endpoint (EP). You can use these design examples as a starting point
to create and test your own root port and endpoint designs.
f The purpose of the PCI Express Compiler User Guide is to explain how to use the PCI
Express IP core and not to explain the PCI Express protocol. Although there is
inevitable overlap between the two documents, this document should be used in
conjunction with an understanding of the following PCI Express specifications:
PHY Interface for the PCI Express Architecture PCI Express 3.0 and PCI Express Base
Specification 1.0a, 1.1, or 2.0.
Host CPU
Active Serial or
Active Quad
Device Configuration
Config Cntl
Block
PCIe Port
Configuration via
PCI Express
(CvPCIe)
PCIe
IP Core
Stratix V Device
f For more information about configuration via PCI Express (CvPCIe) refer to
“Configuration via PCIe and Autonomous PCIe Cores” in Introducing Innovations at 28
nm to Move Beyond Moore’s Law.
Table 1–5. PCIe Hard IP Configurations for the PCIe Compiler in the Quartus II Software in Version 10.1 (Part 1 of 2)
Device Link Rate (Gbps) ×1 ×2 (1) ×4 ×8
Avalon Streaming (Avalon-ST) Interface using MegaWizard Plug-In Manager Design Flow
2.5 yes no yes yes
Stratix V GX
5.0 yes no yes yes
2.5 yes no yes yes
Stratix IV GX
5.0 yes no yes yes
2.5 yes no yes yes (2)
Arria II GX
5.0 no no no no
2.5 yes no yes yes (2)
Arria II GZ
5.0 yes no yes (2) no
2.5 yes yes yes no
Cyclone IV GX
5.0 no no no no
2.5 yes no yes yes
HardCopy IV GX
5.0 yes no yes yes
Avalon-MM Interface using SOPC Builder Design Flow
2.5 yes no yes no
HardCopy IV GX
5.0 yes no no no
2.5 yes no yes no
Arria II GX
5.0 no no no no
Table 1–5. PCIe Hard IP Configurations for the PCIe Compiler in the Quartus II Software in Version 10.1 (Part 2 of 2)
Device Link Rate (Gbps) ×1 ×2 (1) ×4 ×8
2.5 yes yes yes no
Cyclone IV GX
5.0 no no no no
2.5 yes no yes no
Stratix IV GX
5.0 yes no no no
Note to Table 1–5:
(1) For devices that do not offer a ×2 initial configuration, you can use a ×4 configuration with the upper two lanes left unconnected at the device
pins. The link will negotiate to ×2 if the attached device is ×2 native or capable of negotiating to ×2.
(2) The ×8 support uses a 128-bit bus at 125 MHz.
Table 1–6 lists the Total RX buffer space, Retry buffer size, and Maximum Payload
size for device families that include the hard IP implementation. You can find these
parameters on the Buffer Setup page of the parameter editor.
The PCI Express Compiler allows you to select IP cores that support ×1, ×2, ×4, or ×8
operation (Table 1–7 on page 1–10) that are suitable for either root port or endpoint
applications. You can use the MegaWizard Plug-In Manager or SOPC Builder to
customize the IP core. Figure 1–3 shows a relatively simple application that includes
two PCI Express IP cores, one configured as a root port and the other as an endpoint.
Figure 1–3. PCI Express Application with a Single Root Port and Endpoint
Altera FPGA with Embedded PCIe Altera FPGA with Embedded PCIe
Hard IP Block Hard IP Block
PCIe PCIe
Hard IP Hard IP
User Application User Application
PCI Express Link
Logic RP EP Logic
Figure 1–4 illustrates a heterogeneous topology, including an Altera device with two
PCIe hard IP root ports. One root port connects directly to a second FPGA that
includes an endpoint implemented using the hard IP IP core. The second root port
connects to a switch that multiplexes among three PCI Express endpoints.
Figure 1–4. PCI Express Application Including Stratix IV GX with Two Root Ports (Note 1)
PCIe
PCIe Link Soft IP
User Application
PCIe Link EP Logic
EP
PCIe Hard IP
If you target a device that includes an internal transceiver, you can parameterize the
PCI Express IP core to include a complete PHY layer, including the MAC, PCS, and
PMA layers. If you target other device architectures, the PCI Express Compiler
generates the IP core with the Intel-designed PIPE interface, making the IP core usable
with other PIPE-compliant external PHY devices.
Table 1–7 lists the protocol support for devices that include HSSI transceivers.
1 The device names and part numbers for Altera FPGAs that include internal
transceivers always include the letters GX or GT. If you select a device that does not
include an internal transceiver, you can use the PIPE interface to connect to an
external PHY. Table 3–1 on page 3–1 lists the available external PHY types.
You can customize the payload size, buffer sizes, and configuration space (base
address registers support and other registers). Additionally, the PCI Express Compiler
supports end-to-end cyclic redundancy code (ECRC) and advanced error reporting
for ×1, ×2, ×4, and ×8 configurations.
Debug Features
The PCI Express IP cores also include debug features that allow observation and
control of the IP cores for faster debugging of system-level problems.
IP Core Verification
To ensure compliance with the PCI Express specification, Altera performs extensive
validation of the PCI Express IP cores. Validation includes both simulation and
hardware testing.
Simulation Environment
Altera’s verification simulation environment for the PCI Express IP cores uses
multiple testbenches that consist of industry-standard BFMs driving the PCI Express
link interface. A custom BFM connects to the application-side interface.
Altera performs the following tests in the simulation environment:
■ Directed tests that test all types and sizes of transaction layer packets and all bits of
the configuration space
■ Error injection tests that inject errors in the link, transaction layer packets, and data
link layer packets, and check for the proper response from the IP cores
■ PCI-SIG Compliance Checklist tests that specifically test the items in the checklist
■ Random tests that test a wide range of traffic patterns across one or more virtual
channels
Table 1–8 shows the resource utilization for the hard IP implementation using either
the Avalon-ST or Avalon-MM interface with a maximum payload of 256 bytes and 32
tags for the Avalon-ST interface and 16 tags for the Avalon-MM interface.
Table 1–8. Performance and Resource Utilization in Arria II GX, Arria II GZ, Cyclone IV GX,
Stratix IV GX, and Stratix V GX Devices
Parameters Size
f Refer to “Setting Up and Running Analysis and Synthesis” in Quartus II Help and
Area and Timing Optimization in volume 2 of the Quartus II Handbook for more
information about how to effect this setting.
After you purchase a license for the PCI Express IP core, you can request a license file
from the Altera licensing website at (www.altera.com/licensing) and install it on your
computer. When you request a license file, Altera emails you a license.dat file. If you
do not have internet access, contact your local Altera representative.
With Altera's free OpenCore Plus evaluation feature, you can perform the following
actions:
■ Simulate the behavior of an IP core (Altera IP core or AMPPSM megafunction) in
your system
■ Verify the functionality of your design, as well as evaluate its size and speed
quickly and easily
■ Generate time-limited device programming files for designs that include IP cores
■ Program a device and verify your design in hardware
OpenCore Plus hardware evaluation is not applicable to the hard IP implementation
of the PCI Express Compiler. You can use the hard IP implementation of this IP core
without a separate license.
f For information about IP core verification, installation and licensing, and evaluation
using the OpenCore Plus feature, refer to the OpenCore Plus Evaluation of
Megafunctions.
f For details on installation and licensing, refer to the Altera Software Installation and
Licensing Manual.
OpenCore Plus hardware evaluation supports the following two operation modes:
■ Untethered—the design runs for a limited time.
■ Tethered—requires a connection between your board and the host computer. If
tethered mode is supported by all megafunctions in a design, the device can
operate for a longer time or indefinitely.
All IP cores in a device time out simultaneously when the most restrictive evaluation
time is reached. If your design includes more than one megafunction, a specific IP
core's time-out behavior may be masked by the time-out behavior of the other IP
cores.
For IP cores, the untethered timeout is one hour; the tethered timeout value is
indefinite. Your design stops working after the hardware evaluation time expires.
During time-out the Link Training and Status State Machine (LTSSM) is held in the
reset state.
This section provides step-by-step instructions to help you quickly set up and
simulate the PCI Express IP core testbench. The PCI Express IP core provides
numerous configuration options. The parameters chosen in this chapter are the same
as those chosen in the PCI Express High-Performance Reference Design available on
the Altera website. If you choose the parameters specified in this chapter, you can run
all of the tests included in the Chapter 15, Testbench and Design Example. The
following sections show you how to instantiate the PCI Express IP core by completing
the following steps:
1. Parameterize the PCI Express
2. View Generated Files
3. Simulate the Design
4. Constrain the Design
5. Compile for the Design
1 You can change the page that the MegaWizard Plug-In Manager displays by
clicking Next or Back at the bottom of the dialog box. You can move
directly to a named page by clicking the Parameter Settings, EDA, or
Summary tab.
8. Click the Parameter Settings tab. The System Settings page appears. Note that
there are three tabs labeled Parameter Settings, EDA, and Summary.
9. Figure 2–1 specifies the parameters to run the testbench.
10. Click Next to display the PCI Registers page. To enable all of the tests in the
provided testbench and chaining DMA example design, make the base address
register (BAR) assignments shown in Figure 2–2. Bar2 or Bar3 is required.
11. Click Next to display the Capabilities page. Table 2–3 provides the correct settings
for the Capabilities parameters.
Device Capabilities
Tags supported 32
Implement completion timeout disable Turn this option On
Completion timeout range ABCD
Error Reporting
Implement advanced error reporting Off
Implement ECRC check Off
Implement ECRC generation Off
Implement ECRC forwarding Off
MSI Capabilities
MSI messages requested 4
MSI message 64–bit address capable On
Link Capabilities
Link common clock On
Data link layer active reporting Off
Surprise down reporting Off
Link port number 0x01
Slot Capabilities
Enable slot capability Off
Slot capability register 0x0000000
MSI-X Capabilities
Implement MSI-X Off
Table size 0x000
Offset 0x00000000
BAR indicator (BIR) 0
Pending Bit Array (PBA)
Offset 0x00000000
BAR Indicator 0
12. Click the Buffer Setup tab to open the Buffer Setup page. Table 2–4 provides the
correct settings for this page.
1 For the PCI Express hard IP implementation, the RX Buffer Space Allocation is fixed
at Maximum performance. This setting determines the values for a read-only table
that lists the number of posted header credits, posted data credits, non-posted header
credits, completion header credits, completion data credits, total header credits, and
total RX buffer space. Figure 2–3 shows the Credit Allocation Table.
13. Click Next to display the Power Management page. Table 2–5 describes the
correct settings for this page.
14. Click Next (or the EDA page) to display the simulation setup page.
15. On the EDA tab, turn on Generate simulation model to generate an IP functional
simulation model for the IP core. An IP functional simulation model is a
cycle-accurate VHDL or Verilog HDL model produced by the Quartus II software.
c Use the simulation models only for simulation and not for synthesis or any
other purposes. Using these models for synthesis creates a non-functional
design.
16. On the Summary tab, select the files you want to generate. A gray checkmark
indicates a file that is automatically generated. All other files are optional.
17. Click Finish to generate the IP core, testbench, and supporting files.
1 A report file, <variation name>.html, in your project directory lists each file
generated and provides a description of its contents.
18. Click Yes when you are prompted to add the Quartus II IP File (.qip) to the project.
The .qip is a file generated by the parameter editor or SOPC Builder that contains
all of the necessary assignments and information required to process the core or
system in the Quartus II compiler. Generally, a single .qip file is generated for each
IP core.
■ The simulation files for the chaining DMA design example, stored in the
<working_dir>\top_examples\chaining_dma\testbench sub-directory. The
Quartus II software generates the testbench files if you turn on Generate
simulation model on the EDA tab while generating the PCIe IP core.
0
Figure 2–4. Directory Structure for PCI Express IP Core and Testbench
<working_dir>
<variation>.v = top.v, the parameterized PCI Express IP Core
<variation>.sdc = top.sdc, the timing constraints file
PCI Express <variation>.tcl = top.tcl, general Quartus II settings
IP Core Files
pci_express_compiler-library
contains local copy of the pci express library files needed for
simulation, or compilation, or both
<variation>_examples = top_examples
Simulation and
Quartus II
Compilation common
Includes testbench and incremental compile directories
Figure 2–5 illustrates the top-level modules of this design. As this figure illustrates,
the PCI Express IP core connects to a basic root port bus functional model (BFM) and
an application layer high-performance DMA engine. These two modules, when
combined with the PCI Express IP core, comprise the complete example design. The
test stimulus is contained in altpcietb_bfm_driver_chaining.v. The script to run the
tests is runtb.do. For a detailed explanation of this example design, refer to
Chapter 15, Testbench and Design Example.
Endpoint Example
PCI Express
IP Core
Endpoint Application
Layer Example
RC DMA DMA
Slave Write Read
(Optional)
Endpoint
Memory
(32 KBytes)
f The design files used in this design example are the same files that are used for the
PCI Express High-Performance Reference Design. You can download the required
files on the PCI Express High-Performance Reference Design product page. This
product page includes design files for various devices. The example in this document
uses the Stratix IV GX files. You also must also download altpcie_demo.zip which
includes a software driver that the example design uses.
The Stratix IV .zip file includes files for Gen1 and Gen2 ×1, ×4, and ×8 variants. The
example in this document demonstrates the Gen2 ×8 variant. After you download
and unzip this .zip file, you can copy the files for this variant to your project directory,
<working_dir>. The files for the example in this document are included in the
hip_s4gx_gen2x8_128 directory. The Quartus II project file, top.qsf, is contained in
<working_dir>. You can use this project file as a reference.
1 The endpoint chaining DMA design example DMA controller requires the
use of BAR2 or BAR3.
If you want to do an initial compilation to check any potential issues without creating
pin assignments for a specific board, you can do so after running the following two
steps that constrain the chaining DMA design example:
1. To apply Quartus II constraint files, type the following commands at the Tcl
console command prompt:
source ../../top.tcl r
1 To display the Quartus II Tcl Console, on the View menu, point to Utility
Windows and click Tcl Console.
2. To add the Synopsys timing constraints to your design, complete the following
steps:
a. On the Assignments menu, click Settings.
b. Under Timing Analysis Settings, click TimeQuest Timing Analyzer.
c. Under SDC files to include in the project, click add. Browse to your
<working_dir> to add top.sdc.
Example 2–3. Pin Assignments for the Stratix IV (EP4SGX230KF40C2) Development Board
set_location_assignment PIN_AK35 -to local_rstn_ext
set_location_assignment PIN_R32 -to pcie_rstn
set_location_assignment PIN_AN38 -to refclk
set_location_assignment PIN_AU38 -to rx_in0
set_location_assignment PIN_AR38 -to rx_in1
set_location_assignment PIN_AJ38 -to rx_in2
set_location_assignment PIN_AG38 -to rx_in3
set_location_assignment PIN_AE38 -to rx_in4
set_location_assignment PIN_AC38 -to rx_in5
set_location_assignment PIN_U38 -to rx_in6
set_location_assignment PIN_R38 -to rx_in7
set_instance_assignment -name INPUT_TERMINATION DIFFERENTIAL -to free_100MHz -disable
This chapter describes the PCI Express Compiler IP core parameters, which you can
set on the Parameter Settings tab.
System Settings
The first page of the Parameter Settings tab contains the parameters for the overall
system settings. Table 3–1 describes these settings.
PCI Registers
The ×1 and ×4 IP cores support memory space BARs ranging in size from 128 bytes to
the maximum allowed by a 32-bit or 64-bit BAR. The ×8 IP cores support memory
space BARs from 4 KBytes to the maximum allowed by a 32-bit or 64-bit BAR.
The ×1 and ×4 IP cores in legacy endpoint mode support I/O space BARs sized from
16 Bytes to 4 KBytes. The ×8 IP core only supports I/O space BARs of 4 KBytes.
The SOPC Builder flow supports the following functionality:
■ ×1 and ×4 lane width
■ Native endpoint, with no support for:
■ I/O space BAR
■ 32-bit prefetchable memory
■ 16 Tags
■ 1 Message Signaled Interrupts (MSI)
■ 1 virtual channel
■ Up to 256 bytes maximum payload
In the SOPC Builder design flow, you can choose to allow SOPC Builder to
automatically compute the BAR sizes and Avalon-MM base addresses or to enter the
values manually. The Avalon-MM address is the translated base address
corresponding to a BAR hit of a received request from PCI Express link. Altera
recommends using the Auto setting. However, if you decide to enter the address
translation entries, then you must avoid a conflict in address assignment when
adding other components, making interconnections, and assigning base addresses in
SOPC Builder. This process may take a few iterations between SOPC builder address
assignment and MegaWizard address assignment to resolve address conflicts.
PCI Base Address Registers (0x10, 0x14, 0x18, 0x1C, 0x20, 0x24)
BAR0 size and type mapping (I/O space (1), memory space). BAR0
and BAR1 can be combined to form a 64-bit prefetchable BAR. BAR0
BAR Table (BAR0) BAR type and size
and BAR1 can be configured separate as 32-bit non-prefetchable
memories.) (2)
BAR1 size and type mapping (I/O space (1), memory space. BAR0
and BAR1 can be combined to form a 64-bit prefetchable BAR. BAR0
BAR Table (BAR1) BAR type and size
and BAR1 can be configured separate as 32-bit non-prefetchable
memories.)
BAR2 size and type mapping (I/O space (1), memory space. BAR2
BAR Table (BAR2) and BAR3 can be combined to form a 64-bit prefetchable BAR. BAR2
BAR type and size
(3) and BAR3 can be configured separate as 32-bit non-prefetchable
memories.) (2)
BAR3 size and type mapping (I/O space (1), memory space. BAR2
BAR Table (BAR3) and BAR3 can be combined to form a 64-bit prefetchable BAR. BAR2
BAR type and size
(3) and BAR3 can be configured separate as 32-bit non-prefetchable
memories.)
BAR4 size and type mapping (I/O space (1), memory space. BAR4
BAR Table (BAR4) and BAR5 can be combined to form a 64-bit BAR. BAR4 and BAR5
BAR type and size
(3) can be configured separate as 32-bit non-prefetchable
memories.) (2)
BAR Table (BAR5) BAR5 size and type mapping (I/O space (1), memory space. BAR4
BAR type and size and BAR5 can be combined to form a 64-bit BAR. BAR4 and BAR5
(3) can be configured separate as 32-bit non-prefetchable memories.)
BAR Table (EXP-ROM) Expansion ROM BAR size and type mapping (I/O space, memory
Disable/Enable
(4) space, non-prefetchable).
PCIe Read-Only Registers
Device ID
0x0004 Sets the read-only value of the device ID register.
0x000
Subsystem ID
0x0004 Sets the read-only value of the subsystem device ID register.
0x02C (3)
Revision ID
0x01 Sets the read-only value of the revision ID register.
0x008
Vendor ID Sets the read-only value of the vendor ID register. This parameter
0x1172
0x000 can not be set to 0xFFFF per the PCI Express Specification.
Subsystem vendor ID Sets the read-only value of the subsystem vendor ID register. This
0x1172 parameter can not be set to 0xFFFF per the PCI Express Base
0x02C (3) Specification 1.1 or 2.0.
Class code Sets the read-only value of the class code register.
0xFF0000
0x008
Base and Limit Registers
Disable
Specifies what address widths are supported for the IO base and
Input/Output (5) 16-bit I/O addressing
IO limit registers.
32-bit I/O addressing
Disable
Prefetchable memory Specifies what address widths are supported for the prefetchable
32-bit I/O addressing
(5) memory base register and prefetchable memory limit register.
64-bit I/O addressing
Notes to Table 3–2:
(1) A prefetchable 64-bit BAR is supported. A non-prefetchable 64-bit BAR is not supported because in a typical system, the root port configuration
register of type 1 sets the maximum non-prefetchable memory window to 32-bits.
(2) The SOPC Builder flow does not support I/O space for BAR type mapping. I/O space is only supported for legacy endpoint port types.
(3) Only available for EP designs which require the use of the Header type 0 PCI configuration register.
(4) The SOPC Builder flow does not support the expansion ROM.
(5) Only available for RP designs which require the use of the Header type 1 PCI configuration register.
Capabilities Parameters
The Capabilities page contains the parameters setting various capability properties of
the IP core. These parameters are described in Table 3–3. Some of these parameters are
stored in the Common Configuration Space Header. The byte offset within the
Common Configuration Space Header indicates the parameter address.
1 The Capabilities page that appears in SOPC Builder does not include the Simulation
Mode and Summary tabs.
Device Capabilities
0x084
Indicates the number of tags supported for non-posted requests transmitted by the
application layer. The following options are available:
Hard IP: 32 or 64 tags for ×1, ×4, and ×8
Soft IP: 4–256 tags for ×1 and ×4; 4–32 for ×8
SOPC Builder: 16 tags for ×1 and ×4
This parameter sets the values in the Device Control register (0x088) of the PCI
Express capability structure described in Table 6–7 on page 6–4.
Tags supported 4–256
The transaction layer tracks all outstanding completions for non-posted requests
made by the application. This parameter configures the transaction layer for the
maximum number to track. The application layer must set the tag values in all
non-posted PCI Express headers to be less than this value. Values greater than 32
also set the extended tag field supported bit in the configuration space device
capabilities register. The application can only use tag numbers greater than 31 if
configuration software sets the extended tag field enable bit of the device control
register. This bit is available to the application as cfg_devcsr[8].
This option is only selectable for PCI Express version 2.0 and higher root ports . For
Implement PCI Express version 2.0 and higher endpoints this option is forced to On. For PCI
completion timeout Express version 1.0a and 1.1 variations, this option is forced to Off. The timeout
disable On/Off
range is selectable. When On, the core supports the completion timeout disable
0x0A8 mechanism via the PCI Express Device Control Register 2. The application layer logic
must implement the actual completion timeout mechanism for the required ranges.
Completion This option is only available for PCI Express version 2.0 and higher. It indicates
timeout range device function support for the optional completion timeout programmability
mechanism. This mechanism allows system software to modify the completion
timeout value. This field is applicable only to root ports and endpoints that issue
requests on their own behalf. Completion timeouts are specified and enabled via the
Device Control 2 register (0x0A8) of the PCI Express Capability Structure Version 2.0
described in Table 6–8 on page 6–5. For all other functions this field is reserved and
must be hardwired to 0x0000b. Four time value ranges are defined:
Ranges A–D Range A: 50 µs to 10 ms
Range B: 10 ms to 250 ms
Range C: 250 ms to 4 s
Range D: 4 s to 64 s
Bits are set according to the table below to show timeout value ranges supported.
0x0000b completion timeout programming is not supported and the function must
implement a timeout value in the range 50 s to 50 ms. The following encodings are
used to specify the range:
Link Capabilities
0x090
Indicates if the common reference clock supplied by the system is used as the
Link common clock On/Off reference clock for the PHY. This parameter sets the read-only value of the slot clock
configuration bit in the link status register.
Turn this option on for a downstream port if the component supports the optional
Data link layer active capability of reporting the DL_Active state of the Data Link Control and Management
reporting State Machine. For a hot-plug capable downstream port (as indicated by the Hot-
On/Off
Plug Capable field of the Slot Capabilities register), this option must be
0x094 turned on. For upstream ports and components that do not support this optional
capability, turn this option off.
Surprise down When this option is on, a downstream port supports the optional capability of
On/Off
reporting detecting and reporting the surprise down error condition.
Link port number 0x01 Sets the read-only values of the port number field in the link capabilities register.
Slot Capabilities
0x094
The slot capability is required for root ports if a slot is implemented on the port. Slot
Enable slot
On/Off status is recorded in the PCI Express Capabilities register. Only valid for root
capability
port variants.
Defines the characteristics of the slot. You turn this option on by selecting Enable
slot capability. The various bits are defined as follows:
31 19 18 17 16 15 14 7 6 5 4 3 2 1 0
Physical Slot Number
Buffer Setup
The Buffer Setup page contains the parameters for the receive and retry buffers.
Table 3–4 describes the parameters you can set on this page.
Shows the credits and space allocated for each flow-controllable type, based on the
RX buffer size setting. All virtual channels use the same RX buffer space allocation.
The table does not show non-posted data credits because the IP core always
advertises infinite non-posted data credits and automatically has room for the
maximum number of dwords of data that can be associated with each non-posted
RX Buffer Space header.
Read-Only
Allocation (per The numbers shown for completion headers and completion data indicate how much
table
VC) space is reserved in the RX buffer for completions. However, infinite completion
credits are advertised on the PCI Express link as is required for endpoints. It is up to
the application layer to manage the rate of non-posted requests to ensure that the
RX buffer completion space does not overflow. The hard IP RX buffer is fixed at 16
KBytes for Stratix IV GX devices and 4 KBytes for Arria II GX devices.
Power Management
The Power Management page contains the parameters for setting various power
management properties of the IP core.
1 The Power Management page in the SOPC Builder flow does not include Simulation
Mode and Summary tabs.
Table 3–5 describes the parameters you can set on this page.
Avalon-MM Configuration
The Avalon Configuration page contains parameter settings for the PCI Express
Avalon-MM bridge, available only in the SOPC Builder design flow. Table 3–6
describes the parameters on the Avalon Configuration page.
This chapter describes the architecture of the PCI Express Compiler. For the hard IP
implementation, you can design an endpoint using the Avalon-ST interface or
Avalon-MM interface, or a root port using the Avalon-ST interface. For the soft IP
implementation, you can design an endpoint using the Avalon-ST, Avalon-MM or
Descriptor/Data interface. All configurations contain a transaction layer, a data link
layer, and a PHY layer with the following functions:
■ Transaction Layer—The transaction layer contains the configuration space, which
manages communication with the application layer: the receive and transmit
channels, the receive buffer, and flow control credits. You can choose one of the
following two options for the application layer interface from the MegaWizard
Plug-In Manager design flow:
■ Avalon-ST Interface
■ Descriptor/Data Interface (not recommended for new designs)
You can choose the Avalon-MM interface from the SOPC Builder flow.
■ Data Link Layer—The data link layer, located between the physical layer and the
transaction layer, manages packet transmission and maintains data integrity at the
link level. Specifically, the data link layer performs the following tasks:
■ Manages transmission and reception of data link layer packets
■ Generates all transmission cyclical redundancy code (CRC) values and checks
all CRCs during reception
■ Manages the retry buffer and retry mechanism according to received
ACK/NAK data link layer packets
■ Initializes the flow control mechanism for data link layer packets and routes
flow control credits to and from the transaction layer
■ Physical Layer—The physical layer initializes the speed, lane numbering, and lane
width of the PCI Express link according to packets received from the link and
directives received from higher layers.
1 PCI Express soft IP endpoints comply with the PCI Express Base Specification 1.0a, or
1.1. The PCI Express hard IP endpoint and root port comply with the PCI Express Base
Specification 1.1. 2.0, or 2.1.
Figure 4–1 broadly describes the roles of each layer of the PCI Express IP core.
Tx Port Tx
With information sent The data link layer The physical layer
Avalon-ST Interface by the application ensures packet encodes the packet
layer, the transaction integrity, and adds a and transmits it to the
layer generates a TLP, sequence number and receiving device on the
or which includes a link cyclic redundancy other side of the link.
header and, optionally, code (LCRC) check to
a data payload. the packet.
Avalon-MM Interface
The transaction layer The data link layer The physical layer
Rx Port
disassembles the verifies the packet's decodes the packet
or Rx
transaction and sequence number and and transfers it to the
transfers data to the checks for errors. data link layer.
Data/Descriptor application layer in a
Interface form that it recognizes.
This chapter provides an overview of the architecture of the Altera PCI Express IP
core. It includes the following sections:
■ Application Interfaces
■ Transaction Layer
■ Data Link Layer
■ Physical Layer
■ PCI Express Avalon-MM Bridge
■ Completer Only PCI Express Endpoint Single DWord
Application Interfaces
You can generate the PCI Express IP core with the following application interfaces:
■ Avalon-ST Application Interface
■ Avalon-MM Interface
The PCI Express Avalon-ST adapter maps PCI Express transaction layer packets
(TLPs) to the user application RX and TX busses. Figure 4–2 illustrates this interface.
With information sent The data link layer The physical layer
by the application ensures packet encodes the packet
Avalon-ST layer, the transaction integrity, and adds a and transmits it to the
layer generates a TLP, sequence number and receiving device on the Tx
Tx Port
which includes a link cyclic redundancy other side of the link.
header and, optionally, code (LCRC) check to
a data payload. the packet.
Avalon-ST
Adapter
The transaction layer The data link layer The physical layer
Note (1) disassembles the verifies the packet's decodes the packet
Avalon-ST transaction and sequence number and and transfers it to the Rx
Rx Port transfers data to the checks for errors. data link layer.
application layer in a
form that it recognizes.
Figure 4–3 and Figure 4–4 illustrate the hard and soft IP implementations of the PCI
Express IP core. In both cases the adapter maps the user application Avalon-ST
interface to PCI Express TLPs. The hard IP and soft IP implementations differ in the
following respects:
■ The hard IP implementation includes dedicated clock domain crossing logic
between the PHYMAC and data link layers. In the soft IP implementation you can
specify one or two clock domains for the IP core.
Figure 4–3. PCI Express Hard IP Implementation with Avalon-ST Interface to User Application
To Application Layer
Transaction Layer
Clock Data Avalon-ST Tx
PIPE (TL)
Transceiver PHYMAC Domain Link Adapter
Crossing Layer Configuration Side Band
(CDC)
(DLL) Space
LMI
LMI
Figure 4–4. PCI Express Soft IP Implementation with Avalon-ST Interface to User Application
Avalon-ST Rx
Test_in/Test_out
Test
Table 4–1 provides the application clock frequencies for the hard IP and soft IP
implementations. As this table indicates, the Avalon-ST interface can be either 64 or
128 bits for the hard IP implementation. For the soft IP implementation, the Avalon-ST
interface is 64 bits.
RX Datapath
The RX datapath transports data from the transaction layer to the Avalon-ST interface.
A FIFO buffers the RX data from the transaction layer until the streaming interface
accepts it. The adapter autonomously acknowledges all packets it receives from the
PCI Express IP core. The rx_abort and rx_retry signals of the transaction layer
interface are not used. Masking of non-posted requests is partially supported. Refer to
the description of the rx_st_mask<n> signal for further information about masking.
The Avalon-ST RX datapath has a latency range of 3 to 6 pld_clk cycles.
5. The application waits at least 3 more clock cycles for tx_cred to reflect the
consumed credits. tx_cred does not update with more credits until the current
tx_cred allocation is exhausted.
6. Repeat from Step 2.
1 For Arria II GX, Arria II GZ, HardCopy IV GX, and Stratix IV GX devices, the value of
the non-posted tx_cred represents that there are at least that number of credits
available. The non-posted credits displayed may be less than what is actually
available to the core.
TX Datapath—Stratix V GX/GS
For Stratix V GX devices, the IP core provides the credit limit information as output
signals.The application layer may track credits consumed and use the credit limit
information to calculate the number of credits available. However, to enforce the PCI
Express flow control protocol the IP core also checks the available credits before
sending a request to the link, and if the application layer violates the available credits
for a TLP it transmits, the IP core blocks that TLP and all future TLPs until credits
become available. By tracking the credit consumed information and calculating the
credits available, the application layer can optimize performance by selecting for
transmission only TLPs that have credits available. Refer to “Component Specific
Signals for Stratix V” on page 5–16 for more information about the signals in this
interface.
For example, you may want to send an MSI request only after all TX packets are
issued to the transaction layer. Alternatively, if you cannot interrupt traffic flow to
synchronize the MSI, you can use a counter to count 16 writes (the depth of the FIFO)
after a TX packet has been written to the FIFO (or until the FIFO goes empty) to
ensure that the transaction layer interface receives the packet before issuing the MSI
request. Figure 4–5 illustrates the Avalon-ST TX and MSI datapaths.
1 Because the Stratix V devices do not include the adapter module, MSI
synchronization is not necessary for Stratix V devices.
Figure 4–5. Avalon-ST TX and MSI Datapaths, Arria II GX, Cyclone IV GX, HardCopy IV GX, and
Stratix IV GX Devices
tx_cred0 for
Non-Posted Requests
Non-Posted Credits
tx_st_data0
registers
To Application To Transaction
Layer Layer
FIFO
tx_fifo_empty0
tx_fifo_wrptr0
tx_fifo_rdptr0
app_msi_req
Incremental Compilation
The IP core with Avalon-ST interface includes a fully registered interface between the
user application and the PCI Express transaction layer. For the soft IP implementation,
you can use incremental compilation to lock down the placement and routing of the
PCI Express IP core with the Avalon-ST interface to preserve placement and timing
while changes are made to your application.
Avalon-MM Interface
The PCI Express endpoint which results from the SOPC Builder flow comprises a PCI
Express Avalon-MM bridge that interfaces to hard IP implementation with a soft IP
implementation of the transaction layer optimized for the Avalon-MM protocol.
SOPC Builder With information sent The data link layer The physical layer Tx
Avalon-MM component controls by the application ensures packet encodes the packet
Master Port the upstream PCI layer, the transaction integrity, and adds a and transmits it to the
Express devices. layer generates a TLP, sequence number and receiving device on the
which includes a link cyclic redundancy other side of the link.
SOPC Builder header and, optionally, code (LCRC) check to
Avalon-MM component controls a data payload. the packet.
Slave Port access to internal
(Control Register control and status
Access) registers. The transaction layer The data link layer The physical layer
disassembles the verifies the packet's decodes the packet
Rx
transaction and sequence number and and transfers it to the
Root port controls the transfers data to the checks for errors. data link layer.
Avalon-MM
downstream SOPC application layer in a
Slave Port
Builder component. form that it recognizes.
The PCI Express Avalon-MM bridge provides an interface between the PCI Express
transaction layer and other SOPC Builder components across the system interconnect
fabric.
Transaction Layer
The transaction layer sits between the application layer and the data link layer. It
generates and receives transaction layer packets. Figure 4–7 illustrates the transaction
layer of a component with two initialized virtual channels (VCs). The transaction
layer contains three general subblocks: the transmit datapath, the configuration space,
and the receive datapath, which are shown with vertical braces in Figure 4–7 on
page 4–10.
1 You can parameterize the Stratix IV GX IP core to include one or two virtual channels.
The Arria II GX, Cyclone IV GX, and Stratix V GX implementations include a single
virtual channel.
Tracing a transaction through the receive datapath includes the following steps:
1. The transaction layer receives a TLP from the data link layer.
2. The configuration space determines whether the transaction layer packet is well
formed and directs the packet to the appropriate virtual channel based on traffic
class (TC)/virtual channel (VC) mapping.
3. Within each virtual channel, transaction layer packets are stored in a specific part
of the receive buffer depending on the type of transaction (posted, non-posted,
and completion).
4. The transaction layer packet FIFO block stores the address of the buffered
transaction layer packet.
5. The receive sequencing and reordering block shuffles the order of waiting
transaction layer packets as needed, fetches the address of the priority transaction
layer packet from the transaction layer packet FIFO block, and initiates the transfer
of the transaction layer packet to the application layer.
Figure 4–7. Architecture of the Transaction Layer: Dedicated Receive Buffer per Virtual Channel
Towards Application Layer Towards Data Link Layer
Virtual Channel 1
Tx1 Data
Tx1 Descriptor
Tx Transaction Layer
Packet Description
Tx1 Control & Data
Tx1 Request Flow Control
Sequencing Check & Reordering
Rx Flow
Virtual Channel 0 Control Credits Transmit
Data Path
Tx0 Data
Tx0 Descriptor
Virtual Channel
Arbitration & Tx
Sequencing
Tx0 Control
Tx0 Request Flow Control
Sequencing Check & Reordering
Rx0 Descriptor
Posted & Completion
Non-Posted
Receive
Data Path
Tracing a transaction through the transmit datapath involves the following steps:
1. The IP core informs the application layer that sufficient flow control credits exist
for a particular type of transaction. The IP core uses tx_cred[21:0] for the soft IP
implementation and tx_cred[35:0] for the hard IP implementation. The
application layer may choose to ignore this information.
2. The application layer requests a transaction layer packet transmission. The
application layer must provide the PCI Express transaction and must be prepared
to provide the entire data payload in consecutive cycles.
3. The IP core verifies that sufficient flow control credits exist, and acknowledges or
postpones the request.
4. The transaction layer packet is forwarded by the application layer. The transaction
layer arbitrates among virtual channels, and then forwards the priority transaction
layer packet to the data link layer.
Configuration Space
The configuration space implements the following configuration registers and
associated functions:
■ Header Type 0 Configuration Space for Endpoints
■ Header Type 1 Configuration Space for Root Ports
■ PCI Power Management Capability Structure
■ Message Signaled Interrupt (MSI) Capability Structure
■ Message Signaled Interrupt–X (MSI–X) Capability Structure
■ PCI Express Capability Structure
■ Virtual Channel Capabilities
The configuration space also generates all messages (PME#, INT, error, slot power
limit), MSI requests, and completion packets from configuration requests that flow in
the direction of the root complex, except slot power limit messages, which are
generated by a downstream port in the direction of the PCI Express link. All such
transactions are dependent upon the content of the PCI Express configuration space
as described in the PCI Express Base Specification Revision 1.0a, 1.1, 2.0, or 2.1.
f Refer To “Configuration Space Register Content” on page 6–1 or Chapter 7 in the PCI
Express Base Specification 1.0a, 1.1 or 2.0 for the complete content of these registers.
Tx Transaction Layer
Packet Description & Data Transaction Layer
Packet Generator Tx Packets
Ack/Nack
Packets
Data Link Control Control
Configuration Space Power & Management & Status
Management State Machine
Tx Flow Control Credits Function
Transaction Layer
Packet Checker Rx Packets
Rx Transation Layer
Packet Description & Data
Physical Layer
The physical layer is the lowest level of the IP core. It is the layer closest to the link. It
encodes and transmits packets across a link and accepts and decodes received
packets. The physical layer connects to the link through a high-speed SERDES
interface running at 2.5 Gbps for Gen1 implementations and at 2.5 or 5.0 Gbps for
Gen2 implementations. Only the hard IP implementation supports the Gen2 rate.
The physical layer is responsible for the following actions:
PIPE
MAC Layer Interface PHY layer
Lane n
8B10B Tx+ / Tx-
Scrambler
Link Serializer
for an x8 Link
Encoder
Tx Packets
Device Transceiver (per Lane) with 2.5 or 5.0 Gbps SERDES & PLL
Transmit
Lane 0 Data Path
8B10B Tx+ / Tx-
Scrambler Encoder
SKIP
Generation
LTSSM PIPE
Control & Status State Machine Emulation Logic
Lane n
Link Serializer for an x8 Link
Rx Packets Rx MAC
Lane
Receive
Data Path
Lane 0
8B10B Elastic Rx+ / Rx-
Descrambler Decoder Buffer
Rx MAC
Lane
The physical layer is subdivided by the PIPE Interface Specification into two layers
(bracketed horizontally in Figure 4–9):
■ Media Access Controller (MAC) Layer—The MAC layer includes the Link
Training and Status state machine (LTSSM) and the scrambling/descrambling and
multilane deskew functions.
■ PHY Layer—The PHY layer includes the 8B10B encode/decode functions, elastic
buffering, and serialization/deserialization functions.
The physical layer integrates both digital and analog elements. Intel designed the
PIPE interface to separate the MAC from the PHY. The IP core is compliant with the
PIPE interface, allowing integration with other PIPE-compliant external PHY devices.
Depending on the parameters you set in the parameter editor, the IP core can
automatically instantiate a complete PHY layer when targeting the Arria II GX,
Cyclone IV GX, HardCopy IV GX, Stratix II GX, Stratix IV GX or Stratix V GX devices.
The PHYMAC block is divided in four main sub-blocks:
■ MAC Lane—Both the receive and the transmit path use this block.
■ On the receive side, the block decodes the physical layer packet (PLP) and
reports to the LTSSM the type of TS1/TS2 received and the number of TS1s
received since the LTSSM entered the current state. The LTSSM also reports the
reception of FTS, SKIP and IDL ordered sets and the reception of eight
consecutive D0.0 symbols.
■ On the transmit side, the block multiplexes data from the data link layer and
the LTSTX sub-block. It also adds lane specific information, including the lane
number and the force PAD value when the LTSSM disables the lane during
initialization.
■ LTSSM—This block implements the LTSSM and logic that tracks what is received
and transmitted on each lane.
■ For transmission, it interacts with each MAC lane sub-block and with the
LTSTX sub-block by asserting both global and per-lane control bits to generate
specific physical layer packets.
■ On the receive path, it receives the PLPs reported by each MAC lane sub-block.
It also enables the multilane deskew block and the delay required before the TX
alignment sub-block can move to the recovery or low power state. A higher
layer can direct this block to move to the recovery, disable, hot reset or low
power states through a simple request/acknowledge protocol. This block
reports the physical layer status to higher layers.
■ LTSTX (Ordered Set and SKP Generation)—This sub-block generates the physical
layer packet (PLP). It receives control signals from the LTSSM block and generates
PLP for each lane of the core. It generates the same PLP for all lanes and PAD
symbols for the link or lane number in the corresponding TS1/TS2 fields.
The block also handles the receiver detection operation to the PCS sub-layer by
asserting predefined PIPE signals and waiting for the result. It also generates a
SKIP ordered set at every predefined timeslot and interacts with the TX alignment
block to prevent the insertion of a SKIP ordered set in the middle of packet.
■ Deskew—This sub-block performs the multilane deskew function and the RX
alignment between the number of initialized lanes and the 64-bit data path.
The multilane deskew implements an eight-word FIFO for each lane to store
symbols. Each symbol includes eight data bits and one control bit. The FTS, COM,
and SKP symbols are discarded by the FIFO; the PAD and IDL are replaced by
D0.0 data. When all eight FIFOs contain data, a read can occur.
When the multilane lane deskew block is first enabled, each FIFO begins writing
after the first COM is detected. If all lanes have not detected a COM symbol after 7
clock cycles, they are reset and the resynchronization process restarts, or else the
RX alignment function recreates a 64-bit data word which is sent to the data link
layer.
MSI or
Control Register Control & Status Legacy Interrupt
Sync
Access Slave Reg (CSR) Generator
Clock Domain
Boundary
Address
Translator
Transaction Layer
Data Link Layer
Physical Layer
Avalon-MM
PCI Link
Tx Read
Response
Tx Slave Module
Address
Translator
Clock
Avalon-MM Domain PCI Express
Rx Master Crossing Rx Controller
Avalon-MM
Rx Read
Response
Rx Master Module
1 The PCI Express Avalon-MM bridge supports native PCI Express endpoints, but not
legacy PCI Express endpoints. Therefore, the bridge does not support I/O space BARs
and I/O space requests cannot be generated.
■ The Avalon-MM byte enable may deassert, but only in the last qword of the burst.
1 PCIe IP cores using the Avalon-ST interface can handle burst reads up to the specified
Maximum Payload Size.
As an example, Table 4–2 gives the byte enables for 32-bit data.
translation entries in the address translation table are configurable by the user or by
SOPC Builder. Each entry corresponds to a PCI Express BAR. The BAR hit
information from the request header determines the entry that is used for address
translation. Figure 4–11 depicts the PCI Express Avalon-MM bridge address
translation process.
The Avalon-MM RX master module port has an 8-byte datapath. This 8-byte wide
datapath means that native address alignment Avalon-MM slaves that are connected
to the RX master module port will have their internal registers at 8-byte intervals in
the PCI Express address space. When reading or writing a native address alignment
Avalon-MM Slave (such as the SOPC Builder DMA controller core) the PCI Express
address should increment by eight bytes to access each successive register in the
native address slave.
f For more information, refer to the “Native Address Alignment and Dynamic Bus
Sizing” section in the System Interconnect Fabric for Memory-Mapped Interfaces chapter
in volume 4 of the Quartus II Handbook.
The address translation table contains up to 512 possible address translation entries
that you can configure. Each entry corresponds to a base address of the PCI Express
memory segment of a specific size. The segment size of each entry must be identical.
The total size of all the memory segments is used to determine the number of address
MSB bits to be replaced. In addition, each entry has a 2-bit field, Sp[1:0], that
specifies 32-bit or 64-bit PCI Express addressing for the translated address. Refer to
Figure 4–12 on page 4–23. The most significant bits of the Avalon-MM address are
used by the system interconnect fabric to select the slave port and are not available to
the slave. The next most significant bits of the Avalon-MM address index the address
translation entry to be used for the translation process of MSB replacement.
For example, if the core is configured with an address translation table with the
following attributes:
■ Number of Address Pages—16
■ Size of Address Pages—1 MByte
■ PCI Express Address Size—64 bits
then the values in Figure 4–12 are:
■ N = 20 (due to the 1 MByte page size)
■ Q = 16 (number of pages)
■ M = 24 (20 + 4 bit page selection)
■ P = 64
In this case, the Avalon address is interpreted as follows:
■ Bits [31:24] select the TX slave module port from among other slaves connected to
the same master by the system interconnect fabric. The decode is based on the base
addresses assigned in SOPC Builder.
■ Bits [23:20] select the address translation table entry.
■ Bits [63:20] of the address translation table entry become PCI Express address bits
[63:20].
■ Bits [19:0] are passed through and become PCI Express address bits [19:0].
The address translation table can be hardwired or dynamically configured at run
time. When the IP core is parameterized for dynamic address translation, the address
translation table is implemented in memory and can be accessed through the CRA
slave module. This access mode is useful in a typical PCI Express system where
address allocation occurs after BIOS initialization.
For more information about how to access the dynamic address translation table
through the control register access slave, refer to the “Avalon-MM-to-PCI Express
Address Translation Table” on page 6–9.
Figure 4–12. Avalon-MM-to-PCI Express Address Translation (Note 1) (2) (3) (4) (5)
Low address bits unchanged
Figure 4–13 shows the logic for the entire PCI Express interrupt generation process.
Avalon-MM-to-PCI-Express
Interrupt Status and Interrupt
Enable Register Bits
PCI Express Virtual INTA signalling
A2P_MAILBOX_INT7 (When signal rises ASSERT_INTA Message Sent)
A2P_MB_IRQ7 (When signal falls DEASSERT_INTA Message Sent)
A2P_MAILBOX_INT6
A2P_MB_IRQ6
A2P_MAILBOX_INT5
A2P_MB_IRQ5
A2P_MAILBOX_INT4
A2P_MB_IRQ4
A2P_MAILBOX_INT3
A2P_MB_IRQ3
A2P_MAILBOX_INT2
A2P_MB_IRQ2 SET
D Q
A2P_MAILBOX_INT1
A2P_MB_IRQ1 Q MSI Request
A2P_MAILBOX_INT0 CLR
A2P_MB_IRQ0
AV_IRQ_ASSERTED
AVL_IRQ
MSI Enable
(Configuration Space Message Control Register[0])
The PCI Express Avalon-MM bridge selects either MSI or legacy interrupts
automatically based on the standard interrupt controls in the PCI Express
configuration space registers. The Interrupt Disable bit, which is bit 10 of the
Command register (Table 11–1) can be used to disable legacy interrupts. The MSI enable
bit, which is bit 0 of the MSI Control Status register in the MSI capability shown in
Table 11–3 on page 11–5, can be used to enable MSI interrupts. Only one type of
interrupt can be enabled at a time.
Figure 4–14. Design Including PCI Express Endpoint Completer Only Single DWord SOPC Builder Component
Avalon-MM
Avalon-MM Master Rx PCIe Rx
PCIe Link
PCI Express
PCI Express Root Complex
System Hard IP Core
Interconnect
Fabric Interrupt
Avalon-MM Handler PCIe Tx
Slave
.
.
.
As this figure illustrates, the PCI Express IP core links to a PCI Express root complex.
A bridge component includes PCIe TX and RX blocks, a PCIe RX master, and an
interrupt handler. It connects to the FPGA fabric using an Avalon-MM interface. The
following sections provide an overview of each of block in the bridge.
The RX block passes header information to Avalon-MM master which generates the
corresponding transaction to the Avalon-MM interface. Additional requests from the
PCI Express IP core are not accepted while a request is being processed. For reads, the
RX block deasserts the ready signal until the corresponding completion packet is sent
to the PCI Express IP core via the PCIe TX block. For writes, requests must be sent to
the Avalon-MM system interconnect fabric before the next request is accepted.
f For more information about legal combinations of byte enables, refer to Chapter 3,
Avalon Memory-Mapped Interfaces in the Avalon Interface Specifications.
1 When the MSI registers in the configuration space of the completer only single dword
PCI Express IP core are updated, there is a delay before this information is propagated
to the Bridge module shown in Figure 4–14. You must allow time for the Bridge
module to update the MSI register information. Under normal operation,
initialization of the MSI registers should occur substantially before any interrupt is
generated. However, failure to wait until the update completes may result in any of
the following behaviors:
This chapter describes the signals that are part of the PCI Express IP core for each of
the following primary configurations:
■ Signals in the Hard IP Implementation Root Port with Avalon-ST Interface Signals
■ Signals in the Hard IP Implementation Endpoint with Avalon-ST Interface
■ Signals in the Soft IP Implementation with Avalon-ST Interface
■ Signals in the Hard IP Implementation with Avalon-ST Interface for
Stratix V Devices
■ Signals in the SOPC Builder Soft or Hard Full-Featured IP Core with Avalon-MM
Interface
■ Signals in the Completer-Only, Single Dword, IP Core with Avalon-MM Interface
1 Altera does not recommend the Descriptor/Data interface for new designs.
Avalon-ST Interface
The main functional differences between the hard IP and soft IP implementations
using an Avalon-ST interface are the configuration and clocking schemes. In addition,
the hard IP implementation offers a 128-bit Avalon-ST bus for some configurations. In
128-bit mode, the streaming interface clock, pld_clk, is one-half the frequency of the
core clock, core_clk, and the streaming data width is 128 bits. In 64-bit mode, the
streaming interface clock, pld_clk, is the same frequency as the core clock, core_clk,
and the streaming data width is 64 bits.
Figure 5–1, Figure 5–2, Figure 5–3, and Figure 5–4 illustrate the top-level signals for IP
cores that use the Avalon-ST interface.
Figure 5–1. Signals in the Hard IP Implementation Root Port with Avalon-ST Interface Signals
Figure 5–2. Signals in the Hard IP Implementation Endpoint with Avalon-ST Interface
refclk pipe_mode
clk250_in - x8 pipe_rstn
Clock clk250_out - x8 pipe_txclk
clk125_in - x1 and x4 rate_ext
clk125_out - x1 and x4 xphy_pll_areset
xphy_pll_locked
npor
txdetectrx_ext
srst - x1 and x4
txdata0_ext[15:0]
crst - x1 and x4
txdatak0_ext[1:0]
Reset rstn - x8
txelecidle0_ext
l2_exit Repeated for
txcompl0_ext
hotrst_exit Lanes 1-3 in
rxpolarity0_ext 16-bit
dlup_exit x4 MegaCore
rxdata0_ext[15:0] PIPE
dl_ltssm[4:0]
rxdatak0_ext[1:0] for x1
app_msi_req rxvalid0_ext and x4
app_msi_ack rxelecidle0_ext
for
app_msi_tc [2:0] rxstatus0_ext[2:0]
external
app_msi_num [4:0] phystatus_ext
Interrupt PHY
pex_msi_num [4:0] powerdown_ext[1:0]
app_int_sts
app_int_ack - x1 and x4 txdetectrx_ext
txdata0_ext[7:0]
Power pme_to_cr txdatak0_ext
Mnmt pme_to_sr txelecidle0_ext
cfg_pmcsr[31:0] txcompl0_ext Repeated for
cfg_tcvcmap [23:0] rxpolarity0_ext 8-bit Lanes 1-7 in
cfg_busdev [12:0] powerdown_ext[1:0] PIPE x8 MegaCore
cfg_prmcsr [31:0] rxdata0_ext[7:0] for x8
Config rxdatak0_ext
cfg_devcsr [31:0]
cfg_linkcsr [31:0] rxvalid0_ext
cfg_msicsr [15:0] phystatus_ext
rxelecidle0_ext
cpl_err[6:0] rxstatus0_ext[2:0]
Completion cpl_pending
Interface err_desc_func0 [127:0]- x1, x4 test_in[31:0]
test_out[511:0]
( user specified, Test
up to 512 bits) Interface
tx_st_fifo_empty0
tx_st_fifo_full0
Figure 5–4. Signals in the Hard IP Implementation with Avalon-ST Interface for Stratix V Devices
tl_cfg_add[3:0]
avs_pcie_reconfig_address[7:0] tl_cfg_ctl[31:0]
avs_pcie_reconfig_byteenable[1:0] tl_cfg_ctl_wr Config
avs_pcie_reconfig_chipselect tl_cfg_sts[52:0]
avs_pcie_reconfig_write tl_cfg_sts_wr
avs_pcie_reconfig_writedata[15:0] hpg_ctrler[4:0]
Reconfiguration
Block avs_pcie_reconfig_waitrequest
avs_pcie_reconfig_read lmi_dout[31:0]
(optional) avs_pcie_reconfig_readdata[15:0] lmi_rden
avs_pcie_reconfig_readdatavalid lmi_wren
lmi_ack LMI
avs_pcie_reconfig_clk
avs_pcie_reconfig_rstn lmi_addr[11:0]
lmi_din[31:0]
test_out[63:0]
derr_cor_ext_rcv[1:0] test_in[39:0]
derr_rpl Test
ECC Error derr_cor_ext_rpl lane_act[3:0]
rx_st_fifo_full Interface
r2c_err0
r2c_err1 rx_st_fifo_empty
app_msi_req
aer_msi_num[4:0] app_msi_ack Interrupt
pex_msi_num[4:0] app_msi_tc[2:0] (Endpoint)
Interrupts int_status[4:0] app_msi_num[4:0]
(Root Port) serr_out pex_msi_num[4:0]
Table 5–1 lists the interfaces of both the hard IP and soft IP implementations with
links to the subsequent sections that describe each interface.
Table 5–1. Signal Groups in the PCI Express IP core with Avalon-ST Interface
Hard IP
Soft
Signal Group End Root Description
IP
point Port
Logical
Avalon-ST RX v v v “64-, 128-, or 256-Bit Avalon-ST RX Port” on page 5–7
Avalon-ST TX v v v “64-, 128-, or 256-Bit Avalon-ST TX Port” on page 5–13
Clock v v — “Clock Signals—Hard IP Implementation” on page 5–23
Clock — — v “Clock Signals—Soft IP Implementation” on page 5–23
Reset and link training v v v “Reset and Link Training Signals” on page 5–24
ECC error v v — “ECC Error Signals” on page 29
Interrupt v — v “PCI Express Interrupts for Endpoints” on page 5–29
Interrupt and global error — v — “PCI Express Interrupts for Root Ports” on page 5–31
Configuration space v v — “Configuration Space Signals—Hard IP Implementation” on page 5–31
Configuration space — — v “Configuration Space Signals—Soft IP Implementation” on page 5–39
LMI v v — “LMI Signals—Hard IP Implementation” on page 5–40
v v
PCI Express “PCI Express Reconfiguration Block Signals—Hard IP
—
reconfiguration block Implementation” on page 5–41
Power management v v v “Power Management Signals” on page 5–42
Completion v v v “Completion Side Band Signals” on page 5–44
Physical
Transceiver control v v v “Transceiver Control” on page 5–53
Serial v v v “Serial Interface Signals” on page 5–55
PIPE (1) (1) v “PIPE Interface Signals” on page 5–56
Test
Test v v “Test Interface Signals—Hard IP Implementation” on page 5–59
Test — — v “Test Interface Signals—Soft IP Implementation” on page 5–60
Note to Table 5–1:
(1) Provided for simulation only
To facilitate the interface to 64-bit memories, the IP core always aligns data to the
qword or 64 bits; consequently, if the header presents an address that is not qword
aligned, the IP core, shifts the data within the qword to achieve the correct alignment.
Figure 5–5 shows how an address that is not qword aligned, 0x4, is stored in memory.
The byte enables only qualify data that is being written. This means that the byte
enables are undefined for 0x0–0x3. This example corresponds to Figure 5–6 on
page 5–10. Qword alignment is a feature of the IP core that cannot be turned off.
Qword alignment applies to all types of request TLPs with data, including memory
writes, configuration writes, and I/O writes. The alignment of the request TLP
depends on bit 2 of the request address. For completion TLPs with data, alignment
depends on bit 2 of the lower address field. This bit is always 0 (aligned to qword
boundary) for completion with data TLPs that are for configuration read or I/O read
requests.
Figure 5–5. Qword Alignment
PCB Memory
64 bits
.
.
.
0x18
0x10
Valid Data
0x8
Valid Data
0x0
f Refer to Appendix A, Transaction Layer Packet (TLP) Header Formats for the formats
of all TLPs.
Table 5–3 shows the byte ordering for header and data packets for
Figure 5–6–Figure 5–13.
Figure 5–6 illustrates the mapping of Avalon-ST RX packets to PCI Express TLPs for a
three dword header with non-qword aligned addresses with a 64-bit bus. In this
example, the byte address is unaligned and ends with 0x4, causing the first data to
correspond to rx_st_data[63:32].
f For more information about the Avalon-ST protocol, refer to the Avalon Interface
Specifications.
1 Note that the Avalon-ST protocol, as defined in Avalon Interface Specifications, is big
endian, while the PCI Express IP core packs symbols into words in little endian
format. Consequently, you cannot use the standard data format adapters available in
SOPC Builder with PCI Express IP cores that use the Avalon-ST interface.
Figure 5–6. 64-Bit Avalon-ST rx_st_data<n> Cycle Definition for 3-DWord Header TLPs with Non-QWord Aligned Address
clk
rx_st_sop
rx_st_eop
rx_st_be[7:4] F F
rx_st_be[3:0] F
Figure 5–7 illustrates the mapping of Avalon-ST RX packets to PCI Express TLPs for a
three dword header with qword aligned addresses. Note that the byte enables
indicate the first byte of data is not valid and the last dword of data has a single valid
byte.
Figure 5–7. 64-Bit Avalon-ST rx_st_data<n> Cycle Definition for 3-DWord Header TLPs with QWord Aligned Address
(Note 1)
clk
rx_st_sop
rx_st_eop
rx_st_be[7:4] F 1
rx_st_be[3:0] E F
Figure 5–8 shows the mapping of Avalon-ST RX packets to PCI Express TLPs for TLPs
for a four dword with qword aligned addresses with a 64-bit bus.
Figure 5–8. 64-Bit Avalon-ST rx_st_data<n> Cycle Definitions for 4-DWord Header TLPs with QWord Aligned Addresses
clk
rx_st_sop
rx_st_eop
rx_st_be[7:4] F
rx_st_be[3:0] F
Figure 5–9 shows the mapping of Avalon-ST RX packet to PCI Express TLPs for TLPs
for a four dword header with non-qword addresses with a 64-bit bus. Note that the
address of the first dword is 0x4. The address of the first enabled byte is 0x6. This
example shows one valid word in the first dword, as indicated by the rx_st_be signal.
Figure 5–9. 64-Bit Avalon-ST rx_st_data<n> Cycle Definitions for 4-DWord Header TLPs with Non-QWord Addresses
(Note 1)
clk
rx_st_sop
rx_st_eop
rx_st_bardec[7:0] 10
rx_st_be[7:4] C F
rx_st_be[3:0] F
Figure 5–10 shows the mapping of 128-bit Avalon-ST RX packets to PCI Express TLPs
for TLPs with a three dword header and qword aligned addresses.
Figure 5–10. 128-Bit Avalon-ST rx_st_data<n> Cycle Definition for 3-DWord Header TLPs with QWord Aligned Addresses
clk
rx_st_valid
rx_st_data[127:96] data3
rx_st_bardec[7:0] 01
rx_st_sop
rx_st_eop
rx_st_empty
Figure 5–11 shows the mapping of 128-bit Avalon-ST RX packets to PCI Express TLPs
for TLPs with a 3 dword header and non-qword aligned addresses.
Figure 5–11. 128-Bit Avalon-ST rx_st_data<n> Cycle Definition for 3-DWord Header TLPs with non-QWord Aligned
Addresses
clk
rx_st_valid
rx_st_sop
rx_st_eop
rx_st_empty
Figure 5–12 shows the mapping of 128-bit Avalon-ST RX packets to PCI Express TLPs
for a four dword header with non-qword aligned addresses. In this example,
rx_st_empty is low because the data ends in the upper 64 bits of rx_st_data.
Figure 5–12. 128-Bit Avalon-ST rx_st_data Cycle Definition for 4-DWord Header TLPs with non-QWord Aligned Addresses
clk
rx_st_valid
rx_st_sop
rx_st_eop
rx_st_empty
Figure 5–13 shows the mapping of 128-bit Avalon-ST RX packets to PCI Express TLPs
for a four dword header with qword aligned addresses.
Figure 5–13. 128-Bit Avalon-ST rx_st_data Cycle Definition for 4-DWord Header TLPs with QWord Aligned Addresses
clk
rx_st_valid
rx_st_sop
rx_st_eop
rx_st_empty
f For a complete description of the TLP packet header formats, refer to Appendix A,
Transaction Layer Packet (TLP) Header Formats.
Figure 5–14 illustrates the timing of the Avalon-ST RX interface. On this interface, the
core deasserts rx_st_valid in response to the deassertion of rx_st_ready from the
application.
rx_st_ready
3 cycles
max latency
rx_st_valid
rx_st_sop
rx_st_eop
Figure 5–15 illustrates the TLP fields of the tx_cred bus. For completion header,
non-posted header, non-posted data and posted header fields, a saturation value of
seven indicates seven or more available transmit credits.
For the hard IP implementation in Arria II GX, HardCopy IV GX, and Stratix IV GX
devices, a saturation value of six or greater should be used for non-posted header and
non-posted data. If your system allocates a single non-posted credit, you can use the
receipt of completions to detect the release of credit for non-posted writes.
Figure 5–16 illustrates the mapping between Avalon-ST TX packets and PCI Express
TLPs for 3 dword header TLPs with non-qword aligned addresses with a 64-bit bus.
(Figure 5–5 on page 5–9 illustrates the storage of non-qword aligned data.)
Figure 5–16. 64-Bit Avalon-ST tx_st_data Cycle Definition for 3-DWord Header TLP with Non-QWord Aligned Address
1 2 3
clk
tx_st_sop
tx_st_eop
Figure 5–17 illustrates the mapping between Avalon-ST TX packets and PCI Express
TLPs for a four dword header with qword aligned addresses with a 64-bit bus.
Figure 5–17. 64-Bit Avalon-ST tx_st_data Cycle Definition for 4–DWord TLP with QWord Aligned Address
1 2 3
clk
tx_st_sop
tx_st_eop
Figure 5–18 illustrates the mapping between Avalon-ST TX packets and PCI Express
TLPs for four dword header with non-qword aligned addresses with a 64-bit bus.
Figure 5–18. 64-Bit Avalon-ST tx_st_data Cycle Definition for TLP 4-DWord Header with Non-QWord Aligned Address
clk
tx_st_sop
tx_st_eop
Figure 5–19 shows the mapping of 128-bit Avalon-ST TX packets to PCI Express TLPs
for a three dword header with qword aligned addresses.
Figure 5–19. 128-Bit Avalon-ST tx_st_data Cycle Definition for 3-DWord Header TLP with QWord Aligned Address
clk
tx_st_valid
tx_st_data[127:96] Data3
tx_st_sop
tx_st_eop
tx_st_empty
Figure 5–20 shows the mapping of 128-bit Avalon-ST TX packets to PCI Express TLPs
for a 3 dword header with non-qword aligned addresses.
Figure 5–20. 128-Bit Avalon-ST tx_st_data Cycle Definition for 3-DWord Header TLP with non-QWord Aligned Address
clk
tx_st_valid
tx_st_data[127:96] Data0 Data 4
Figure 5–21 shows the mapping of 128-bit Avalon-ST TX packets to PCI Express TLPs
for a four dword header TLP with qword aligned data.
Figure 5–21. 128-Bit Avalon-ST tx_st_data Cycle Definition for 4-DWord Header TLP with QWord Aligned Address
clk
tx_st_sop
tx_st_eop
tx_st_empty
Figure 5–22 shows the mapping of 128-bit Avalon-ST TX packets to PCI Express TLPs
for a four dword header TLP with non-qword aligned addresses. In this example,
tx_st_empty is low because the data ends in the upper 64 bits of tx_st_data.
Figure 5–22. 128-Bit Avalon-ST tx_st_data Cycle Definition for 4-DWord Header TLP with non-QWord Aligned Address
clk
tx_st_valid
tx_st_sop
tx_st_eop
tx_st_empty
Figure 5–23 illustrates the layout of header and data for a 3-DWord header for 256-bit
with aligned and unaligned data.
Figure 5–23. 256-Bit Avalon-ST tx_sd_data Cycle Definition for 3-DWord Header TLP with QWord Aligned Address
clk
tx_st_sop
tx_st_eop
tx_st_emp[1:0] 01 10
Figure 5–24 shows the location of headers and data for the 256-bit Avalon-ST packets.
This layout of data applies to both the TX and RX buses.
Figure 5–24. Location of Headers and Data for Avalon-ST 256-Bit Interface
D1 D9 D0 D8 D1 D9 D2
D0 D8 D7 D0 D8 D1 D9
H3 D7 H3 D6 D7 D0 D8
H2 D6 H2 D5 H2 D6 H2 D7
H1 D5 H1 D4 H1 D5 H1 D6
D5
255 H0 D4 255 H0 D3
255
H0 D4
255
H0
Figure 5–25 illustrates the timing of the Avalon-ST TX interface. The core can deassert
tx_st_ready<n> to throttle the application which is the source.
1 2 3 4 5 6 7 8 9 10 11 12 13
clk
tx_st_ready
response_time
tx_st_valid
tx_st_sop
tx_st_eop
ECRC Forwarding
On the Avalon-ST interface, the ECRC field follows the same alignment rules as
payload data. For packets with payload, the ECRC is appended to the data as an extra
dword of payload. For packets without payload, the ECRC field follows the address
alignment as if it were a one dword payload. Depending on the address alignment,
Figure 5–8 on page 5–10 through Figure 5–13 on page 5–12 illustrate the position of
the ECRC data for RX data. Figure 5–16 on page 5–18 through Figure 5–22 on
page 5–20 illustrate the position of ECRC data for TX data. For packets with no
payload data, the ECRC would correspond to Data0 in these figures.
Refer to Chapter 7, Reset and Clocks for a complete description of the clock interface
for each PCI Express IP core.
<variant>_plus.v or .vhd
pcie_rstn directly resets all sticky PCI Express IP core configuration registers. Sticky
pcie_rstn I registers are those registers that fail to reset in L2 low power mode or upon a fundamental
reset. This is an asynchronous reset. This signal is not used in Stratix V devices.
reset_n is the system-wide reset which resets all PCI Express IP core circuitry not affected by
local_rstn I
pcie_rstn. This is an asynchronous reset.This signal is not used in Stratix V devices.
Both <variant>_plus.v or .vhd and <variant>.v or .vhd
Indicates successful speed negotiation to Gen2 when asserted. This signal is not used in
suc_spd_neg O
Stratix V devices.
LTSSM state: The LTSSM state machine encoding defines the following states:
■ 00000: detect.quiet
■ 00001: detect.active
■ 00010: polling.active
■ 00011: polling.compliance
■ 00100: polling.configuration
■ 00101: polling.speed
■ 00110: config.linkwidthstart
■ 00111: config.linkaccept
■ 01000: config.lanenumaccept
■ 01001: config.lanenumwait
dl_ltssm[4:0] O
■ 01010: config.complete
■ 01011: config.idle
■ 01100: recovery.rcvlock
■ 01101: recovery.rcvconfig
■ 01110: recovery.idle
■ 01111: L0
■ 10000: disable
■ 10001: loopback.entry
■ 10010: loopback.active
■ 10011: loopback.exit
■ 10100: hot.reset
■ 10110: L1.entry
dl_ltssm[4:0] ■ 10111: L1.idle
O
(continued) ■ 11000: L2.idle
■ 11001: L2.transmit.wake
Reset Details
The following description applies to all devices except Stratix V. Refer to “Reset
Details for Stratix V Devices” on page 5–27 for Stratix V devices.
The hard IP implementation (×1, ×4, and ×8) or the soft IP implementation (×1 and
×4) have three reset inputs: npor, srst, and crst. npor is used internally for all sticky
registers that may not be reset in L2 low power mode or by the fundamental reset).
npor is typically generated by a logical OR of the power-on-reset generator and the
perst signal as specified in the PCI Express card electromechanical specification. The
srst signal is a synchronous reset of the datapath state machines. The crst signal is a
synchronous reset of the nonsticky configuration space registers. For endpoints,
whenever the l2_exit, hotrst_exit, dlup_exit, or other power-on-reset signals are
asserted, srst and crst should be asserted for one or more cycles for the soft IP
implementation and for at least 2 clock cycles for hard IP implementation.
Figure 5–26 provides a simplified view of the logic controlled by the reset signals.
<variant>.v or .vhd
<variant>_core.v or .vhd
altpcie_hip_pipen1b.v or .vhd
npor
SERDES Reset
State Machine
Configuration Space
Sticky Registers
Configuration Space
crst Non-Sticky Registers
srst
Datapath State Machines of
MegaCore Fucntion
For root ports, srst should be asserted whenever l2_exit, hotrst_exit, dlup_exit,
and power-on-reset signals are asserted. The root port crst signal should be asserted
whenever l2_exit, hotrst_exit and other power-on-reset signals are asserted. When
the perst# signal is asserted, srst and crst should be asserted for a longer period of
time to ensure that the root complex is stable and ready for link training.
The PCI Express IP core soft IP implementation (×8) has two reset inputs, npor and
rstn. The npor reset is used internally for all sticky registers that may not be reset in
L2 low power mode or by the fundamental reset. npor is typically generated by a
logical OR of the power-on-reset generator and the perst# signal as specified in the
PCI Express Card electromechanical Specification.
The rstn signal is an asynchronous reset of the datapath state machines and the
nonsticky configuration space registers. Whenever the l2_exit, hotrst_exit,
dlup_exit, or other power-on-reset signals are asserted, rstn should be asserted for
one or more cycles. When the perst# signal is asserted, rstn should be asserted for a
longer period of time to ensure that the root complex is stable and ready for link
training.
<variant>.v or .vhd
<variant>_core.v or .vhd
altpcie_hip_256_pipen1b.v
perst_n
pld_clrpmapcship SERDES Reset
State Machine
Configuration Space
Sticky Registers
Configuration Space
pld_clrhip_n Non-Sticky Registers
Figure 5–28 illustrates the sequencing for the processes that configure the FPGA and
bring up the PCI Express link.
Figure 5–28. Sequencing of FPGA Configuration and PCIe Link Initialization in Stratix V Devices
IO_POF_Load
PCIe_LinkTraining_Enumeration
PLD_Fabric_Programming
pld_clk_ready
pld_clk_in_use
100 ms
perst_n
IO_POF_Load
PCIe_LinkTraining_Enumeration
For additional information about reset in Stratix V devices refer to “Reset in Stratix V
Devices” on page 7–4.
Table 5–8. ECC Error Signals for Hard IP Implementation (Note 1) (Note 2)
Signal I/O Description
Indicates a correctable error in the RX buffer for the corresponding virtual
derr_cor_ext_rcv[1:0] (3) O
channel.
derr_rpl (3) O Indicates an uncorrectable error in the retry buffer.
derr_cor_ext_rpl (3) O Indicates a correctable error in the retry buffer.
r2c_err0 O Indicates an uncorrectable ECC error on VC0.
r2c_err1 O Indicates an uncorrectable ECC error on VC1
Note to Table 5–8:
(1) These signals are not available for the hard IP implementation in Arria II GX devices.
(2) The Avalon-ST rx_st_err<n> described in Table 5–2 on page 5–7 indicates an uncorrectable error in the RX buffer.
(3) This signal applies only when ECC is enabled in some hard IP configurations. Refer to Table 1–9 on page 1–14 for more information.
Table 5–10 shows the layout of the Configuration MSI Control Status Register.
Table 5–10. Configuration MSI Control Status Register
Field and Bit Map
15 9 8 7 6 4 3 1 0
64-bit
mask multiple message MSI
reserved address multiple message enable
capability capable enable
capability
Table 5–11 outlines the use of the various fields of the Configuration MSI Control
Status Register.
Table 5–11. Configuration MSI Control Status Register Field Descriptions
Bit(s) Field Description
[15:9] reserved —
Per vector masking capable. This bit is hardwired to 0 because the function does not
mask support the optional MSI per vector masking using the Mask_Bits and
[8]
capability Pending_Bits registers defined in the PCI Local Bus Specification, Rev. 3.0. Per
vector masking can be implemented using application layer registers.
core_clk
pld_clk 64-bit mode
tl_cfg_ctl[31:0] data0 data1
tl_cfg_ctl_wr
Figure 5–31 illustrates the timing of the tl_cfg_ctl interface for the Arria II GX,
Cyclone IV GX, HardCopy IV, and Stratix IV GX devices when using a 128-bit
interface.
core_clk
pld_clk 128-bit mode
tl_cfg_ctl[31:0] data0 data1
tl_cfg_ctl_wr
Figure 5–32 illustrates the timing of the tl_cfg_sts interface for the Arria II GX,
Cyclone IV GX, HardCopy IV, and Stratix IV GX devices when using a 64-bit
interface.
core_clk
pld_clk 64-bit mode
Figure 5–33 illustrates the timing of the tl_cfg_sts interface for the Arria II GX,
Cyclone IV GX, HardCopy IV, and Stratix IV GX devices when using a 128-bit
interface.
Figure 5–33. tl_cfg_sts Timing (Hard IP Implementation)
core_clk
In the example design created with the PCI Express IP core, there is a Verilog HDL
module or VHDL entity included in the altpcierd_tl_cfg_sample.v and
altpcierd_tl_cfg_sample.vhd files respectively that you can use to sample the
configuration space signals. In this module or entity the tl_cfg_ctl_wr and
tl_cfg_sts_wr signals are registered twice and then the edges of the delayed signals
are used to enable sampling of the tl_cfg_ctl and tl_cfg_sts busses.
Because the hard IP core_clk is much earlier than the pld_clk, the Quartus II
software tries to add delay to the signals to avoid hold time violations. This delay is
only necessary for the tl_cfg_ctl_wr and tl_cfg_sts_wr signals. You can place
multicycle setup and hold constraints of three cycles on them to avoid timing issues if
the logic shown in Figure 5–30 and Figure 5–32 is used. The multicycle setup and hold
contraints are automatically included in the <variation_name>.sdc file that is created
with the hard IP variation. In some cases, depending on the exact device, speed grade
and global routing resources used for the pld_clk, the Quartus II software may have
difficulty avoiding hold time violations on the tl_cfg_ctl_wr and tl_cfg_sts_wr
signals. If hold time violations occur in your design, you can reduce the multicycle
setup time for these signals to 0. The exact time the signals are clocked is not critical to
the design, just that the signals are reliably sampled. There are instruction comments
in the <variation_name>.sdc file on making these modifications.
core_clk
tl_cfg_ctl[31:0] data0 data1 data2 data3 data4 data5 data6
Figure 5–35 shows the timing for updates to the tl_cfg_sts bus in Stratix V devices.
pld_clk
tl_cfg_sts[53:0] data0 data1 data2 data3 data4 data5 data6
Table 5–15. Multiplexed Configuration Register Information Available on tl_cfg_ctl (Part 1 of 2) (Note 1)
Address 31:24 23:16 15:8 7:0
cfg_dev2csr[15:0]
cfg_devcsr[15:0]
0
cfg_devcsr[14:12]= cfg_devcsr[7:5]=
Max Read Req Size (2) Max Payload (2)
1 cfg_slotcsr[31:16] cfg_slotcsr[15:0]
2 cfg_linkscr[15:0] cfg_link2csr[15:0]
3 8’h00 cfg_prmcsr[15:0] cfg_rootcsr[7:0]
4 cfg_seccsr[15:0] cfg_secbus[7:0] cfg_subbus[7:0]
5 12’h000 cfg_io_bas[19:0]
6 12’h000 cfg_io_lim[19:0]
7 8h’00 cfg_np_bas[11:0] cfg_np_lim[11:0]
8 cfg_pr_bas[31:0]
9 20’h00000 cfg_pr_bas[43:32]
A cfg_pr_lim[31:0]
B 20’h00000 cfg_pr_lim[43:32]
C cfg_pmcsr[31:0]
D cfg_msixcsr[15:0] cfg_msicsr[15:0]
E 8’h00 cfg_tcvcmap[23:0]
Table 5–15. Multiplexed Configuration Register Information Available on tl_cfg_ctl (Part 2 of 2) (Note 1)
Address 31:24 23:16 15:8 7:0
F 16’h0000 3’b000 cfg_busdev[12:0]
Note to Table 5–15:
(1) Items in blue are only available for root ports.
(2) This field is encoded as specified in Section 7.8.4 of the PCI Express Base Specification.(3’b000–3b101 correspond to 128–4096 bytes).
Table 5–16 describes the configuration space registers referred to in Table 5–13 and
Table 5–15.
PCI Express
MegaCore Function
lmi_dout 32
lmi_ack
LMI
lmi_rden
lmi_wren
Configuration Space
lmi_addr 12 128 32-bit registers
(4 KBytes)
lmi_din 32
pld_clk
The LMI interface is synchronized to pld_clk and runs at frequencies up to 250 MHz.
The LMI address is the same as the PCIe configuration space address. The read and
write data are always 32 bits. The LMI interface provides the same access to
configuration space registers as configuration TLP requests. Register bits have the
same attributes, (read only, read/write, and so on) for accesses from the LMI interface
and from configuration TLP requests.
Table 5–18 describes the signals that comprise the LMI interface.
pld_clk
lmi_rden
lmi_addr[11:0]
lmi_dout[31:0]
lmi_ack
pld_clk
lmi_wren
lmi_din[31:0]
lmi_addr[11:0]
lmi_ack
f For a detailed description of the Avalon-MM protocol, refer to the Avalon Memory-
Mapped Interfaces chapter in the Avalon Interface Specifications.
Table 5–21 shows the layout of the Power Management Capabilities register.
Table 5–21. Power Management Capabilities Register
3124 2216 15 1413 129 8 72 10
data
rsvd PME_status data_scale data_select PME_EN rsvd PM_state
register
Table 5–22 outlines the use of the various fields of the Power Management
Capabilities register.
Figure 5–39 illustrates the behavior of pme_to_sr and pme_to_cr in an endpoint. First,
the IP core receives the PME_turn_off message which causes pme_to_sr to assert.
Then, the application sends the PME_to_ack message to the root port by asserting
pme_to_cr.
Figure 5–39. pme_to_sr and pme_to_cr in an Endpoint IP core
clk
pme_to_sr
soft
IP pme_to_cr
hard pme_to_sr
IP
pme_to_cr
f For a description of the completion rules, the completion header format, and
completion status field values, refer to Section 2.2.9 of the PCI Express Base
Specification, Rev. 2.0.
When asserted, the TLP header is logged in the AER header log register if it is
the first error detected. When used, this signal should be asserted at the same
time as the corresponding cpl_err error bit (2, 3, 4, or 5).
In the soft IP implementation, the application presents the TLP header to the
IP core on the err_desc_func0 bus. In the hard IP implementation, the
application presents the header to the IP core by writing the following values
cpl_err[6:0]
to 4 registers via LMI before asserting cpl_err[6]:
(continued)
■ lmi_addr: 12'h81C, lmi_din: err_desc_func0[127:96]
■ lmi_addr: 12'h820, lmi_din: err_desc_func0[95:64]
■ lmi_addr: 12'h824, lmi_din: err_desc_func0[63:32]
■ lmi_addr: 12'h828, lmi_din: err_desc_func0[31:0]
Refer to the “LMI Signals—Hard IP Implementation” on page 5–40 for more
information about LMI signalling.
For the ×8 soft IP, only bits [3:1] of cpl_err are available. For the ×1, ×4 soft IP
implementation and all widths of the hard IP implementation, all bits are
available.
TLP Header corresponding to a cpl_err. Logged by the IP core when
cpl_err[6] is asserted. This signal is only available for the ×1 and ×4 soft IP
err_desc_func0
I implementation. In the hard IP implementation, this information can be written to
[127:0]
the AER header log register through the LMI interface. If AER is not implemented
in your variation this bus should be tied to all 0’s.
Completion pending. The application layer must assert this signal when a master
block is waiting for completion, for example, when a transaction is pending. If
cpl_pending I
this signal is asserted and low power mode is requested, the IP core waits for the
deassertion of this signal before transitioning into low-power state.
Figure 5–40 shows all the signals of a full-featured PCI Express IP core available in the
SOPC Builder design flow. Your parameterization may not include some of the ports.
The Avalon-MM signals are shown on the left side of this figure.
Figure 5–40. Signals in the SOPC Builder Soft or Hard Full-Featured IP Core with Avalon-MM Interface
CraIrq_o
CraReadData_o[31:0] (1) reconfig_fromgxb[<n>:0]
CraWaitRequest_o (2) reconfig_togxb[<n>:0] Transceiver
32-Bit reconfig_clk
CraAddress_i[11:0] Control
Avalon-MM cal_blk_clk
CraByteEnable_i[3:0]
CRA gxb_powerdown
CraChipSelect_i
Slave Port
CraRead_i
CraWrite_i
CraWriteData_i[31:0] tx[3:0]
rx[3:0]
RxmWrite_o pipe_mode 1-Bit Serial
RxmRead_o xphy_pll_areset
RxmAddress_o[31:0] xphy_pll_locked
RxmWriteData_o[63:0]
RxmByteEnable_o[7:0] txdata<n>_ext[15:0]
64-Bit
RxmBurstCount_o[9:0] txdatak<n>_ext[1:0]
Avalon-MM Rx
RxmWaitRequest_i txdetectrx_ext
Master Port
RxmReadDataValid_i txelectidle<n>_ext Soft IP
RxmReadData_i[63:0] txcompl<n>_ext Implementation
RxmIrq_i rxpolarity<n>_ext
RxmIrqNum_i[5:0] powerdown<n>_ext[1:0] 16-Bit PIPE
RxmResetRequest_o for x1 and x4
rxdata<n>_ext[15:0]
rxdatak<n>_ext[1:0] (Repeat for lanes
TxsChipSelect_i rxvalid<n>_ext 1-3 in x4)
TxsRead_i phystatus_ext
TxsWrite_i rxelectidle<n>_ext
TxsAddress_i[WIDTH-1:0] rxstatus0_ext[2:0]
64-Bit TxsBurstCount_i[9:0]
Avalon-MM Tx TxsWriteData_i[63:0] txdata0_ext[7:0]
Slave Port TxsByteEnable_i[7:0] txdatak0_ext
TxsReadDataValid_o txdetectrx_ext
TxsReadData_o[63:0] txelectidle0_ext Hard IP
TxsWaitRequest_o txcompl0_ext Implementation
rxpolarity0_ext
refclk powerdown0_ext[1:0] 8-Bit PIPE
Clock clk125_out rxdata0_ext[7:0] Simulation
AvlClk_i rxdatak0_ext Only
rxvalid0_ext (3)
phystatus_ext
Reset & reset_n rxelectidle0_ext
Status pcie_rstn rxstatus0_ext[2:0]
suc_spd_neg rate_ext
test_in[31:0] Test
test_out[511:0] or [9:0] Interface
(test_out is optional)
Figure 5–41 shows the signals of a completer-only, single dword, PCI Express IP core.
Figure 5–41. Signals in the Completer-Only, Single Dword, IP Core with Avalon-MM Interface
(1) reconfig_fromgxb[<n>:0]
(2) reconfig_togxb[<n>:0] Transceiver
reconfig_clk Control
cal_blk_clk
gxb_powerdown
tx[3:0]
rx[3:0]
RxmWrite_o pipe_mode 1-Bit Serial
RxmRead_o xphy_pll_areset
RxmAddress_o[31:0] xphy_pll_locked
32-Bit RxmWriteData_o[31:0]
Avalon-MM Rx RxmByteEnable_o[3:0] txdata0_ext[15:0]
Master Port RxmWaitRequest_i txdatak0_ext[1:0]
RxmReadDataValid_i txdetectrx_ext
RxmReadData_i[31:0] txelectidle0_ext Soft IP
RxmIrq_i txcompl0_ext Implementation
RxmResetRequest_o rxpolarity0_ext
powerdown0_ext[1:0] 16-Bit PIPE
for x1 and x4
rxdata0_ext[15:0]
rxdatak0_ext[1:0] (Repeat for lanes
rxvalid0_ext 1-3 in x4)
phystatus_ext
rxelectidle0_ext
rxstatus0_ext[2:0]
refclk
txdata0_ext[7:0]
Clock clk125_out
txdatak0_ext
AvlClk_i
txdetectrx_ext
txelectidle0_ext Hard IP
txcompl0_ext Implementation
rxpolarity0_ext
powerdown0_ext[1:0] 8-Bit PIPE
rxdata0_ext[7:0] Simulation
reset_n rxdatak0_ext Only
Reset &
pcie_rstn rxvalid0_ext (3)
Status phystatus_ext
suc_spd_neg
rxelectidle0_ext
rxstatus0_ext[2:0]
test_in[31:0] Test
test_out[511:0], [63:0], or [9:0] Interface
(test_out is optional)
Table 5–24 lists the interfaces for these IP cores with links to the sections that describe
each.
Logical
Avalon-MM CRA Slave v — “32-Bit Non-bursting Avalon-MM CRA Slave Signals” on page 5–49
Avalon-MM RX Master v v “RX Avalon-MM Master Signals” on page 5–50
Avalon-MM TX Slave v — “64-Bit Bursting TX Avalon-MM Slave Signals” on page 5–50
Clock v v “Clock Signals” on page 5–51
Reset and Status v v “Reset and Status Signals” on page 5–52
Physical and Test
Transceiver Control v v “Transceiver Control” on page 5–53
Serial v v “Serial Interface Signals” on page 5–55
Pipe v v “PIPE Interface Signals” on page 5–56
Test v v “Test Signals” on page 5–58
f The PCI Express IP cores with Avalon-MM interface implement the Avalon-MM
which is described in the Avalon Interface Specifications. Refer to this specification for
information about the Avalon-MM protocol, including timing diagrams.
Clock Signals
Table 5–28 describes the clock signals for the PCI Express IP cores generated in SOPC
Builder.
Figure 5–42 shows the PCI Express reset logic for SOPC Builder.
PCI Express
Avalon-MM Bridge
Reset_n_pcie
Rstn_i
Transaction Layer
Data Link Layer
Physical Layer
npor
System Interconnect Fabric
l2_exit
Reset Request RxmResetRequest_o hotrst_exit
Module dlup_exit
npor dl_ltssm[4:0]
Reset_n PCIe_rstn
Note to figure
(1) The system-wide reset, reset_n indirectly resets all PCI Express IP core circuitry not affected by PCIe_rstn using the Reset_n_pcie signal
and the Reset Synchronizer module.
(2) For a description of the dl_ltssm[4:0] bus, refer to Table 5–7.
Pcie_rstn also resets the rest of the PCI Express IP core, but only after the following
synchronization process:
1. When Pcie_rstn asserts, the reset request module asserts reset_request,
synchronized to the Avalon-MM clock, to the Reset Synchronizer block.
2. The Reset Synchronizer block sends a reset pulse, Reset_n_pcie, synchronized to
the Avalon-MM clock, to the PCI Express Compiler IP core.
Transceiver Control
Table 5–30 describes the transceiver support signals.
reconfig_fromgxb[16:0] These are the transceiver dynamic reconfiguration signals. Transceiver dynamic
reconfiguration is not typically required for PCI Express designs in Stratix II GX
(Stratix IV GX ×1 and ×4) or Arria GX devices. These signals may be used for cases in which the PCI
reconfig_fromgxb[33:0] O Express instance shares a transceiver quad with another protocol that supports
(Stratix IV GX ×8) dynamic reconfiguration. They may also be used in cases where the transceiver
analog controls (VOD, Pre-emphasis, and Manual Equalization) need to be
reconfig_fromgxb O
modified to compensate for extended PCI Express interconnects such as cables.
(Stratix II GX, Arria GX) In these cases, these signals must be connected as described in the Stratix II GX
reconfig_togxb[3:0] I Device Handbook, otherwise, when unused, the reconfig_clk signal should
tied low, reconfig_togxb tied to b'010 and reconfig_fromgxb left open.
(Stratix IV GX)
For Arria II GX and Stratix IV GX devices, dynamic reconfiguration is required for
reconfig_togxb[2:0] I
PCI Express designs to compensate for variations due to process, voltage and
(Stratix II GX, Arria GX) temperature. You must connect the ALTGX_RECONFIG instance to the ALTGX
reconfig_clk I instances with receiver channels, in your design using these signals. The
maximum frequency of reconfig_clk is 50 MHz. For more information about
(Arria II GX, Arria II GZ, instantiating the ALTGX_RECONFIG megafunction in your design refer to
Cyclone IV GX) “Transceiver Offset Cancellation” on page 13–9.
A 125 MHz free running clock that you must provide that serves as input to the
fixed clock of the transceiver. fixedclk and the 50 MHz reconfig_clk must be
fixedclk I free running and not derived from refclk. This signal is used in the hard IP
implementation for Arria II GX, Arria II GZ, Cyclone IV GX, HardCopy IV GX, and
Stratix IV GX devices.
When asserted, indicates that offset calibration is calibrating the transceiver. This
busy_reconfig_altgxb_
I signal is used in the hard IP implementation for Arria II GX, Arria II GZ,
reconfig
Cyclone IV GX, HardCopy IV GX, and Stratix IV GX devices.
reset_reconfig_altgxb_ This signal keeps the altgxb_reconfig block in reset till the reconfig_clk and
I
reconfig fixedclk are stable.
The input signals listed in Table 5–31 connect from the user application directly to the
transceiver instance.
The following sections describe signals for the three possible types of physical
interfaces (1-bit, 20-bit, or PIPE). Refer to Figure 5–1 on page 5–2, Figure 5–2 on
page 5–3, Figure 5–3 on page 5–4, and Figure 5–40 on page 5–47 for pinout diagrams
of all of the PCI Express IP core variants.
For the soft IP implementation of the ×1 IP core any channel of any transceiver block
can be assigned for the serial input and output signals. For the hard IP
implementation of the ×1 IP core the serial input and output signals must use channel
0 of the Master Transceiver Block associated with that hard IP block.
For the ×4 IP core the serial inputs (rx_in[0-3]) and serial outputs (tx_out[0-3])
must be assigned to the pins associated with the like-number channels of the
transceiver block. The signals rx_in[0]/tx_out[0] must be assigned to the pins
associated with channel 0 of the transceiver block, rx_in[1]/tx_out[1] must be
assigned to the pins associated with channel 1 of the transceiver block, and so on.
Additionally, the ×4 hard IP implementation must use the four channels of the Master
Transceiver Block associated with that hard IP block.
For the ×8 IP core the serial inputs (rx_in[0-3]) and serial outputs (tx_out[0-3])
must be assigned to the pins associated with the like-number channels of the Master
Transceiver Block. The signals rx_in[0]/tx_out[0] must be assigned to the pins
associated with channel 0 of the Master Transceiver Block, rx_in[1]/tx_out[1] must
be assigned to the pins associated with channel 1 of the Master Transceiver Block, and
so on. The serial inputs (rx_in[4-7]) and serial outputs (tx_out[4-7]) must be
assigned in order to the pins associated with channels 0-3 of the Slave Transceiver
Block. The signals rx_in[4]/tx_out[4] must be assigned to the pins associated with
channel 0 of the Slave Transceiver Block, rx_in[5]/tx_out[5] must be assigned to
the pins associated with channel 1 of the Slave Transceiver Block, and so on.
Figure 5–43 illustrates this connectivity.
Figure 5–43. Two PCI Express ×8 Links in a Four Transceiver Block Device
Stratix IV GX Device
Transceiver Block GXBL1 Transceiver Block GXBR1
(Slave) (Slave)
PCI Express Lane 4 Channel0 Second PCI First PCI Channel0 PCI Express Lane 4
Express Express
Transceiver Block GXBL0 (PIPE) (PIPE) Transceiver Block GXBR0
(Master) x8 Link x8 Link (Master)
1 You must verify the location of the master transceiver block before making pin
assignments for the hard IP implementation of the PCI Express IP core.
f Refer to Pin-out Files for Altera Devices for pin-out tables for all Altera devices in
.pdf, .txt, and .xls formats.
for simulation of the PIPE interface for variations using an internal transceiver. In
Table 5–33, signals that include lane number 0 also exist for lanes 1-7, as marked in the
table. Refer to Chapter 14, External PHYs for descriptions of the slightly modified
PIPE interface signalling for use with specific external PHYs. The modifications
include DDR signalling and source synchronous clocking in the TX direction.
tx_pipemargin Transmit VOD margin selection. The PCI Express IP core hard IP sets the
O value for this signal based on the value from the Link Control 2 Register.
Available for simulation only.
Transmit de-emphasis selection. In PCI Express Gen2 (5 Gbps) mode it
selects the transmitter de-emphasis:
■ 1'b0: -6 dB
tx_pipedeemph O ■ 1'b1: -3.5 dB
The PCI Express IP core hard IP sets the value for this signal based on
the indication received from the other end of the link during the Training
Sequences (TS). You do not need to change this value.
Receive data <n> (2 symbols on lane <n>). This bus receives data on
lane <n>. The first received symbol is rxdata<n>_ext[7:0] and the
rxdata<n>_ext[15:0] (1) (2) I
second is rxdata<n>_ext[15:8]. For the 8 Bit PIPE mode only
rxdata<n>_ext[7:0] is available.
Receive data control <n> (2 symbols on lane <n>). This signal separates
control and data symbols. The first symbol received is aligned with
rxdatak<n>_ext[1:0] (1) (2) I rxdatak<n>_ext[0] and the second symbol received is aligned with
rxdata<n>_ext[1]. For the 8 Bit PIPE mode only the single bit signal
rxdatak<n>_ext is available.
Receive valid <n>. This symbol indicates symbol lock and valid data on
rxvalid<n>_ext (1) (2) I
rxdata<n>_ext and rxdatak<n>_ext.
Test Signals
The test_in and test_out busses provide run-time control and monitoring of the
internal state of the IP cores. Table 5–35 describes the test signals for the hard IP
implementation.
c Altera recommends that you use the test_out and test_in signals for debug or non-
critical status monitoring purposes such as LED displays of PCIe link status. They
should not be used for design function purposes. Use of these signals will make it
more difficult to close timing on the design. The signals have not been rigorously
verified and will not function as documented in some corner cases.
This section describes registers that you can access the PCI Express configuration
space and the Avalon-MM bridge control registers. It includes the following sections:
■ Configuration Space Register Content
■ PCI Express Avalon-MM Bridge Control Register Content
■ Comprehensive Correspondence between Config Space Registers and PCIe Spec
Rev 2.0
f For comprehensive information about these registers, refer to Chapter 7 of the PCI
Express Base Specification Revision 1.0a, 1.1 or 2.0 depending on the version you specify
on the System Setting page of the MegaWizard interface.
1 To facilitate finding additional information about these PCI Express registers, the
following tables provide the name of the corresponding section in the PCI Express Base
Specification Revision 2.0.
1 In the following tables, the names of fields that are defined by parameters in the
parameter editor are links to the description of that parameter. These links appear as
green text.
Table 6–2. PCI Type 0 Configuration Space Header (Endpoints), Rev2 Spec: Type 0 Configuration Space Header
Byte Offset 31:24 23:16 15:8 7:0
0x000 Device ID Vendor ID
0x004 Status Command
0x008 Class code Revision ID
Header Type
0x00C 0x00 0x00 Cache Line Size
(Port type)
0x010 BAR Table (BAR0)
0x014 BAR Table (BAR1)
0x018 BAR Table (BAR2)
0x01C BAR Table (BAR3)
0x020 BAR Table (BAR4)
0x024 BAR Table (BAR5)
0x028 Reserved
0x02C Subsystem ID Subsystem vendor ID
0x030 Expansion ROM base address
0x034 Reserved Capabilities Pointer
0x038 Reserved
0x03C 0x00 0x00 Interrupt Pin Interrupt Line
Note to Table 6–2:
(1) Refer to Table 6–23 on page 6–12 for a comprehensive list of correspondences between the configuration space registers and the PCI Express
Base Specification 2.0.
Table 6–4. MSI Capability Structure, Rev2 Spec: MSI and MSI-X Capability Structures
Byte Offset 31:24 23:16 15:8 7:0
Message Control
0x050 Configuration MSI Control Status Register Field Next Cap Ptr Capability ID
Descriptions
0x054 Message Address
0x058 Message Upper Address
0x05C Reserved Message Data
Note to Table 6–4:
(1) Refer to Table 6–23 on page 6–12 for a comprehensive list of correspondences between the configuration space registers and the PCI Express
Base Specification 2.0.
Table 6–5. MSI-X Capability Structure, Rev2 Spec: MSI and MSI-X Capability Structures
Byte Offset 31:24 23:16 15:8 7:3 2:0
Message Control
0x068 Next Cap Ptr Capability ID
MSI-X Table size[26:16]
0x06C MSI-X Table Offset BIR
Note to Table 6–5:
(1) Refer to Table 6–23 on page 6–12 for a comprehensive list of correspondences between the configuration space registers and the PCI Express
Base Specification 2.0.
Table 6–6. Power Management Capability Structure, Rev2 Spec: Power Management Capability Structure
Byte Offset 31:24 23:16 15:8 7:0
0x078 Capabilities Register Next Cap PTR Cap ID
PM Control/Status
0x07C Data Power Management Status & Control
Bridge Extensions
Note to Table 6–6:
(1) Refer to Table 6–23 on page 6–12 for a comprehensive list of correspondences between the configuration space registers and the PCI Express
Base Specification 2.0.
Table 6–7 describes the PCI Express capability structure for specification versions 1.0a
and 1.1.
Table 6–7. PCI Express Capability Structure Version 1.0a and 1.1 (Note 1), Rev2 Spec: PCI Express Capabilities
Register and PCI Express Capability List Register
Byte Offset 31:24 23:16 15:8 7:0
0x080 PCI Express Capabilities Register Next Cap Pointer PCI Express Cap ID
0x084 Device Capabilities
0x088 Device Status Device Control
0x08C Link Capabilities
0x090 Link Status Link Control
0x094 Slot Capabilities
0x098 Slot Status Slot Control
0x09C Reserved Root Control
0x0A0 Root Status
Note to Table 6–7:
(1) Reserved and preserved. As per the PCI Express Base Specification 1.1, this register is reserved for future RW implementations. Registers are
read-only and must return 0 when read. Software must preserve the value read for writes to bits.
(2) Refer to Table 6–23 on page 6–12 for a comprehensive list of correspondences between the configuration space registers and the PCI Express
Base Specification 2.0.
Table 6–8 describes the PCI Express capability structure for specification version 2.0.
Table 6–8. PCI Express Capability Structure Version 2.0, Rev2 Spec: PCI Express Capabilities Register and PCI Express
Capability List Register
Byte Offset 31:16 15:8 7:0
0x080 PCI Express Capabilities Register Next Cap Pointer PCI Express Cap ID
0x084 Device Capabilities
0x088 Device Status Device Control 2
0x08C Link Capabilities
0x090 Link Status Link Control
0x094 Slot Capabilities
0x098 Slot Status Slot Control
0x09C Root Capabilities Root Control
0x0A0 Root Status
0x0A4 Device Capabilities 2
Device Control 2
0x0A8 Device Status 2
Implement completion timeout disable
0x0AC Link Capabilities 2
0x0B0 Link Status 2 Link Control 2
0x0B4 Slot Capabilities 2
0x0B8 Slot Status 2 Slot Control 2
Note to Table 6–8:
(1) Registers not applicable to a device are reserved.
(2) Refer to Table 6–23 on page 6–12 for a comprehensive list of correspondences between the configuration space registers and the PCI Express
Base Specification 2.0.
Table 6–9. Virtual Channel Capability Structure, Rev2 Spec: Virtual Channel Capability (Part 1 of 2)
Byte Offset 31:24 23:16 15:8 7:0
0x100 Next Cap PTR Vers. Extended Cap ID
Port VC Cap 1
0x104 ReservedP
Number of low-priority VCs
0x108 VAT offset ReservedP VC arbit. cap
0x10C Port VC Status Port VC control
0x110 PAT offset 0 (31:24) VC Resource Capability Register (0)
0x114 VC Resource Control Register (0)
0x118 VC Resource Status Register (0) ReservedP
0x11C PAT offset 1 (31:24) VC Resource Capability Register (1)
0x120 VC Resource Control Register (1)
0x124 VC Resource Status Register (1) ReservedP
...
0x164 PAT offset 7 (31:24) VC Resource Capability Register (7)
Table 6–9. Virtual Channel Capability Structure, Rev2 Spec: Virtual Channel Capability (Part 2 of 2)
Byte Offset 31:24 23:16 15:8 7:0
0x168 VC Resource Control Register (7)
Note to Table 6–9:
(1) Refer to Table 6–23 on page 6–12 for a comprehensive list of correspondences between the configuration space
registers and the PCI Express Base Specification 2.0.
Table 6–10 describes the PCI Express advanced error reporting extended capability
structure.
Table 6–10. PCI Express Advanced Error Reporting Extended Capability Structure, Rev2 Spec: Advanced Error Reporting
Capability
Byte Offset 31:24 23:16 15:8 7:0
0x800 PCI Express Enhanced Capability Header
0x804 Uncorrectable Error Status Register
0x808 Uncorrectable Error Mask Register
0x80C Uncorrectable Error Severity Register
0x810 Correctable Error Status Register
0x814 Correctable Error Mask Register
0x818 Advanced Error Capabilities and Control Register
0x81C Header Log Register
0x82C Root Error Command
0x830 Root Error Status
0x834 Error Source Identification Register Correctable Error Source ID Register
Note to Table 6–10:
(1) Refer to Table 6–23 on page 6–12 for a comprehensive list of correspondences between the configuration space registers and the PCI Express
Base Specification 2.0.
1 The data returned for a read issued to any undefined address in this range is
unpredictable.
The complete map of PCI Express Avalon-MM bridge registers is shown in Table 6–12:
Table 6–13. Avalon-MM to PCI Express Interrupt Status Register (Part 1 of 2) Address: 0x0040
Bit Name Access Description
31:24 Reserved — —
23 A2P_MAILBOX_INT7 RW1C 1 when the A2P_MAILBOX7 is written to
22 A2P_MAILBOX_INT6 RW1C 1 when the A2P_MAILBOX6 is written to
Table 6–13. Avalon-MM to PCI Express Interrupt Status Register (Part 2 of 2) Address: 0x0040
Bit Name Access Description
21 A2P_MAILBOX_INT5 RW1C 1 when the A2P_MAILBOX5 is written to
20 A2P_MAILBOX_INT4 RW1C 1 when the A2P_MAILBOX4 is written to
19 A2P_MAILBOX_INT3 RW1C 1 when the A2P_MAILBOX3 is written to
18 A2P_MAILBOX_INT2 RW1C 1 when the A2P_MAILBOX2 is written to
17 A2P_MAILBOX_INT1 RW1C 1 when the A2P_MAILBOX1 is written to
16 A2P_MAILBOX_INT0 RW1C 1 when the A2P_MAILBOX0 is written to
15:14 Reserved — —
Avalon-MM interrupt input vector. When an Avalon-MM
IRQ is being signaled (AVL_IRQ_ASSERTED = 1), this
register indicates the current highest priority
13:8 AVL_IRQ_INPUT_VECTOR RO Avalon-MM IRQ being asserted. This value changes as
higher priority interrupts are asserted and deasserted.
This register stores the value of the RXmIrqNum_i input
signal.
Current value of the Avalon-MM interrupt (IRQ) input
ports to the Avalon-MM RX master port:
7 AVL_IRQ_ASSERTED RO
0 – Avalon-MM IRQ is not being signaled.
1 – Avalon-MM IRQ is being signaled.
6:0 Reserved — —
A PCI Express interrupt can be asserted for any of the conditions registered in the PCI
Express interrupt status register by setting the corresponding bits in the
Avalon-MM-to-PCI Express interrupt enable register (Table 6–14). Either MSI or
legacy interrupts can be generated as explained in the section “Generation of PCI
Express Interrupts” on page 4–22.
The PCI Express root complex typically requires write access to a set of PCI
Express-to-Avalon-MM mailbox registers and read-only access to a set of
Avalon-MM-to-PCI Express mailbox registers. There are eight mailbox registers
available.
The PCI Express-to-Avalon-MM mailbox registers are writable at the addresses shown
in Table 6–15. Writing to one of these registers causes the corresponding bit in the
Avalon-MM interrupt status register to be set to a one.
Table 6–15. PCI Express-to-Avalon-MM Mailbox Registers, Read/Write Address Range: 0x800-0x0815
Address Name Access Description
0x0800 P2A_MAILBOX0 RW PCI Express-to-Avalon-MM Mailbox 0
0x0804 P2A_MAILBOX1 RW PCI Express-to-Avalon-MM Mailbox 1
0x0808 P2A_MAILBOX2 RW PCI Express-to-Avalon-MM Mailbox 2
0x080C P2A_MAILBOX3 RW PCI Express-to-Avalon-MM Mailbox 3
0x0810 P2A_MAILBOX4 RW PCI Express-to-Avalon-MM Mailbox 4
0x0814 P2A_MAILBOX5 RW PCI Express-to-Avalon-MM Mailbox 5
0x0818 P2A_MAILBOX6 RW PCI Express-to-Avalon-MM Mailbox 6
0x081C P2A_MAILBOX7 RW PCI Express-to-Avalon-MM Mailbox 7
The Avalon-MM-to-PCI Express mailbox registers are read at the addresses shown in
Table 6–16. The PCI Express root complex should use these addresses to read the
mailbox information after being signaled by the corresponding bits in the PCI Express
interrupt enable register.
Table 6–16. Avalon-MM-to-PCI Express Mailbox Registers, read-only Address Range: 0x0900-0x091F
Address Name Access Description
0x0900 A2P_MAILBOX0 RO Avalon-MM-to-PCI Express Mailbox 0
0x0904 A2P_MAILBOX1 RO Avalon-MM-to-PCI Express Mailbox 1
0x0908 A2P_MAILBOX2 RO Avalon-MM-to-PCI Express Mailbox 2
0x090C A2P_MAILBOX3 RO Avalon-MM-to-PCI Express Mailbox 3
0x0910 A2P_MAILBOX4 RO Avalon-MM-to-PCI Express Mailbox 4
0x0914 A2P_MAILBOX5 RO Avalon-MM-to-PCI Express Mailbox 5
0x0918 A2P_MAILBOX6 RO Avalon-MM-to-PCI Express Mailbox 6
0x091C A2P_MAILBOX7 RO Avalon-MM-to-PCI Express Mailbox 7
Each entry in the PCI Express address translation table (Table 6–17) is 8 bytes wide,
regardless of the value in the current PCI Express address width parameter. Therefore,
register addresses are always the same width, regardless of PCI Express address
width.
Table 6–17. Avalon-MM-to-PCI Express Address Translation Table Address Range: 0x1000-0x1FFF
Access
Address Bits Name Description
Table 6–18. PCI Express Avalon-MM Bridge Address Space Bit Encodings
Value
Indication
(Bits 1:0)
Memory Space, 32-bit PCI Express address. 32-bit header is generated.
00
Address bits 63:32 of the translation table entries are ignored.
01 Memory space, 64-bit PCI Express address. 64-bit address header is generated.
10 Reserved
11 Reserved
The interrupt status register (Table 6–19) records the status of all conditions that can
cause an Avalon-MM interrupt to be asserted.
Table 6–19. PCI Express to Avalon-MM Interrupt Status Register Address: 0x3060
Bits Name Access Description
[15:0] Reserved — —
[16] P2A_MAILBOX_INT0 RW1C 1 when the P2A_MAILBOX0 is written
[17] P2A_MAILBOX_INT1 RW1C 1 when the P2A_MAILBOX1 is written
[18] P2A_MAILBOX_INT2 RW1C 1 when the P2A_MAILBOX2 is written
[19] P2A_MAILBOX_INT3 RW1C 1 when the P2A_MAILBOX3 is written
[20] P2A_MAILBOX_INT4 RW1C 1 when the P2A_MAILBOX4 is written
[21] P2A_MAILBOX_INT5 RW1C 1 when the P2A_MAILBOX5 is written
[22] P2A_MAILBOX_INT6 RW1C 1 when the P2A_MAILBOX6 is written
[23] P2A_MAILBOX_INT7 RW1C 1 when the P2A_MAILBOX7 is written
[31:24] Reserved — —
An Avalon-MM interrupt can be asserted for any of the conditions noted in the
Avalon-MM interrupt status register by setting the corresponding bits in the interrupt
enable register (Table 6–20).
PCI Express interrupts can also be enabled for all of the error conditions described.
However, it is likely that only one of the Avalon-MM or PCI Express interrupts can be
enabled for any given bit. There is typically a single process in either the PCI Express
or Avalon-MM domain that is responsible for handling the condition reported by the
interrupt.
Table 6–20. PCI Express to Avalon-MM Interrupt Enable Register Address: 0x3070
Bits Name Access Description
[15:0] Reserved — —
Enables assertion of Avalon-MM interrupt CraIrq_o signal when
[23:16] P2A_MB_IRQ RW
the specified mailbox is written by the root complex.
[31:24] Reserved — —
Table 6–21. Avalon-MM-to-PCI Express Mailbox Registers, Read/Write (Part 1 of 2) Address Range: 0x3A00-0x3A1F
Address Name Access Description
0x3A00 A2P_MAILBOX0 RW Avalon-MM-to-PCI Express mailbox 0
0x3A04 A2P _MAILBOX1 RW Avalon-MM-to-PCI Express mailbox 1
Table 6–21. Avalon-MM-to-PCI Express Mailbox Registers, Read/Write (Part 2 of 2) Address Range: 0x3A00-0x3A1F
Address Name Access Description
0x3A08 A2P _MAILBOX2 RW Avalon-MM-to-PCI Express mailbox 2
0x3A0C A2P _MAILBOX3 RW Avalon-MM-to-PCI Express mailbox 3
0x3A10 A2P _MAILBOX4 RW Avalon-MM-to-PCI Express mailbox 4
0x3A14 A2P _MAILBOX5 RW Avalon-MM-to-PCI Express mailbox 5
0x3A18 A2P _MAILBOX6 RW Avalon-MM-to-PCI Express mailbox 6
0x3A1C A2P_MAILBOX7 RW Avalon-MM-to-PCI Express mailbox 7
Table 6–22. PCI Express-to-Avalon-MM Mailbox Registers, Read-Only Address Range: 0x3800-0x3B1F
Access
Address Name Description
Mode
0x3B00 P2A_MAILBOX0 RO PCI Express-to-Avalon-MM mailbox 0.
0x3B04 P2A_MAILBOX1 RO PCI Express-to-Avalon-MM mailbox 1
0x3B08 P2A_MAILBOX2 RO PCI Express-to-Avalon-MM mailbox 2
0x3B0C P2A_MAILBOX3 RO PCI Express-to-Avalon-MM mailbox 3
0x3B10 P2A_MAILBOX4 RO PCI Express-to-Avalon-MM mailbox 4
0x3B14 P2A_MAILBOX5 RO PCI Express-to-Avalon-MM mailbox 5
0x3B18 P2A_MAILBOX6 RO PCI Express-to-Avalon-MM mailbox 6
0x3B1C P2A_MAILBOX7 RO PCI Express-to-Avalon-MM mailbox 7
Table 6–23. Correspondence Configuration Space Registers and PCI Express Base Specification Rev. 2.0 Description
Byte Address Config Reg Offset 31:24 23:16 15:8 7:0 Corresponding Section in PCIe Specification
Table 6-1. Common Configuration Space Header
0x000:0x03C PCI Header Type 0 configuration registers Type 0 Configuration Space Header
0x000:0x03C PCI Header Type 1 configuration registers Type 1 Configuration Space Header
0x040:0x04C Reserved
0x050:0x05C MSI capability structure MSI and MSI-X Capability Structures
0x068:0x070 MSI capability structure MSI and MSI-X Capability Structures
0x070:0x074 Reserved
0x078:0x07C Power management capability structure PCI Power Management Capability Structure
0x080:0x0B8 PCI Express capability structure PCI Express Capability Structure
0x080:0x0B8 PCI Express capability structure PCI Express Capability Structure
Table 6–23. Correspondence Configuration Space Registers and PCI Express Base Specification Rev. 2.0 Description
Byte Address Config Reg Offset 31:24 23:16 15:8 7:0 Corresponding Section in PCIe Specification
0x0B8:0x0FC Reserved
0x094:0x0FF Root port
0x100:0x16C Virtual channel capability structure Virtual Channel Capability
0x170:0x17C Reserved
0x180:0x1FC Virtual channel arbitration table VC Arbitration Table
0x200:0x23C Port VC0 arbitration table (Reserved) Port Arbitration Table
0x240:0x27C Port VC1 arbitration table (Reserved) Port Arbitration Table
0x280:0x2BC Port VC2 arbitration table (Reserved) Port Arbitration Table
0x2C0:0x2FC Port VC3 arbitration table (Reserved) Port Arbitration Table
0x300:0x33C Port VC4 arbitration table (Reserved) Port Arbitration Table
0x340:0x37C Port VC5 arbitration table (Reserved) Port Arbitration Table
0x380:0x3BC Port VC6 arbitration table (Reserved) Port Arbitration Table
0x3C0:0x3FC Port VC7 arbitration table (Reserved) Port Arbitration Table
0x400:0x7FC Reserved PCIe spec corresponding section name
0x800:0x834 Advanced Error Reporting AER (optional) Advanced Error Reporting Capability
0x838:0xFFF Reserved
Table 6-2. PCI Type 0 Configuration Space Header (Endpoints), Rev2 Spec: Type 0 Configuration Space Header
0x000 Device ID Vendor ID Type 0 Configuration Space Header
0x004 Status Command Type 0 Configuration Space Header
0x008 Class Code Revision ID Type 0 Configuration Space Header
0x00C 0x00 Header Type 0x00 Cache Line Size Type 0 Configuration Space Header
0x010 Base Address 0 Base Address Registers (Offset 10h - 24h)
0x014 Base Address 1 Base Address Registers (Offset 10h - 24h)
0x018 Base Address 2 Base Address Registers (Offset 10h - 24h)
0x01C Base Address 3 Base Address Registers (Offset 10h - 24h)
0x020 Base Address 4 Base Address Registers (Offset 10h - 24h)
0x024 Base Address 5 Base Address Registers (Offset 10h - 24h)
0x028 Reserved Type 0 Configuration Space Header
0x02C Subsystem Device ID Subsystem Vendor ID Type 0 Configuration Space Header
0x030 Expansion ROM base address Type 0 Configuration Space Header
0x034 Reserved Capabilities PTR Type 0 Configuration Space Header
0x038 Reserved Type 0 Configuration Space Header
0x03C 0x00 0x00 Interrupt Pin Interrupt Line Type 0 Configuration Space Header
Table 6-3. PCI Type 1 Configuration Space Header (Root Ports) , Rev2 Spec: Type 1 Configuration Space Header
0x000 Device ID Vendor ID Type 1 Configuration Space Header
0x004 Status Command Type 1 Configuration Space Header
0x008 Class Code Revision ID Type 1 Configuration Space Header
BIST Header Type Primary Latency Timer Cache
0x00C Type 1 Configuration Space Header
Line Size
0x010 Base Address 0 Base Address Registers (Offset 10h/14h)
Table 6–23. Correspondence Configuration Space Registers and PCI Express Base Specification Rev. 2.0 Description
Byte Address Config Reg Offset 31:24 23:16 15:8 7:0 Corresponding Section in PCIe Specification
0x014 Base Address 1 Base Address Registers (Offset 10h/14h)
Secondary Latency Timer Subordinate Bus Secondary Latency Timer (Offset 1Bh)/Type 1
0x018 Number Secondary Bus Number Primary Bus Configuration Space Header/ /Primary Bus Number
Number (Offset 18h)
Secondary Status Register (Offset 1Eh) / Type 1
0x01C Secondary Status I/O Limit I/O Base
Configuration Space Header
0x020 Memory Limit Memory Base Type 1 Configuration Space Header
Prefetchable Memory Limit Prefetchable Memory
0x024 Prefetchable Memory Base/Limit (Offset 24h)
Base
0x028 Prefetchable Base Upper 32 Bits Type 1 Configuration Space Header
0x02C Prefetchable Limit Upper 32 Bits Type 1 Configuration Space Header
0x030 I/O Limit Upper 16 Bits I/O Base Upper 16 Bits Type 1 Configuration Space Header
0x034 Reserved Capabilities PTR Type 1 Configuration Space Header
0x038 Expansion ROM Base Address Type 1 Configuration Space Header
0x03C Bridge Control Interrupt Pin Interrupt Line Bridge Control Register (Offset 3Eh)
Table 6-4.MSI Capability Structure, Rev2 Spec: MSI and MSI-X Capability Structures
0x050 Message Control Next Cap Ptr Capability ID MSI and MSI-X Capability Structures
0x054 Message Address MSI and MSI-X Capability Structures
0x058 Message Upper Address MSI and MSI-X Capability Structures
0x05C Reserved Message Data MSI and MSI-X Capability Structures
Table 6-5. MSI-X Capability Structure, Rev2 Spec: MSI and MSI-X Capability Structures
0x68 Message Control Next Cap Ptr Capability ID MSI and MSI-X Capability Structures
0x6C MSI-X Table Offset BIR MSI and MSI-X Capability Structures
0x70 Pending Bit Array (PBA) Offset BIR MSI and MSI-X Capability Structures
Table 6-6. Power Management Capability Structure, Rev2 Spec: Power Management Capability Structure
0x078 Capabilities Register Next Cap PTR Cap ID PCI Power Management Capability Structure
Data PM Control/Status Bridge Extensions Power
0x07C PCI Power Management Capability Structure
Management Status & Control
Table 6-7. PCI Express Capability Structure Version 1.0a and 1.1 (Note 1), Rev2 Spec: PCI Express Capabilities Register
and PCI Express Capability List Register
PCI Express Capabilities Register Next Cap PTR PCI Express Capabilities Register / PCI Express
0x080
Capability ID Capability List Register
0x084 Device capabilities Device Capabilities Register
0x088 Device Status Device Control Device Status Register/Device Control Register
0x08C Link capabilities Link Capabilities Register
0x090 Link Status Link Control Link Status Register/Link Control Register
0x094 Slot capabilities Slot Capabilities Register
0x098 Slot Status Slot Control Slot Status Register/ Slot Control Register
0x09C Reserved Root Control Root Control Register
0x0A0 Root Status Root Status Register
Table 6–23. Correspondence Configuration Space Registers and PCI Express Base Specification Rev. 2.0 Description
Byte Address Config Reg Offset 31:24 23:16 15:8 7:0 Corresponding Section in PCIe Specification
Table 6-8. PCI Express Capability Structure Version 2.0, Rev2 Spec: PCI Express Capabilities Register and PCI Express
Capability List Register
PCI Express Capabilities Register Next Cap PTR PCI Express Capabilities Register /PCI Express
0x080
PCI Express Cap ID Capability List Register
0x084 Device capabilities Device Capabilities Register
0x088 Device Status Device Control Device Status Register / Device Control Register
0x08C Link capabilities Link Capabilities Register
0x090 Link Status Link Control Link Status Register / Link Control Register
0x094 Slot Capabilities Slot Capabilities Register
0x098 Slot Status Slot Control Slot Status Register / Slot Control Register
0x09C Root Capabilities Root Control Root Capabilities Register / Root Control Register
0x0A0 Root Status Root Status Register
0x0A4 Device Capabilities 2 Device Capabilities 2 Register
Device Status 2 Register / Device Control 2
0x0A8 Device Status 2 Device Control 2
Register
0x0AC Link Capabilities 2 Link Capabilities 2 Register
0x0B0 Link Status 2 Link Control 2 Link Status 2 Register / Link Control 2 Register
0x0B4 Slot Capabilities 2 Slot Capabilities 2 Register
0x0B8 Slot Status 2 Slot Control 2 Slot Status 2 Register / Slot Control 2 Register
Table 6-9. Virtual Channel Capability Structure, Rev2 Spec: Virtual Channel Capability
0x100 Next Cap PTR Vers. Extended Cap ID Virtual Channel Enhanced Capability Header
0x104 ReservedP Port VC Cap 1 Port VC Capability Register 1
0x108 VAT offset ReservedP VC arbit. cap Port VC Capability Register 2
0x10C Port VC Status Port VC control Port VC Status Register / Port VC Control Register
PAT offset 0 (31:24) VC Resource Capability
0x110 VC Resource Capability Register
Register (0)
0x114 VC Resource Control Register (0) VC Resource Control Register
0x118 VC Resource Status Register (0) ReservedP VC Resource Status Register
PAT offset 1 (31:24) VC Resource Capability
0x11C VC Resource Capability Register
Register (1)
0x120 VC Resource Control Register (1) VC Resource Control Register
0x124 VC Resource Status Register (1) ReservedP VC Resource Status Register
… …
PAT offset 7 (31:24) VC Resource Capability
0x164 VC Resource Capability Register
Register (7)
0x168 VC Resource Control Register (7) VC Resource Control Register
0x16C VC Resource Status Register (7) ReservedP VC Resource Status Register
Table 6-10. PCI Express Advanced Error Reporting Extended Capability Structure, Rev2 Spec: Advanced Error Reporting
Capability
Table 6–23. Correspondence Configuration Space Registers and PCI Express Base Specification Rev. 2.0 Description
Byte Address Config Reg Offset 31:24 23:16 15:8 7:0 Corresponding Section in PCIe Specification
Advanced Error Reporting Enhanced Capability
0x800 PCI Express Enhanced Capability Header
Header
0x804 Uncorrectable Error Status Register Uncorrectable Error Status Register
0x808 Uncorrectable Error Mask Register Uncorrectable Error Mask Register
0x80C Uncorrectable Error Severity Register Uncorrectable Error Severity Register
0x810 Correctable Error Status Register Correctable Error Status Register
0x814 Correctable Error Mask Register Correctable Error Mask Register
0x818 Advanced Error Capabilities and Control Register Advanced Error Capabilities and Control Register
0x81C Header Log Register Header Log Register
0x82C Root Error Command Root Error Command Register
0x830 Root Error Status Root Error Status Register
Error Source Identification Register Correctable
0x834 Error Source Identification Register
Error Source ID Register
This chapter covers the functional aspects of the reset and clock circuitry for PCI
Express IP core variants created using the MegaWizard Plug-In Manager design flow.
It includes the following sections:
■ Reset Hard IP Implementation
■ Clocks
For descriptions of the available reset and clock signals refer to the following sections
in the Chapter 5, IP Core Interfaces: “Reset and Link Training Signals” on page 5–24,
“Clock Signals—Hard IP Implementation” on page 5–23, and “Clock Signals—Soft IP
Implementation” on page 5–23.
1 When you use SOPC Builder to generate the PCI Express IP core, the reset and
calibration logic is included in the IP core variant.
<variant>_plus.v or .vhd
This option partitions the reset logic between the following two plain text files:
■ <working_dir>/pci_express_compiler-library/altpcie_rs_serdes.v or .vhd—This
file includes the logic to reset the transceiver.
■ <working_dir>/<variation>_examples/chaining_dma/<variation>_rs_hip.v or
.vhd—This file includes the logic to reset the PCI Express IP core.
The _plus variant includes all of the logic necessary to initialize the PCI Express IP
core, including the following:
■ Reset circuitry
■ ALTGXB Reconfiguration IP core
■ Test_in settings
Figure 7–1 illustrates the reset logic for both the <variant>_plus.v or .vhd and
<variant>.v or .vhd options.
<variant>_example_chaining_pipen1b.v or .vhd
pcie_rstn busy_altgxb_reconfig
local_rstn Refclk
ALTGXB_Reconfig
100 MHz
altpcie_reconfig_
<device>.v or .vhd
cal_blk_clk
50 MHz
locked
reconfig_clk 50 MHz
free_running_clock 100 MHz PLL
altpcierd_reconfig_pll_clk.v fixedclk 125 MHz
f Refer to “PCI Express (PIPE) Reset Sequence” in the Reset Control and Power Down
chapter in volume of volume 2 of the Stratix IV Device Handbook for a timing diagram
illustrating the reset sequence.
1 To understand the reset sequence in detail, you can also review altpcie_rs_serdes.v
file.
<variant>.v or .vhd
If you choose to implement your own reset circuitry, you must design logic to replace
the Transceiver Reset module shown in Figure 7–1.
Figure 7–2 provides a somewhat more detailed view of the reset signals in the
<variant>.v or .vhd reset logic.
<variant>.v or .vhd
Transceiver Reset
tx_digitalreset
dl_ltssm[4:0]
rx_analogreset
rx_digitalreset
altpcie_rs_serdes.v <variant>_serdes.v
<variant>_core.v or .vhd or .vhd
or .vhd
npor
Figure 7–3. Global Reset Signals for ×1 and ×4 Endpoints in the Soft IP Implementation
<variant>.v or .vhd
<variant>_core.v or .vhd
Reset Synchronization
Circuitry from Design altpcie_hip_pipen1b.v or .vhd
Example
Note (1)
Note (1) srst
Other Power
On Reset l2_exit
perst# crst hotrst_exit
Note (2) dlup_exit
dl_ltssm[4:0]
npor
rx_freqlocked
pll_locked tx_digitalreset
rx_pll_locked rx_analogreset
rx_digitalreset
<variant>_serdes.v or .vhd
tx_digitalreset
rx_analogreset
rx_digitalreset
pll_powerdown
gxb_powerdown rx_freqlocked
Note (3) pll_locked
rx_pll_locked
Note (4)
Upon exit from any reset, all port registers and state machines must be set to their
initialization values with the exception of sticky registers as defined Sections 7.4 and
7.6 of the PCI Express Base Specification. The PCI Express IP core has several reset
sources, both external and internal to implement these resets. These signals are
described in “Reset and Link Training Signals” on page 5–24.
To meet 100 ms PCIe configuration time, a reset controller implemented as a hard
macro handles the initial reset of the PMA, PCS, and PCI Express IP core. Once the
PCI Express link has been established, a soft reset controller handles warm and hot
resets. The <variant>_plus.v or .vhd IP cores include soft reset logic. You can use the
<variant>.v or .vhd if you want to specify your own soft reset sequence. Figure 7–4
provides a high-level block diagram for the reset logic.
<variant>_plus.v or .vhd
<variant>.v or .vhd
Reset Synchronization
Circuitry from Design perst_n
Example for Stratix V
pld_clrhip
PCIe Hard IP Core
pld_clrpmapcship
pld_clk_ready
<variant>_rs_hip.v pld_clk_in_use
or .vhd reset_status
Reset Logic
PHY IP
<variant>_serdes.v or .vhd
■ npor—The npor signal is used internally for all sticky registers that may not be
reset in L2 low power mode or by the fundamental reset). npor is typically
generated by a logical OR of the power-on-reset generator and the perst signal as
specified in the PCI Express card electromechanical specification.
■ srst— The srst signal initiates a synchronous reset of the datapath state
machines.
■ crst—The crst signal initiates a synchronous reset of the nonsticky configuration
space registers.
For endpoints, whenever the l2_exit, hotrst_exit, dlup_exit, or other
power-on-reset signals are asserted, srst and crst should be asserted for one or more
cycles for the soft IP implementation and for at least two clock cycles for hard IP
implementation.
Figure 7–5 provides a simplified view of the logic controlled by the reset signals.
<variant>.v or .vhd
<variant>_core.v or .vhd
altpcie_hip_pipen1b.v or .vhd
npor
SERDES Reset
State Machine
Configuration Space
Sticky Registers
Configuration Space
crst Non-Sticky Registers
srst
Datapath State Machines of
MegaCore Fucntion
For root ports, srst should be asserted whenever l2_exit, hotrst_exit, dlup_exit,
and power-on-reset signals are asserted. The root port crst signal should be asserted
whenever l2_exit, hotrst_exit and other power-on-reset signals are asserted. When
the perst# signal is asserted, srst and crst should be asserted for a longer period of
time to ensure that the root complex is stable and ready for link training.
■ npor—The npor reset is used internally for all sticky registers that may not be reset
in L2 low power mode or by the fundamental reset. npor is typically generated by
a logical OR of the power-on-reset generator and the perst# signal as specified in
the PCI Express Card electromechanical Specification.
■ rstn—The rstn signal is an asynchronous reset of the datapath state machines and
the nonsticky configuration space registers. Whenever the l2_exit, hotrst_exit,
dlup_exit, or other power-on-reset signals are asserted, rstn should be asserted
for one or more cycles. When the perst# signal is asserted, rstn should be asserted
for a longer period of time to ensure that the root complex is stable and ready for
link training.
Clocks
This section describes clocking for the PCI Express IP core. It includes the following
sections:
■ Avalon-ST Interface—Hard IP Implementation
■ Avalon-ST Interface—Soft IP Implementation
■ Clocking for a Generic PIPE PHY and the Simulation Testbench
■ Avalon-MM Interface–Hard IP and Soft IP Implementations
Figure 7–6. Arria II GX, Cyclone IV GX, HardCopy IV GX, Stratix IV GX, Stratix V GX ×1, ×4, or ×8 100 MHz Reference
Clock
<variant>.v or .vhd
100-MHz
Clock Source refclk <variant>_serdes.v or .vhd
(ALTGX or ALT2GX
Megafunction) core_clk_out
Calibration
Note (1)
Clock Source rx_cruclk 125 MHz - x1 or x4
pll_inclk 250 MHz - x8
cal_blk_clk
Reconfig reconfig_clk
Clock Source fixedclk
tx_clk_out
Application Clock
Fixed
Clock Source
<variant>_core.v or .vhd
(PCIe MegaCore Function)
pld_clk
The IP core contains a clock domain crossing (CDC) synchronizer at the interface
between the PHY/MAC and the DLL layers which allows the data link and
transaction layers to run at frequencies independent of the PHY/MAC and provides
more flexibility for the user clock interface to the IP core. Depending on system
requirements, this additional flexibility can be used to enhance performance by
running at a higher frequency for latency optimization or at a lower frequency to save
power.
Stratix IV GX Device
Clock
Domain Data Link Transaction
Trans- PHY
Crossing Layer Layer Adapter
ceiver MAC
(CDC) (DLL) (TL)
refclk p_clk
100 MHz
(1)
core_clk
pld_clk
core_clk_out
User Application User Clock
Domain
p_clk
The transceiver derives p_clk from the 100 MHz refclk signal that you must provide
to the device. The p_clk frequency is 250 MHz for Gen1 systems and 500 MHz for
Gen2. The PCI Express specification allows a +/- 300 ppm variation on the clock
frequency.
The CDC module implements the asynchronous clock domain crossing between the
PHY/MAC p_clk domain and the data link layer core_clk domain.
core_clk, core_clk_out
The core_clk signal is derived from p_clk. The core_clk_out signal is derived from
core_clk. Table 7–1 outlines the frequency requirements for core_clk and
core_clk_out to meet PCI Express link bandwidth constraints. An asynchronous
FIFO in the adapter decouples the core_clk and pld_clk clock domains.
pld_clk
The application layer and part of the adapter use this clock. Ideally, the pld_clk drives
all user logic within the application layer, including other instances of the PCI Express
IP core and memory interfaces. The pld_clk input clock pin is typically connected to
the core_clk_out output clock pin.
■ clk250_in – This signal is the clock for all of the ×8 IP core registers. All
synchronous application layer interface signals are synchronous to this clock.
clk250_in must be 250 MHz and it must be the exact same frequency as
clk250_out.
Figure 7–8. Clocking for the Generic PIPE Interface and the Simulation Testbench, All Families
<variant>.v or .vhd
PLL
Note 1
100-MHz refclk clk125_out Application Clock
pll_inclk core_clk_out
Clock Source clk250_out
<variant>_core.v or .vhd
(PCIe MegaCore Function)
clk250_out
clk500_out
pclk_in rate_ext
When you implement a generic PIPE PHY in the IP core, you must provide a 125 MHz
clock on the clk125_in input. Typically, the generic PIPE PHY provides the 125 MHz
clock across the PIPE interface.
All of the IP core interfaces, including the user application interface and the PIPE
interface, are synchronous to the clk125_in input. You are not required to use the
refclk and clk125_out signals in this case.
Figure 7–9. Arria GX, Stratix II GX, or Stratix IV GX PHY ×1 and ×4 and Arria II GX ×1, ×4, and ×8
with 100 MHz Reference Clock
<variant>.v or .vhd
100-MHz clk62.5_out
refclk <variant>_serdes.v or .vhd
Clock Source or
(ALTGX or ALT2GX
clk125_out
Megafunction)
<variant>_core.v or .vhd
(PCIe MegaCore Function)
pld_clk
<variant>.v or .vhd
clk250_in pld_clk
Figure 7–11. Clocking for the Generic PIPE Interface and the Simulation Testbench, All Device
Families
PLL
<variant>_core.v or .vhd
(PCIe MegaCore Function)
Application Clock
pld_clk
The system interconnect fabric drives the additional input clock, clk in Figure 7–12, to
the PCI Express IP core. In general, clk is the main clock of the SOPC Builder system
and originates from an external clock source.
PCI Express
clk125_out
MegaCore
Avalon-MM
ref_clk
Avalon
MM
If you turn on the Use PCIe core clock, option for the Avalon clock domain, you must
make appropriate clock assignments for all Avalon-MM components. Figure 7–13
illustrates a system that uses a single clock domain.
Figure 7–13. Connectivity for a PCI Express IP core with a Single Clock Domain
Table 7–2 summarizes the differences between the two Avalon clock modes.
This chapter provides detailed information about the PCI Express IP core. TLP
handling. It includes the following sections:
■ Supported Message Types
■ Transaction Layer Routing Rules
■ Receive Buffer Reordering
Vendor-defined Messages
Transmit Transmit
Vendor Defined Type 0 Yes No No
Receive Receive
Transmit Transmit
Vendor Defined Type 1 Yes No No
Receive Receive
Hot Plug Messages
Attention_indicator On Transmit Receive No Yes No
Attention_Indicator As per the recommendations in the PCI Express
Transmit Receive No Yes No Base Specification Revision 1.1 or 2.0, these
Blink
messages are not transmitted to the application
Attention_indicator_
Transmit Receive No Yes No layer in the hard IP implementation.
Off
For soft IP implementation, following the PCI
Power_Indicator On Transmit Receive No Yes No
Express Specification 1.0a, these messages are
Power_Indicator Blink Transmit Receive No Yes No transmitted to the application layer.
Power_Indicator Off Transmit Receive No Yes No
■ The transaction layer sends all memory and I/O requests, as well as completions
generated by the application layer and passed to the transmit interface, to the PCI
Express link.
■ The IP core can generate and transmit power management, interrupt, and error
signaling messages automatically under the control of dedicated signals.
Additionally, the IP core can generate MSI requests under the control of the
dedicated signals.
Memory Write or
I/O or Cfg Write I/O or Cfg Write
Message Read Request Read Completion
Request Completion
Request
Spec Core Spec Core Spec Core Spec Core Spec Core
Memory Write or
Posted
1) N 1) N 1) Y/N 1) N 1) Y/N 1) No
Message yes yes yes yes
Request 2)Y/N 2) N 2) Y 2) N 2) Y 2) No
I/O or
Configuration No No Y/N 3) Yes Y/N 4) Yes Y/N No Y/N No
Write Request
2) Y/N 2) No 2) No 2) No
I/O or
Configuration
Y/N No Yes Yes Yes Yes Y/N No Y/N No
Write
Completion
Notes to Table 8–2:
(1) CfgRd0 can pass IORd or MRd.
(2) CfgWr0 can IORd or MRd.
(3) CfgRd0 can pass IORd or MRd.
(4) CfrWr0 can pass IOWr.
(5) A Memory Write or Message Request with the Relaxed Ordering Attribute bit clear (b’0) must not pass any other Memory Write or Message
Request.
(6) A Memory Write or Message Request with the Relaxed Ordering Attribute bit set (b’1) is permitted to pass any other Memory Write or Message
Request.
(7) Endpoints, Switches, and Root Complex may allow Memory Write and Message Requests to pass Completions or be blocked by Completions.
(8) Memory Write and Message Requests can pass Completions traveling in the PCI Express to PCI directions to avoid deadlock.
(9) If the Relaxed Ordering attribute is not set, then a Read Completion cannot pass a previously enqueued Memory Write or Message Request.
(10) If the Relaxed Ordering attribute is set, then a Read Completion is permitted to pass a previously enqueued Memory Write or Message Request.
(11) Read Completion associated with different Read Requests are allowed to be blocked by or to pass each other.
(12) Read Completions for Request (same Transaction ID) must return in address order.
1 MSI requests are conveyed in exactly the same manner as PCI Express memory write
requests and are indistinguishable from them in terms of flow control, ordering, and
data integrity.
ECRC
ECRC ensures end-to-end data integrity for systems that require high reliability. You
can specify this option on the Capabilities page of the MegaWizard Plug-In Manager.
The ECRC function includes the ability to check and generate ECRC for all PCI
Express IP cores. The hard IP implementation can also forward the TLP with ECRC to
the receive port of the application layer. The hard IP implementation transmits a TLP
with ECRC from the transmit port of the application layer. When using ECRC
forwarding mode, the ECRC check and generate are done in the application layer.
You must select Implement advanced error reporting on the Capabilities page using
the parameter editor to enable ECRC forwarding, ECRC checking and ECRC
generation. When the application detects an ECRC error, it should send the
ERR_NONFATAL message TLP to the PCI Express IP core to report the error.
f For more information about error handling, refer to the Error Signaling and Logging
which is Section 6.2 of the PCI Express Base Specification, Rev. 2.0.
Table 9–1 summarizes the RX ECRC functionality for all possible conditions.
1 L0s ASPM can be optionally enabled when using the Arria GX,
Cyclone IV GX, HardCopy IV GX, Stratix II GX, Stratix IV GX, or
Stratix V GX internal PHY. It is supported for other device families to the
extent allowed by the attached external PHY device.
1 L1 ASPM is not supported when using the Arria GX, Cyclone IV GX,
HardCopy IV GX, Stratix II GX, Stratix IV GX, or Stratix V GX internal
PHY. It is supported for other device families to the extent allowed by the
attached external PHY device.
1 In the L2 state, only auxiliary power is available; main power is off. Because the
auxiliary power supply is insufficient to run an FPGA, Altera FPGAs provide
pseudo-support for this state. The pm_auxpwr signal, which indicates that auxiliary
power has been detected, can be hard-wired high.
An endpoint can exit the L0s or L1 state by asserting the pm_pme signal. Doing so,
initiates a power_management_event message which is sent to the root complex. If the
IP core is in theL0s or L1 state, the link exits the low-power state to send this message.
The pm_pme signal is edge-senstive. If the link is in the L2 state, a Beacon (or Wake#) is
generated to reinitialize the link before the core can generate the
power_management_event message. Wake# is hardwired to 0 for root ports.
How quickly a component powers up from a low-power state, and even whether a
component has the right to transition to a low power state in the first place, depends
on L1 Exit Latency, recorded in the Link Capabilities register, and Endpoint L0s
acceptable latency, recorded in the Device Capabilities register.
Exit Latency
A component’s exit latency is defined as the time it takes for the component to awake
from a low-power state to L0, and depends on the SERDES PLL synchronization time
and the common clock configuration programmed by software. A SERDES generally
has one transmit PLL for all lanes and one receive PLL per lane.
■ Transmit PLL—When transmitting, the transmit PLL must be locked.
■ Receive PLL—Receive PLLs train on the reference clock. When a lane exits electrical
idle, each receive PLL synchronizes on the receive data (clock data recovery
operation). If receive data has been generated on the reference clock of the slot,
and if each receive PLL trains on the same reference clock, the synchronization
time of the receive PLL is lower than if the reference clock is not the same for all
slots.
Each component must report in the configuration space if they use the slot’s reference
clock. Software then programs the common clock register, depending on the reference
clock of each component. Software also retrains the link after changing the common
clock register value to update each exit latency. Table 9–3 describes the L0s and L1 exit
latency. Each component maintains two values for L0s and L1 exit latencies; one for
the common clock configuration and the other for the separate clock configuration.
Acceptable Latency
The acceptable latency is defined as the maximum latency permitted for a component
to transition from a low power state to L0 without compromising system
performance. Acceptable latency values depend on a component’s internal buffering
and are maintained in a configuration space registry. Software compares the link exit
latency with the endpoint’s acceptable latency to determine whether the component is
permitted to use a particular power state.
■ For L0s, the connected component and the exit latency of each component
between the root port and endpoint is compared with the endpoint’s acceptable
latency. For example, for an endpoint connected to a root port, if the root port’s L0s
exit latency is 1 µs and the endpoint’s L0s acceptable latency is 512 ns, software
will probably not enable the entry to L0s for the endpoint.
■ For L1, software calculates the L1 exit latency of each link between the endpoint
and the root port, and compares the maximum value with the endpoint’s
acceptable latency. For example, for an endpoint connected to a root port, if the
root port’s L1 exit latency is 1.5 µs and the endpoint’s L1 exit latency is 4 µs, and
the endpoint acceptable latency is 2 µs, the exact L1 exit latency of the link is 4 µs
and software will probably not enable the entry to L1.
Some time adjustment may be necessary if one or more switches are located between
the endpoint and the root port.
1 To maximize performance, Altera recommends that you set L0s and L1 acceptable
latency values to their minimum values.
Slot Size 8 4 2 1 8 4 2 1 8 4 2 1
Lane 7:0,6:1,5:2,4:3,3:4, 3:4,2:5, 1:6, 7:0,6:1, 3:0,2:1, 3:0,
0:7 3:0 7:0 3:0 1:0 0:0
assignments 2:5,1:6,0:7 1:6,0:7 0:7 5:2,4:3 1:2,0:3 2:1
Figure 9–1 illustrates a PCI Express card with two, ×4 IP cores, a root port and an
endpoint on the top side of the PCB. Connecting the lanes without lane reversal
creates routing problems. Using lane reversal, solves the problem.
1 After you compile the design once, you can run the your pcie_constraints.tcl
command with the -no_compile option to suppress analysis and synthesis, and
decrease turnaround time during development.
1 In the MegaWizard Plug-In Manager flow, the script contains virtual pins for most
I/O ports on the PCI Express IP core to ensure that the I/O pin count for a device is
not exceeded. These virtual pin assignments must reflect the names used to connect to
each PCI Express instantiation.
f Refer to section 6.1 of PCI Express 2.0 Base Specification for a general description of PCI
Express interrupt support for endpoints.
MSI Interrupts
MSI interrupts are signaled on the PCI Express link using a single dword memory
write TLPs generated internally by the PCI Express IP core. The app_msi_req input
port controls MSI interrupt generation. When the input port asserts app_msi_req, it
causes a MSI posted write TLP to be generated based on the MSI configuration
register values and the app_msi_tc and app_msi_num input ports.
Figure 10–1 illustrates the architecture of the MSI handler block.
app_msi_req
app_msi_ack MSI Handler
app_msi_tc
Block
app_msi_num
pex_msi_num
app_int_sts
cfg_msicsr[15:0]
Figure 10–2 illustrates a possible implementation of the MSI handler block with a per
vector enable bit. A global application interrupt enable can also be implemented
instead of this per vector MSI.
app_int_sts
Vector 0
app_int_en0
app_msi_req0 msi_enable & Master Enable
R/W
app_int_sts0 app_msi_req
MSI
Arbitration app_msi_ack
Vector 1
app_int_en1
app_msi_req1
R/W
app_int_sts1
Root Complex
8 Requested
2 Allocated Interrupt
Block
Interrupt Register
Figure 10–4 illustrates the interactions among MSI interrupt signals for the root port
in Figure 10–3. The minimum latency possible between app_msi_req and app_msi_ack
is one clock cycle.
clk
app_msi_req
app_msi_tc[2:0] valid
app_msi_num[4:0] valid
app_msi_ack
MSI-X
You can enable MSI-X interrupts by turning on Implement MSI-X on the Capabilities
page using the parameter editor. If you turn on the Implement MSI-X option, you
should implement the MSI-X table structures at the memory space pointed to by the
BARs as part of your application.
MSI-X TLPs are generated by the application and sent through the transmit interface.
They are single dword memory writes so that Last DW Byte Enable in the TLP header
must be set to 4b’0000. MSI-X TLPs should be sent only when enabled by the MSI-X
enable and the function mask bits in the message control for MSI-X configuration
register. In the hard IP implementation, these bits are available on the tl_cfg_ctl
output bus.
f For more information about implementing the MSI-X capability structure, refer
Section 6.8.2. of the PCI Local Bus Specification, Revision 3.0.
Legacy Interrupts
Legacy interrupts are signaled on the PCI Express link using message TLPs that are
generated internally by the PCI Express IP core. The app_int_sts input port controls
interrupt generation. When the input port asserts app_int_sts, it causes an
Assert_INTA message TLP to be generated and sent upstream. Deassertion of the
app_int_sts input port causes a Deassert_INTA message TLP to be generated and
sent upstream. Refer to Figure 10–5 and Figure 10–6.
Figure 10–5 illustrates interrupt timing for the legacy interface. In this figure the
assertion of app_int_ack indicates that the Assert_INTA message TLP has been sent.
clk
app_int_sts
app_int_ack
Figure 10–6 illustrates the timing for deassertion of legacy interrupts. The assertion of
app_int_ack indicates that the Deassert_INTA message TLP has been sent.
clk
app_int_sts
app_int_ack
Table 10–1 describes 3 example implementations; 1 in which all 32 MSI messages are
allocated and 2 in which only 4 are allocated.
MSI interrupts generated for hot plug, power management events, and system errors
always use TC0. MSI interrupts generated by the application layer can use any traffic
class. For example, a DMA that generates an MSI at the end of a transmission can use
the same traffic control as was used to transfer data.
Throughput analysis requires that you understand the Flow Control Loop, shown in
“Flow Control Update Loop” on page 11–2. This section discusses the Flow Control
Loop and strategies to improve throughput. It covers the following topics:
■ Throughput of Posted Writes
■ Throughput of Non-Posted Reads
Each receiver also maintains a credit allocated counter which is initialized to the
total available space in the RX buffer (for the specific Flow Control class) and then
incremented as packets are pulled out of the RX buffer by the application layer. The
value of this register is sent as the FC Update DLLP value.
FC FC
Flow Credit Update Update Credit
FC Update DLLP
Control Limit DLLP DLLP Allocated
Gating Decode 6 Generate
Logic
Credits 5 Incr
(Credit 7
Consumed 4
Check) Counter
3
1 Allow 2 Incr
Rx
Data Packet Data Packet
Buffer
PCI
App Transaction Data Link Physical Physical Data Link Transaction App
Express
Layer Layer Layer Layer Layer Layer Layer Layer
Link
The following numbered steps describe each step in the Flow Control Update loop.
The corresponding numbers on Figure 11–1 show the general area to which they
correspond.
1. When the application layer has a packet to transmit, the number of credits
required is calculated. If the current value of the credit limit minus credits
consumed is greater than or equal to the required credits, then the packet can be
transmitted immediately. However, if the credit limit minus credits consumed is
less than the required credits, then the packet must be held until the credit limit is
increased to a sufficient value by an FC Update DLLP. This check is performed
separately for the header and data credits; a single packet consumes only a single
header credit.
2. After the packet is selected for transmission the credits consumed register is
incremented by the number of credits consumed by this packet. This increment
happens for both the header and data credit consumed registers.
3. The packet is received at the other end of the link and placed in the RX buffer.
4. At some point the packet is read out of the RX buffer by the application layer. After
the entire packet is read out of the RX buffer, the credit allocated register can be
incremented by the number of credits the packet has used. There are separate
credit allocated registers for the header and data credits.
5. The value in the credit allocated register is used to create an FC Update DLLP.
6. After an FC Update DLLP is created, it arbitrates for access to the PCI Express link.
The FC Update DLLPs are typically scheduled with a low priority; consequently, a
continuous stream of application layer TLPs or other DLLPs (such as ACKs) can
delay the FC Update DLLP for a long time. To prevent starving the attached
transmitter, FC Update DLLPs are raised to a high priority under the following
three circumstances:
a. When the last sent credit allocated counter minus the amount of received
data is less than MAX_PAYLOAD and the current credit allocated counter is
greater than the last sent credit counter. Essentially, this means the data sink
knows the data source has less than a full MAX_PAYLOAD worth of credits,
and therefore is starving.
b. When an internal timer expires from the time the last FC Update DLLP was
sent, which is configured to 30 µs to meet the PCI Express Base Specification for
resending FC Update DLLPs.
c. When the credit allocated counter minus the last sent credit allocated
counter is greater than or equal to 25% of the total credits available in the RX
buffer, then the FC Update DLLP request is raised to high priority.
After arbitrating, the FC Update DLLP that won the arbitration to be the next item
is transmitted. In the worst case, the FC Update DLLP may need to wait for a
maximum sized TLP that is currently being transmitted to complete before it can
be sent.
7. The FC Update DLLP is received back at the original write requester and the
credit limit value is updated. If packets are stalled waiting for credits, they can
now be transmitted.
To allow the write requester to transmit packets continuously, the credit allocated
and the credit limit counters must be initialized with sufficient credits to allow
multiple TLPs to be transmitted while waiting for the FC Update DLLP that
corresponds to the freeing of credits from the very first TLP transmitted.
Table 11–1 shows the delay components for the FC Update Loop when the PCI
Express IP core is implemented in a Stratix II GX device. The delay components are
independent of the packet length. The total delays in the loop increase with packet
length.
Table 11–1. FC Update Loop Delay in Nanoseconds Components For Stratix II GX (Part 1 of 2) (Note 1), (Note 2)
×8 Function ×4 Function ×1 Function
Delay Path
Min Max Min Max Min Max
From decrement transmit credit consumed counter
60 68 104 120 272 288
to PCI Express Link.
From PCI Express Link until packet is available at
124 168 200 248 488 536
Application Layer interface.
From Application Layer draining packet to
generation and transmission of Flow Control (FC)
60 68 120 136 216 232
Update DLLP on PCI Express Link (assuming no
arbitration delay).
Table 11–1. FC Update Loop Delay in Nanoseconds Components For Stratix II GX (Part 2 of 2) (Note 1), (Note 2)
×8 Function ×4 Function ×1 Function
Delay Path
Min Max Min Max Min Max
From receipt of FC Update DLLP on the PCI
Express Link to updating of transmitter's Credit 116 160 184 232 424 472
Limit register.
Note to Table 11–1:
(1) The numbers for other Gen1 PHYs are similar.
(2) Gen2 numbers are to be determined.
Based on the above FC Update Loop delays and additional arbitration and packet
length delays, Table 11–2 shows the number of flow control credits that must be
advertised to cover the delay. The RX buffer size must support this number of credits
to maintain full bandwidth.
These numbers take into account the device delays at both ends of the PCI Express
link. Different devices at the other end of the link could have smaller or larger delays,
which affects the minimum number of credits required. In addition, if the application
layer cannot drain received packets immediately in all cases, it may be necessary to
offer additional credits to cover this delay.
Setting the Desired performance for received requests to High on the Buffer Setup
page on the Parameter Settings tab using the parameter editor configures the RX
buffer with enough space to meet the above required credits. You can adjust the
Desired performance for received request up or down from the High setting to tailor
the RX buffer size to your delays and required performance.
Table 11–3. Completion Data Space (in Credit units) to Cover Read Round Trip Delay
×8 Function ×4 Function ×1 Function
Max Packet Size
Typical Typical Typical
128 120 96 56
256 144 112 80
512 192 160 128
1024 256 256 192
2048 384 384 384
4096 768 768 768
1 Note also that the completions can be broken up into multiple completions of smaller
packet size.
With multiple completions, the number of available credits for completion headers
must be larger than the completion data space divided by the maximum packet size.
Instead, the credit space for headers must be the completion data space (in bytes)
divided by 64, because this is the smallest possible read completion boundary. Setting
the Desired performance for received completions to High on the Buffer Setup page
when specifying parameter settings in your IP core configures the RX buffer with
enough space to meet the above requirements. You can adjust the Desired
performance for received completions up or down from the High setting to tailor the
RX buffer size to your delays and required performance.
You can also control the maximum amount of outstanding read request data. This
amount is limited by the number of header tag values that can be issued by the
application and by the maximum read request size that can be issued. The number of
header tag values that can be in use is also limited by the PCI Express IP core. For the
×8 function, you can specify 32 tags. For the ×1 and ×4 functions, you can specify up
to 256 tags, though configuration software can restrict the application to use only 32
tags. In commercial PC systems, 32 tags are typically sufficient to maintain optimal
read throughput.
Each PCI Express compliant device must implement a basic level of error
management and can optionally implement advanced error management. The Altera
PCI Express IP core implements both basic and advanced error reporting. Given its
position and role within the fabric, error handling for a root port is more complex than
that of an endpoint.
The PCI Express specifications defines three types of errors, outlined in Table 12–1.
The following sections describe the errors detected by the three layers of the PCI
Express protocol and describes error logging. It includes the following sections:
■ Physical Layer Errors
■ Data Link Layer Errors
■ Transaction Layer Errors
■ Error Reporting and Data Poisoning
Uncorrectable The received TLP is passed to the application and the application layer
Poisoned TLP received logic must take appropriate action in response to the poisoned TLP. In
(non-fatal)
PCI Express 1.1, this error is treated as an advisory error. Refer to
“2.7.2.2 Rules for Use of Data Poisoning” in the PCI Express Base
Specification 2.0 for more information about poisoned TLPs.
This error is caused by an ECRC check failing despite the fact that the
transaction layer packet is not malformed and the LCRC check is valid.
Uncorrectable The IP core handles this transaction layer packet automatically. If the
ECRC check failed (1)
(non-fatal) TLP is a non-posted request, the IP core generates a completion with
completer abort status. In all cases the TLP is deleted in the IP core and
not presented to the application layer.
This error occurs whenever a component receives any of the following
unsupported requests:
■ Type 0 configuration requests for a non-existing function.
■ Completion transaction for which the requester ID does not match the
bus/device.
■ Unsupported message.
■ A type 1 configuration request transaction layer packet for the TLP
from the PCIe link.
Unsupported request for Uncorrectable
■ A locked memory read (MEMRDLK) on native endpoint.
endpoints (non-fatal)
■ A locked completion transaction.
■ A 64-bit memory transaction in which the 32 MSBs of an address are
set to 0.
■ A memory or I/O transaction for which there is no BAR match.
■ A poisoned configuration write request (CfgWr0)
If the TLP is a non-posted request, the IP core generates a completion
with unsupported request status. In all cases the TLP is deleted in the IP
core and not presented to the application layer.
This error occurs whenever a component receives an unsupported
request including:
■ Unsupported message
Unsupported requests for
Uncorrectable fatal ■ A type 0 configuration request TLP
root port
■ A 64-bit memory transaction which the 32 MSBs of an address are
set to 0.
■ A memory transaction that does not match a Windows address
Uncorrectable ■ The completion packet for a request that was to I/O or configuration
Unexpected completion space has a length greater than 1 dword.
(non-fatal)
■ The completion status is Configuration Retry Status (CRS) in
response to a request that was not to configuration space.
In all of the above cases, the TLP is not presented to the application
layer; the IP core deletes it.
Other unexpected completion conditions can be detected by the
application layer and reported through the use of the cpl_err[2]
signal. For example, the application layer can report cases where the
total length of the received successful completions do not match the
original read request length.
This error occurs when a component receives a transaction layer packet
Uncorrectable that violates the FC credits allocated for this type of transaction layer
Receiver overflow (1)
(fatal) packet. In all cases the IP core deletes the TLP and it is not presented to
the application layer.
A receiver must never cumulatively issue more than 2047 outstanding
unused data credits or 127 header credits to the transmitter.
Flow control protocol error Uncorrectable
(FCPE) (1) (fatal) If Infinite credits are advertised for a particular TLP type (posted,
non-posted, completions) during initialization, update FC DLLPs must
continue to transmit infinite credits for that TLP type.
f Refer to the PCI Express Base Specification 1.0a, 1.1 or 2.0 for a description of the device
signaling and logging for an endpoint.
The IP core implements data poisoning, a mechanism for indicating that the data
associated with a transaction is corrupted. Poisoned transaction layer packets have
the error/poisoned bit of the header set to 1 and observe the following rules:
■ Received poisoned transaction layer packets are sent to the application layer and
status bits are automatically updated in the configuration space. In PCI Express
1.1, this is treated as an advisory error.
■ Received poisoned configuration write transaction layer packets are not written in
the configuration space.
■ The configuration space never generates a poisoned transaction layer packet; the
error/poisoned bit of the header is always set to 0.
Poisoned transaction layer packets can also set the parity error bits in the PCI
configuration space status register. Table 12–5 lists the conditions that cause parity
errors.
Poisoned packets received by the IP core are passed to the application layer. Poisoned
transmit transaction layer packets are similarly sent to the link.
This chapter describes features of the PCI Express IP core that you can use to
reconfigure the core after power-up. It includes the following sections:
■ Dynamic Reconfiguration
■ Transceiver Offset Cancellation
Dynamic Reconfiguration
The PCI Express IP core reconfiguration block allows you to dynamically change the
value of configuration registers that are read-only at run time.The PCI Express
reconfiguration block is only available in the hard IP implementation for the
Arria II GX, Cyclone IV GX, HardCopy IV GX and Stratix IV GX devices. Access to
the PCI Express reconfiguration block is available when you select Enable for the
PCIe Reconfig option on the System Settings page using the parameter editor. You
access this block using its Avalon-MM slave interface. For a complete description of
the signals in this interface, refer to “PCI Express Reconfiguration Block Signals—
Hard IP Implementation” on page 5–41.
The PCI Express reconfiguration block provides access to read-only configuration
registers, including configuration space, link configuration, MSI and MSI-X
capabilities, power management, and advanced error reporting.
The procedure to dynamically reprogram these registers includes the following three
steps:
1. Bring down the PCI Express link by asserting the pcie_reconfig_rstn reset signal,
if the link is already up. (Reconfiguration can occur before the link has been
established.)
2. Reprogram configuration registers using the Avalon-MM slave PCIe Reconfig
interface.
3. Release the npor reset signal.
1 You can use the LMI interface to change the values of configuration registers that are
read/write at run time. For more information about the LMI interface, refer to “LMI
Signals—Hard IP Implementation” on page 5–40.
Table 13–1 lists all of the registers that you can update using the PCI Express
reconfiguration block interface.
f Refer to the appropriate device handbook to determine the frequency range for your
device as follows: Transceiver Architecture in Volume II of the Arria II Device Handbook,
Transceivers in Volume 2 of the Cyclone IV Device Handbook, Transceiver Architecture in
Volume 2 of the Stratix IV Device Handbook, or Altera PHY IP User Guide for Stratix V
devices.
1 The <variant>_plus hard IP PCI Express endpoint automatically includes the circuitry
for offset cancellation, you do not have to add this circuitry manually.
The chaining DMA design example instantiates the offset cancellation circuitry in the
file <variation name_example_pipen1b>.<v or .vhd>. Figure 13–1 shows the
connections between the ALTGX_RECONFIG instance and the ALTGX instance. The
names of the Verilog HDL files in this figure match the names in the chaining DMA
design example described in Chapter 15, Testbench and Design Example.
<variant>.v or .vhd
<variant>_serdes.v or .vhd
altpcie_reconfig_4sgx.v or .vhd (ALTGX or ALT2GX
Megafunction )
ALTGX_RECONFIG Megafunction
busy busy
reconfig_fromgxb[16:0] reconfig_fromgxb[16:0]
reconfig_togxb[3:0] reconfig_togxb[3:0]
Reconfig reconfig_clk
reconfig_clk reconfig_clk
Clock Source
cal_blk_clk
fixedclk
tx_clk_out
Reconfig
Clock Source
<variant>_core.v or .vhd
(PCIe MegaCore Function)
Fixed
Clock Source
pld_clk
When an external PHY is selected, additional logic required to connect directly to the
external PHY is included in the <variation name> module or entity.
The user logic must instantiate this module or entity in the design. The
implementation details for each of these modes are discussed in the following
sections.
1 The refclk is the same as pclk, the parallel clock provided by the external PHY. This
document uses the terms refclk and pclk interchangeably.
■ clk125_out is a 125 MHz output that has the same phase-offset as refclk. The
clk125_out must drive the clk125_in input in the user logic as shown in the
Figure 14–1. The clk125_in is used to capture the incoming receive data and also
is used to drive the clk125_in input of the IP core.
■ clk125_early is a 125 MHz output that is phase shifted. This phase-shifted output
clocks the output registers of the transmit data. Based on your board delays, you
may need to adjust the phase-shift of this output. To alter the phase shift, copy the
PLL source file referenced in your variation file from the <path>/ip/PCI Express
Compiler/lib directory, where <path> is the directory in which you installed the
PCI Express Compiler, to your project directory. Then use the MegaWizard Plug In
Manager in the Quartus II software to edit the PLL source file to set the required
phase shift. Then add the modified PLL source file to your Quartus II project.
■ tlp_clk62p5 is a 62.5 MHz output that drives the tlp_clk input of the IP core
when the MegaCore internal clock frequency is 62.5 MHz.
Figure 14–1. 16-bit SDR Mode - 125 MHz without Transmit Clock
PCI Express
MegaCore Function
rxdata A Q1 A Q1
D Q4 D Q4
clk125_in
ENB ENB
txdata Q1 A
Q4 D
ENB
clk125_in
clk125_out
refclk (pclk)
Mode 1
PLL clk125_early tlp_clk_62p5
tlp_clk
refclk clk125_out
External connection
in user logic
Figure 14–2. 16-bit SDR Mode with a 125 MHz Source Synchronous Transmit Clock
PCI Express
MegaCore Function
rxdata A Q1
D Q4
clk125_in
ENB
txdata Q1 A
Q4 D
ENB
clk125_in
DDIO
txclk (~refclk) Q 1 A
Q4 D
tlp_clk
ENB
refclk (pclk)
refclk clk125_out
clk125_out
source file referenced in your variation file from the <path>/ip/PCI Express
Compiler/lib directory, where <path> is the directory in which you installed the
PCI Express Compiler, to your project directory. Then use the MegaWizard Plug In
Manager to edit the PLL source file to set the required phase shift. Then add the
modified PLL source file to your Quartus II project.
■ An optional 62.5 MHz TLP Slow clock is provided for ×1 implementations.
An edge detect circuit detects the relationships between the 125 MHz clock and the
250 MHz rising edge to properly sequence the 16-bit data into the 8-bit output
register.
Edge
Detect
& Sync
clk125_in
clk125_out
txdata txdata_h
Q1 A Q1 A
Q4 D Q4 D txdata_l
External connection
in user logic ENB ENB
refclk clk125_out
An edge detect circuit detects the relationships between the 125 MHz clock and the
250 MHz rising edge to properly sequence the 16-bit data into the 8-bit output
register.
Figure 14–4. 8-bit DDR Mode with a Source Synchronous Transmit Clock
Edge
Detect
& Sync
clk125_in
clk125_out
txdata txdata_h
Q1 A Q1 A
Q4 D Q4 D txdata_l
External connection
in user logic
ENB ENB
refclk clk125_out
txclk
Q1 A
Q4 D
ENB
PCI Express
MegaCore Function
rxdata rxdata_h
A Q1 A Q1 A Q1
D Q4 D Q4 D Q4
refclk (pclk) 250 MHz
A Q1 A Q1 rxdata_l
Edge
Detect D Q4 D Q4
& Sync
ENB ENB
clk125_in
clk125_in
tlp_clk
Mode 4 clk250_early
PLL
clk125_out
txdata txdata_h
Q1 A Q1 A
Q4 D Q4 D txdata_l
ENB ENB
External connection refclk
in user logic
An edge detect circuit detects the relationships between the 125 MHz clock and the
250 MHz rising edge to properly sequence the 16-bit data into the 8-bit output
register.
Figure 14–6. 8-bit SDR Mode with 250 MHz Source Synchronous Transmit Clock
PCI Express
MegaCore Function
rxdata rxdata_h
A Q1 A Q1 A Q1
D Q4 D Q4 D Q4
refclk (pclk) 250 MHz
A Q1 A Q1 rxdata_l
Edge
Detect D Q4 D Q4
& Sync
ENB ENB
clk125_in
clk125_zero
tlp_clk
Mode 4 clk250_early
PLL
clk125_out
txdata txdata_h
Q1 A Q1 A
Q4 D Q4 D txdata_l
txclk (~refclk)
Q1 A
Q4 D
clk250_early
ENB
The TI XIO1100 device has some additional control signals that need to be driven by
your design. These can be statically pulled high or low in the board design, unless
additional flexibility is needed by your design and you want to drive them from the
Altera device. These signals are shown in the following list:
■ P1_SLEEP must be pulled low. The PCI Express IP core requires the refclk (RX_CLK
from the XIO1100) to remain active while in the P1 powerdown state.
■ DDR_EN must be pulled high if your variation of the PCI Express IP core uses the 8-
bit DDR (w/TXClk) mode. It must be pulled low if the 16-bit SDR (w/TXClk) mode
is used.
■ CLK_SEL must be set correctly based on the reference clock provided to the
XIO1100. Consult the XIO1100 data sheet for specific recommendations.
1 You may need to modify the timing constraints to take into account the specific
constraints of your external PHY and your board design.
1 To meet timing for the external PHY in the Cyclone III family, you must avoid using
dual-purpose VREF pins.
If you are using an external PHY with a design that does not target a Cyclone II
device, you might need to modify the PLL instance required by some external PHYs
to function correctly.
To modify the PLL instance, follow these steps:
1. Copy the PLL source file referenced in your variation file from the <path>/ip/PCI
Express Compiler/lib directory, where <path> is the directory in which you
installed the PCI Express Compiler, to your project directory.
2. Use the MegaWizard Plug In Manager to edit the PLL to specify the device that the
PLL uses.
3. Add the modified PLL source file to your Quartus II project.
This chapter introduces the root port or endpoint design example including a
testbench, BFM, and a test driver module. When you create a PCI Express function
variation using the MegaWizard Plug-In Manager flow as described in Chapter 2,
Getting Started, the PCI Express compiler generates a design example and testbench
customized to your variation. This design example is not generated when using the
SOPC Builder flow.
When configured as an endpoint variation, the testbench instantiates a design
example and a root port BFM, which provides the following functions:
■ A configuration routine that sets up all the basic configuration registers in the
endpoint. This configuration allows the endpoint application to be the target and
initiator of PCI Express transactions.
■ A VHDL/Verilog HDL procedure interface to initiate PCI Express transactions to
the endpoint.
The testbench uses a test driver module, altpcietb_bfm_driver_chaining, to exercise
the chaining DMA of the design example. The test driver module displays
information from the endpoint configuration space registers, so that you can correlate
to the parameters you specified using the parameter editor.
When configured as a root port, the testbench instantiates a root port design example
and an endpoint model, which provides the following functions:
■ A configuration routine that sets up all the basic configuration registers in the root
port and the endpoint BFM. This configuration allows the endpoint application to
be the target and initiator of PCI Express transactions.
■ A Verilog HDL procedure interface to initiate PCI Express transactions to the
endpoint BFM.
The testbench uses a test driver module, altpcietb_bfm_driver_rp, to exercise the
target memory and DMA channel in the endpoint BFM. The test driver module
displays information from the root port configuration space registers, so that you can
correlate to the parameters you specified using the parameter editor. The endpoint
model consists of an endpoint variation combined with the chaining DMA
application described above.
PCI Express link monitoring and error injection capabilities are limited to those
provided by the IP core’s test_in and test_out signals. The following sections
describe the testbench, the design example, root port and endpoint BFMs in detail.
1 The Altera testbench and root port or endpoint BFM provide a simple method to do
basic testing of the application layer logic that interfaces to the variation. However,
the testbench and root port BFM are not intended to be a substitute for a full
verification environment. To thoroughly test your application, Altera suggests that
you obtain commercially available PCI Express verification IP and tools, or do your
own extensive hardware testing or both.
Your application layer design may need to handle at least the following scenarios that
are not possible to create with the Altera testbench and the root port BFM:
■ It is unable to generate or receive vendor defined messages. Some systems
generate vendor defined messages and the application layer must be designed to
process them. The IP core passes these messages on to the application layer which
in most cases should ignore them, but in all cases using the descriptor/data
interface must issue an rx_ack to clear the message from the RX buffer.
■ It can only handle received read requests that are less than or equal to the
currently set Maximum payload size option specified on Buffer Setup page using
the parameter editor. Many systems are capable of handling larger read requests
that are then returned in multiple completions.
■ It always returns a single completion for every read request. Some systems split
completions on every 64-byte address boundary.
■ It always returns completions in the same order the read requests were issued.
Some systems generate the completions out-of-order.
■ It is unable to generate zero-length read requests that some systems generate as
flush requests following some write transactions. The application layer must be
capable of generating the completions to the zero length read requests.
■ It uses fixed credit allocation.
The chaining DMA design example provided with the IP core handles all of the above
behaviors, even though the provided testbench cannot test them.
1 To run the testbench at the Gen1 data rate, you must have the Stratix II GX device
family installed. To run the testbench at the Gen2 data rate, you must have the
Stratix IV GX device family installed.
Additionally PCI Express link monitoring and error injection capabilities are limited
to those provided by the IP core’s test_in and test_out signals. The testbench and
root port BFM do not NAK any transactions.
Endpoint Testbench
The testbench is provided in the subdirectory <variation_name>_examples
/chaining_dma/testbench in your project directory. The testbench top level is named
<variation_name>_chaining_testbench.
This testbench simulates up to an ×8 PCI Express link using either the PIPE interfaces
of the root port and endpoints or the serial PCI Express interface. The testbench
design does not allow more than one PCI Express link to be simulated at a time.
Figure 15–1 presents a high level view of the testbench.
Chaining DMA
Test Driver Module
(altpcietb_bfm_driver_chaining)
The testbench has several VHDL generics/Verilog HDL parameters that control the
overall operation of the testbench. These generics are described in Table 15–1.
This testbench simulates up to an ×8 PCI Express link using either the PIPE interfaces
of the root port and endpoints or the serial PCI Express interface. The testbench
design does not allow more than one PCI Express link to be simulated at a time. The
top-level of the testbench instantiates four main modules:
■ <variation name>_example_rp_pipen1b—This is the example root port design that
includes your variation of the IP core. For more information about this module,
refer to “Root Port Design Example” on page 15–22.
■ altpcietb_pipe_phy—There are eight instances of this module, one per lane. These
modules connect the PIPE MAC layer interfaces of the root port and the endpoint.
The module mimics the behavior of the PIPE PHY layer to both MAC interfaces.
■ altpcietb_bfm_driver_rp—This module drives transactions to the root port BFM.
This is the module that you modify to vary the transactions sent to the example
endpoint design or your own design. For more information about this module, see
“Test Driver Module” on page 15–18.
The testbench has routines that perform the following tasks:
■ Generates the reference clock for the endpoint at the required frequency.
■ Provides a PCI Express reset at start up.
The testbench has several Verilog HDL parameters that control the overall operation
of the testbench. These parameters are described in Table 15–3.
Table 15–3. Testbench Verilog HDL Parameters for the Root Port Testbench
Allowed Default
Parameter Description
Values Value
Selects the PIPE interface (PIPE_MODE_SIM=1) or the serial
interface (PIPE_MODE_SIM= 0) for the simulation. The PIPE
interface typically simulates much faster than the serial
PIPE_MODE_SIM 0 or 1 1
interface. If the variation name file only implements the PIPE
interface, then setting PIPE_MODE_SIM to 0 has no effect and
the PIPE interface is always used.
Controls how many lanes are interconnected by the testbench.
Setting this generic value to a lower number simulates the
endpoint operating on a narrower PCI Express interface than
NUM_CONNECTED_LANES 1,2,4,8 8 the maximum.
If your variation only implements the ×1 IP core, then this
setting has no effect and only one lane is used.
Setting this parameter to a 1 speeds up simulation by making
many of the timing counters in the PCI Express IP core operate
FAST_COUNTERS 0 or 1 1 faster than specified in the PCI Express specification.This
parameter should usually be set to 1, but can be set to 0 if there
is a need to simulate the true time-out values.
1 The chaining DMA design example requires setting BAR 2 or BAR 3 to a minimum of
256 bytes. To run the DMA tests using MSI, you must set the MSI messages requested
parameter on the Capabilities page to at least 2.
Figure 15–3 shows a block diagram of the design example connected to an external
RC CPU.
Memory
Endpoint Memory
Read Write
Descriptor Descriptor
Table Table
Avalon-MM
interfaces
Avalon-ST
PCI Data
Express
DMA Write DMA Read MegaCore
Function PCI Express
Root Port
Variation
DMA Control/Status Register
RC Slave
■ The design example exercises the optional ECRC module when targeting the hard
IP implementation using a variation with both Implement advanced error
reporting and ECRC forwarding set to On in the “Capabilities Parameters” on
page 3–7.
■ The design example exercises the optional PCI Express reconfiguration block
when targeting the hard IP implementation created using the MegaWizard Plug-In
manager if you selected PCIe Reconfig on the System Settings page. Figure 15–4
illustrates this test environment.
Figure 15–4. Top-Level Chaining DMA Example for Simulation—Hard IP Implementation with PCIE Reconfig Block
to test_in[5,32] altpcierd_compliance_test.v
Chaining DMA Root Complex
CBB Test <variant>_plus
Driver Memory
Endpoint Memory Read Write
PCI
Avalon-ST Descriptor Descriptor
Express
Avalon-MM Table Table
MegaCore
interfaces
Function Data
Configuration Variation
(Hard IP
DMA Write DMA Read Avalon-MM Implementation)
PCIE Reconfig PCI Express
Driver Root Port
Control Register
Reset
CPU
RC Slave
Calibration
The example endpoint design application layer accomplishes the following objectives:
■ Shows you how to interface to the PCI Express IP core in Avalon-ST mode, or in
descriptor/data mode through the ICM. Refer to Appendix C, Incremental
Compile Module for Descriptor/Data Examples.
■ Provides a chaining DMA channel that initiates memory read and write
transactions on the PCI Express link.
■ If the ECRC forwarding functionality is enabled, provides a CRC Compiler IP core
to check the ECRC dword from the Avalon-ST RX path and to generate the ECRC
for the Avalon-ST TX path.
■ If the PCI Express reconfiguration block functionality is enabled, provides a test
that increments the Vendor ID register to demonstrate this functionality.
You can use the example endpoint design in the testbench simulation and compile a
complete design for an Altera device. All of the modules necessary to implement the
design example with the variation file are contained in one of the following files,
based on the language you use:
<variation name>_examples/chaining_dma/example_chaining.vhd
or
<variation name>_examples/chaining_dma/example_chaining.v
These files are created in the project directory when files are generated.
The following modules are included in the design example and located in the
subdirectory <variation name>_example/chaining_dma:
■ <variation name>_example_pipen1b—This module is the top level of the example
endpoint design that you use for simulation. This module is contained in the
following files produced by the MegaWizard interface:
The following modules are provided in both Verilog HDL and VHDL, and reflect each
hierarchical level:
■ altpcierd_example_app_chaining—This top level module contains the logic
related to the Avalon-ST interfaces as well as the logic related to the sideband
bus. This module is fully register bounded and can be used as an incremental
re-compile partition in the Quartus II compilation flow.
■ altpcierd_cdma_ast_rx, altpcierd_cdma_ast_rx_64,
altpcierd_cdma_ast_rx_128—These modules implement the Avalon-ST receive
port for the chaining DMA. The Avalon-ST receive port converts the Avalon-ST
interface of the IP core to the descriptor/data interface used by the chaining
DMA submodules. altpcierd_cdma_ast_rx is used with the descriptor/data IP
core (through the ICM). altpcierd_cdma_ast_rx_64 is used with the 64-bit
Avalon-ST IP core. altpcierd_cdma_ast_rx_128 is used with the 128-bit Avalon-
ST IP core.
■ altpcierd_cdma_ast_tx, altpcierd_cdma_ast_tx_64,
altpcierd_cdma_ast_tx_128—These modules implement the Avalon-ST
transmit port for the chaining DMA. The Avalon-ST transmit port converts the
descriptor/data interface of the chaining DMA submodules to the Avalon-ST
interface of the IP core. altpcierd_cdma_ast_tx is used with the descriptor/data
IP core (through the ICM). altpcierd_cdma_ast_tx_64 is used with the 64-bit
Avalon-ST IP core. altpcierd_cdma_ast_tx_128 is used with the 128-bit Avalon-
ST IP core.
■ altpcierd_cdma_ast_msi—This module converts MSI requests from the
chaining DMA submodules into Avalon-ST streaming data. This module is
only used with the descriptor/data IP core (through the ICM).
■ alpcierd_cdma_app_icm—This module arbitrates PCI Express packets for the
modules altpcierd_dma_dt (read or write) and altpcierd_rc_slave.
alpcierd_cdma_app_icm instantiates the endpoint memory used for the DMA
read and write transfer.
■ altpcierd_compliance_test.v—This module provides the logic to perform CBB
via a push button.
■ altpcierd_rc_slave—This module provides the completer function for all
downstream accesses. It instantiates the altpcierd_rxtx_downstream_intf and
altpcierd_reg_access modules. Downstream requests include programming of
chaining DMA control registers, reading of DMA status registers, and direct
read and write access to the endpoint target memory, bypassing the DMA.
■ altpcierd_rx_tx_downstream_intf—This module processes all downstream
read and write requests and handles transmission of completions. Requests
addressed to BARs 0, 1, 4, and 5 access the chaining DMA target memory
space. Requests addressed to BARs 2 and 3 access the chaining DMA control
and status register space using the altpcierd_reg_access module.
■ altpcierd_reg_access—This module provides access to all of the chaining DMA
control and status registers (BAR 2 and 3 address space). It provides address
decoding for all requests and multiplexing for completion data. All registers
are 32-bits wide. Control and status registers include the control registers in the
altpcierd_dma_prog_reg module, status registers in the
altpcierd_read_dma_requester and altpcierd_write_dma_requester modules,
■ altpcierd_cdma_tx_ecrc_64.v, altpcierd_cdma_tx_ecrc_64_altcrc.v,
altpcierd_cdma_tx_ecrc_64.vo—These modules contain the CRC32 generation
megafunction used in the altpcierd_ecrc_gen module. The .v files are used for
synthesis. The .vo file is used for simulation.
■ altpcierd_tx_ecrc_data_fifo, altpcierd_tx_ecrc_ctl_fifo,
altpcierd_tx_ecrc_fifo—These are FIFOs that are used in the ECRC generator
modules in altpcierd_cdma_ecrc_gen.
■ altpcierd_pcie_reconfig—This module is instantiated when the PCIE reconfig
option on the System Settings page is turned on. It consists of a Avalon-MM
master which drives the PCIE reconfig Avalon-MM slave of the device under
test. The module performs the following sequence using the Avalon-MM
interface prior to any PCI Express configuration sequence:
a. Turns on PCIE reconfig mode and resets the reconfiguration circuitry in the
hard IP implementation by writing 0x2 to PCIE reconfig address 0x0 and
asserting the reset signal, npor.
b. Reads the PCIE vendor ID register at PCIE reconfig address 0x89.
c. Increments the vendor ID register by one and writes it back to PCIE reconfig
address 0x89.
d. Removes the hard IP reconfiguration circuitry and SERDES from the reset state
by deasserting npor.
■ altpcierd_cplerr_lmi—This module transfers the err_desc_func0 from the
application to the PCE Express hard IP using the LMI interface. It also retimes
the cpl_err bits from the application to the hard IP. This module is only used
with the hard IP implementation of the IP core.
■ altpcierd_tl_cfg_sample—This module demultiplexes the configuration space
signals from the tl_cfg_ctl bus from the hard IP and synchronizes this
information, along with the tl_cfg_sts bus to the user clock (pld_clk)
domain. This module is only used with the hard IP implementation.
Table 15–6 describes the control fields of the of the DMA read and DMA write control
registers.
Table 15–6. Bit Definitions for the Control Field in the DMA Write Control Register and DMA Read Control Register
Bit Field Description
16 Reserved —
Enables interrupts of all descriptors. When 1, the endpoint DMA module issues an
17 MSI_ENA interrupt using MSI to the RC when each descriptor is completed. Your software
application or BFM driver can use this interrupt to monitor the DMA transfer status.
Enables the endpoint DMA module to write the number of each descriptor back to
18 EPLAST_ENA the EPLAST field in the descriptor table. Table 15–10 describes the descriptor
table.
When your RC reads the MSI capabilities of the endpoint, these register bits map to
the PCI Express back-end MSI signals app_msi_num [4:0]. If there is more than
[24:20] MSI Number one MSI, the default mapping if all the MSIs are available, is:
■ MSI 0 = Read
■ MSI 1 = Write
Table 15–6. Bit Definitions for the Control Field in the DMA Write Control Register and DMA Read Control Register
Bit Field Description
When the RC application software reads the MSI capabilities of the endpoint, this
[30:28] MSI Traffic Class value is assigned by default to MSI traffic class 0. These register bits map to the
PCI Express back-end signal app_msi_tc[2:0].
When 0, the DMA engine stops transfers when the last descriptor has been
executed. When 1, the DMA engine loops infinitely restarting with the first
31 DT RC Last Sync
descriptor when the last descriptor is completed. To stop the infinite loop, set this
bit to 0.
Table 15–7 defines the DMA status registers. These registers are read only.
Table 15–8 describes the fields of the DMA write status register. All of these fields are
read only.
Table 15–9 describes the fields in the DMA read status high register. All of these fields
are read only.
1 Note that the chaining DMA descriptor table should not cross a 4 KByte boundary.
Table 15–11 shows the layout of the descriptor fields following the descriptor header.
Table 15–11. Chaining DMA Descriptor Format Map
3122 21 16 150
Reserved Control Fields (refer to Table 15–12) DMA Length
Endpoint Address
RC Address Upper DWORD
RC Address Lower DWORD
Each descriptor provides the hardware information on one DMA transfer. Table 15–13
describes each descriptor field.
2. Sets up the chaining DMA descriptor header and starts the transfer data from the
endpoint memory to the BFM shared memory. The transfer calls the procedure
dma_set_header which writes four dwords, DW0:DW3 (Table 15–17), into the
DMA write register module.
After writing the last dword, DW3, of the descriptor header, the DMA write starts
the three subsequent data transfers.
3. Waits for the DMA write completion by polling the BFM share memory location
0x80c, where the DMA write engine is updating the value of the number of
completed descriptor. Calls the procedures rcmem_poll and msi_poll to determine
when the DMA write transfers have completed.
2. Sets up the chaining DMA descriptor header and starts the transfer data from the
BFM shared memory to the endpoint memory by calling the procedure
dma_set_header which writes four dwords, DW0:DW3, (Table 15–21) into the
DMA read register module.
After writing the last dword of the Descriptor header (DW3), the DMA read starts
the three subsequent data transfers.
3. Waits for the DMA read completion by polling the BFM share memory location
0x90c, where the DMA read engine is updating the value of the number of
completed descriptors. Calls the procedures rcmem_poll and msi_poll to
determine when the DMA read transfers have completed.
<var>_example_rp_pipen1b.v
Avalon-ST
VC1 Avalon-ST Interface
(altpcietb_bfm_vcintf_ast)
PCI Express
Config Bus
(altpcietb_tl_
Root Port PCI Express
cfg_sample.v) Variation
VC0 Avalon-ST Interface
(variation_name.v)
(altpcietb_bfm_vcintf_ast)
Avalon-ST
You can use the example root port design for Verilog HDL simulation. All of the
modules necessary to implement the example design with the variation file are
contained in <variation_name>_example_rp_pipen1b.v. This file is created in the
<variation_name>_examples/root_port subdirectory of your project when the PCI
Express IP core variant is generated.
The MegaWizard interface creates the variation files in the top-level directory of your
project, including the following files:
■ <variation_name>.v—the top level file of the PCI Express IP core variation. The file
instantiates the SERDES and PIPE interfaces, and the parameterized core,
<variation_name>_core.v.
■ <variation_name>_serdes.v —contains the SERDES.
■ <variation_name>_core.v—used in synthesizing <variation_name>.v.
■ <variation_name>_core.vo—used in simulating <variation_name>.v.
The following modules are generated for the design example in the subdirectory
<variation_name>_examples/root_port:
VC0 Interface
IP Functional Simulation (altpcietb_bfm_vcintf)
Model of the Root
Port Interface
(altpcietb_bfm_rpvar_64b_x8_pipen1b) VC1 Interface
(altpcietb_bfm_vcintf)
The functionality of each of the modules included in Figure 15–6 is explained below.
■ BFM shared memory (altpcietb_bfm_shmem VHDL package or Verilog HDL
include file)—The root port BFM is based on the BFM memory that is used for the
following purposes:
■ Storing data received with all completions from the PCI Express link.
■ Storing data received with all write transactions received from the PCI Express
link.
■ Sourcing data for all completions in response to read transactions received
from the PCI Express link.
■ Sourcing data for most write transactions issued to the PCI Express link. The
only exception is certain BFM write procedures that have a four-byte field of
write data passed in the call.
■ Storing a data structure that contains the sizes of and the values programmed
in the BARs of the endpoint.
A set of procedures is provided to read, write, fill, and check the shared memory from
the BFM driver. For details on these procedures, see “BFM Shared Memory Access
Procedures” on page 15–40.
■ BFM Read/Write Request Procedures/Functions (altpcietb_bfm_rdwr VHDL
package or Verilog HDL include file)— This package provides the basic BFM
procedure calls for PCI Express read and write requests. For details on these
procedures, see “BFM Read and Write Procedures” on page 15–34.
The ebfm_cfg_rp_ep executes the following steps to initialize the configuration space:
1. Sets the root port configuration space to enable the root port to send transactions
on the PCI Express link.
2. Sets the root port and endpoint PCI Express capability device control registers as
follows:
a. Disables Error Reporting in both the root port and endpoint. BFM does not
have error handling capability.
b. Enables Relaxed Ordering in both root port and endpoint.
c. Enables Extended Tags for the endpoint, if the endpoint has that capability.
d. Disables Phantom Functions, Aux Power PM, and No Snoop in both the root port
and endpoint.
e. Sets the Max Payload Size to what the endpoint supports because the root port
supports the maximum payload size.
f. Sets the root port Max Read Request Size to 4 KBytes because the example
endpoint design supports breaking the read into as many completions as
necessary.
g. Sets the endpoint Max Read Request Size equal to the Max Payload Size
because the root port does not support breaking the read request into multiple
completions.
3. Assigns values to all the endpoint BAR registers. The BAR addresses are assigned
by the algorithm outlined below.
a. I/O BARs are assigned smallest to largest starting just above the ending
address of BFM shared memory in I/O space and continuing as needed
throughout a full 32-bit I/O space. Refer to Figure 15–9 on page 15–33 for more
information.
b. The 32-bit non-prefetchable memory BARs are assigned smallest to largest,
starting just above the ending address of BFM shared memory in memory
space and continuing as needed throughout a full 32-bit memory space.
c. Assignment of the 32-bit prefetchable and 64-bit prefetchable memory BARS
are based on the value of the addr_map_4GB_limit input to the
ebfm_cfg_rp_ep. The default value of the addr_map_4GB_limit is 0.
If the addr_map_4GB_limit input to the ebfm_cfg_rp_ep is set to 0, then the 32-
bit prefetchable memory BARs are assigned largest to smallest, starting at the
top of 32-bit memory space and continuing as needed down to the ending
address of the last 32-bit non-prefetchable BAR.
However, if the addr_map_4GB_limit input is set to 1, the address map is
limited to 4 GByte, the 32-bit and 64-bit prefetchable memory BARs are
assigned largest to smallest, starting at the top of the 32-bit memory space and
continuing as needed down to the ending address of the last 32-bit non-
prefetchable BAR.
d. If the addr_map_4GB_limit input to the ebfm_cfg_rp_ep is set to 0, then the 64-
bit prefetchable memory BARs are assigned smallest to largest starting at the 4
GByte address assigning memory ascending above the 4 GByte limit
throughout the full 64-bit memory space. Refer to Figure 15–8 on page 15–32.
If the addr_map_4GB_limit input to the ebfm_cfg_rp_ep is set to 1, then the 32-
bit and the 64-bit prefetchable memory BARs are assigned largest to smallest
starting at the 4 GByte address and assigning memory by descending below
the 4 GByte address to addresses memory as needed down to the ending
address of the last 32-bit non-prefetchable BAR. Refer to Figure 15–7 on
page 15–31.
The above algorithm cannot always assign values to all BARs when there are a few
very large (1 GByte or greater) 32-bit BARs. Although assigning addresses to all
BARs may be possible, a more complex algorithm would be required to effectively
assign these addresses. However, such a configuration is unlikely to be useful in
real systems. If the procedure is unable to assign the BARs, it displays an error
message and stops the simulation.
4. Based on the above BAR assignments, the root port configuration space address
windows are assigned to encompass the valid BAR address ranges.
5. The endpoint PCI control register is set to enable master transactions, memory
address decoding, and I/O address decoding.
The ebfm_cfg_rp_ep procedure also sets up a bar_table data structure in BFM shared
memory that lists the sizes and assigned addresses of all endpoint BARs. This area of
BFM shared memory is write-protected, which means any user write accesses to this
area cause a fatal simulation error. This data structure is then used by subsequent
BFM procedure calls to generate the full PCI Express addresses for read and write
requests to particular offsets from a BAR. This procedure allows the testbench code
that accesses the endpoint application layer to be written to use offsets from a BAR
and not have to keep track of the specific addresses assigned to the BAR. Table 15–22
shows how those offsets are used.
The configuration routine does not configure any advanced PCI Express capabilities
such as the virtual channel capability or advanced error reporting capability.
0x001F FF80
Configuration Scratch
Space
Used by BFM routines ,
not writable by user calls
0x001F FFC0 or endpoint
BAR Table
Used by BFM routines ,
not writable by user calls
or endpoint
0x0020 0000
Endpoint Non -
Prefetchable Memory
Space BARs
Assigned Smallest to
Largest
Unused
0x001F FF80
Configuration Scratch
Space
Used by BFM routines
not writable by user calls
or endpoint
0x001F FFC0
BAR Table
Used by BFM routines
not writable by user calls
or endpoint
0x0020 0000
Endpoint Non -
Prefetchable Memory
Space BARs
Assigned Smallest to
Largest
BAR size dependent
Unused
BAR size dependent
Endpoint Memory Space
BARs
(Prefetchable 32 bit)
Assigned Smallest to
Largest
0x0000 0001 0000 0000
Endpoint Memory Space
BARs
(Prefetchable 64 bit)
Assigned Smallest to
Largest
BAR size dependent
Unused
0x001F FF80
Configuration Scratch
Space
Used by BFM routines
not writable by user calls
or endpoint
0x001F FFC0
BAR Table
Used by BFM routines
not writable by user calls
or endpoint
0x0020 0000
Endpoint /O Space
BARs
Assigned Smallest to
Largest
BAR size dependent
Unused
0xFFFF FFFF
1 The last subsection describes procedures that are specific to the chaining DMA design
example.
This section describes both VHDL procedures and functions and Verilog HDL
functions and tasks where applicable. Although most VHDL procedure are
implemented as Verilog HDL tasks, some VHDL procedures are implemented as
Verilog HDL functions rather than Verilog HDL tasks to allow these functions to be
called by other Verilog HDL functions. Unless explicitly specified otherwise, all
procedures in the following sections also are implemented as Verilog HDL tasks.
1 You can see some underlying Verilog HDL procedures and functions that are called by
other procedures that normally are hidden in the VHDL package. You should not call
these undocumented procedures.
ebfm_barwr Procedure
The ebfm_barwr procedure writes a block of data from BFM shared memory to an
offset from the specified endpoint BAR. The length can be longer than the configured
MAXIMUM_PAYLOAD_SIZE; the procedure breaks the request up into multiple
transactions as needed. This routine returns as soon as the last transaction has been
accepted by the VC interface module.
ebfm_barwr_imm Procedure
The ebfm_barwr_imm procedure writes up to four bytes of data to an offset from the
specified endpoint BAR.
Table 15–24. ebfm_barwr_imm Procedure
Location altpcietb_bfm_rdwr.v or altpcietb_bfm_rdwr.vhd
Syntax ebfm_barwr_imm(bar_table, bar_num, pcie_offset, imm_data, byte_len, tclass)
Address of the endpoint bar_table structure in BFM shared memory. The bar_table
structure stores the address assigned to each BAR so that the driver code does not need
Arguments bar_table
to be aware of the actual assigned addresses only the application specific offsets from
the BAR.
bar_num Number of the BAR used with pcie_offset to determine PCI Express address.
pcie_offset Address offset from the BAR base.
Data to be written. In VHDL, this argument is a std_logic_vector(31 downto 0). In
Verilog HDL, this argument is reg [31:0].In both languages, the bits written depend on
the length as follows:
Length Bits Written
imm_data 4 31 downto 0
3 23 downto 0
2 15 downto 0
1 7 downto 0
byte_len Length of the data to be written in bytes. Maximum length is 4 bytes.
tclass Traffic class to be used for the PCI Express transaction.
ebfm_barrd_wait Procedure
The ebfm_barrd_wait procedure reads a block of data from the offset of the specified
endpoint BAR and stores it in BFM shared memory. The length can be longer than the
configured maximum read request size; the procedure breaks the request up into
multiple transactions as needed. This procedure waits until all of the completion data
is returned and places it in shared memory.
ebfm_barrd_nowt Procedure
The ebfm_barrd_nowt procedure reads a block of data from the offset of the specified
endpoint BAR and stores the data in BFM shared memory. The length can be longer
than the configured maximum read request size; the procedure breaks the request up
into multiple transactions as needed. This routine returns as soon as the last read
transaction has been accepted by the VC interface module, allowing subsequent reads
to be issued immediately.
ebfm_cfgwr_imm_wait Procedure
The ebfm_cfgwr_imm_wait procedure writes up to four bytes of data to the specified
configuration register. This procedure waits until the write completion has been
returned.
ebfm_cfgwr_imm_nowt Procedure
The ebfm_cfgwr_imm_nowt procedure writes up to four bytes of data to the specified
configuration register. This procedure returns as soon as the VC interface module
accepts the transaction, allowing other writes to be issued in the interim. Use this
procedure only when successful completion status is expected.
ebfm_cfgrd_wait Procedure
The ebfm_cfgrd_wait procedure reads up to four bytes of data from the specified
configuration register and stores the data in BFM shared memory. This procedure
waits until the read completion has been returned.
Table 15–29. ebfm_cfgrd_wait Procedure
Location altpcietb_bfm_rdwr.v or altpcietb_bfm_rdwr.vhd
Syntax ebfm_cfgrd_wait(bus_num, dev_num, fnc_num, regb_ad, regb_ln, lcladdr, compl_status)
bus_num PCI Express bus number of the target device.
dev_num PCI Express device number of the target device.
fnc_num Function number in the target device to be accessed.
regb_ad Byte-specific address of the register to be written.
Length, in bytes, of the data read. Maximum length is four bytes. The regb_ln and the
regb_ln
regb_ad arguments cannot cross a DWORD boundary.
lcladdr BFM shared memory address of where the read data should be placed.
Completion status for the configuration transaction.
In VHDL, this argument is a std_logic_vector(2 downto 0) and is set by the
Arguments
procedure on return.
In Verilog HDL, this argument is reg [2:0].
In both languages, this is the completion status as specified in the PCI Express
compl_status specification:
Compl_StatusDefinition
000SC— Successful completion
001UR— Unsupported Request
010CRS — Configuration Request Retry Status
100CA — Completer Abort
ebfm_cfgrd_nowt Procedure
The ebfm_cfgrd_nowt procedure reads up to four bytes of data from the specified
configuration register and stores the data in the BFM shared memory. This procedure
returns as soon as the VC interface module has accepted the transaction, allowing
other reads to be issued in the interim. Use this procedure only when successful
completion status is expected and a subsequent read or write with a wait can be used
to guarantee the completion of this operation.
ebfm_cfg_rp_ep Procedure
The ebfm_cfg_rp_ep procedure configures the root port and endpoint configuration
space registers for operation. Refer to Table 15–31 for a description the arguments for
this procedure.
ebfm_cfg_decode_bar Procedure
The ebfm_cfg_decode_bar procedure analyzes the information in the BAR table for
the specified BAR and returns details about the BAR attributes.
Table 15–33. Constants: VHDL Subtype NATURAL or Verilog HDL Type INTEGER
Constant Description
SHMEM_FILL_ZEROS Specifies a data pattern of all zeros
SHMEM_FILL_BYTE_INC Specifies a data pattern of incrementing 8-bit bytes (0x00, 0x01, 0x02, etc.)
SHMEM_FILL_WORD_INC Specifies a data pattern of incrementing 16-bit words (0x0000, 0x0001, 0x0002, etc.)
Specifies a data pattern of incrementing 32-bit dwords (0x00000000, 0x00000001,
SHMEM_FILL_DWORD_INC
0x00000002, etc.)
Specifies a data pattern of incrementing 64-bit qwords (0x0000000000000000,
SHMEM_FILL_QWORD_INC
0x0000000000000001, 0x0000000000000002, etc.)
SHMEM_FILL_ONE Specifies a data pattern of all ones
shmem_write
The shmem_write procedure writes data to the BFM shared memory.
shmem_read Function
The shmem_read function reads data to the BFM shared memory.
shmem_fill Procedure
The shmem_fill procedure fills a block of BFM shared memory with a specified data
pattern.
shmem_chk_ok Function
The shmem_chk_ok function checks a block of BFM shared memory against a specified
data pattern.
Log Constants
The following constants are defined in the BFM Log package. They define the type of
message and their values determine whether a message is displayed or simulation is
stopped after a specific message. Each displayed message has a specific prefix, based
on the message type in Table 15–39.
You can suppress the display of certain message types. The default values
determining whether a message type is displayed are defined in Table 15–39. To
change the default message display, modify the display default value with a
procedure call to ebfm_log_set_suppressed_msg_mask.
Certain message types also stop simulation after the message is displayed.
Table 15–39 shows the default value determining whether a message type stops
simulation. You can specify whether simulation stops for particular messages with the
procedure ebfm_log_set_stop_on_msg_mask.
All of these log message constants are VHDL subtype natural or type integer for
Verilog HDL.
message In Verilog HDL, the message string is limited to a maximum of 100 characters. Also, because
Verilog HDL does not allow variable length strings, this routine strips off leading characters of
8’h00 before displaying the message.
Return always 0 Applies only to the Verilog HDL routine.
ebfm_log_set_suppressed_msg_mask Procedure
The ebfm_log_set_suppressed_msg_mask procedure controls which message types
are suppressed.
ebfm_log_set_stop_on_msg_mask Procedure
The ebfm_log_set_stop_on_msg_mask procedure controls which message types stop
simulation. This procedure alters the default behavior of the simulation when errors
occur as described in the Table 15–39 on page 15–44.
ebfm_log_open Procedure
The ebfm_log_open procedure opens a log file of the specified name. All displayed
messages are called by ebfm_display and are written to this log file as simulator
standard output.
ebfm_log_close Procedure
The ebfm_log_close procedure closes the log file opened by a previous call to
ebfm_log_open.
himage1
This function creates a one-digit hexadecimal string representation of the input
argument that can be concatenated into a larger message string and passed to
ebfm_display.
himage2
This function creates a two-digit hexadecimal string representation of the input
argument that can be concatenated into a larger message string and passed to
ebfm_display.
himage4
This function creates a four-digit hexadecimal string representation of the input
argument can be concatenated into a larger message string and passed to
ebfm_display.
himage8
This function creates an 8-digit hexadecimal string representation of the input
argument that can be concatenated into a larger message string and passed to
ebfm_display.
himage16
This function creates a 16-digit hexadecimal string representation of the input
argument that can be concatenated into a larger message string and passed to
ebfm_display.
dimage1
This function creates a one-digit decimal string representation of the input argument
that can be concatenated into a larger message string and passed to ebfm_display.
dimage2
This function creates a two-digit decimal string representation of the input argument
that can be concatenated into a larger message string and passed to ebfm_display.
dimage3
This function creates a three-digit decimal string representation of the input argument
that can be concatenated into a larger message string and passed to ebfm_display.
dimage4
This function creates a four-digit decimal string representation of the input argument
that can be concatenated into a larger message string and passed to ebfm_display.
dimage5
This function creates a five-digit decimal string representation of the input argument
that can be concatenated into a larger message string and passed to ebfm_display.
dimage6
This function creates a six-digit decimal string representation of the input argument
that can be concatenated into a larger message string and passed to ebfm_display.
dimage7
This function creates a seven-digit decimal string representation of the input
argument that can be concatenated into a larger message string and passed to
ebfm_display.
chained_dma_test Procedure
The chained_dma_test procedure is the top-level procedure that runs the chaining
DMA read and the chaining DMA write
Use_msi When set, the root port uses native PCI Express MSI to detect the DMA completion.
Use_eplast When set, the root port uses BFM shared memory polling to detect the DMA completion.
dma_rd_test Procedure
Use the dma_rd_test procedure for DMA reads from the endpoint memory to the
BFM shared memory.
Use_eplast When set, the root port uses BFM shared memory polling to detect the DMA completion.
dma_wr_test Procedure
Use the dma_wr_test procedure for DMA writes from the BFM shared memory to the
endpoint memory.
Use_eplast When set, the root port uses BFM shared memory polling to detect the DMA completion.
dma_set_rd_desc_data Procedure
Use the dma_set_rd_desc_data procedure to configure the BFM shared memory for
the DMA read.
dma_set_wr_desc_data Procedure
Use the dma_set_wr_desc_data procedure to configure the BFM shared memory for
the DMA write.
dma_set_header Procedure
Use the dma_set_header procedure to configure the DMA descriptor table for DMA
read or DMA write.
rc_mempoll Procedure
Use the rc_mempoll procedure to poll a given DWORD in a given BFM shared
memory location.
msi_poll Procedure
The msi_poll procedure tracks MSI completion from the endpoint.
dma_set_msi Procedure
The dma_set_msi procedure sets PCI Express native MSI for the DMA read or the
DMA write.
find_mem_bar Procedure
The find_mem_bar procedure locates a BAR which satisfies a given memory space
requirement.
dma_set_rclast Procedure
The dma_set_rclast procedure starts the DMA operation by writing to the endpoint
DMA register the value of the last descriptor to process (RCLast).
ebfm_display_verb Procedure
The ebfm_display_verb procedure calls the procedure ebfm_display when the global
variable DISPLAY_ALL is set to 1.
Figure 16–1. SOPC Builder Example System with Multiple PCI Express IP cores
Root Complex
External PHY
PCI Express Link
PIPE Interface
Endpoint Endpoint
Endpoint Endpoint
(SOPC Builder System) (SOPC Builder System) (SOPC Builder System)
Figure 16–2 shows how SOPC Builder integrates components and the PCI Express IP
core using the system interconnect fabric. This design example transfers data between
an on-chip memory buffer located on the Avalon-MM side and a PCI Express memory
buffer located on the root complex side. The data transfer uses the DMA component
which is programmed by the PCI Express software application running on the root
complex processor.
DMA
1 This design example uses Verilog HDL. You can substitute VHDL for Verilog HDL.
4. In the Directory, Name, Top-Level Entity page, enter the following information:
a. Specify the working directory for your project. This design example uses the
directory \sopc_pcie.
b. Specify the name of the project. This design example uses pcie_top. You must
specify the same name for both the project and the top-level design entity.
1 The Quartus II software specifies a top-level design entity that has the same name as
the project automatically. Do not change this name.
b. In the Target device box, select Auto device selected by the Fitter.
9. Click Next to close this page and display the EDA Tool Settings page.
10. Click Next to display the Summary page.
11. Check the Summary page to ensure that you have entered all the information
correctly.
12. Click Finish to complete the Quartus II project.
f Refer to Volume 4: SOPC Builder of the Quartus II Handbook for more information on
how to use SOPC Builder.
2. In the System Name box, type pcie_top, select Verilog under Target HDL, and
click OK.
1 This example design requires that you specify the same name for the SOPC Builder
system as for the top-level project file. However, this naming is not required for your
own design. If you want to choose a different name for the system file, you must
create a wrapper HDL file of the same name as the project's top level and instantiate
the generated system.
3. To add modules from the System Contents tab, under Interface Protocols in the
PCI folder, double-click the PCI Express Compiler<version_number> component.
3. Click the Avalon page and specify the settings in Table 16–3.
For an example of a system that uses the PCI Express core clock for the Avalon
clock domain see Figure 7–13 on page 7–15.
4. You can retain the default values for all parameters on the Capabilities, Buffer
Setup, and Power Management pages.
1 Your system is not yet complete, so you can ignore any error messages generated by
SOPC Builder at this stage.
3. Click Finish. The DMA Controller module is added to your SOPC Builder system.
4. In the System Contents tab, double-click the On-Chip Memory (RAM or ROM)
in the On-Chip subfolder of the Memory and Memory Controllers folder. This
component contains a slave port.
5. Retain the default settings for all other options and click Finish.
6. The On-chip Memory component is added to your SOPC Builder system.
4. To specify the interrupt number for DMA interrupt sender, irq, type a 0 in the IRQ
column next to the irq port.
5. In the Base column, enter the base addresses in Table 16–7 for all the slaves in your
system.
SOPC Builder generates informational messages indicating the actual PCI BAR
settings.
For this example BAR1:0 is sized to 4 KBytes or 12 bits; PCI Express requests that
match this BAR, are able to access the Avalon addresses from 0x80000000–
0x80000FFF. BAR2 is sized to 32 KBytes or 15 bits; matching PCI Express requests are
able to access Avalon addresses from 0x8000000–0x80007FFF. The DMA
control_port_slave is accessible at offsets 0x1000 through 0x103F from the
programmed BAR2 base address. The pci_express_compiler_0
Control_Register_Access slave port is accessible at offsets 0x4000–0x7FFF from the
programmed BAR2 base address. Refer to “PCI Express-to-Avalon-MM Address
Translation” on page 4–19 for additional information on this address mapping.
For Avalon-MM accesses directed to the pci_express_compiler_0 TX_interface port,
Avalon-MM address bits 19-0 are passed through to the PCI Express address
unchanged because a 1 MByte or 20–bit address page size was selected. Bit 20 is used
to select which one of the 2 address translation table entries is used to provide the
upper bits of the PCI Express address. Avalon address bits [31:21] are used to select
the TX_interface slave port. Refer to section “Avalon-MM-to-PCI Express Address
Translation” on page 4–20 for additional information on this address mapping.
1 You can also use any other supported third-party simulator to simulate your design.
SOPC Builder creates IP functional simulation models for all the system components.
The IP functional simulation models are the .vo or .vho files generated by SOPC
Builder in your project directory.
f For more information about IP functional simulation models, refer to Simulating Altera
Designs in volume 3 of the Quartus II Handbook.
The SOPC Builder-generated top-level file also integrates the simulation modules of
the system components and testbenches (if available), including the PCI Express
testbench. The Altera-provided PCI Express testbench simulates a single link at a
time. You can use this testbench to verify the basic functionality of your PCI Express
Compiler system. The default configuration of the PCI Express testbench is
predefined to run basic PCI Express configuration transactions to the PCI Express
device in your SOPC Builder generated system. You can edit the PCI Express
testbench altpcietb_bfm_driver.v or altpcietb_bfm_driver.vhd file to add other PCI
Express transactions, such as memory read (MRd) and memory write (MWr).
For more information about the PCI Express BFM, refer to Chapter 15, Testbench and
Design Example.
For this design example, perform the following steps:
1. Before simulating the system, if you are running the Verilog HDL design example,
edit the altpcietb_bfm_driver.v file in the
c:\sopc_pci\pci_express_compiler_examples\sopc\testbench directory to
enable target and DMA tests. Set the following parameters in the file to one:
■ parameter RUN_TGT_MEM_TST = 1;
■ parameter RUN_DMA_MEM_TST = 1;
If you are running the VHDL design example, edit the altpcietb_bfm_driver.vhd
in the c:\sopc_pci\pci_express_compiler_examples\sopc\testbench directory to
set the following parameters to one.
■ RUN_TGT_MEM_TST : std_logic := '1';
■ RUN_DMA_MEM_TST : std_logic := '1';
5. To generate waveform output for the simulation, type the following command at
the simulator command prompt:
do wave_presets.do r
6. To simulate the design, type the following command at the simulator prompt:
run -all r
The PCI Express Compiler test driver performs the following transactions with
display status of the transactions displayed in the ModelSim simulation message
window:
■ Various configuration accesses to the PCI Express IP core in your system after
the link is initialized
■ Setup of the Address Translation Table for requests that are coming from the
DMA component
■ Setup of the DMA controller to read 4 KBytes of data from the Root Port BFM’s
shared memory
■ Setup of the DMA controller to write the same 4 KBytes of data back to the
Root Port BFM’s shared memory
■ Data comparison and report of any mismatch
7. Exit the ModelSim tool after it reports successful completion.
Example 16–1. Transcript from Simulation of Requester/Completer PCI Express Hard IP Implementation
# INFO: 464 ns Completed initial configuration of Root Port.
# INFO: 3641 ns EP LTSSM State: DETECT.ACTIVE
# INFO: 3657 ns RP LTSSM State: DETECT.ACTIVE
# INFO: 3689 ns EP LTSSM State: POLLING.ACTIVE
# INFO: 6905 ns RP LTSSM State: POLLING.ACTIVE
# INFO: 9033 ns RP LTSSM State: POLLING.CONFIG
# INFO: 9353 ns EP LTSSM State: POLLING.CONFIG
# INFO: 10441 ns EP LTSSM State: CONFIG.LINKWIDTH.START
# INFO: 10633 ns RP LTSSM State: CONFIG.LINKWIDTH.START
# INFO: 11273 ns EP LTSSM State: CONFIG.LINKWIDTH.ACCEPT
# INFO: 11801 ns RP LTSSM State: CONFIG.LINKWIDTH.ACCEPT
# INFO: 12121 ns RP LTSSM State: CONFIG.LANENUM.WAIT
# INFO: 12745 ns EP LTSSM State: CONFIG.LANENUM.WAIT
# INFO: 12937 ns EP LTSSM State: CONFIG.LANENUM.ACCEPT
# INFO: 13081 ns RP LTSSM State: CONFIG.LANENUM.ACCEPT
# INFO: 13401 ns RP LTSSM State: CONFIG.COMPLETE
# INFO: 13849 ns EP LTSSM State: CONFIG.COMPLETE
# INFO: 14937 ns EP LTSSM State: CONFIG.IDLE
# INFO: 15129 ns RP LTSSM State: CONFIG.IDLE
# INFO: 15209 ns RP LTSSM State: L0
# INFO: 15465 ns EP LTSSM State: L0
# INFO: 21880 ns EP PCI Express Link Status Register (1041):
# INFO: 21880 ns Negotiated Link Width: x4
# INFO: 21880 ns Slot Clock Config: System Reference Clock Used
# INFO: 22769 ns RP LTSSM State: RECOVERY.RCVRLOCK
# INFO: 23177 ns EP LTSSM State: RECOVERY.RCVRLOCK
# INFO: 23705 ns EP LTSSM State: RECOVERY.RCVRCFG
# INFO: 23873 ns RP LTSSM State: RECOVERY.RCVRCFG
# INFO: 25025 ns RP LTSSM State: RECOVERY.IDLE
# INFO: 25305 ns EP LTSSM State: RECOVERY.IDLE
# INFO: 25385 ns EP LTSSM State: L0
# INFO: 25537 ns RP LTSSM State: L0
# INFO: 26384 ns Current Link Speed: 2.5GT/s
# INFO: 27224 ns EP PCI Express Link Control Register (0040):
# INFO: 27224 ns Common Clock Config: System Reference Clock Used
# INFO: 28256 ns EP PCI Express Capabilities Register (0001):
# INFO: 28256 ns Capability Version: 1
# INFO: 28256 ns Port Type: Native Endpoint
# INFO: 28256 ns EP PCI Express Link Capabilities Register (0103F441):
# INFO: 28256 ns Maximum Link Width: x4
# INFO: 28256 ns Supported Link Speed: 2.5GT/s
# INFO: 28256 ns L0s Entry: Supported
# INFO: 28256 ns L1 Entry: Not Supported
# INFO: 33008 ns BAR1:0 4 KBytes 00000001 00000000 Prefetchable
# INFO: 33008 ns BAR2 32 KBytes 00200000 Non-Prefetchable
# INFO: 34104 ns Completed configuration of Endpoint BARs.
# INFO: 35064 ns Starting Target Write/Read Test.
# INFO: 35064 ns Target BAR = 0
# INFO: 35064 ns Length = 004096, Start Offset = 000000
# INFO: 47272 ns Target Write and Read compared okay!
# INFO: 47272 ns Starting DMA Read/Write Test.
# INFO: 47272 ns Setup BAR = 2
# INFO: 47272 ns Length = 004096, Start Offset = 000000
# INFO: 55761 ns Interrupt Monitor: Interrupt INTA Asserted
# INFO: 55761 ns Clear Interrupt INTA
# INFO: 56737 ns Interrupt Monitor: Interrupt INTA Deasserted
# INFO: 66149 ns MSI recieved!
#INFO: 66149 ns DMA Read and Write compared okay!
1 You can use the same testbench to simulate the Completer-Only single dword IP core
by changing the settings in the driver file. For the Verilog HDL design example, edit
the altpcietb_bfm_driver.v file in the
c:\sopc_pci\pci_express_compiler_examples\sopc\testbench directory to enable
target memory tests and specify the completer-only single dword variant. Set the
following parameters in the file to one:
■ parameter RUN_TGT_MEM_TST = 1;
■ parameter RUN_DMA_MEM_TST = 0;
■ parameter AVALON_MM_LITE = 1;
If you are running the VHDL design example, edit the altpcietb_bfm_driver.vhd
in the c:\sopc_pci\pci_express_compiler_examples\sopc\testbench directory to
set the following parameters to one.
■ RUN_TGT_MEM_TST : std_logic := '1';
■ RUN_DMA_MEM_TST : std_logic := '0';
■ AVALON_MM_LITE : std_logic := '1';
Program a Device
After you compile your design, you can program your targeted Altera device and
verify your design in hardware.
f For more information about IP functional simulation models, see the Simulating Altera
Designs chapter in volume 3 of the Quartus II Handbook.
As you bring up your PCI Express system, you may face a number of issues related to
FPGA configuration, link training, BIOS enumeration, data transfer, and so on. This
chapter suggests some strategies to resolve the common issues that occur during
hardware bring-up.
No No
Link Training
The physical layer automatically performs link training and initialization without
software intervention. This is a well-defined process to configure and initialize the
device's physical layer and link so that PCIe packets can be transmitted. If you
encounter link training issues, viewing the actual data in hardware should help you
determine the root cause. You can use the following tools to provide hardware
visibility:
■ SignalTap® II Embedded Logic Analyzer
■ Third-party PCIe analyzer
f For more information about link training, refer to the “Link Training and Status State
Machine (LTSSM) Descriptions” section of PCI Express Base Specification 2.0.
f For more information about SignalTap, refer to the Design Debugging Using the
SignalTap II Embedded Logic Analyzer chapter in volume 3 of the Quartus II Handbook.
f The PHY Interface for PCI Express Architecture specification is available on the Intel
website (www.intel.com).
This chapter describes the PCI Express IP core that employs the legacy
descriptor/data interface. It includes the following sections:
■ Descriptor/Data Interface
■ Incremental Compile Module for Descriptor/Data Examples
1 Altera recommends choosing the Avalon-ST or Avalon-MM interface for all new
designs for compatibility with the hard IP implementation of the PCI Express IP core.
Descriptor/Data Interface
When you use the MegaWizard Plug-In Manager to generate a PCI Express endpoint
with the descriptor/data interface, the MegaWizard interface generates the
transaction, data link, and PHY layers. Figure B–1 illustrates this interface.
Tx
With information sent The data link layer The physical layer
by the application ensures packet encodes the packet
tx_desc layer, the transaction integrity, and adds a and transmits it to the
tx_data layer generates a TLP, sequence number and receiving device on the
which includes a link cyclic redundancy other side of the link.
header and, optionally, code (LCRC) check to
a data payload. the packet.
The transaction layer The data link layer The physical layer
disassembles the verifies the packet's decodes the packet
Rx
transaction and sequence number and and transfers it to the
transfers data to the checks for errors. data link layer.
rx_desc
application layer in a
rx_data
form that it recognizes.
RX and TX ports use a data/descriptor style interface, which presents the application
with a descriptor bus containing the TLP header and a separate data bus containing
the TLP payload. A single-cycle-turnaround handshaking protocol controls the
transfer of data.
Figure B–2 shows all the signals for PCI Express IP core using the descriptor/data
interface.
cpl_err[2:0]
Completion Interface cpl_pending
ko_cpl_spc_vcn[19:0]
cfg_tcvcmap[23:0]
cfg_busdev[12:0]
cfg_prmcsr[31:0]
Configuration cfg_devcsr[31:0]
cfg_linkcsr[31:0]
cfg_msicsr[15:0]
test_in[31:0] (4)
Test Interface test_out[511:0]
In Figure B–2, the transmit and receive signals apply to each implemented virtual
channel, while configuration and global signals are common to all virtual channels on
a link.
Table B–1 lists the interfaces for this MegaCore with links to the sections that describe
each interface.
Table B–1. Signal Groups in the PCI Express IP core using the Descriptor/Data Interface
Signal Group Description
Logical
Descriptor RX “Receive Datapath Interface Signals” on page B–3
Descriptor TX “Transmit Operation Interface Signals” on page B–12
Clock “Clock Signals—Soft IP Implementation” on page 5–23
Reset “Reset and Link Training Signals” on page 5–24
Interrupt “PCI Express Interrupts for Endpoints” on page 5–29
Configuration space “Configuration Space Signals—Soft IP Implementation” on page 5–39
“PCI Express Reconfiguration Block Signals—Hard IP Implementation”
Power management
on page 5–41
“Completion Interface Signals for Descriptor/Data Interface” on
Completion
page B–25
Physical
Transceiver Control “Transceiver Control” on page 5–53
Serial “Serial Interface Signals” on page 5–55
Pipe “PIPE Interface Signals” on page 5–56
Test
Test “Test Interface Signals—Soft IP Implementation” on page 5–60
1 In the following tables, transmit interface signal names with a <n> suffix are for
virtual channel <n>. If the IP core implements multiple virtual channels, there are an
additional sets of signals for each virtual channel number.
rx_req
rx_ack
rx_desc[135:128] valid
rx_desc[127:64] valid
rx_desc[63:0] valid
Bit 126 of the descriptor indicates the type of transaction layer packet in transit:
■ rx_desc[126]when set to 0: transaction layer packet without data
■ rx_desc[126] when set to 1: transaction layer packet with data
Receive acknowledge. This signal is asserted for 1 clock cycle when the application
interface acknowledges the descriptor phase and starts the data phase, if any. The
rx_ack<n> I rx_req signal is deasserted on the following clock cycle and the rx_desc is ready for
the next transmission. rx_ack is independent of rx_dv and rx_data. It cannot be
used to backpressure rx_data. You can use rx_ws to insert wait states.
Receive abort. This signal is asserted by the application interface if the application
cannot accept the requested descriptor. In this case, the descriptor is removed from the
rx_abort<n> I receive buffer space, flow control credits are updated, and, if necessary, the application
layer generates a completion transaction with unsupported request (UR) status on the
transmit side.
Receive retry. The application interface asserts this signal if it is not able to accept a
non-posted request. In this case, the application layer must assert rx_mask<n> along
rx_retry<n> I
with rx_retry<n> so that only posted and completion transactions are presented on
the receive interface for the duration of rx_mask<n>.
The IP core generates the eight MSBs of this signal with BAR decoding information.
Refer to Table B–3.
clk
rx_data[63:32] DW 0 DW 2 DW 4
rx_data[31:0] DW 1 DW 3
clk
rx_data[63:32] DW 1 DW 3
rx_data[31:0] DW 0 DW 2 DW 4
Receive byte enable. These signals qualify data on rx_data[63:0]. Each bit of the
rx_be<n>[7:0] O signal indicates whether the corresponding byte of data on rx_data[63:0] is valid.
These signals are not available in the ×8 IP core.
Receive wait states. With this signal, the application layer can insert wait states to
rx_ws<n> I
throttle data transfer.
Note to Table B–4:
(1) For all signals, <n> is the virtual channel number which can be 0 or 1.
clk
rx_req
rx_ack
rx_abort
rx_retry
rx_mask
rx_dfr
rx_dv
rx_ws
Data
Signals rx_data[63:32]
rx_data[31:0]
rx_be[7:0]
Each virtual channel has a dedicated datapath and associated buffers and no ordering
relationships exist between virtual channels. While one virtual channel may be
temporarily blocked, data flow continues across other virtual channels without
impact. Within a virtual channel, reordering is mandatory only for non-posted
transactions to prevent deadlock. Reordering is not implemented in the following
cases:
■ Between traffic classes mapped in the same virtual channel
■ Between posted and completion transactions
■ Between transactions of the same type regardless of the relaxed-ordering bit of
the transaction layer packet
In Figure B–6, the IP core receives a memory read request transaction of 4 DWORDS
that it cannot immediately accept. A second transaction (memory write transaction of
one DWORD) is waiting in the receive buffer. Bit 2 of rx_data[63:0] for the memory
write request is set to 1.
In clock cycle three, transmission of non-posted transactions is not permitted for as
long as rx_mask is asserted.
Flow control credits are updated only after a transaction layer packet has been
extracted from the receive buffer and both the descriptor phase and data phase (if
any) have ended. This update happens in clock cycles 8 and 12 in Figure B–6.
clk
rx_req
rx_ack
rx_abort
rx_retry
rx_mask
rx_dfr
rx_dv
rx_ws
Data
Signals rx_data[63:32] DW 0
rx_data[31:0]
Transaction Aborted
In Figure B–7, a memory read of 16 DWORDS is sent to the application layer. Having
determined it will never be able to accept the transaction layer packet, the application
layer discards it by asserting rx_abort. An alternative design might implement logic
whereby all transaction layer packets are accepted and, after verification, potentially
rejected by the application layer. An advantage of asserting rx_abort is that
transaction layer packets with data payloads can be discarded in one clock cycle.
Having aborted the first transaction layer packet, the IP core can transmit the second,
a three DWORD completion in this case. The IP core does not treat the aborted
transaction layer packet as an error and updates flow control credits as if the
transaction were acknowledged. In this case, the application layer is responsible for
generating and transmitting a completion with completer abort status and to signal a
completer abort event to the IP core configuration space through assertion of cpl_err.
In clock cycle 6, rx_abort is asserted and transmission of the next transaction begins
on clock cycle number.
clk
rx_req
rx_ack
rx_abort
rx_retry
rx_mask
rx_dfr
rx_dv
rx_ws
Data
Signals rx_data[63:32] DW 1
rx_data[31:0] DW 0 DW 2
Normally, rx_dfr is asserted on the same or following clock cycle as rx_req; however,
in this case the signal is already asserted until clock cycle 7 to signal the end of
transmission of the first transaction. It is immediately reasserted on clock cycle eight
to request a data phase for the second transaction.
clk
rx_req
rx_ack
rx_abort
rx_retry
rx_mask
rx_dfr
rx_dv
Data rx_ws
Signals
rx_data[63:32] DW 1 DW 3 DW 5 DW 7 DW 0 DW 2
rx_data[31:0] DW 0 DW 2 DW 4 DW 6 DW 1
■ The application layer deasserts rx_ws at clock cycle 11, thereby ending an
application interface-induced wait state.
Figure B–9. RX Transaction with a Data Payload and Wait States Waveform
1 2 3 4 5 6 7 8 9 10 11 12
clk
rx_req
rx_ack
rx_desc[135:128] valid
rx_desc[127:64] CPLD 4 DW
Descriptor
Signals rx_desc[63:0] valid
rx_abort
rx_retry
rx_mask
rx_dfr
rx_dv
rx_ws
Data
Signals rx_data[63:32] DW 0 DW 2
rx_data[31:0] DW 1 DW 3
Table B–5. RX Minimum and Maximum Latency Values in Clock Cycles Between Receive Signals
Signal 1 Signal 2 Min Typical Max Notes
rx_req rx_ack 1 1 N —
Always asserted on the same clock cycle if a data payload is present,
rx_req rx_dfr 0 0 0 except when a previous data transfer is still in progress. Refer to
Figure B–8 on page B–10.
rx_req rx_dv 1 1-2 N Assuming data is sent.
rx_retry rx_req 1 2 N rx_req refers to the next transaction request.
1 In the following tables, transmit interface signal names suffixed with <n> are for
virtual channel <n>. If the IP core implements additional virtual channels, there are
an additional set of signals suffixed with the virtual channel number.
tx_ws<n> O ■ To give a high-priority virtual channel or the retry buffer transmission priority when the
link is initialized with fewer lanes than are permitted by the link.
If the IP core is not ready to acknowledge a descriptor phase (through assertion of tx_ack
on the following cycle), it will automatically assert tx_ws to throttle transmission. When
tx_dv is not asserted, tx_ws should be ignored.
clk
tx_data[63:32] DW 0 DW 2 DW 4
tx_data[31:0] DW 1 DW 3
clk
tx_data[63:32] DW 1 DW 3
tx_data[31:0] DW 0 DW 2 DW 4
The application layer must provide a properly formatted TLP on the TX Data interface. The
number of data cycles must be correct for the length and address fields in the header.
Issuing a packet with an incorrect number data cycles will result in the TX interface
hanging and unable to accept further requests.
Note to Table B–7:
(1) For all signals, <n> is the virtual channel number which can be 0 or 1.
Table B–9 shows the bit information for tx_cred<n>[21:0] for the ×1 and ×4 IP cores.
Table B–10 shows the bit information for tx_cred<n>[65:0] for the ×8 IP cores.
Figure B–12. TX 64-Bit Completion with Data Transaction of Eight DWORD Waveform
1 2 3 4 5 6 7 8 9
clk
tx_req
Descriptor
tx_ack
Signals
tx_desc[127:0] CPLD
tx_dfr
tx_dv
tx_ws
tx_err
Figure B–13 shows the IP core transmitting a memory write of one DWORD.
1 2 3 4 5 6 7 8 9
clk
tx_req
Descriptor
tx_ack
Signals
tx_desc[127:0] MEMWR32
tx_dfr
tx_dv
tx_data[63:32]
Data
Signals tx_data[31:0] DW0
tx_ws
tx_err
Figure B–14. TX State Machine Is Busy with the Preceding Transaction Layer Packet Waveform
1 2 3 4 5 6 7
clk
tx_req
Descriptor
Signals tx_ack
tx_desc[127:0] MEMWR64
tx_dfr
tx_dv
tx_data[63:32]
Data
Signals tx_data[31:0]
tx_ws
tx_err
Figure B–15 shows that the application layer must wait to receive an acknowledge
before write data can be transferred. Prior to the start of a transaction (for example,
tx_req being asserted), note that the tx_ws signal is set low for the ×1 and ×4
configurations and is set high for the ×8 configuration.
1 2 3 4 5 6 7 8 9
clk
tx_req
Descriptor tx_ack
Signals
tx_desc[127:0] MEMWR32
tx_dfr
tx_dv
tx_data[63:32]
Data tx_data[31:0] DW0
Signals
tx_ws
tx_err
Figure B–16 shows how the transaction layer extends the a data phase by asserting the
wait state signal.
Figure B–16. TX Transfer with Wait State Inserted for a Single DWORD Write
1 2 3 4 5 6 7
clk
tx_req
Descriptor
tx_ack
Signals
tx_desc[127:0] MEMWR32
tx_dfr
tx_dv
tx_data[63:32]
Data
Signals tx_data[31:0] DW0
tx_ws
tx_err
Figure B–17. TX Signal Activity When IP core Has Fewer than Maximum Potential Lanes Waveform
1 2 3 4 5 6 7 8 9 10 11 12 13
clk
tx_req
Descriptor
tx_ack
Signals
tx_desc[127:0] MEMWR32
tx_dfr
tx_dv
tx_data[63:32] DW 1 DW 3 DW 5 DW 7
Data
Signals tx_data[31:0] DW 0 DW 2 DW 4 DW 6
tx_ws
tx_err
In clock cycle 3, the IP core inserts a wait state because the memory write 64-bit
transaction layer packet request has a 4-DWORD header. In this case, tx_dv could
have been sent one clock cycle later.
clk
tx_req
Descriptor
tx_ack
Signals
tx_desc[127:0]
tx_dfr
tx_dv
tx_data[63:32] DW 0 DW 2 DW 4 DW 6
Data
tx_data[63:32] DW 1 DW 3 DW 5 DW 7
Signals
tx_ws
tx_err
In clock cycle five, the IP core asserts tx_ws a second time to throttle the flow of data
because priority was not given immediately to this virtual channel. Priority was given
to either a pending data link layer packet, a configuration completion, or another
virtual channel. The tx_err is not available in the ×8 IP core.
clk
tx_req
Descriptor
Signals tx_ack
tx_desc[127:0] MEMWR64
tx_dfr
tx_dv
tx_data[63:32] DW 1 DW 3 DW 5 DW 7
Data
Signals tx_data[31:0] DW 0 DW 2 DW 4 DW 6
tx_ws
tx_err
In clock cycle five, the second transaction layer packet is not immediately
acknowledged because of additional overhead associated with a 64-bit address, such
as a separate number and an LCRC. This situation leads to an extra clock cycle
between two consecutive transaction layer packets.
1 2 3 4 5 6 7 8 9 10 11 12
clk
tx_req
Descriptor
tx_ack
Signals
tx_desc[127:0] MEMRD64 MEMWR64
tx_dfr
tx_dv
tx_data[63:32] DW 1 DW 3 DW 5 DW 7
Data
Signals tx_data[31:0] DW 0 DW 2 DW 4 DW 6
tx_ws
tx_err
In clock cycles 5, 7, 9, and 11, the IP core inserts wait states to throttle the flow of
transmission.
Figure B–21. TX Multiple Wait States that Throttle Data Transmission Waveform
1 2 3 4 5 6 7 8 9 10 11 12 13 14
clk
tx_req
Descriptor
Signals tx_ack
tx_desc[127:0] MEMWR64
tx_dfr
tx_dv
tx_data[63:32] DW 1 DW 3 DW 5 DW 7 DW 9 DW 11
Data
Signals tx_data[31:0] DW 0 DW 2 DW 4 DW 6 DW 8 DW 10
tx_ws
tx_err
1 2 3 4 5 6 7 8 9 10 11 12 13 14
clk
tx_req
Descriptor
Signals tx_ack
tx_desc[127:0] MEMWR64
tx_dfr
tx_dv
tx_data[63:32] DW 1 DW 3 DW 5 DW 7 DW 9 DW B DW D DW F
Data
Signals tx_data[63:32] DW 0 DW 2 DW 4 DW 6 DW 8 DW A DW C DW E
tx_ws
tx_err
Therefore for endpoint variations, there could be some rare TLP completion sequences which could lead to a RX Buffer overflow. For example,
a sequence of 3 dword completion TLP using a qword aligned address would require 6 dwords of elapsed time to be written in the RX buffer:
3 dwords for the TLP header, 1 dword for the TLP data, plus 2 dwords of PHY MAC and data link layer overhead. When using the Avalon-ST
128-bit interface, reading this TLP from the RX Buffer requires 8 dwords of elapsed time.Therefore, theoretically, if such completion TLPs are
sent back-to-back, without any gap introduced by DLLP, update FC or a skip character, the RX Buffer will overflow because the read frequency
does not offset the write frequency. This is certainly an extreme case and in practicalities such a sequence has a very low probably of occurring.
However, to ensure that the RX buffer never overflows with completion TLPs, Altera recommended building a circuit in the application layer
which arbitrates the upstream memory read request TLP based on the available space in the completion buffer.
1 The ICM is provided for backward compatibility only. New designs using the
Avalon-ST interface should use the Avalon-ST PCI Express MegaCore instead.
Stratix IV, Stratix III, Stratix II, Stratix II GX, Cyclone II,
Cyclone III, Arria GX, or Stratix GX Device
Endpoint
<variation_name>_icm
PCI Express MegaCore
Function - Desc/Data IF
ICM
Chaining DMA/
User Application
ICM Features
The ICM provides the following features:
■ A fully registered boundary to the application to support design partitioning for
incremental compilation
■ An Avalon-ST protocol interface for the application at the RX, TX, and interrupt
(MSI) interfaces for designs using the Avalon-ST interface
■ Optional filters and ACK’s for PCI Express message packets received from the
transaction layer
■ Maintains packet ordering between the TX and MSI Avalon-ST interfaces
■ TX bypassing of non-posted PCI Express packets for deadlock prevention
<variation_name>_icm Partition
When you generate a PCI Express IP core, the MegaWizard produces module,
<variation_name>_icm in the subdirectory
<variation_name>_examples\common\incremental_compile_module, as a
wrapper file that contains the IP core and the ICM module. (Refer to Figure B–23.)
Your application connects to this wrapper file. The wrapper interface resembles the
PCI Express IP core interface, but replaces it with an Avalon-ST interface. (Refer to
Table B–12.)
1 The wrapper interface omits some signals from the IP core to maximize circuit
optimization across the partition boundary. However, all of the IP core signals are still
available on the IP core instance and can be wired to the wrapper interface by editing
the <variation_name>_icm file as required.
By setting this wrapper module as a design partition, you can preserve timing of the
IP core using the incremental synthesis flow.
Table B–12 describes the <variation_name>_icm interfaces.
tx_req0
ICM Tx tx_desc0
Reg tx_dfr0
tx_stream_ready0 tx_dv0
tx_data0
tx_stream_valid0
tx_err0
tx_stream_data0 Avalon-ST Tx Conversion
cpl_err0
cpl_pending0
tx_stream_mask0 NP Bypass tx_ack0
F
tx_stream_mask0 I tx_ws0
cpl_pending_icm
cpl_err_icm Instantiate cpl_pending
pex_msi_num_icm Reg one per cpl_err
app_int_sts_icm virtual pex_msi_num
app_int_sts_ack_icm channel app_int_sts
cfg_busdev_icm ICM Sideband app_int_sts_ack
cfg_devcsr_icm cfg_msicsr
cfg_linkcsr_icm
cfg_tcvcmap_icm cfg_busdev
cfg_msicsr_icm cfg_devcsr
test_out_icm cfg_linkcsr
cfg_tcvcmap
test_out
RX Datapath
The RX datapath contains the RX boundary registers (for incremental compile) and a
bridge to transport data from the PCI Express IP core interface to the Avalon-ST
interface. The bridge autonomously acks all packets received from the PCI Express IP
core. For simplicity, the rx_abort and rx_retry features of the IP core are not used,
and RX_mask is loosely supported. (Refer to Table B–14 on page B–32 for further
details.) The RX datapath also provides an optional message-dropping feature that is
enabled by default. The feature acknowledges PCI Express message packets from the
PCI Express IP core, but does not pass them to the user application. The user can
optionally allow messages to pass to the application by setting the DROP_MESSAGE
parameter in altpcierd_icm_rxbridge.v to 1’b0. The latency through the ICM RX
datapath is approximately four clock cycles.
TX Datapath
The TX datapath contains the TX boundary registers (for incremental compile) and a
bridge to transport data from the Avalon-ST interface to the PCI Express IP core
interface. A data FIFO buffers the Avalon-ST data from the user application until the
PCI Express IP core accepts it. The TX datapath also implements an NPBypass
function for deadlock prevention. When the PCI Express IP core runs out of
non-posted (NP) credits, the ICM allows completions and posted requests to bypass
NP requests until credits become available. The ICM handles any NP requests
pending in the ICM when credits run out and asserts the tx_mask signal to the user
application to indicate that it should stop sending NP requests. The latency through
the ICM TX datapath is approximately five clock cycles.
MSI Datapath
The MSI datapath contains the MSI boundary registers (for incremental compile) and
a bridge to transport data from the Avalon-ST interface to the PCI Express IP core
interface. The ICM maintains packet ordering between the TX and MSI datapaths. In
this design example, the MSI interface supports low-bandwidth MSI requests. For
example, not more than one MSI request can coincide with a single TX packet. The
MSI interface assumes that the MSI function in the PCI Express IP core is enabled. For
other applications, you may need to modify this module to include internal buffering,
MSI-throttling at the application, and so on.
Sideband Datapath
The sideband interface contains boundary registers for non-timing critical signals
such as configuration signals. (Refer to Table B–17 on page B–36 for details.)
ICM Files
This section lists and briefly describes the ICM files. The PCI Express MegaWizard
generates all these ICM files placing them in the
<variation name>_examples\common\incremental_compile_module folder.
When using the Quartus II software, include the files listed in Table B–13 in your
design:
altpcierd_icm_txbridge.v or This module implements the bridging required to connect the application’s
altpcierd_icm_txbridge.vhd Avalon-ST TX interface to the IP core’s TX interface.
altpcierd_icm_tx_pktordering.v or This module contains the NP-Bypass function. It instantiates the npbypass FIFO
altpcierd_icm_tx_pktordering.vhd and altpcierd_icm_npbypassctl.
This module controls whether a Non-Posted PCI Express request is forwarded
altpcierd_icm_npbypassctl.v or to the IP core or held in a bypass FIFO until the IP core has enough credits to
altpcierd_icm_npbypassctl.vhd accept it. Arbitration is based on the available non-posted header and data
credits indicated by the IP core.
altpcierd_icm_sideband.v or This module implements incremental-compile boundary registers for the
altpcierd_icm_sideband.vhd non-timing critical sideband signals to and from the IP core.
altpcierd_icm_fifo.v or
This is a MegaWizard-generated RAM-based FIFO.
altpcierd_icm_fifo.vhd
altpcierd_icm_fifo_lkahd.v or
This is a MegaWizard-generated RAM-based look-ahead FIFO.
altpcierd_icm_fifo_lkahd.vhd
altpcierd_icm_defines.v or This file contains global define’s used by the Verilog ICM modules.
altpcierd_icm_defines.vhd
RX Ports
Table B–14 describes the application-side ICM RX signals.
Interface Signals
Clocks rx_st_data into the application. The application must accept the
rx_st_valid0
data when rx_st_valid is high.
Byte-enable bits. These are valid on the data (3rd to last) cycles of the
[81:74] Byte Enable bits
packet.
[73] rx_sop_flag When asserted, indicates that this is the first cycle of the packet.
[72] rx_eop_flag When asserted, indicates that this is the last cycle of the packet.
[71:64] Bar bits BAR bits. These are valid on the 2nd cycle of the packet.
rx_st_data0 Multiplexed rx_desc/rx_data bus
1st cycle – rx_desc0[127:64]
clk
rx_stream_ready0 source
ICM_response_time throttles
data
rx_stream_valid0
rx_stream_data0
rx_desc0
desc_hi desc_lo data0 data1 last data
rx_data0
rx_sop_flag
rx_eop_flag
TX Ports
Table B–15 describes the application-side TX signals.
1 Altera recommends disabling the OpenCore Plus feature when compiling with this
flow. (On the Assignments menu, click Settings. Under Compilation Process
Settings, click More Settings. Under Disable OpenCore Plus hardware evaluation
select On.)
1 Information for the partition netlist is saved in the db folder. Do not delete this folder.
clk
tx_stream_ready0 allowed
response time source
0 - 3 clocks throttles
data
tx_stream_valid0
tx_stream_data0
tx_desc0
desc_hi desc_lo data0 data1 last data
tx_data0
tx_sop_flag
tx_eop_flag
clk
msi_stream_ready0
allowed
response time
0 - 3 clocks
msi_stream_valid0
Sideband Interface
Table B–17 describes the application-side sideband signals.
The following sections show the resource utilization for the soft IP implementation of
the PCI Express IP Core. It includes performance and resource utilization numbers for
the following application interfaces:
■ Avalon-ST Interface
■ Avalon-MM Interface
■ Avalon-MM Interface
Avalon-ST Interface
This section provides performance and resource utilization for the soft IP
implementation of following device families:
■ Arria GX Devices
■ Arria II GX Devices
■ Stratix II GX Devices
■ Stratix III Family
■ Stratix IV Family
Arria GX Devices
Table C–1 shows the typical expected performance and resource utilization of
Arria GX (EP1AGX60DF780C6) devices for different parameters with a maximum
payload of 256 bytes using the Quartus II software, version 10.1.
Arria II GX Devices
Table C–2 shows the typical expected performance and resource utilization of
Arria II GX (EP2AGX125EF35C4) devices for different parameters with a maximum
payload of 256 bytes using the Quartus II software, version 10.1.
Stratix II GX Devices
Table C–3 shows the typical expected performance and resource utilization of
Stratix II and Stratix II GX (EP2SGX130GF1508C3) devices for a maximum payload of
256 bytes for devices with different parameters, using the Quartus II software, version
10.1.
Table C–3. Performance and Resource Utilization, Avalon-ST Interface - Stratix II and
Stratix II GX Devices
Parameters Size
Table C–4. Performance and Resource Utilization, Avalon-ST Interface - Stratix III Family
Parameters Size
Stratix IV Family
Table C–5 shows the typical expected performance and resource utilization of
Stratix IV GX (EP3SGX290FH29C2X) devices for a maximum payload of 256 bytes
with different parameters, using the Quartus II software, version 10.1.
Table C–5. Performance and Resource Utilization, Avalon-ST Interface - Stratix IV Family
Parameters Size
Avalon-MM Interface
This section tabulates the typical expected performance and resource utilization for
the soft IP implementation for various parameters when using the SOPC Builder
design flow to create a design with an Avalon-MM interface and the following
parameter settings:
■ On the Buffer Setup page, for ×1, ×4 configurations:
■ Maximum payload size set to 256 Bytes unless specified otherwise
■ Desired performance for received requests and Desired performance for
completions set to Medium unless specified otherwise
■ 16 Tags
Size and performance tables appear here for the following device families:
■ Arria GX Devices
■ Cyclone III Family
■ Stratix II GX Devices
■ Stratix III Family
■ Stratix IV Family
Arria GX Devices
Table C–6 shows the typical expected performance and resource utilization of
Arria GX (EP1AGX60CF780C6) devices for a maximum payload of 256 bytes with
different parameters, using the Quartus II software, version 10.1.
Table C–6. Performance and Resource Utilization, Avalon-MM Interface - Arria GX Devices
(Note 1)
Parameters Size
It may be difficult to achieve 125 MHz frequency in complex designs that target the
Arria GX device. Altera recommends the following strategies to achieve timing:
■ Use separate clock domains for the Avalon-MM and PCI Express modules
■ Set the Quartus II Analysis & Synthesis Settings Optimization Technique to
Speed
■ Add non-bursting pipeline bridges to the Avalon-MM master ports
■ Use Quartus II seed sweeping methodology
Table C–7. Performance and Resource Utilization, Avalon-MM Interface - Cyclone III Family
Parameters Size
Table C–7. Performance and Resource Utilization, Avalon-MM Interface - Cyclone III Family
Parameters Size
Stratix II GX Devices
Table C–8 shows the typical expected performance and resource utilization of
Stratix II and Stratix II GX (EP2SGX130GF1508C3) devices for a maximum payload of
256 bytes with different parameters, using the Quartus II software, version 10.1.
Table C–8. Performance and Resource Utilization, Avalon-MM Interface - Stratix II GX Devices
Parameters Size
Table C–9. Performance and Resource Utilization, Avalon-MM Interface - Stratix III Family
Parameters Size
Stratix IV Family
Table C–10 shows the typical expected performance and resource utilization of
Stratix IV (EP4SGX230KF40C2) devices for a maximum payload of 256 bytes with
different parameters, using the Quartus II software, version 10.1.
Table C–10. Performance and Resource Utilization, Avalon-MM Interface - Stratix IV Family
Parameters Size
Descriptor/Data Interface
This section tabulates the typical expected performance and resource utilization of the
listed device families for various parameters when using the MegaWizard Plug-In
Manager design flow using the descriptor/data interface, with the OpenCore Plus
evaluation feature disabled and the following parameter settings:
■ On the Buffer Setup page, for ×1, ×4, and ×8 configurations:
■ Maximum payload size set to 256 Bytes unless specified otherwise.
■ Desired performance for received requests and Desired performance for
completions both set to Medium unless specified otherwise.
■ On the Capabilities page, the number of Tags supported set to 16 for all
configurations unless specified otherwise.
Size and performance tables appear here for the following device families:
■ Arria GX Devices
■ Cyclone III Family
■ Stratix II GX Devices
■ Stratix III Family
■ Stratix IV Family
Arria GX Devices
Table C–11 shows the typical expected performance and resource utilization of
Arria GX (EP1AGX60DF780C6) devices for a maximum payload of 256 bytes with
different parameters, using the Quartus II software, version 10.1.
Table C–11. Performance and Resource Utilization, Descriptor/Data Interface - Arria GX Devices
Parameters Size
Table C–12. Performance and Resource Utilization, Descriptor/Data Interface - Cyclone III
Family
Parameters Size
Stratix II GX Devices
Table C–13 shows the typical expected performance and resource utilization of the
Stratix II and Stratix II GX (EP2SGX130GF1508C3) devices for a maximum payload of
256 bytes with different parameters, using the Quartus II software, version 10.1.
Table C–13. Performance and Resource Utilization, Descriptor/Data Interface - Stratix II and
Stratix II GX Devices
Parameters Size
Table C–14. Performance and Resource Utilization, Descriptor/Data Interface - Stratix III Family
Parameters Size
Stratix IV Family
Table C–15 shows the typical expected performance and resource utilization of
Stratix IV (EP4SGX290FH29C2X) devices for a maximum payload of 256 bytes with
different parameters, using the Quartus II software, version 10.1.
Table C–15. Performance and Resource Utilization, Descriptor/Data Interface - Stratix IV Family
Parameters Size
This chapter provides additional information about the document and Altera.
Revision History
The table below displays the revision history for the chapters in this User Guide.
249078
■ Added support for Arria GX device family.
May 2007 7.1 ■ Added SOPC Builder support for ×1 and ×4.
■ Added Incremental Compile Module (ICM).
December ■ Maintenance release; updated version numbers.
7.0
2006
2.1.0 ■ Minor format changes throughout user guide.
April 2006
rev 2
■ Added support for Arria GX device family.
May 2007 7.1 ■ Added SOPC Builder support for ×1 and ×4.
■ Added Incremental Compile Module (ICM).
December ■ Added support for Cyclone III device family.
7.0
2006
December ■ Added support Stratix III device family.
6.1
2006 ■ Updated version and performance information.
■ Rearranged content.
April 2006 2.1.0
■ Updated performance information.
■ Added ×8 support.
October 2005 2.0.0 ■ Added device support for Stratix® II GX and Cyclone® II.
■ Updated performance information.
Typographic Conventions
The following table shows the typographic conventions this document uses.
c
A caution calls attention to a condition or possible situation that can damage or
destroy the product or your work.
w
A warning calls attention to a condition or possible situation that can cause you
injury.
The envelope links to the Email Subscription Management Center page of the Altera
website, where you can sign up to receive update notifications for Altera documents.