Pcie 3.0
Pcie 3.0
Aim: To develop an RTL design for physical coding sub-layer present in the physical layer of
PCIe 3.0 2nd Generation.
Software: Questasim 10.7.
Overview of Physical Layer:
Physical layer of PCI Express consists of three sublayers as shown in the figure.
1. MAC (Media Access Control)
2. PCS (Physical Coding Sublayer)
3. PMA (Physical Media Attachment Sublayer).
MAC comprises of State Machines for Link Training and Status State Machine (LTSSM) and
lane–lane de-skew.
PCS comprises of 8b/10b encoder/decoder, Rx detection, and elastic buffer.
PMA comprises of Analog buffers, SEDRES, 10bit interface.
Transmitter:
Scrambler:
• In telecommunications, a scrambler is a device that transposes, or inverts signals or
otherwise encodes a message at the sender's side to make the message unintelligible at a
receiver not equipped with an appropriately set descrambling device.
• Whereas encryption usually refers to operations carried out in the digital domain,
scrambling usually refers to operations carried out in the analog domain.
• For energy dispersal on the carrier, reducing inter-carrier signal interference. It eliminates
the dependence of a signal's power spectrum upon the actual transmitted data, making it
more dispersed to meet maximum power spectral density requirements.
• Because if the power is concentrated in a narrow frequency band, it can interfere with
adjacent channels due to the intermodulation also known as cross-modulation caused by
non-linearities of the receiving tract.
• An algorithm that converts an input string into a seemingly random output string of the
same length (e.g., by pseudo-randomly selecting bits to invert), thus avoiding long
sequences of bits of the same value; in this context, a randomizer is also referred to as a
scrambler.
• An analog or digital source of unpredictable (i.e., high entropy), unbiased, and usually
independent (i.e., random) output bits. A "truly" random generator may be used to feed a
(more practical) deterministic pseudo-random random number generator, which extends
the random seed value.
• Scramblers are essential components of physical layer. They are usually defined based on
linear-feedback shift registers (LFSRs) due to their good statistical properties and ease of
implementation in hardware.
• PCI Express employs a technique called data scrambling to reduce the possibility of
electrical resonances on the link.
• PCI Express specification defines a scrambling/descrambling algorithm that is
implemented using a linear feedback shift register.
• PCI Express accomplishes scrambling or descrambling by performing a serial XOR
operation to the data with the seed output of a Linear Feedback Shift Register that is
synchronized between PCI Express devices.
• In the current work for scrambler the polynomial used is 𝑥 5 + 𝑥 3 + 1.
Note: From the polynomial expression, the powers 3 & 5 represent the XOR gate taken at the
output of 3rd and 5th D flip flop.
8b/10b Encoder:
• The scrambled 8-bit data is given to the encoder.
• Each Lane of a device's transmitter implements an 8-bit to 10-bit Encoder that encodes 8-
bit data or control characters into 10-bit symbols. The coding scheme was invented by IBM
in 1982.
Purpose of Encoding a Character Stream:
• The primary purpose of this scheme is to embed a clock into the serial bit stream
transmitted onall Lanes. No clock is therefore transmitted along with the serial data bit
stream.
• This eliminates the need for a high frequency 2.5GHz clock signal on the Link which
would generate significant EMI noise and would be a challenge to route on a standard
FR4 board.
• Link wire routing between two ports is much easier given that there is no clock to route,
removing the need to match clock length to Lane signal trace lengths. Two devices are
connected by simply wiring their Lanes together.
• Embedded Clock. Creates sufficient 0-to-1 and 1-to-0 transition density (i.e.,
signal changes) to facilitate re-creation of the receive clock on the receiver
end using a PLL (by guaranteeing a limited run length of consecutive ones or
zeros). The recovered receive clock is used to clock inbound 10-bit symbols
into an elastic buffer. The figure illustrates the example case wherein 00h is
converted to 1101000110b, where an 8- bit character with no transitions has
5 transitions when converted to a 10b symbol. These transitions keep the
receiver PLL synchronized to the transmit circuit clock:
i) Limited 'run length' means that the encoding scheme ensures the signal line
will not remain in a high or low state for an extended period. The run length
does not exceed five consecutive 1s or 0s.
ii) 1s and 0s are clocked out on the rising edge of the transmit clock. At the
receiver, a PLL can recreate the clock by syncing to the leading edges of 1s
and 0s.
iii) Limited run length ensures minimum frequency drift in the receiver's PLL
relative to the local clock in the transmit circuit.
The disadvantage of 8b/10b encoding scheme is that due to the expansion of each 8-
bit character into a 10-bit symbol prior to transmission, the actual transmission
performance is degraded by 25% or said another way, the transmission overhead is
increased by 25%.
• Each 10-bit symbol is subdivided into two sub-blocks: the first is six bits wide
and the second is four bits wide.
8b/10b conversion lookup tables refer to all 8-bit characters using a special notation
(represented by Dxx.y for Data characters and Kxx.y. for Control characters). Figure
illustrates the notation equivalent for any 8-bit D or K character. Below are the
steps to convert 8-bit number to its notation equivalent.
Disparity:
Character disparity refers to the difference between the number of 1s and 0s in a 10-bit symbol:
• When a symbol has more 0s than 1s, the symbol has negative (–) disparity
(e.g.,0101000101b).
• When a symbol has more 1s than 0s, the symbol has positive (+) disparity
(e.g.,1001101110b).
• When a symbol has an equal number of 1s and 0s, the symbol has neutral
disparity (e.g.,0110100101b).
• Each 10-bit symbol contains one of the following numbers of ones and zeros
(notnecessarily contiguous):
o Four 0s and six 1s (+ disparity).
o Six 0s and four 1s (– disparity).
o Five 0s and five 1s (neutral disparity).
• Its current state indicates the balance of 1s and 0s transmitted since link initialization.
• The CRD's initial state (before any characters are transmitted) can be + or –.
• The CRD's current state can be either positive (if more 1s than 0s have been
transmitted) or negative (if more 0s than 1s).
• Each character is converted via a table lookup with the current state of the
CRD factored in.
• As each new character is encoded, the CRD either remains the same (if the
newly generated 10-bit character has neutral disparity) or it flips to the
opposite polarity (if the newly generated character has + or – disparity).
Refer to the figure below. The encode is accomplished by performing two table
lookups in parallel.
• First Table Lookup: Three elements are submitted to a 5-bit to 6-bit table for
a lookup (see Table 4-1 and Table 4-2):
- The table lookup yields the upper 6-bits of the 10-bit symbol (bits abcdei).
• Second Table Lookup: Three elements are submitted to a 3-bit to 4-bit table
for a lookup (see Table 4-3 and Table 4-4):
- The table lookup yields the lower 4-bits of the 10-bit symbol (bits fghj).
The 8b/10b encoder computes a new CRD based on the resultant 10-bit symbol and supplies this
CRD for the 8b/10b encode of the next character. If the resultant 10-bit symbol is neutral (i.e., it
has an equal number of 1s and 0s), the polarity of the CRD remains unchanged. If the resultant 10-
bit symbol is + or –, the CRD flips to its opposite state. It is an error if the CRD is currently + or
– and the next 10-bit symbol produced has the same polarity as the CRD (unless the next symbol
has neutral disparity, in which case the CRD remains the same).
The 8b/10b encoder feeds a Parallel-to-Serial converter which clocks 10-bit symbols out in the bit
order 'abcdeifghj' (shown in above figure).
The Lookup Tables:
The following four tables define the table lookup for the two sub-blocks of 8-
bit Data andControl characters.
Data Byte Name Unencoded Bits EDCBA Current RD – abcdei Current RD + abcdei
Data Byte Name Unencoded Bits EDCBA Current RD – abcdei Current RD + abcdei
Data Byte Name Unencoded Bits EDCBA Current RD – abcdei Current RD + abcdei
Data Byte Name Unencoded Bits HGF Current RD - fghj Current RD + fghj
Data Byte Name Unencoded Bits HGF Current RD – fghj Current RD + fghj
Serializer:
• A serializer/de-serializer (SerDes) circuit converts parallel data—in other words, multiple
streams of data—into a serial (one bit) stream of data that is transmitted over a high-speed
connection, such as LVDS, to a receiver that converts the serial stream back to the original,
parallel data. A clock system puts parallel into a serial by taking bits from the multiple
streams and alternating them on up and down parts of the signals.
• Both the serializer and de-serializer are functional blocks on the transmitting and receiving
chips. The two functional blocks are Parallel In Serial Out (PISO) and the Serial In Parallel
Out (SIPO).
• LVDS (low-voltage differential signaling) has two wires for one bit of data.
• SerDes has emerged as the primary solution in chips where there is a need for fast data
movement and limited I/O, but this technology is becoming significantly more challenging
to work with as speeds continue to rise to offset the massive increase in data.
• Much of the demand for high-speed SerDes comes from large data centers, where the
current state-of-the-art throughput is 100 Gbps. Standards from IEEE and the Optical
Internetworking Forum are defining higher and higher data rates on a single lane, which
allow data to be aggregated to much larger systems. Then, to move SerDes technology to
the next level of performance, one of the major advancements is the adoption of PAM4
signaling above 28Gbps.
• The serializer converts the 10-bit parallel data obtained from encoder into serial form.
• This serial data is sent via a link to the receiver side.
Fig: Serializer.
Receiver:
De-serializer:
• The basic SerDes function has two blocks: the Parallel in Serial Out (PISO) block or
parallel-to-serial converter, and the Serial In Parallel Out (SIPO) block or serial-to-parallel
converter. Each end of a communication link has a SerDes with these two fundamental
blocks; the PISO block is used for transmission and the SIPO block is used for reception.
• Embedded clock — This serializes the data and the clock into a single stream. One clock
cycle is transmitted first followed by the actual data, creating a periodic rising edge at the
beginning of the data stream.
• 8b/10b SerDes — This maps the data to a 10-bit code right before serializing. The de-
serializer makes use of the reference clock to monitor the recovered clock from the bit
stream.
• Bit interleaved — This multiplexes multiple slower serial data streams into faster
streams, whereas the receiver demultiplexes the faster streams back into multiple slower
streams.
• The serial data that is sent from the transmitter is sent to receiver’s first block.
• De-serializer converts the 10-bit serial data into parallel form and sends it to the buffer.
Fig. De-serializer.
Buffer:
• Buffer can be used as a delay element to overcome synchronization problem.
• Adding a buffer reduces the wire length, which reduces the net capacitance, and hence
delay from source to destination decreases.
• The buffer is used as a delay element to delay the arrival of parallel data from de-serializer
before giving it to the decoder.
• The parallel data is given to the decoder at once from the buffer after a delay of 10 clock
cycles.
8b/10b Decoder:
Each receiver Lane incorporates a 10b/8b Decoder which is fed from the buffer. The 8b/10b
Decoder uses two lookup tables (the D and K tables) to decode the 10-bit symbol stream into 8-
bit Data (D) or Control (K) characters plus the D/K# signal. The state of the D/K# signal indicates
that the received symbol is:
• A Data (D) character if a match for the received symbol is discovered in the D table.
D/K# is driven High.
• A Control (K) character if a match for the received symbol is discovered in the K table.
D/K# is driven Low.
Disparity Calculator:
The decoder determines the initial disparity value based on the disparity of the first symbol
received. After the first symbol, once the disparity is initialized in the decoder, it expects the
calculated disparity for each subsequent symbol received to toggle between + and - unless the
symbol received has neutral disparity in which case the disparity remains the same value.
The error detection logic of the 8b/10b Decoder detects errors in the received symbol
stream. It should be noted that it doesn't catch all possible transmission errors. The
specification requires that these errors be detected and reported as a Receiver Error
indication to the Data Link Layer. The two types of errors detected are:
Code violation errors (i.e., a 10-bit symbol could not be decoded into a valid
8-bit Data orControl character).
Disparity errors.
There is no automatic hardware error correction for these errors at the Physical Layer.
Code Violations:
Disparity Errors:
• A character that encodes into a 10-bit symbol with disparity other than neutral is encoded
into a 10-bit symbol with polarity opposite to that of the CRD. If the next symbol does
not have neutraldisparity and its disparity is the same as the CRD, a disparity error is
detected.
• If two bits in a symbol flip in error, the error may not be detected (and the symbol may
decode into a valid 8-bit character). The error goes undetected at the Physical Layer.
When the Physical Layer logic detects an error, it sends a Receiver Error indication
to the Data Link Layer. The specification lists a few of these errors, but it is far from
being an exhaustive error list. It is up to the designer to determine what Physical Layer
errors to detect and report. Some of these errors include:
• 8b/10b Decoder-related disparity errors
• 8b/10b Decoder-related code violation errors
• Elastic Buffer overflow or underflow caused by loss of symbol(s)
• The packet received is not consistent with the packet format rules
De-scrambler:
• It is used to reverse the scrambled code generated by scrambler.
• The de-scrambler circuit is same as scrambler.
• The 8-bit decoded input is given to de-scrambler to get original data which was initially
given to the scrambler.
Fig. De-scrambler.