0% found this document useful (0 votes)
48 views

USB3.1_GP_Documentation

This document is a graduation project report from Ain Shams University focusing on the design and verification of a USB SuperSpeed (USB 3.0) Serializer/Deserializer (SerDes) system. It details the architecture, components, and verification methodologies used to ensure functionality, including the use of System Verilog and Universal Verification Methodology (UVM). The report covers various aspects of the USB PHY, including its physical layer, coding sublayer, and media attachment synthesis, along with extensive testing strategies and results.

Uploaded by

Mohamed Tarek
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

USB3.1_GP_Documentation

This document is a graduation project report from Ain Shams University focusing on the design and verification of a USB SuperSpeed (USB 3.0) Serializer/Deserializer (SerDes) system. It details the architecture, components, and verification methodologies used to ensure functionality, including the use of System Verilog and Universal Verification Methodology (UVM). The report covers various aspects of the USB PHY, including its physical layer, coding sublayer, and media attachment synthesis, along with extensive testing strategies and results.

Uploaded by

Mohamed Tarek
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 234

Ain Shams University,

Faculty of Engineering,
Computer and Systems Engineering

Digital Design and Verification


For
SerDes System

Team members
Hassan Khaled Hassan
Abdelmagid Mohamed Abdelmagid
Alaa Salah Abd El-Fattah

Under Supervision of: Sponsored By:


Dr. Ahmed M. Zaki

Associate professor, Ain-shams University, Egypt

(https://ic-pedia.com)
II
Contents
GitHub Repo............................................................................................................................................... IX

ABSTRACT ................................................................................................................................................. X

Introduction................................................................................................................................................. XI

Universal Serial Bus (USB) ........................................................................................................................... 1

USB PHY with PIPE Schematic .................................................................................................................... 2

Physical Interface PCIE and USB(PIPE) ...................................................................................................... 3

Physical Layer (PHY) ................................................................................................................................... 4

PHY Synthesis .......................................................................................................................................... 5

Physical Coding SubLayer (PCS) Synthesis ............................................................................................. 6

PCS TX: .................................................................................................................................................... 7

PCS RX .................................................................................................................................................... 7

Physical Media Attachment (PMA) Synthesis ............................................................................................ 8

PMA TX .................................................................................................................................................... 9

PMA RX Synthesis .................................................................................................................................. 10

Common Block ........................................................................................................................................... 11

Transmitter Blocks ...................................................................................................................................... 14

PCS TX ................................................................................................................................................... 15

Gasket TX: .......................................................................................................................................... 15

Line 8b/10b Encoding .......................................................................................................................... 16

PMA TX .................................................................................................................................................. 22

Parallel to serial: .................................................................................................................................. 22

Receiver Blocks .......................................................................................................................................... 23

PCS RX .................................................................................................................................................. 24

Elastic Buffer: ...................................................................................................................................... 24

Receiver Status: .................................................................................................................................. 30

Comma Detection: ............................................................................................................................... 31

PMA RX .................................................................................................................................................. 32

Serial To parallel ................................................................................................................................. 32

Decoder............................................................................................................................................... 33

III
Gasket Rx:........................................................................................................................................... 34

CDR .................................................................................................................................................... 35

Phase Mixer ........................................................................................................................................ 37

Simulink model for Phase Mixer .......................................................................................................... 38

PHY Verification.......................................................................................................................................... 42

Test Strategy .......................................................................................................................................... 46

Waveform ............................................................................................................................................... 52

 Generated Clocks periods based on the Data width Used ........................................................... 52

 Gasket block dividing Data based on the Bus width .................................................................... 53

 Encoder block Encoded Data based on the disparity .................................................................. 54

 PMA_TX P2S serializes the symbolled data................................................................................ 55

 PMA_RX S2P deserializes the symbolled data ........................................................................... 55

 Elastic Buffer Storing the Data .................................................................................................. 56

 Decoding the symbol encoded data into the original Data ........................................................... 56

 Rx_Gasket Collecting the data to Rx output ................................................................................ 57

 Elastic Buffer Underflow .............................................................................................................. 57

 Elastic Buffer Overflow ................................................................................................................ 58

 Elastic Buffer (Threshold Monitor) Add Request – Delete Requests ............................................ 58

 Decoding Error ............................................................................................................................ 59

 Decoder Disparity Error ............................................................................................................... 59

 Decoding Error - Rx_status. ........................................................................................................ 60

 Disparity Error - Rx_status. ......................................................................................................... 60

 Underflow- Rx_status. ................................................................................................................. 61

 Data is OK- Rx_status. ................................................................................................................ 61

CLK Periods ............................................................................................................................................ 62

TX Verification......................................................................................................................................... 64

TX Test Strategy ................................................................................................................................. 64

Features: ............................................................................................................................................. 65

CDR (Clock and Data Recovery) ................................................................................................................ 77

Typical Receiver and Analog CDR .......................................................................................................... 78

IV
Digital Implementation of CDR ................................................................................................................ 81

Analogy to analog implementation ....................................................................................................... 82

Digital PLL Architecture ....................................................................................................................... 83

Phase Detectors ......................................................................................................................................... 86

Types of Phase Detectors ....................................................................................................................... 87

Hoggie Phase Detector ....................................................................................................................... 87

Bang-Bang Phase Detector ................................................................................................................. 90

Digital Loop Filter........................................................................................................................................ 95

Phase Interpolator: ................................................................................................................................... 102

PI using Direct Programming Interface (DPI) ..................................................................................... 110

PI Modelling....................................................................................................................................... 111

Clock and Data Recovery Integration ....................................................................................................... 115

Channel .................................................................................................................................................... 118

Channel DPI ...................................................................................................................................... 124

FULL System Verification.......................................................................................................................... 125

Tx_Gasket_Env_test ............................................................................................................................. 128

Tx_PMA_Env_test ................................................................................................................................ 128

Rx_S2P_Env_test ................................................................................................................................. 129

Elastic_Buffer_ Env_test ....................................................................................................................... 129

Rx_gasket_ Env_test ............................................................................................................................ 130

UVM Register Abstraction Layer (RAL) ................................................................................................ 131

Encoder_Decoder_Env_test ................................................................................................................. 138

CDR System Verification ....................................................................................................................... 141

No Channel – No offset ..................................................................................................................... 141

No Channel – 300PPm offset ............................................................................................................ 142

Channel Attenuation 10dB ................................................................................................................. 144

200 PPM phase difference with channel ............................................................................................ 146

500 PPM phase difference with channel ............................................................................................ 148

800 PPM phase difference with channel ............................................................................................ 150

1000 PPM phase difference with channel .......................................................................................... 152

V
Spread Spectrum Clocking (SSC): ........................................................................................................ 156

Separate Reference Clock With No Spreading (SRNS) - CDR Test .................................................. 162

Separate Reference Clock With Independent Spreading (SRIS) – CDR Test .................................... 165

Separate Reference Clock With Independent Spreading (SRIS) and PPM offset – CDR Test ........... 169

Regression Testing ............................................................................................................................... 171

Functional Coverage: ............................................................................................................................ 175

Code Coverage: .................................................................................................................................... 179

Assertion coverage: .............................................................................................................................. 186

Lint Checking With SPYGLASS: ........................................................................................................... 192

SPYGLASS Built-in Checking: .............................................................................................................. 200

SPYGLASS Debugging Capabilities: .................................................................................................... 202

SpyGlass Design Constraints (SGDC): ................................................................................................. 206

Summary for SpyGlass Tool: ................................................................................................................ 208

Formal Verification ................................................................................................................................ 209

Applied formal verification for digital loop filter: .................................................................................. 209

Applied formal verification for Bang-Bang phase detector (BBPD): .................................................... 215

Applied formal verification for PMA_RX: ............................................................................................ 219

Applied formal verification for PMA_TX: ............................................................................................ 220

Applied formal verification for PCS: ................................................................................................... 221

References ............................................................................................................................................... 222

VI
Table of Figures
Figure 1 Superspeed partitions ..................................................................................................................... 2

Figure 2 PIPE interface between PHY and MAC .......................................................................................... 3

Figure 3 Top view Schematic ........................................................................................................................ 4

Figure 4 PHY top .......................................................................................................................................... 5

Figure 5 PHY TX, PHY RX ........................................................................................................................... 5

Figure 6 PCS Top ........................................................................................................................................ 6

Figure 7 PCS TX, PCS RX ........................................................................................................................... 6

Figure 8 PCS TX .......................................................................................................................................... 7

Figure 9 PCS RX .......................................................................................................................................... 7

Figure 10 PMA Top ....................................................................................................................................... 8

Figure 11 PMA TX, PMA RX ......................................................................................................................... 8

Figure 12 PMA TX ........................................................................................................................................ 9

Figure 13 PMA RX ...................................................................................................................................... 10

Figure 14 Common Block ........................................................................................................................... 12

Figure 15 inside Common block ................................................................................................................. 13

Figure 16 Transmitter Blocks ...................................................................................................................... 14

Figure 17 Gasket Tx ................................................................................................................................... 15

Figure 18 example1 of encoding, decoding ................................................................................................ 16

Figure 19 MAC layer letter notation with encode, decode ........................................................................... 17

Figure 20 sample of encoded data and special character ........................................................................... 18

Figure 21 code mapping ............................................................................................................................. 19

Figure 22 line encoding .............................................................................................................................. 21

Figure 23 parallel to serial .......................................................................................................................... 22

Figure 24 Receiver Blocks .......................................................................................................................... 23

Figure 25 elastic buffer ............................................................................................................................... 26

Figure 26 elastic buffer synchronous unit.................................................................................................... 27

Figure 27 flow chart for SKP Handling ....................................................................................................... 27

Figure 28 elastic buffer synthesis................................................................................................................ 28

Figure 29 Reciever Status Synthesis .......................................................................................................... 30


VII
Figure 30 Comma detection Synthesis ....................................................................................................... 31

Figure 31 Phase Mixer models ................................................................................................................... 38

Figure 32 UVM Structure ............................................................................................................................ 42

Figure 33 UVM Phases............................................................................................................................... 43

Figure 34 Width = 32 bit clock generated................................................................................................... 52

Figure 35 Width = 16 bit clock generated................................................................................................... 52

Figure 36 Width = 8 bit clock generated..................................................................................................... 53

Figure 37 balance between 0's and 1's ....................................................................................................... 54

VIII
GitHub Repo
https://github.com/HassanKhaled11/SerDes_GP-ICpedia

IX
ABSTRACT
This document is the graduation project report prepared by senior year’s students in Computer
and System department of Faculty of Engineering Ain Shams to discuss the USB (universal
serial bus) Superspeed also known as USB 3.0, it introduces significantly higher data transfer
rates and enhanced capabilities compared to its predecessors. The USB SuperSpeed standard
employs advanced serializer and deserializer (SerDes) technologies to enable faster
communication between USB devices and hosts.

Serializer and deserializer components play a crucial role in USB SuperSpeed by converting
parallel data into serial data during transmission and vice versa during reception. This enables
efficient utilization of the available bandwidth and facilitates high-speed data transfer rates.

This document also features Functional verification for USB chip to chip layer using SV (system
verilog) and UVM (Universal Verification Methodology) with its phases, hierarchy and
components to verify the modules and the functionalities implemented.

X
Introduction
A Serializer/Deserializer (SerDes) is a pair of functional blocks commonly used in high-speed
communications to compensate for limited input/output. These blocks convert data between
serial data and parallel interfaces in each direction. The term "SerDes" generically refers to
interfaces used in various technologies and applications. The primary use of SerDes is to provide
data transmission over a single line or a differential pair to minimize the number of I/O pins and
interconnects. The basic SerDes function is made up of two functional blocks: the Parallel In
Serial Out (PISO) block (aka Parallel-to-Serial converter) and the Serial In Parallel Out (SIPO)
block (aka Serial-to-Parallel converter). There are 4 different SerDes architectures: (1) Parallel
clock SerDes, (2) Embedded clock SerDes, (3) 8b/10b SerDes, (4) Bit interleaved SerDes.

XI
Universal Serial Bus (USB)
The PHY interface for USB SuperSpeed Architecture has passed with many versions till reaching the current
version

The document traces the revision history of a USB specification, starting with an initial draft (0.1, 7/31/02)
and progressing through multiple versions. It undergoes industry review (0.5, 8/16/02) and provides
operational details (0.6, 10/4/02), including timing diagrams (0.7, 11/4/02). Changes are made to the receiver
detection sequence (0.8, 11/22/02). The document becomes stable for implementation (0.9, 12/16/02).

Updates reflect the 1.0a Base Spec (0.95, 4/25/03) and include multilane suggestions. The specification
stabilizes for implementation (1.00, 6/19/03). Version 1.70 (11/6/05) introduces Gen. 2 PIPE, followed by fixes
based on feedback (1.81, 12/4/05; 1.86, 2/27/06). Updates include handling CLKREQ# (1.87, 9/28/06). Minor
editorial updates follow (1.90, 3/24/07). Version 2.00 (7/21/07) represents a stable revision.

Version 2.7 (12/31/07) introduces updates supporting USB specification revision 3.0. Subsequent versions
(2.71, 2/21/08; 2.75, 2/8/08; 2.90, 8/11/08) include handling SKP and USB SuperSpeed PHY power
management. The specification reaches stability for implementation (2.90, 8/11/08). The final version, 3.0
(3/11/09), is declared as the conclusive update. The revisions encompass technical enhancements,
compliance with USB standards, and adjustments for evolving USB SuperSpeed modes.

1|Page
USB PHY with PIPE Schematic

The PHY layer for SuperSpeed is mainly partitioned into PCS(Physical Coding Sublayer) and PMA(Physical
Media Attachment).

Common Block which contains PLL that uses REF_CLK to generate Bit_Rate_CLK in addition to all clks used
in the PHY

The PCS have the 8b/10b code/decode and the elastic buffer and RX detection and has more than one clock
domain

The PMA have the serial to parallel block, serializer and CDR

Figure 1 Superspeed partitions

2|Page
Physical Interface PCIE and USB(PIPE)
The PHY Interface for the PCI Express and USB SuperSpeed Architectures (PIPE) is intended to
enable the development of functionally equivalent PCI Express and USB SuperSpeed PHY's. Such
PHY's can be delivered as discrete IC's or as macrocells for inclusion in ASIC designs. The specification
defines a set of PHY functions which must be incorporated in a PIPE compliant PHY, and it defines a
standard interface between such a PHY and a Media Access Layer (MAC) & Link Layer ASIC. It is not
the intent of this specification to define the internal architecture or design of a compliant PHY chip or
macrocell. The PIPE specification is defined to allow various approaches to be used. Where possible
the PIPE specification references the PCI Express base specification or USB 3.0 Specification rather
than repeating its content. In case of conflicts, the PCI-Express Base Specification and USB 3.0
Specification shall supersede the PIPE spec. This spec provides some information about how the MAC
could use the PIPE interface for various LTSSM states and Link states. This information should be
viewed as ‘guidelines for’ or as ‘one way to implement’ base specification requirements. MAC
implementations are free to do things in other ways as long as they meet the corresponding
specification requirements.

One of the intents of the PIPE specification is to accelerate PCI Express endpoint and USB SuperSpeed
device development. This document defines an interface to which ASIC and endpoint device vendors
can develop. Peripheral and IP vendors will be able to develop and validate their designs, insulated
from the high-speed and analog circuitry issues associated with the PCI Express or USB SuperSpeed
PHY interfaces, thus minimizing the time and risk of their development cycles.

PHY

PIPE

Figure 2 PIPE interface between PHY and MAC

3|Page
Physical Layer (PHY)

Figure 3 Top view Schematic

4|Page
PHY Synthesis

Figure 4 PHY top

Figure 5 PHY TX, PHY RX


5|Page
Physical Coding SubLayer (PCS) Synthesis

Figure 6 PCS Top

Figure 7 PCS TX, PCS RX

6|Page
PCS TX:

Figure 8 PCS TX

PCS RX

Figure 9 PCS RX

7|Page
Physical Media Attachment (PMA) Synthesis

Figure 10 PMA Top

Figure 11 PMA TX, PMA RX

8|Page
PMA TX
Direction Width Description
Name
Input 1 Serial CLK(5G)
Bit_Rate_CLK
Input 8 Word CLK
Bit_Rate_CLK_10
Input 1 Active Low Reset
Rst_n
Input 10 Encoded Data
Data_in
Input 1 Enable PMA
MAC_Data_En
Output 1 Serial bit sent in positive lane
TX_Out_P
Output 10 Serial bit sent in negative lane
TX_Out_N

Figure 12 PMA TX

9|Page
PMA RX Synthesis

Figure 13 PMA RX

10 | P a g e
Common Block
The Common Block contains the PLL(Phase Locked loop) which is utilized in the design of USB PHYs
(Physical Layers). The PLL is a crucial component that helps in generating stable clock signals, ensuring
proper synchronization, and meeting the timing requirements of USB communication.

The PLL is responsible for generating stable and precise clock signals used in USB communication

The PLL include a clock multiplication function, allowing the USB PHY to generate higher-frequency clocks
derived from a lower-frequency reference clock. Which is useful for achieving high-speed data rates.

In the design, the PLL takes as input Ref clock which is 100MHz and generates from it clock with faster rate:

 the serial clock which is 5GHz ( Bit_rate_clk)

the fast clock is then given to a clock divider and according to the Div_ratio it is reduced to a lower speed:

 the word clock which is the time for 10bits (Bit_Rate_clk_10)


 the PCLK which changes according to the Bus width so using the 32 bus width takes 4 word clock
period,16 bus width takes 2 word clock period, 8 bus width is equal

Common block “top module”

Direction Width Description


Name

Input 1 Reference Clk (100 MHz)


Ref_Clk

Input 6 Data Width (8,16,32)


DataBusWidth

Input 1 Active Low Reset


Rst_n

Output 1 Serial CLK(5G)


Bit_Rate_Clk

Output 1 WordClk
Bit_Rate_Clk_10

Output 1 Parallel Clk


PCLK

PLL

Direction Width Description


Name
Input 1 100 MHz clk to generate high rate clks
Ref CLK
Input 1 5 giga clock
CLK

11 | P a g e
Clock Divider

Direction Width Description


Name

Input 1 Serial CLK(5G)


Ref_CLK

Input 8 Ratio for dividing high rates to low one


Div_ratio

Input 1 Active Low Reset


Rst_n

Output 10 Low rate clk


Divided clk

Figure 14 Common Block

12 | P a g e
Figure 15 inside Common block

13 | P a g e
Transmitter Blocks

Figure 16 Transmitter Blocks

14 | P a g e
PCS TX
Gasket TX:
Data can be input to the module with different width 8,16, 32

However for it to be transmitted it is divided into blocks of 8bits to be encoded to 10 bits and then sent to
the receiver So the gasket module is responsible for diving the input into 8bits block to be transmitted
every word clock cycle so within the period of the width all the 8-bit blocks are transmitted

Direction Width Description


Name
Input 1 Parallel CLK
PCLK
Input 1 Word CLK
Bit_Rate_CLK_10
Input 1 Active Low Reset
Reset_n
Input 32 Parallel Data sent from MAC
MAC_TX_Data
Input 1 Enable
MAC_Data_En
Input 4 Indication From MAC to detect data sent is
MAC_TX_DataK
command or not

Input 5 Data sent Width


DataBusWidth
Output 1 Flag for command data
TXDataK
Output 8 Data to be encoded
TXData

Figure 17 Gasket Tx
15 | P a g e
Line 8b/10b Encoding
The 8b/10b line coding scheme is used to encode data for transmission over a communication channel.
It's designed to ensure reliable data transmission and provide certain properties such as DC balance
(equal number of 0s and 1s) and error detection.

10-Bit Symbols: In this scheme, data is encoded into 10-bit symbols. Each 10-bit symbol represents a
combination of data and control information. This encoding is more efficient for transmission than using
a straightforward binary representation.

Notation "D05.2": The "D05.2" notation is a way to describe a specific 10-bit symbol within the 8b/10b
scheme. Let's break it down:

 "D" indicates that it's a data character.


 "05" represents the first five bits of the data character in binary. In binary, "05" is "00101. “
 ".2" indicates that the last bit is "0" in binary.

One Byte Representation: While "D05.2" is a specific encoding within the 8b/10b scheme, it doesn't
directly represent one byte of data. The 8b/10b scheme doesn't have a strict one-to-one mapping
between its symbols and bytes of data. The scheme balances between 0s and 1s to ensure reliable
transmission.

Figure 18 example1 of encoding, decoding

16 | P a g e
The Gen 1 PHY uses the 8b/10b transmission code. PHY transmits information using an adaptive 8B/10B
code to bound the maximum run length of the code.

There are Two types of transmission characters Data and Special. Ordered Sets are known as
combinations of transmission characters.

MAC layer uses letter notation for describing information bits and control variable for example special
symbol K28.5 here we just used letter and decimal numbers for describing not binary notation which we
will explain in the following.

Information bit can hold only zero or one as value, control variable also can hold either the value D means
“valid data byte” or the value K means “special code”

Figure 19 MAC layer letter notation with encode, decode

• Each 10-bit encoded symbol contains six 1’s and four 0’s, six 0’s and four 1’s, or five 1’s and five
0’s. Symbol encodings with more than six bits of one polarity are not valid.

• The D/K bit accompanying each byte into the encoder assures that control (K) symbols and data
(D) symbols are unique sets. Even if a data byte and control byte are the same numerically, their
symbol encodings are different.

17 | P a g e
Figure 20 sample of encoded data and special character

DC Balance and run length

A DC-balanced serial data stream means that


it has the same number of 0s and 1s for a given
length of data stream. DC-balance is important
for certain media as it avoids a charge being
built up.

The run-length is defined as the maximum


numbers of contiguous 0s or 1s in the serial
data stream. A small run length data stream
provides data transitions within a small length of data. Data transitions are essential for clock
recovery.

The PLL of the CDR generates a phase-adjustable output clock from the reference clock input.
Transitions on the serial data stream provide the transmission clock phase information to the PLL
and allow the PLL to recover the transmission clock with the correct phase. Note that the reference
clock input is always necessary for the CDR. The serial data stream embeds the phase of the
transmission clock, not the clock itself. This reference clock comes from the receiver system, not
the transmitter system.

18 | P a g e
Code Mapping

The coding scheme breaks the original 8-


bit data into two blocks, 3 most significant
bits (y) and 5 least significant bits (x).
From the most significant bit to the least
significant bit, they are named as H, G, F
and E, D, C, B, A. The 3-bit block is
encoded into 4 bits named j, h, g, f. The 5-
bit block is encoded into 6 bits named i, e,
d, c, b, a. As seen in Figure., the 4-bit and
6-bit blocks are then combined into a 10-
bit encoded value. Figure 21 code mapping

Running Disparity

• the 8b/10b encoding scheme actually supports two possible encoded values for each data and
control byte. This is because transmitters are required to assure that outbound serial data, over
time, contains an equal (balanced) number of 0’s and 1’s. The on-going difference in the number of
transmitted bits of each polarity is referred to as the running disparity. Tracking the current running
disparity (CRD) and correcting any imbalance is critical on the AC-coupled SuperSpeed link and
one of the important motivations for 8b/10b encoding.

• each symbol out of the 8b/10b encoder has one of the following 10-bit properties:

• It is comprised of six 1’s and four 0’s (a positive disparity) OR

• It is comprised of six 0’s and four 1’s (a negative disparity) OR

• It is comprised of five 1’s and five 0’s (a neutral disparity)

19 | P a g e
Encoder “Top Block”

Direction Width Description


Name
Input 1 Enable the encoder to do
MAC_Data_En
Input 1 Define whether data is comman or actual data
TXDataK
Input 8 Data from MAC
Data
Input 1 Clk to send 10-bits to PMA
Bit_Rate_10
Input 1 Active low reset
Rst
Output 10 Data sent to PMA
Data_Out

Line_encoding

Direction Width Description


Name
Input 1 Enable the encoder to do
Enable
Input 1 Define whether data is comman or actual
TXDataK
data

Input 8 Data to be encoded


Data
Output 10 Encoded data
Encoded_data_pos
Output 10 Encoded data
Encoded_data_neg

20 | P a g e
FSM_RD ”Running Disparity”

Direction Width Description


Name
Input 1 Enable the encoder to do
Enable
Input 1 Define whether data is comman or actual data
TXDataK
Input 10 RD (-) Data
Data_neg
Input 10 RD (+) Data
Data_pos
Input 1 Clk to send 10-bits to PMA
Bit_Rate_10
Input 1 Active low reset
Rst
Output 10 Current Running Disparity Data
Data_10

Figure 22 line encoding

21 | P a g e
PMA TX
Parallel to serial:
USB(universal serial bus) transmits data to receiver serial as transmitting data serially instead of parallel
have many advantages such as

 reduced wiring complexity as transmitting data in parallel requires multiple wires unlike sending data
serially
 Improving signal integrity as Parallel communication over multiple wires can lead to issues such as signal
skew, where bits arrive at slightly different times due to variations in wire lengths, while serial
communication, being a single stream of bits, is less susceptible to these timing discrepancies, resulting
in better signal integrity.
 Longer Transmission Distances: Serial communication is often more suitable for long-distance
communication as the use of differential signaling (such as in USB) helps reduce the impact of noise and
interference over longer cable lengths.

The parallel data is received with relatively slow speed and the transmitted with faster speed where the
serial data rate is 10 times faster than the parallel data rate.

Figure 23 parallel to serial

22 | P a g e
Receiver Blocks

Figure 24 Receiver Blocks

23 | P a g e
PCS RX
Elastic Buffer:
Introduction

Elastic Buffers (also known as


Elasticity Buffers, Receiver
Transmitter
Synchronization Buffers, and
Elastic Stores) are used to
ensure data integrity when Recovered
bridging two different clock Clock Elastic
domains. This buffer is simply a Domain buffer
FIFO (First-In-First-Out) where Local Domain
data is deposited at a certain Clock
rate based on one clock and
Local Domain Clock
removed at a rate derived from
a different clock. Because these
two clocks could (and almost
always do) have minor
frequency differences, there is the potential for this FIFO to eventually overflow or underflow. To avoid
this situation, an Elastic Buffer has the ability to insert or remove special symbols, during specified
intervals that allow the buffer to compensate for the clock differences.

The extreme clock difference happens when TX clock is -300ppm, the RX clock is -300ppm and the
spread Spectrum clocking (SSC) is -5000ppm, the total mismatch is 5600ppm.so every 178 symbo1
(1M/5600) the transmitter will fall behind with one symbol. If not addressed the receiver will encounter
data underflow (no symbols to process)

The other extreme clock difference happens when TX clock is +300ppm, the RX clock is -300ppm and
the spread Spectrum clocking (SSC) is 0ppm, the total mismatch is 600ppm.so every 1666 symbo1
(1M/600) the transmitter will have sent one extra symbol. If not addressed the receiver will encounter
data overflow (a dropped symbol during processing)

The elastic buffer is a crucial to handle this mismatch and to avoid the timing errors between the read
and write operations which may lead to data loss. The elastic buffer is mainly an asynchronous FIFO with
additional functionality. The elastic buffer is used to deal with SKP to avoid losing data. Normally the
System clock and receive clock, which is used in the elastic buffer, should be the same which is 5G,
however there can be slight changes which ranges from -5300 ppm to 300ppm.

24 | P a g e
The elastic buffer implemented uses Nominal Half Full Buffer where the elastic buffer should always have
the buffer half full and adding the SKP or removing it is done to keep the elastic buffer half full.

Adding skp to keep the buffer at nominal half full

Removing skp to keep the buffer at nominal half full

25 | P a g e
Elastic Buffer Design:

Write pointer Memory Unit Read pointer


control unit control unit

Synchronous Unit

Threshold monitor

Figure 25 elastic buffer


The elastic buffer is mainly designed to have 5 main blocks:

 Memory unit
It is mainly a FIFO where data is stored and read from it. The elements stored are the 10 bits data
received which can be data or command like SKP. It determines the position to read from or write to
using the read and write pointer which is received from other blocks.
 Write pointer control unit
This module produces the binary and gray write pointer, which is used for the selection of the correct
address of the memory unit while taking into consideration if a SKP needs to be deleted. It generates
the overflow signal by comparing between the read and write address. The gray code of read pointer
is synchronized to the Recovered Clock Domain and compared with the gray code of the write pointer,
then the unit produces the full Flag Read pointer control unit
 Read pointer control unit
This module produces the binary and gray read pointer, which is needed to read from the correct
address in the memory unit and it generates the request signal of SKP add after being compared with
the write pointer. The gray code of write pointer is synchronized to the local Clock Domain and
compared with the gray code of the read pointer, then the unit produces the empty Flag.

26 | P a g e
 Threshold unit
This module is used to check whether the buffer exceeded the limit in comparison with the size of half
the array. So, if the number of elements bigger than 8 SKP remove request is made
 Synchronous Unit
The function of this module is to synchronize the gray code of the write pointer and the read pointer so
the read and write pointer checks on them and produce the empty and full flag.

Gray code read


Gray code write
pointer
Write pointer pointer read pointer
control unit control unit

FF FF

FF FF

Figure 26 elastic buffer synchronous unit

Gray code is used as the change between a number and the following one is in only one bit so by using
Gray code and synchronizers errors is avoided even when dealing with two clock domains, Recovered
Clock Domain and Local Clock Domain.

Flow charts for SKP Delete and SKP add Operations

Figure 27 flow chart for SKP Handling


27 | P a g e
Figure 28 elastic buffer synthesis

28 | P a g e
29 | P a g e
Receiver Status:
This module gives feedback to the MAC with the current status according to the input flags such as
overflow, underflow, disparity error and decode error.

[2] [1] [0] Description

0 0 0 Received data ok

0 0 1 USB superSpeed Mode: 1Skp ordered set added

0 1 0 USB superSpeed Mode: 1Skp ordered set removed

0 1 1 Reciever Detected

1 0 0 Both 8b/10b decode error and (optionally) receive disparity


error

1 0 1 Elastic Buffer Overflow

1 1 0 Elastic Buffer Underflow this is unused if the elastic buffer


is operating in nominal buffer empty mode

1 1 1 Receive disparity error (Reserved if Receive Disparity error


is reported with code 0b100)

Figure 29 Reciever Status Synthesis

30 | P a g e
Comma Detection:
The module checks on every 10 bits of received data and raise a flag when a comma is detected to be
able to read the data after it and store it in the elastic Buffer. This helps to align data as when a comma
is received a flag is raised so the start and end of the 10 bits are known.

Direction Width Description


Name
Input 1 Bit Rate Clk (5G)
CLK
Input 1 Active Low Reset
Rst_n
Input 10 Data collected serially
Detect_Comma
Output 8 Data recovered from decoder
RxValid
Output 1 Write pulse to write in buffer
Comma_pulse

Figure 30 Comma detection Synthesis

31 | P a g e
PMA RX
Serial To parallel
After receiving the serial data from the channel, it is transferred back to parallel to be decoded and
received. Each 10 bits of data is collected into one parallel data.

Direction Width Description


Name
Input 1 Recovered CLK
Recovered_Bit_Clk
Input 1 Active Low Reset
Rst_n
Input 1 Recovered serial bit from CDR
Ser_in
Input 1 Indication to toggle serial bit or not
RXPolarity
Output 10 Data recovered from decoder
Data_to_Decoder

32 | P a g e
Decoder
After receiving the data as 10bits the decoder functionality is to map these bits back to the 8bits which is
the original data. In addition to that it also defines if this data is actual data or command by comparing
where it got the equivalent 8bits.

An additional functionality is that it detects the errors which can be either decode error or disparity error.

Decode error is when the data is not found neither as positive encoding nor as positive encoding. On the
other hand, the disparity error occurs when the data is supposed to be positive encoding while the
received was negative encoding or vice versa

Direction Width Description


Name
Input 1 WordCLK (125, 250 , 500) MHz
CLK
Input 1 Active Low Reset
Rst_n
Input 10 Encoded Data
Data_in
Output 8 Data recovered from decoder
Data_out
Output 1 Error in Decoding
DecodeError
Output 1 Error in Data Disparity
DisparityError
Output 10 Indication whether it’s actual data or
RxDataK
command

33 | P a g e
Gasket Rx:
In the receiver after receiving the blocks of 10bits and decoding them to 8bits. The gasket is responsible
for collecting the 8-bit blocks according to the width.

So, the output become 8, 16 or 32 bit

Direction Width Description


Name
Input 1 WordCLK (125, 250 , 500) MHz
Word_Clk
Input 1 Active Low Reset
Rst_n
Input Parallel Data CLK
PCLK
Input 6 Width of data (8,16,32)
Width
Input 8 Word Data
Data_in
Output 32 Data sent to MAC
Data_out

34 | P a g e
CDR
In the context of USB (Universal Serial Bus) and data communication, "CDR" can stand for "Clock Data
Recovery." Clock Data Recovery is a process that helps in the extraction of a clock signal from a data
stream. It is particularly important in high-speed communication systems to ensure proper
synchronization between the sender and receiver.

USB, especially in its SuperSpeed versions (USB 3.0 and later), uses complex signaling techniques to
achieve high data transfer rates. These techniques involve the transmission of data along with a clock
signal that indicates when the bits of data should be sampled. Clock Data Recovery becomes crucial in
scenarios where the clock signal needs to be extracted accurately from the incoming data stream.

In USB SuperSpeed, the clock recovery process helps the receiver to synchronize with the incoming data
stream, ensuring proper sampling and interpretation of the transmitted bits. This is important for
maintaining signal integrity and reliable data communication.

CDR circuits are often implemented in the receiver circuitry of USB devices to extract and recover the
clock signal from the incoming data. The recovered clock is then used to sample the data accurately,
reconstructing the original information sent by the transmitter.

In summary, Clock Data Recovery (CDR) in USB involves the extraction of a clock signal from the
incoming data stream, and it plays a critical role in ensuring reliable and high-speed data communication
in USB systems.

The CDR designed is based on Bang Bang CDR. It operates on a simple principle of comparing the
sampled data with two clock signals: the primary clock and a version shifted by 90 degrees, along with
their inverses. The "bang-bang" terminology arises from the binary nature of the system – it makes binary
decisions about clock adjustment. The system samples incoming data using the clock pair and their
inverses. The sampled data is compared with the 90-degree shifted clock. The alignment with the rising
or falling edge indicates whether the clock is early or late.

The Bang-Bang CDR's being simple in concept and implementation makes it an attractive choice in
addition to that it provides robust clock recovery, exhibiting resilience to noise and variations in the
incoming signal

35 | P a g e
Data0 Data1 Data2

D0

P0

D1

P1

In Bang Bang CDR there is 4clocks which sample the data. The comparison between the value given
between P0-D0 or P0-D1 defines if the clock is early or late to be adjusted. If D0 =P0 then the data is early.
On the other hand, if P0=D1 then it late. For the CDR to work correctly there should be DC Balance and
maximum run length as having a big number of consecutives ones or zeros will lead to the CDR not being
able to detect if it early or late

The Clock and Data Recovery (CDR) mechanism with Bang Bang CDR is designed to ensure accurate
synchronization of clocks and data in communication systems. The system utilizes four clocks for data
sampling and relies on comparisons between P0-D0 and P0-D1 values to determine whether the clock is
early or late, thereby facilitating necessary adjustments.

 Early Detection: If D0 equals P0, the system interprets the data as early.
 Late Detection: If P0 equals D1, the system identifies the data as late.

Maintaining DC balance is crucial for the proper functioning of the CDR. DC balance ensures an equilibrium
between the number of ones and zeros in the data stream. An imbalance in DC levels can impact the CDR's
ability to accurately detect whether the clock is early or late, potentially leading to synchronization issues.

The system takes into account the concept of maximum run length, which refers to the maximum number
of consecutive ones or zeros in the data stream. Monitoring and limiting the run length are essential to
prevent challenges in accurate clock and data recovery. Excessive consecutive ones or zeros may hinder
the CDR's ability to discern the timing relationship between the clock and data.

36 | P a g e
Phase Mixer
This module is part of the CDR it is essential to change the phase and frequency of the clock to synchronize
it with the received data to be able to sample the data correctly without losing any bits.

The input to the phase mixer is the clock and the


digital control, which defines how the output will sin(𝜔1 𝑡) sin(𝜔1 𝑡 + 𝜙)
change. If the digital control keeps changing the Phase
frequency will change as the 𝜙 will be function of Mixer
frequency

If 𝜙(𝑡) = 𝑘 → sin(𝜔𝑡 + 𝑘) → 𝑐ℎ𝑎𝑛𝑔𝑒 𝑖𝑛 𝑝ℎ𝑎𝑠𝑒


Digital control

𝜙(𝑡) = 𝑘𝑡 → sin(𝜔𝑡 + 𝑘𝑡) → 𝑐ℎ𝑎𝑛𝑔𝑒 𝑖𝑛 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦

𝜙(𝑡) = 𝑘𝑡 2 → sin(𝜔𝑡 + 𝑘𝑡) → 𝑠𝑝𝑟𝑒𝑎𝑑𝑖𝑛𝑔

The Clock and Data Recovery (CDR) Phase Mixer plays a pivotal role in digital communication systems. It
utilizes a digital code structure to select two phases from a predefined set, employing the first n bits for this
purpose. The remaining bits finely control interpolation between the selected phases. In an example scenario
with 4 phases (0, 90, 180, 270 degrees), 2 bits choose two phases, while 8 bits govern precise interpolation.
This mechanism, often used in high-speed interfaces, optimizes recovered clock signals, ensuring reliable
data recovery by adapting to variations in incoming clocks. Its flexibility and high-resolution interpolation
contribute to the system's overall reliability and data integrity.

90

180
0

270

37 | P a g e
Simulink model for Phase Mixer

Figure 31 Phase Mixer models

38 | P a g e
39 | P a g e
40 | P a g e
41 | P a g e
PHY Verification
This section verifies the functionality and performance of the PHY the top module design, which implements
the USB protocol for data communication. The verification objectives are to ensure that the PHY design
conforms to the USB protocol specifications, and that it meets the required functionality, throughput, latency,
targets. The verification plan outlines the verification strategy, methodology, environment, scenarios, and
criteria for the PHY design. The verification environment is based on the UVM framework, which provides a
standardized and reusable way of creating and executing testbenches for digital designs and systems-on-
chip (SoCs).

UVM is a verification methodology that uses a SystemVerilog-based, object-oriented approach to create


modular, configurable, and scalable testbench components that can be easily integrated into the verification
process. UVM also provides guidelines and best practices for developing testbenches, running simulations,
and analyzing results. UVM consists of several key components, such as drivers, monitors, scoreboards,
agents, sequences, and environments, which perform different functions and roles in the verification process.
UVM also supports transaction-level modeling (TLM), which enables the communication and synchronization
between testbench components using abstract transactions rather than signals.

The testbench architecture for the PHY verification is shown in Figure. The testbench consists of two main
components: the UVM test and the UVM environment. The UVM test is responsible for generating and
controlling the test scenarios and stimuli for the DUT. The UVM environment is responsible for providing the
testbench components and interfaces for the DUT. The testbench components include the following:

Figure 32 UVM Structure

42 | P a g e
My agent which models the behavior and protocol of the PHY interface, and consists of a driver, a monitor,
and a sequencer.

My driver which is responsible for driving the signals from the sequencer to the Bus Functional Model wich
passes the data to the PHY DUT.

My Monitor which responsible for monitoring the output data from the DUT and sending it using TLM to
Scoreboard and coverage components.

The scoreboard which compares the expected and actual outputs of the DUT and reports any mismatches
or errors.

The coverage collector, which collects and analyzes the functional and code coverage of the DUT and reports
the coverage metrics and goals.

The testbench interfaces include the following:

The BFM interface, which connects the my_driver to the PHY port of the DUT, and provides the signals and
transactions for the USB protocol.

The testbench components and interfaces are configured and connected using UVM configuration and TLM
mechanisms, which allow the flexibility and reusability of the testbench architecture. The testbench
components and interfaces interact with the DUT and each other using UVM sequences which are predefined
or random sequences of transactions that drive and monitor the DUT behavior and performance through
different phases organizes the timeline of the verification process and that is one of most important features
in UVM all phases are indicated in the figure:

Figure 33 UVM Phases

43 | P a g e
Build phases: These phases are used to construct, configure, and connect the testbench components. They
are executed at the start of the simulation, and they do not consume simulation time. The build phases are:

 build_phase: Used to create the instances of the testbench components and set their initial
configuration.
 connect_phase: Used to connect the TLM ports and exports of the components using TLM
connections or uvm_config_db.
 end_of_elaboration_phase: Used to make any final adjustments to the testbench structure,
configuration, or connectivity before the simulation starts.

Run-time phases: These phases are used to generate, drive, monitor, and check the test stimuli and
responses. They are executed in parallel with the run_phase, which is the main phase that consumes
simulation time. The run-time phases are:

 start_of_simulation_phase: Used to display the testbench topology or configuration information, or to


set any run-time configuration.
 run_phase: Used to generate and control the test scenarios and stimuli for the DUT, and to monitor
and check the DUT behavior and performance.
 pre_reset_phase: Used to perform any activity that should occur before the reset, such as waiting for
a power-good signal.
 reset_phase: Used to generate a reset and to put the DUT or interface into its default state.
 post_reset_phase: Used to perform any activity that should occur after the reset, such as configuring
the DUT or interface.
 pre_configure_phase: Used to perform any activity that should occur before the configuration, such
as setting the initial seed or randomization constraints.
 configure_phase: Used to configure the DUT or interface with the required parameters or registers.
 post_configure_phase: Used to perform any activity that should occur after the configuration, such as
checking the configuration status or enabling the interrupts.
 pre_main_phase: Used to perform any activity that should occur before the main phase, such as
setting the initial coverage or scoreboard settings.
 main_phase: Used to generate and drive the main test stimuli and responses, and to monitor and
check the main DUT behavior and performance.
 post_main_phase: Used to perform any activity that should occur after the main phase, such as
collecting the final coverage or scoreboard data.
 pre_shutdown_phase: Used to perform any activity that should occur before the shutdown, such as
sending the end-of-test signal or flushing the queues.
 shutdown_phase: Used to shut down the DUT or interface gracefully, and to perform any cleanup or
recovery actions.
 post_shutdown_phase: Used to perform any activity that should occur after the shutdown, such as
checking the shutdown status or releasing the resources.

44 | P a g e
Clean-up phases: These phases are used to extract, check, and report the verification results. They are
executed at the end of the simulation, and they do not consume simulation time. The clean-up phases are:

extract_phase: Used to retrieve and process the information from the scoreboards and functional coverage
monitors, and to compute the expected data or coverage metrics.

check_phase: Used to check that the DUT behaved correctly and to identify any errors or mismatches that
may have occurred during the simulation.

report_phase: Used to display or write the verification results, such as the test status, errors, warnings,
messages, coverage, and metrics.

final_phase: Used to complete any other outstanding actions that the testbench has not already completed,
such as closing the files or terminating the processes.

45 | P a g e
Test Strategy
The verification Features for the PHY design verification are summarized in the Table. The verification results
and waveforms show that the PHY design passes all the functional and performance tests and achieves the
required coverage and metrics. The verification results are verified and validated using UVM reports, which
provide the detailed information and statistics of the verification process, such as the test name, status,
duration, errors, warnings, messages, coverage, and metrics.

Reset_n DataBus MAC_TX MAC_TX Test Feature


MAC_Data RxPolari
Width _En ty
_Data _DataK

0 8 x x 1 0
0 8 x x 0 1
0 16 x x 1 0 Resetting the
0 16 x x 0 0 Design
0 32 x x 1 0
0 32 x x 0 0

1 8 32’d0000BC 1 1 0
1 8 Random 0 1 0
32bit data
1 8 32’d0000BC 1 1 0 Collecting Data S to
1 8 Random 0 1 0 P
32bit data with COMMA
1 8 32’dxxxxBC 1 1 0 Detection
1 8 Random 0 1 0 with DATA WIDTH =
32bit data 8
1 8 32’dxxxxBC 1 1 0
1 8 Random 0 1 0
32bit data

1 16 32’d0000BC 1 1 0 Collecting Data S to


BC P
1 16 Random 0 1 0 with COMMA
32bit data Detection

46 | P a g e
1 16 32’d0000BC 1 1 0 with DATA WIDTH =
BC 16
1 16 Random 0 1 0
32bit data
1 16 32’dxxxxBC 1 1 0
BC
1 16 Random 0 1 0
32bit data
1 16 32’dxxxxBC 1 1 0
BC

1 32 32’dBCBCB 1 1 0
CBC
1 32 Random 0 1 0
32bit data
1 32 32’dBCBCB 1 1 0 Collecting Data S to
CBC P
1 32 Random 0 1 0 with COMMA
32bit data Detection
1 32 32’dBCBCB 1 1 0 with DATA WIDTH =
CBC 32
1 32 Random 0 1 0
32bit data
1 32 32’dBCBCB 1 1 0
CBC

1 32 1 1 0
32’dBCBCB
CBC

1 32 0 1 0
Random
32bit data Elastic Buffer
Underflow
1 16 1 1 0 &
32’dBCBCB
RX_Status = 3’b110
CBC

1 16 0 1 0
Random
32bit data

47 | P a g e
1 8 1 1 0
32’dBCBCB
CBC

1 8 0 1 0
Random
32bit data

1 32 1 1 0
32’dBCBCB
CBC

1 32 0 1 0
Random
32bit data

1 16 1 1 0
32’dBCBCB
CBC Elastic Buffer
Underflow
1 16 0 1 0 &
Random
RX_Status = 3’b110
32bit data

1 8 1 1 0
32’dBCBCB
CBC

1 8 1 1 0
32’dBCBCB
CBC

1 32 1 1 0
32’dBCBCB
CBC
Elastic Buffer
1 32 1 1 0 SKP ADDED
32’dBCBCB
CBC &
RX_Status =
1 32 0 1 0 3’b001
Random
32bit data

48 | P a g e
1 32 1 1 0
32’dBCBCB
CBC

1 32 1 1 0
32’dBCBCB
CBC

1 32 1 1 0
32’dBCBCBCBC

1 32 1 1 0
32’dBCBCBCBC
Elastic Buffer
1 32 0 1 0 SKP ADDED
Random 32bit
data &
RX_Status =
1 32 1 1 0 3’b001
32’dBCBCBCBC

1 32 1 1 0
32’dBCBCBCBC

1 32 1 1 0
32’dBCBCBCBC

1 32 0 1 0
Random 32bit
data
Elastic Buffer
1 32 0 1 0 SKP Removed
Random 32bit
data &
RX_Status =
1 32 1 1 0 3’b010
32’dBCBCBCBC

1 32 0 1 0
Random 32bit
data

49 | P a g e
1 32 1 1 0
32’dBCBCBCBC

1 32 0 1 0
Random 32bit
data
Decoder
1 32 1 1 0 Decode Error
32’dBCBCBCBC
&
1 32 0 1 0 RX_Status = 3’b100
Random 32bit
data

1 32 1 1 0
32’dBCBCBCBC

1 32 1 1 0
32’dBCBCBCBC

1 32 0 1 0
Random 32bit
data
Decoder
1 32 1 1 0 Disparity Error
32’dBCBCBCBC
&
1 32 0 1 0 RX_Status = 3’b111
Random 32bit
data

1 32 1 1 0
32’dBCBCBCBC

Testing All Main


Features
Power and Reset
Random Patterns Generating with Different Seeds
Test
&
Stress Testing

50 | P a g e
Testing All Events
&
Coverage Driven Verification Using Functional Coverage and Assertions
generated Clocks
Period

51 | P a g e
Waveform
 Generated Clocks periods based on the Data width Used

Figure 34 Width = 32 bit clock generated

- For the (Width = 32) bit the Clocks generated from the Common Block and extracted from

(Ref_CLk = 100MHz):
 PCLK running at 125MHz.
 Symbol_Clk running at 500MHz.
 Bit_Rate_Clk running at 5GHz.

- For the (Width = 16) bit the Clocks generated from the Common Block and extracted from

(Ref_CLk = 100MHz):
 PCLK running at 250MHz.
 Symbol_Clk running at 500MHz.

Figure 35 Width = 16 bit clock generated

 Bit_Rate_Clk running at 5GHz.

52 | P a g e
- For the (Width = 8) bit the Clocks generated from the Common Block and extracted from
(Ref_CLk = 100MHz):
 PCLK running at 500MHz.

Figure 36 Width = 8 bit clock generated

 Symbol_Clk running at 500MHz.


 Bit_Rate_Clk running at 5GHz.

 Gasket block dividing Data based on the Bus width

- for (width = 32)


 Max_Tx_Data = BCBCBCBC here is divided into 4 blocks 8-bit each and transmitted to the
Encoder BC per block.
 Max_Tx_Data = 6e162325 here is divided into 4 blocks 8-bit each and transmitted to the Encoder.

53 | P a g e
 Encoder block Encoded Data based on the disparity

Figure 37 balance between 0's and 1's

- Encoder Encoded each 8-bit and produce 10-bit data symbol with symbol clk rate for the appropriate
disparity keeping the balance in 0’s and 1’s based on the table of disparity as in the figures

54 | P a g e
 PMA_TX P2S serializes the symbolled data

- Transmitter serializes the parallel symbol data here for 10 Bit_Rate_Clk cycles from LSB to MSB.

 PMA_RX S2P deserializes the symbolled data

- After date serialized from the TX the receiver collects the data in S2P (Serial to Parallel) when collecting
COMMA Order Set It raise the Rx_Valid flag indicating the Symbol lock as we see here it collects the
serialized data converting them to parallel data sending them to next stage Elastic Buffer.

55 | P a g e
 Elastic Buffer Storing the Data

-With The K28 Comma Detection Lock the data are stored inside the Elastic buffer as we see here, therefore
an increasing in write pointer with each write operation happens.

 Decoding the symbol encoded data into the original Data

Decoder takes the symbol encoded data converting them into 8-bit original data:

56 | P a g e
 Rx_Gasket Collecting the data to Rx output

Rx_gasket collects data into 32-bit original data bus width to the output.

 Elastic Buffer Underflow

57 | P a g e
 Elastic Buffer Overflow

 Elastic Buffer (Threshold Monitor) Add Request – Delete Requests

58 | P a g e
 Decoding Error

 Decoder Disparity Error

Two positive or negative encoded data comes in sequence without toggling disparity due to end padding
Command while buffer under threshold waiting for collecting 4 symbols of data(acceptable).

59 | P a g e
 Decoding Error - Rx_status.

 Disparity Error - Rx_status.

60 | P a g e
 Underflow- Rx_status.

 Data is OK- Rx_status.

61 | P a g e
CLK Periods

-DataBusWidth = 32

-DataBusWidth = 16

62 | P a g e
-DataBusWidth = 8

63 | P a g e
TX Verification
TX Test Strategy

MAC_En MAC_Data MAC_DataK Rst_n Test Feature


Data_Bus_Width
1’b0 8’hxx 4’hx 1’b1 Check Mac Enable enforce
6’d8
all blocks to be disabled

1’b1 8’hxx 4’hx 1’b0 Check active low reset Data


6’d8
to be Zeros

1’b1 8’h26 4’h0 1’b1 Check byte to be sent as


6’d8
actual Data not Command

1’b1 8’hBC 4’h1 1’b1 Check byte to be sent as


6’d8
Command “COMMA”

1’b1 16’h719c 4’h0 1’b1 Check 2-bytes to be sent


6’d16
as actual Data

1’b1 16’h71BC 4’h1 1’b1 Check sent 2-bytes as LSB


6’d16
byte sent as command and
the other sent as Actual
Data

1’b1 16’hBC71 4’h2 1’b1 Check sent the MSB byte


6’d16
as command

1’b1 16’hBCBC 4’h3 1’b1 Check 2-bytes are sent as


6’d16
actual Data

1’b1 32’h26c4dd81 4’h0 1’b1 Check the whole word is


6’d32
sent as Data

1’b1 32’h26c4ddbc 4’h1 1’b1 Check that LSB Byte is sent


6’d32
as command

1’b1 32’h26c4bc81 4’h2 1’b1 Check that 2nd Byte is sent


6’d32
as command

1’b1 32’h26c4bcbc 4’h3 1’b1 Check that 1st & 2nd Byte is
6’d32
sent as command

1’b1 32’h26bcdd81 4’h4 1’b1 Check that 3rd Byte is sent


6’d32
as command

1’b1 32’h26bcddbc 4’h5 1’b1 Check that 1st & 3rd Byte is
6’d32
sent as command

64 | P a g e
1’b1 32’h26bcbc81 4’h6 1’b1 Check that 2nd & 3rd Byte is
6’d32
sent as command

1’b1 32’h26bcbcbc 4’h7 1’b1 Check that 1st ,2nd, 3rd Byte
6’d32
is sent as command

1’b1 32’hbcc4dd81 4’h8 1’b1 Check that 4th Byte is sent


6’d32
as command

1’b1 32’hbcc4ddbc 4’h9 1’b1 Check that 4th & 1st Byte is
6’d32
sent as command

1’b1 32’hbcc4bc81 4’hA 1’b1 Check that 2nd & 4th Byte is
6’d32
sent as command

1’b1 32’hbcc4bcbc 4’hB 1’b1 Check that 1st, 2nd ,4th Byte
6’d32
is sent as command

1’b1 32’hbcbcdd81 4’hC 1’b1 Check that 3rd, 4th Byte is


6’d32
sent as command

1’b1 32’hbcbcddbc 4’hD 1’b1 Check that 1st ,3rd ,4th Byte
6’d32
is sent as command

1’b1 32’hbcbcbc81 4’hE 1’b1 Check that 2nd ,3rd ,4th Byte
6’d32
is sent as command

1’b1 32’hbcbcbcbc 4’hF 1’b1 Check the whole word is


6’d32
sent as command

Features:
1. Check data sent when MAC_DATA_En signal is de-activated (enable = 0)

2. Check active low reset to be Zeros

65 | P a g e
3. Check data 8’h26 with width 8 and it’s actual data not a command

4. Check data 8’hbc to be sent as command with encoded value = 10’h0fa

5. Check data 16’h719c to be sent as data with encoded value = 10’h0e2 , 10’h23c

66 | P a g e
6. Check data 16’h71bc to be sent the LSB byte as command with encoded value = 305
and the next byte which is 71 to be sent as data with encoded value = 0ea

7. Check data 16’hbc9c and 16’hbcbc patterns to be sent with constraints for command
or data as following:
a. 1st pattern which is 16’hbc9c sent the LSB byte as data and the 2nd one as
command
b. 2nd pattern which is 16’hbcbc sent all two bytes as command

Width 32

67 | P a g e
8. Check data 32’h26c4dd81 to be sent all as data with encoded value = 22d – 2e6-0a6-
199 alternatively positive & negative RD

9. Check data 32’h26c4ddbc to be sent the LSB byte as command and the other bytes
are sent as actual Data

10. Check data 32’h26c4bc81 to be all sent as Data except the 2nd byte to be sent as
command:

68 | P a g e
11. Check data 32’h26c4bcbc to be sent as data except the 1st & 2nd byte to be sent as
command:

12. Check data 32’h26bcdd81 to be sent all as data except the 3rd byte:

13. Check data 32’h26bcddbc to be sent all as data except 1st & 3rd bytes to be sent as
command:

69 | P a g e
14. Check data 32’h26bcbc81 to be sent as data except the 2nd and 3rd bytes are sent as
command:

15. Check data 32’h26bcbcbc to be all sent as command except the MSB byte is sent as
Data:

70 | P a g e
16. Check data 32’hbcc4dd81 to be sent as data except MSB byte to be sent as
command:

17. Check data 32’hbcc4ddbc to be sent as data except the LSB, MSB bytes are sent as
command and data 32’hbcc4bc81 to be sent as data except the 2nd & 4th bytes are
sent as command:

71 | P a g e
18. Check data 32’hbcc4bcbc to be sent as command except the 3rd byte is sent as data
and data 32’hbcbcdd81 to be sent as MSB 2-bytes as command and LSB 2-bytes are
sent as Data:

19. Check data 32’hbcbcddbc to be sent as command except 2nd byte is sent as data and
data 32’hbcbcbcbc to be sent all as command :

72 | P a g e
20. Check data 32’hbcbcbc81 to be sent as command except the LSB byte is sent as
data:

UVM_Report:

73 | P a g e
Coverage Reports:

1. Functional Coverage:

74 | P a g e
2. Code coverage:

75 | P a g e
76 | P a g e
CDR (Clock and Data Recovery)
In USB SuperSpeed (USB 3.0 and above), Clock and Data Recovery (CDR) is a crucial process for the
receiver to correctly interpret the incoming data stream. Unlike USB 2.0 that transmits a separate clock signal,
SuperSpeed relies on a technique called embedded clock. And the solution for this challenge is that the
clock information is encoded within the data itself. This eliminates the need for an extra wire but puts the
burden on the receiver to extract the clock signal from the data stream. The receiver utilizes a dedicated
circuit called the CDR, the incoming data stream goes through the CDR, which employs various techniques
to recover the embedded clock:

• Data Encoding: USB SuperSpeed uses a specific encoding scheme, typically 8b/10b encoding. This
encoding ensures enough transitions (edges) within the data to facilitate clock recovery. And this
encoding schema previously mentioned in this document.
• Phase-Locked Loop (PLL): The CDR often utilizes a PLL. A reference clock from the system is fed
into the PLL. The CDR adjusts the phase of this reference clock to align it with the transitions in the
incoming data. This creates a "recovered clock" that is synchronized with the original transmitter's
clock.
• Digital Implementations: Modern CDRs can be entirely digital, using techniques like Delay-Locked
Loops (DLLs) or phase rotators to achieve phase alignment.

77 | P a g e
For recovering the input data, the recovered clock must sample the data in the middle of the input data eye,
to ensure an improved bit-error-rate (BER).

One of the problems in SerDes without forwarding the transmitting clock is different clock sources. A source-
asynchronous system in which the transmitting and receiving sides use different clock sources which leads
to a frequency offset between the transmitted data and the local clock on the receiver side due to natural
device mismatches, most used category is one without forwarding transmitting clock. Thus, many researchers
have proposed a wide variety of CDR designs for high speed serial links applications. Let’s take short
overview across CDR architecture.

Typical Receiver and Analog CDR

78 | P a g e
To identify (and limit) the scope of the problem, we refer to the block diagram of a typical high-speed receiver,
illustrated in previous figure. We observe that receivers at these speeds typically comprise a bank of slicers
to sample the incoming signal at a number of equally spaced phases, some type of deserialization and a
clock recovery unit. A common CDR uses an analog phase-locked loop (PLL), including a bang-bang phase
detector, charge pump loop filter (CPLF) and a voltage-controlled oscillator (VCO), as shown in Figure. Some
analog CDR implementations run the phase detector and charge pump at the baud rate, while others de
serialize to varying degrees before summing at the loop filter.

The bang-bang phase detector is common to many analog CDRs and the digital CDR proposed here. Lower
speed transceivers (operating where the baud interval is much larger than multiple gate delays) often use
phase detectors which produce more linear responses. And we will talk more about different phase detectors
especially Bang-Bang phase detector which we used in our implementation later.

The charge pump acts as an analog feedback mechanism within the CDR loop. It compares the phase
relationship between the recovered clock from the data and a reference clock. Based on this comparison, the
charge pump controls the flow of current to a capacitor. If the recovered clock lags the reference clock (data
arriving late), the charge pump injects current into the capacitor, raising its voltage. And if the recovered clock
leads the reference (data arriving early), the charge pump removes current from the capacitor, lowering its
voltage. The voltage on the capacitor controls the oscillation frequency of the VCO.As the voltage on the
capacitor increases, the VCO adjusts its frequency to become more aligned with the incoming data. Similarly,
a decrease in voltage nudges the VCO frequency in the opposite direction. The charge pump acts like a
proportional control mechanism, driving the VCO to adjust its frequency and achieve phase alignment
between the recovered clock and the embedded clock within the data stream.

79 | P a g e
Analog Clock and Data Recovery (CDR) circuits, while serving as the foundation for earlier communication
systems, come with some drawbacks compared to their digital counterparts. Here's a breakdown of the key
disadvantages:

• Susceptibility to Noise
Analog circuits are more prone to picking up electrical noise and interference from the environment
or other parts of the system. This noise can introduce jitter (variations) in the recovered clock
signal. Jitter can cause errors in data interpretation, leading to corrupted data transmission.

• Limited Scalability
As data rates climb, designing and fine-tuning analog CDRs becomes increasingly challenging.
Higher frequencies require more precise component selection and circuit design to maintain
accurate clock recovery. This complexity can limit the ability of analog CDRs to handle the ever-
increasing data demands of modern communication standards.

• Process Variations
Manufacturing processes for analog components can have slight inconsistencies. These variations
can cause slight differences in the performance of analog CDRs across different chips produced
even in the same batch. This can lead to inconsistencies in data transmission reliability between
devices.

• Temperature sensitivity
The performance of analog CDRs can be affected by temperature fluctuations. This can necessitate
additional calibration or temperature control measures.

• Limited programmability
Analog CDRs typically offer less flexibility in adapting to different data encoding schemes or
changing operating conditions compared to their digital counterparts.

While analog CDRs laid the groundwork for data recovery, the limitations mentioned above have led to the
rise of digital CDRs as data rates continue to soar...

80 | P a g e
Digital Implementation of CDR
Digital Clock and Data Recovery (CDR) circuits have emerged; as the preferred choice for modern
communication systems; due to several advantages over their analog counterparts. Let us have a closer look
at the benefits of digital CDRs:

Enhanced Noise Immunity:


Unlike analog circuits, digital implementations are significantly less susceptible to electrical noise
and interference. This translates to a cleaner recovered clock signal with minimal jitter, ensuring
data integrity and reducing errors during transmission.
Superior Scalability:
Digital CDRs excel at scaling to accommodate ever-increasing data rates.
Adjustments can be made through code or logic changes, offering greater flexibility.
compared to the complex fine-tuning required for analog circuits at higher frequencies.
Simplified Integration:
Digital CDRs integrate seamlessly with digital receivers. This simplifies chip design by
eliminating the need for separate analog components like charge pumps and loop filters.
This leads to a more compact and efficient overall system.
Programmable Control:
A significant advantage of digital CDRs is their programmability. They can be easily
configured to adapt to different data encoding schemes used in various communication
protocols. Additionally, they can adjust their behavior dynamically based on changing
conditions, offering more control over the clock recovery process.
Reduced Cost
Digital implementations can potentially lower manufacturing costs compared to analog
circuits due to simpler design and potentially higher integration levels.
Improved Consistency
Digital CDRs are less prone to manufacturing variations that can affect the performance
of analog circuits. This leads to more consistent performance across different chips.

Overall Impact The shift towards digital CDRs has resulted in several positive outcomes for
communication systems:

Reliable Data Transmission: Digital CDRs contribute to more robust and error-free data transfer at high
speeds, crucial for modern applications like high-definition video streaming and fast internet connectivity.

Increased Flexibility: The programmability of digital CDRs allows for easier adaptation to evolving
communication standards, ensuring compatibility with future advancements.

Although analog CDRs played a historical role, digital CDRs represent the future of clock and data recovery
due to their superior noise immunity, scalability, programmability, and ease of integration and other features
that make it best choice for super-speed and high rates applications, therefore we use digital CDR in our
implementation and that is what we will discuss in details the coming sections.

81 | P a g e
Analogy to analog implementation
To start talking about digital architecture and implementation of our CDR we need to give a look on the
analogy of analog implementation to illustrate the similarities between both the analog and digital approaches.

Simply what we will do we will replace the analog components of analog CDR we introduced before into
digital ones but first let us introduce a linearized model for the analog CDR in figure.

The loop gain for this linearized system

Now we map the CPLF and VCO by making a backward difference substitution The result
is the following:

And in realizing the equation it is simplified into:

And by comparing last two equations we can find that the proportional path in CPLF now is modeled by phase
update gain (phug), and the integral path in CPLF modeled by Frequency update gain (frug). And that 𝐾𝐷𝑃𝐶
models the gain of VCO, 𝐾𝑣𝑐𝑜 and the added term z^-NEL represents the delay through the control path of
DPC and delay through deserialization process.

82 | P a g e
Digital PLL Architecture
In the following figure the architecture for our digital phase locked loop implementation based on the analogy
we introduced before. We can find that it consists of:

• Bang-Bang phase detector.


• Decimation.
• Phase update gain(phug).
• Frequency update gain(frug).
• Digital to Phase Converter (DPC).

each component we will introduce later in more details.

and here is linearized model of proposed architecture

83 | P a g e
And with sample realization

In figure we introduce study case with some test parameters introduced in table.

First, we describe how many bits are used for the phase integrator. One key aspect that we employ in the
implementation to achieve fractional gains is sending only the top bits of an -bit integrator to the next stage.
In doing so, we achieve an effective gain of 2−𝐷 , with the lower D bits being termed dither bits. We need to
supply 9 bits to the DPC and we desire a phug of 2−3 . Without considering the needs of the frequency
register, the size of the phase integrator would simply be bits. However, in Figure. it can be seen that the
phase integrator is 15 bits wide, but that there is an 8X gain (3-bit shift) in the phase error path to the phase
integrator. Thus, the value is as indicated in Table. Next, we discuss why the need for the extra bits arises.
The purpose of the frequency integrator is to compensate for the ppm offset difference between the local
reference clock and the incoming data. The frequency integrator must have enough top bits to hit the target
maximum ppm and have enough resolution (dither bits) so as not to be a significant source of noise. The
maximum ppm value that can be tracked is the fraction of a UI that the maximum frequency register value
can move the output phase per UI times 1 million. To determine this value, we must include the fact that since
the decimation factor is 8, the frequency integrator only gets to move the DPC once every 8 UI and that the
top 9 bits get attenuated by in passing to the DPC. Therefore, the frequency integrator can change the input
to the DPC by 3.98 bits every 8 UI. Therefore, since the DPC has a 9-bit input in the implementation, the
maximum ppm offset that can be tracked is ppm. The dither bits in the frequency integrator are included to
provide the necessary attenuation and frequency resolution. The value is calculated by concatenating the
effects of the dither bits in the frequency and phase registers; which yields as indicated in Table II. The
frequency resolution of the top bits of the frequency integrator that are passed to the phase integrator is
ppm/lsb. In summary, we have truncated the phase to 1/512th of a UI and the frequency to 3.8 ppm/lsb.
Simulations have shown that the quantization noise produced by these truncations is well into the noise floor.

84 | P a g e
Test Device CDR parameters

In the next sections, we will continue providing more study and details for each component in our
implementation…

85 | P a g e
Phase Detectors
Phase detectors, also known as phase comparators, are a fundamental building block in many electronic
systems, especially those dealing with signals that have a specific frequency or phase relationship. Here's a
breakdown of their key functions. A phase detector takes two input
signals, typically periodic waveforms like sinusoids or pulses. It
analyzes the phase difference between these two signals and
generates an output voltage proportional to that difference.

Phase detectors find use in various applications, including:

Phase-Locked Loops (PLLs): PLLs are circuits that synchronize the frequency and phase of an output
signal with a reference signal. Phase detectors are crucial components within PLLs, constantly monitoring
the phase difference and feeding back a correction signal to maintain synchronization.

Clock and Data Recovery (CDR): In high-speed data transmission systems, like USB 3.0, data often carries
its own clock information embedded within the signal. CDR circuits utilize phase detectors to recover this
clock signal from the incoming data stream.

Topology of Phase Detectors

Analog Multipliers: These are the simplest form, multiplying the two input signals to generate an output
voltage proportional to their product. The resulting voltage will be positive when the signals are in phase and
negative when they are out of phase.

Exclusive-OR (XOR) Gates: In digital implementations, XOR gates can be used as phase detectors. The
output of the XOR gate will be high when the two input signals have different phases and low when they are
in phase. This output can be further processed to generate a voltage representing the phase difference.

86 | P a g e
Choosing a Phase Detector:

The selection of a phase detector depends on factors like:

Required Accuracy: The sensitivity of the phase detector determines how small a phase difference it can
detect.

Input Signal Type: Analog or digital phase detectors are suitable depending on the type of input signals
being used.

Operational Speed: For high-speed applications, the phase detector's response time becomes a crucial
factor.

By understanding the function and types of phase detectors, you can see their importance in various
electronic systems that rely on precise synchronization or analysis of signal phases. And now we are going
to talk about some types of PDs and the used PD in our implementation.

Types of Phase Detectors


We can define types of phase detectors in linear and non-linear phase detectors. Each one of both has its
own advantages and disadvantages we will address that in more details. Let us start now by linear phase
detectors and Hoggie phase detector is the best example for that type pf phase detectors.

Hoggie Phase Detector


Hoggie phase detectors are a specific type of linear
phase detector used primarily in older analog Phase-
Locked Loop (PLL) circuits. They offer certain
advantages but have limitations compared to more
modern linear or digital detectors.

Function:

The Hoggie phase detector is a type of phase detector


used in high-speed clock and data recovery circuits. It
measures the timing difference between a periodic clock input and an NRZ (Non-Return to Zero) random
data, carried by differential inputs.

The Hoggie phase detector is a linear phase detector, which means its output is a linear function of the phase
difference between the input signals. This type of phase detector is often used in applications where a robust
response is needed, such as in phase-locked loops (PLLs) and clock and data recovery circuits.

87 | P a g e
It has been a favorite in realizing Clock-Data-Recovery circuits for optical transmission systems due to its
well-known loop characteristic and the low complexity of the phase detector, which allows efficient
implementation.

However, like all phase detectors, the Hoggie phase detector has its own trade-offs and specific design
techniques. It's important to consider these factors when designing and implementing a clock and data
recovery circuit.

88 | P a g e
Advantages:

 Simple design: The Hoggie phase detector is relatively simple to design and implement, making it a
cost-effective choice for many applications.
 Wide dynamic range: It can handle a wide range of input signal amplitudes, making it suitable for a
variety of applications.
 Low noise: The Hoggie phase detector has low noise performance, which is important for applications
where precise phase measurements are required.

Disadvantages:

 Sensitivity to DC offset: The Hoggie phase detector is sensitive to DC offset in the input signals,
which can degrade its performance. This can be mitigated by using additional circuitry to remove the
DC offset.
 Limited accuracy: The accuracy of the Hoggie phase detector is limited by the quantization noise of
the digital circuit used to implement it. This can be improved by using higher resolution digital circuits,
but this will also increase the cost and complexity of the design.
 Not suitable for high frequencies: The Hoggie phase detector is not suitable for use with high-
frequency signals due to its limited bandwidth.

Overall, the Hoggie phase detector is a versatile and cost-effective choice for many phase detection
applications. However, it is important to be aware of its limitations, such as its sensitivity to DC offset and
limited accuracy, when choosing a phase detector for a specific application.

There are two main reasons why the Hoggie phase detector is not suitable for high frequency signals:

Limited bandwidth: The Hoggie phase detector relies on analog components like filters and comparators.
These components have inherent limitations in their ability to handle high frequencies. Filters designed for
precise phase detection often have narrow bandwidths, meaning they only allow a limited range of
frequencies to pass through. As the signal frequency increases, the filter starts to attenuate the signal,
reducing its amplitude and making it harder to process accurately. Comparators also have limitations in their
switching speed, and at high frequencies, they may not be able to keep up with the rapid changes in the
signal, leading to errors in the phase detection.

Quantization noise: The Hoggie phase detector converts the analog phase difference signal into a digital
signal using a quantizer. This process introduces quantization noise, which is essentially an error added to
the signal due to the finite resolution of the digital representation. At high frequencies, the quantization error
becomes more significant relative to the actual signal level, further degrading the accuracy of the phase
measurement. Additionally, high-frequency signals often have smaller phase differences, and the
quantization noise can easily obscure these subtle changes, making it difficult to measure the phase
accurately.

89 | P a g e
Bang-Bang Phase Detector
A Bang-Bang Phase Detector (BBPD), also sometimes called a digital
phase detector, is a type of phase detector used in Clock and Data
Recovery (CDR) circuits, particularly in analog implementations. Here's a
breakdown of how it works and its key characteristics:

Function:

Unlike linear phase detectors with a proportional output, or XOR gates with a digital output based on a small
phase difference, BBPDs take a more binary approach.

They compare the phase difference between a recovered clock signal (from the incoming data stream) and
a reference clock.

Based on the comparison, the BBPD generates a digital


output with only two possible states: "high" or "low".

Operation:

The BBPD often utilizes two samplers within the CDR


circuit:

An edge sampler: This typically samples the data stream


at the expected center of the data "eye" (the region where the data signal is most distinguishable from noise).

A data sampler: This samples the data stream at a predetermined point within the data eye, often in the

middle.

By comparing the timing of these samples relative to the reference clock, the BBPD determines if the
recovered clock is leading or lagging:

90 | P a g e
Early: If the edge or data sample occurs before the corresponding edge/data transition in the reference clock,
the output is set to "high," indicating the recovered clock is leading.

Late: If the edge or data sample occurs after the corresponding edge/data transition in the reference clock,
the output is set to "low," indicating the recovered clock is lagging.

Applications:

BBPDs were commonly used in earlier CDR circuits, particularly in analog implementations.

They provide a simple way to control the VCO (Voltage Controlled Oscillator) within the CDR loop.

The high/low output of the BBPD can be used to drive an analog circuit like a charge pump, which adjusts
the VCO's frequency based on the detected phase error (early or late).

91 | P a g e
92 | P a g e
Advantages of Bang-Bang:

Bandwidth limitations:

Hoggie: As mentioned earlier, Hoggie detectors rely on filters and comparators with limited bandwidth. These
components struggle to handle high-frequency signals, attenuating them and introducing errors.

Bang-Bang: Unlike Hoggie, Bang-Bang detectors don't rely on analog filters or comparators directly in the
feedback loop. Instead, they use.

a digital decision circuit that simply compares the sign of the phase error to a threshold. This eliminates the
bandwidth limitations of analog components, allowing Bang-Bang detectors to operate effectively at much
higher frequencies.

Quantization noise:

Hoggie: The analog-to-digital conversion in Hoggie detectors introduces quantization noise, especially at
high frequencies where the phase difference signal is smaller. This noise degrades the accuracy of the phase
measurement.

Bang-Bang: Bang-Bang detectors operate entirely in the digital domain, eliminating the need for analog-to-
digital conversion and its associated quantization noise. This allows for more accurate phase detection, even
at high frequencies.

Simplicity: Bang-Bang detectors have a simpler design compared to Hoggie detectors, potentially making
them cheaper and easier to implement.

Fast response: The digital nature of Bang-Bang detectors enables faster response times compared to
Hoggie, making them suitable for dynamic applications.

Disadvantages

Limited accuracy: While they offer noise immunity at high frequencies, Bang-Bang detectors inherently have
lower accuracy compared to Hoggie detectors, especially for smaller phase errors.

Stability issues: At certain operating conditions, Bang-Bang detectors can exhibit limit cycling behavior,
which can affect their stability and performance.

93 | P a g e
94 | P a g e
Digital Loop Filter

As we mentioned before that CDR is mainly an analog block consisting of some analog sub-blocks such as
charge pump, VCO , resistance and capacitor and there is a model block in digital domain for each.

In this section we care about resistance and capacitor which make the loop filter as shown in figure 5.

Let’s keep deeply with more detail about RC circuits:

From the ohm’s law and capacitor law we can deduce that any capacitor can be considered as integral path
modeled by frequency update gain (FRUG) and for resistor as proportional path modeled by phase update
gain (PHUG).

Above circuit shows a typical RC integrator as the output voltage across capacitor C will be integral of the
input voltage and there is capacitive reactance can be represented with Xc symbol and has dependency on
the frequency of input voltage as following:

1
𝑋𝑐 =
2𝜋𝑓𝑐

From the previous equation we can imagine some scenarios such as

95 | P a g e
(1) If input voltage has high frequency realized by high speed input data which leads to make the
capacitor as short circuit hence the input passes across the capacitor. (assume no change in C)
(2) If input voltage has low frequency realized by low speed input data, which leads to make the
capacitor as open circuit due to contra relation with capacitive reactance hence the input can’t pass
across the capacitor. (assume no change in C)
(3) Increase the capacitance of the capacitor will affect the Xc to decrease and then it allows the input
to pass.
(4) Use capacitor with low capacitance leads to increase its reactance and hence acts as open circuit,
which make it difficult for data to pass across it.

From previous scenarios, we can deduce the behavior of capacitor which allows some data (with specific
characteristics) to pass and the other can’t which is the filtering for input data.

Now let’s show the RC model:

Let Vin the input voltage and i is the alternating current, since R is very large in comparison of capacitive
reactance Xc, the voltage drop across resistance R may be considered equal to input voltage.

𝑉𝑖𝑛 = 𝑉𝑟

𝑉𝑟 𝑉𝑖𝑛
𝑖= =
𝑅 𝑅

Then the charge on capacitance at any instant

1
𝑞 = ∫ 𝑖 𝑑𝑡 = ∫ 𝑉𝑖𝑛 𝑑𝑡
𝑅

And output voltage

𝑞 1
𝑉𝑜𝑢𝑡 = = ∫ 𝑉𝑖𝑛 𝑑𝑡
𝑐 𝑅𝐶

Or

𝑉𝑜𝑢𝑡 ∝ ∫ 𝑉𝑖𝑛 𝑑𝑡

The output waveform from RC integrator circuit depends on the time constant of the circuit and shape of
the input wave, so if input is square wave then output from RC is triangular wave as we know the
integration of constant is linear and the square wave gives two constant values zero or one and the
following graphs will show the effect of changing the capacitance

96 | P a g e
97 | P a g e
After discussing the RC model for square wave input and how its functionality acts as both filtering and
integration we get an important info which is capacitor is an integral path and from ohms low V = IR which
is proportional path and now we need a modeling for RC integral and proportional path but in digital
domain.

Digital Filter:

From the figure, we can deduce the relation between integration and filtering in digital. Simply, if you have a
register with 10-bits width, and you accumulate this register by incrementing one for each cycle as an add
accumulator while output is only the top 4 bits; which are immune to change; unless the other six bits are
full ones, which does not frequently occur and with that, you achieve not all values passed the only values
with change in top 4 bits which is filtering process with accumulation which represents the integration in
discrete domain.

The block labeled ∑ in above figure is the digital integrator and the Z-domain response of an integrator is
given by:

1
𝐻(𝑧)𝐼𝑁𝑇 =
𝑧−1

Now let’s go deep in our modeling using System Verilog and another modeling with CPP but let’s recap the
flow of CDR loop:

98 | P a g e
From the above figure there are two paths integral path (represent the capacitor path) and the proportional
path for phase integrator (represent the resistor) and we get the up, dn signals from PD.

The Phase Detector provides the state of loop that describes the relation between data and clock
generated from phase mixer. According to their state, the phase detector makes a decision to accelerate or
deaccelerate the generated clock with signals Up (state = late) and Down (state = early). However, to avoid
sharp change in phase, we use frequency integrator to accumulate the decisions of early and late then feed
the phase integrator to model the PPM offset occurred in clock generated. We then multiply it by FRUG and
for phase integrator, it feeds the phase mixer with code representing the which phases will be used to be
interpolated there by using top 3 bit (3 MSB-bits) and use other bits to select the value that clock add it or
subtract it from its phase.

Now let’s discuss the spec for phase and frequency integrators:
 Phase integrator: is unsigned and non-saturating to allow the phase to move more than 1 unit
interval, it consists of N-bit accumulator and we send only N-D bits to pmixer and the other dither
bits and in doing so we achieve 2^-D effective gain.
 Frequency integrator: is signed and saturating since it used to track both +/- parts per million (ppm)
offsets, saturation is required because we don’t want the frequency register to roll over from large
positive values to large negative values.

99 | P a g e
The purpose of frequency integrator is to compensate for the ppm offset difference between the
local reference clock and the incoming data. The frequency integrator must have enough top bits to
hit the target maximum ppm and have enough resolution (dither bits) so as not to be a significant
source of noise.

The maximum ppm value that can be tracked, can be calculated by the maximum value that the
frequency register can hold (depending on frequency integrator width used), and we used only top 9
bits that feed the phase integrator as we mentioned before frequency integrator is signed so the top
9 bits is (sign bit + 8 bits) and other bits used for attenuation which control number bits can
frequency register effect on bits sent to phase mixer ( code )

 Equations that represent the accumulation process: (using SV)

This is the modeling using two CPP functions which are: (using CPP)

extern int accum_freq(int up , int dn) {


long current_accum_val = freq_gain*(up-dn) + accum_freq_val;
if(current_accum_val > freq_max_val){
accum_freq_val = current_accum_val - freq_max_val;
} else {
accum_freq_val = current_accum_val;
}
return accum_freq_val;
}

extern int accum_phase(int up,int dn) {


int current_accum_val = 0;
if(up && !dn){
current_accum_val = phase_gain*(up-dn) + (accum_freq_val>>(width-9)&511) +
accum_phase_val; // top 9 010101011101
} else {
current_accum_val = phase_gain*(up-dn) - (accum_freq_val>>(width-9)&511) +
accum_phase_val;
}
//phase_gain*(up-dn) + accum_freq_val + accum_phase_val;
if(current_accum_val > phase_max_val){
accum_phase_val = current_accum_val - phase_max_val;
} else {
accum_phase_val = current_accum_val;
}
return accum_phase_val;
}

100 | P a g e
The following graphs represent change early late states for input data and how digital filter increase or
decrease the phase to compensate the input data with the generated clock to sample the data correctly.

(1) using CPP Model

(2) Using System Verilog Model:

101 | P a g e
Phase Interpolator:

After talking about the phase detector and loop filter, the next to be talked about is the phase interpolator as
we see in the above figure the basic CDR loop architecture we can observe the overall flow of Din inside
the loop.

The main task of a CDR circuit is to precisely recover the transmitted data sequence from the highly
distorted receiver input signal. By detecting signal transitions, extracts the clock timing information from the
received signal and generates a clock aligned at the center of the bit period.

The actual architecture of the CDR circuit is mainly determined by the transceiver clocking strategy, which
may be a forward or embedded clock, and, in our case, we used embedded clock and to generate the clock
with right phase and right frequency is the responsibility of the phase mixer.

Now we need to recap a little information about interpolation in general, which we used in our case:

Let’s assume we have two sinusoidal waves, and we will apply interpolation theory on it as following

102 | P a g e
𝑋𝐼 = 𝐴 sin(𝜔𝑡)
𝜋
𝑋𝑄 = 𝐴 sin(⍵𝑡 − ) = −𝐴 cos(⍵𝑡)
2
Xi, XQ are two sines with different phases as one of them reach the max point first rather than the other one
which leads to concept of early and late states that we get from the phase detector.

And after interpolation between the two sinusoidal waves, it ‘ll generate another sin wave with another phase

𝑌 = 𝐴 sin(⍵𝑡 − ∅)

𝑌 = 𝐴(sin(⍵𝑡) cos(∅) − sin(∅) cos(⍵𝑡))

𝑌 = cos(∅) 𝑋𝐼 + sin(∅) 𝑋𝑄

𝑌 = a1 𝑋𝐼 + 𝑎2 𝑋𝑄
As a1 ^ 2 + a2 ^ 2 = 1 and to calculate the phi just use the inverse tan between Xi, XQ which represent the
new phase for the generated sin wave result in interpolation.

∅ = tan−1 (𝑋𝑄 /𝑋𝐼 )

103 | P a g e
By calculating the phi, you know the new phase and from previous equations we can also deduce that phi
will affect the amplitude of generated sin wave but we used the summation of two waves as summation of
two vectors and for each vector there are phase and magnitude and the magnitude of Y equal to square root
of square magnitude for each vector Xi , Xq which in this case equal to one.

The question is how many phases can the generated sin wave take? To answer that question we need to
know that at phi = 0 - Y = Xi and at phi = pi/2  Y = Xq and you can take any phases in between these
values as difference between sin and cosine is pi/2 and interpolation between Xi and Xq with phases = 0 ,
pi/2 represent the first quadrant only but as general case there are at minimum four quadrants as following:

First quadrant: Interpolate between phase = 0 and phase = 90,

Second quadrant: Interpolate between phase = 90 and phase = 180,

Third quadrant: Interpolate between phase = 180 and phase = 270,

Fourth quadrant: Interpolate between phase = 270 and phase = 360

104 | P a g e
As you increase your quadrants as you get more high resolution for the result phase, and in our
implementation, we used eight phases to interpolate between them as we used the previous quadrants and
divide each one to two quadrants, so the new quadrants are (0, 45, 90, 135, 180, 225, 270, 315, 360)

How can we select which quadrant shall we use to apply the interpolation? the answer depends on the
component that drives a decision to the interpolator to increase the phase if I’m late or decrease if I’m early
and that role of the Digital Filter so let’s show the abstract block of the pmixer to discuss around its ports
and the role for each one.

There are two inputs clk and Code of width 11 bits and one output which is generated clk with required
phase and frequency depending on the state of the control loop as we discussed before.

105 | P a g e
Input clk port: as we mentioned before we used an embedded clk not a forward clk architecture so we
need a clk generator in the receiver side which is PLL in our case but we need to apply the interpolation on
the generated clk from the PLL to get the required phase, but we need to clarify an important point that we
have in our implementation:

The point is we said that we need to eight phases to apply interpolation as we select two phases
representing the quadrant that we will get the new phase from it but in our schematic block we don’t see the
eight phases, so the question is where is the eight phases?

To answer that we need to imagine that we get the clk (input port) from PLL with period approximately equal
to our data transmission rate and this clk has a PPM offset on it as we discussed in the digital filter section
and we apply some equation on this clk to generate the eight phases as following:

1. We calculate the period for the input clk [from PLL]


2. We have a generated lookup table for sin with the phase
zero
3. We access the lookup table by indices with the required
phase difference between each.
4. The index itself is calculated depends on the period of the
input clock.

106 | P a g e
For more details, let’s discuss the previous code in the figure; as you see we have eight arrays with size of
360 that represent the period and the values stored in these arrays are getting from the lookup table but
here is the point as from values of one sine with specific characteristic you can generate more sin phases
and frequencies, too.

This flexibility comes from dealing with sin wave as points.

The figure shows the eight phases as points.

The index that we access the lookup table with is calculated by using time sampling for sampling these sin
points modulus the time period for the input period clk [getting from PLL] multiplied by the number of
sampled points used which is 360 points sin values in our case, so we can represent it as equation

𝑖𝑛𝑑𝑒𝑥 = (𝑡%𝑇) ∗ 360

107 | P a g e
The time sampling that we used in the equation to calculate the index, so it leads to the following if the time
sampling at the first of the period, then the index will be at the first and so on.

The previous modeling is a MATLAB model but how we modelled the index equation in system Verilog as
there is a limitation for operation (t%T) in Verilog as the t must be an integer otherwise it fires an error and
can’t do it but we used another implementation to perform the modulus which is :

We used $realtime system function which return the current sampling time but in real type so we apply the
formula in the figure just to perform the operation (ts%T) and as we said as sampling time increases as
index get increases so we can cover the whole period which is required.

After applying the t%T we get the current sampling time and divide it by the input clk period to distribute it
across the period and getting the index by multiplying that by number of sin sampling points which is 360
and we used the index as following:

 Index to access lookup table of sin which generate sin with phase 0
 Index + 45 to access lookup table of sin which generates sin with phase 45
 Index + 90 to access lookup table of sin which generates sin with phase 90
 Index + 135 to access lookup table of sin which generates sin with phase 135
 Index + 180 to access lookup table of sin which generates sin with phase 180
 Index + 225 to access lookup table of sin which generates sin with phase 225
 Index + 270 to access lookup table of sin which generates sin with phase 270
 Index + 315 to access lookup table of sin which generates sin with phase 315

and all these are real values but we need them as clock pulses so we used slices with parameterized
threshold level as shown in figure:

108 | P a g e
Now we talked about the input port clock and how to generate the eight phases from only one input phase
clk which getting from the PLL but the question is how to control these phase, how to select the quadrant
that you ‘ll interpolate and the answer is related to the next input port which is the 11-bits Code.

The 11-bits code are getting from the digital loop filter which accumulates the early, late states from the
phase detector and provide a decision with specific code to the phase mixer and the 11-bits code are
divided as following:

 The Top 3 [MSB] bits used to select which phase used for interpolation as you need at least 3 bits to
cover the eight phases and case statement used as the selection mux.
 The other 8-bits we deal with it as value for the phase that we need

We have 8-bits for the phase then our phase range is located between [ 0 – 255 ] and the generated sin
has a phase in this range.

And as we discussed before the equation for the result interpolation is:

𝑌 = cos(∅) 𝑋𝐼 + sin(∅) 𝑋𝑄
And we know the range we move through which is [ 0 – 255] , so we can replace sin(phi) and cos(phi)
with the following equation :

𝑌 = A 𝑋𝐼 + 𝐵 𝑋𝑄

Where B equal the value of phase in range [ 0 – 255] and A equal to 255 – B and we can get the required
clock with the required phase.

109 | P a g e
Now the question is how the code change? Is the change in code is sharp? May it change directly from one
quadrant to another?

All of these questions are dependent on how good the control loop is, but most likely the digital filter helps
in this with attenuation of bit change on the code it does as we used 11-bits Code from the phase integrator
register which has width equal to code width added to other dither bits and the frequency integrator get only
move the code every change in its top 9 bits and this change occurs only when other dither bits which is
(width-9) are full ones so you attenuates the code change by 2 ^ (width - 9) and the same for phase
integrator register so, it somehow provide a protection against the sharply change .

The figure shows the change in code.

PI using Direct Programming Interface (DPI)


Another implementation for the PI was done using DPI where it was implemented in C++ and exported to
systemVerilog to be used.

SystemVerilog DPI is a feature which allows users to interface between SystemVerilog and foreign
programming languages such as C, C++, and systemC.
DPI enables to integrate the systemVerilog design with external components written in other languages, so
allowing a more powerful and flexible design environment. Not to mention that, it provides an easy and
efficient way to connect existing code, which is usually written in C/C++ without the need for Verilog
Programming Language Interface (PLI) or Verilog Procedural Interface (VLI).
DPIs can be used to make a golden model in another language and compare the output from the C function
to the output of the module implemented in SystemVerilog. Also it can be used to implement functions which
could be complex to be done in
SystemVerilog.

110 | P a g e
PI Modelling
 Interpolation in first quadrant

111 | P a g e
 Interpolation in second quadrant

112 | P a g e
 Interpolation in fourth quadrant

113 | P a g e
 Non-Linearity of Phase interpolation

 Phase Interpolation

114 | P a g e
Clock and Data Recovery Integration
Okay now after we talked about Bang Bang phase detector and different types of phase detectors also after
we talked about digital loop filter dynamic also phase Interpolator or in another interpretation phase mixer we
are going to talk about integrating all these components together to form the CDR loop.

Now to recap the previous picture, we will begin with the data being sampled by the three clocks and the
Bang Bang phase detector takes the decision whether the sampled CLK leads the data or lags it then we use
a decimation block use voting The decision which is next sent to the digital filter in order to The error with
phase integrator and frequency integrator to compensate the difference in frequency and phase Then sending
thedecision to the phase interpolator to update the clock and select the most suitable clock. Therefore, we
repeat the loop and take the next decision until we lock in the middle of the data symbol in the eye diagram.

115 | P a g e
Okay now let us see some snippets for the locking of the CDR loop with respect to different parameters of
frug and phug Also see the saturation of frequency integrator and the changing of phase and code During
the locking process.

 100ps

 100.1ps

116 | P a g e
 100.2ps
CDR failed to recover the data with large ppm offset because this exceeds the maximum ppm offset
that can be tracked. Frequency integrator only gets to move the DPC once every 8 UI and that the top
9(8 bits + 1 sign) bits get attenuated by in passing to the DPC. Therefore, the frequency integrator
can change the input to the DPC by 3.98 bits every 8 UI.

With increasing to 10 bits we found it compensates!

117 | P a g e
Channel
The channel represents the path between the transmitter and receiver which have an effect on the data which
was transmitted by the transmitter and going to the receiver. Passing
through the channel can cause attenuation for the passing bits. The
effect of the channel is like the effect of a low pass filter where it
attenuates the signal.

In communication system, there are two types of channels: “short channels” and “long channels”. The terms
"short channel" and "long channel" generally refer to the length of the communication channel and the
corresponding effects on the signal integrity and performance. The length of the channel significantly impacts
how signals are attenuated, distorted, and delayed, which in turn affects the overall quality of data
transmission. One way to visualize and analyze these effects is through an eye diagram, which is a tool used
to assess the quality of a digital signal.

Short Channel Long Channel

Signal attenuation It typically experiences less attenuation It is more susceptible to significant


compared to a long channel. This is attenuation. The longer the channel, the
because the signal has less distance to greater the power loss the signal
travel, thereby encountering fewer experiences as it propagates, requiring
losses. Probably attenuation less than potentially more amplification and
20 dB equalization. Attenuation more than 20dB

Distortion and Suffers less from dispersion and More prone to issues like dispersion (both
Dispersion distortion. Dispersion, which involves chromatic and modal), particularly in fiber
the spreading of signal components at optic channels, and various forms of
different frequencies, is less in shorter distortion such as phase noise and inter-
distances. symbol interference (ISI). These effects are
exacerbated by the increased channel
length

Noise and Generally encounters less external noise More likely to accumulate noise and suffer
Interference and interference due to the shorter from interference over its length, which
exposure time and limited interaction can degrade the signal quality further.
with the environment.

118 | P a g e
Eye diagram The eye diagram tends to be more The eye diagram may appear more closed
open, indicating less signal distortion. due to increased inter-symbol
interference (ISI) and jitter.

Bits The transitions between bits are more The signal edges become less sharp, and
distinct, making it easier for the receiver the distinction between bits can be
to distinguish between a ‘1’ and a ‘0’. blurred, leading to higher bit error rates.

This attenuation effects can be expressed by the following equation


1
𝑉𝑜𝑢𝑡 (𝑠) 𝑠𝐶
𝑉𝑖𝑛 (𝑠)
= 𝐻(𝑠) = 1
𝑅+
𝑠𝐶

1 1
𝐻(𝑠) = 1+𝑠𝑅𝐶 = 𝑠
1+
𝜔𝑐

1 1
Attenuation: 𝐴 = |𝐻(𝑗𝜔)| = | 𝜔 |= 2
1+𝑗 √1+( 𝜔 )
𝜔𝑐
𝜔𝑐

1

1 𝜔 2 2
Attenuation in dB = 𝐴𝑑𝑏 = 20 log10 𝐴 = 20 log10 2
= 20 log10 (1 + (𝜔 ) )
√1+( 𝜔 ) 𝑐
𝜔𝑐

1

𝜔 2 2
𝐴𝑑𝑏
(1 + (𝜔 ) ) = 10 20
𝑐

119 | P a g e
𝜔 2
𝐴𝑑𝑏 𝐴𝑑𝑏
(𝜔 ) = 10 10 − 1 → 𝜔 = √10 10 − 1
𝜔
𝑐 𝑐

To model the attenuation in digital we go to the z domain:

1 1
𝐻(𝑠) = 1+𝑠𝑅𝐶 = 𝑠
1+
𝜔𝑐

using zero-order-hold:

1−𝑒 −𝑆𝑇𝑠 1 1
𝐻(𝑠) = 𝑠
( 𝑠 ) → 𝐻(𝑧) = (1 − 𝑧 −1 ). 𝑧𝑡𝑟𝑎𝑛𝑠 ( 𝑠 )
1+ 𝑠(1+ )
𝜔𝑐 𝜔𝑐

𝑧 −1 (1−𝑒 −𝜔𝑐𝑇𝑠 )
𝐻(𝑧) = (1 − 𝑧 −1 ). 𝑧𝑡𝑟𝑎𝑛𝑠 ((1−𝑧−1 )(1−𝑧−1 𝑒 −𝜔𝑐𝑇𝑠 ) )

𝑌(𝑧) 𝑧 −1 (1−𝑒 −𝜔𝑐𝑇𝑠 )


𝐻(𝑧) = 𝑈(𝑧) = (1−𝑧−1 𝑒 −𝜔𝑐𝑇𝑠 )

−𝜔𝑇𝑠 −𝜔𝑇𝑠
Let 𝛼 = 𝑒𝑐 , 𝛽 = 1 − 𝑒𝑐 =1−𝛼

𝑌(𝑧) 𝛽𝑧 −1
𝑈(𝑧)
= (1−𝛼𝑧−1 )

𝑦(𝑘) − 𝛼. 𝑦(𝑘 − 1) = 𝛽. 𝑢(𝑘 − 1) → 𝑦(𝑘) = 𝛽. 𝑢(𝑘 − 1) + 𝛼. 𝑦(𝑘 − 1)

In Z domain 𝑧 −1 means the old value(previous value) so it can easily be implemented in digital by using flip
flop and taking the old value

0 dB , F= 2.5e9

120 | P a g e
10 dB F=2.5e9

Attenuation 10 dB:

Our system implementation:

In our system we implemented the channel with attenuation 10 dB by substituting in the equations with 10dB
and making the sample clk 10 times the clk of the system we got the alpha and beta to make the channel

121 | P a g e
To see if the channel complies with the system requirement an eye diagram can be drawn and accordingly
checking eye height or eye width requirements to determine problems in the system such as jitter.

An eye diagram is created by superimposing segments of a digital signal to form a continuous image that
resembles the open eye of a needle. The clarity and openness of the "eye" provide a visual indication of the
system's ability to distinguish between ones and zeros, which translates to the bit error rate of the
communication link.

122 | P a g e
5 dB attenuation eye diagram:

10dB attenuation eye diagram

0dB attenuation eye diagram

123 | P a g e
Channel DPI

Another implementation was made to the channel, using Direct programming interface (DPI).
The Functions are implemented in C and then are specified as being imported from DPI-C to be used in
SystemVerilog.
Example (in SystemVerilog file):
import "DPI-C" function real calculate_alpha( input real ATTENUATION, input int N);
For our system using DPI was beneficial as it helped in making some mathematical calculations which is
easier in C for example power of a decimal number. As a result it allowed for easier implementation to achieve
any required attenuation.
7 dB:

5 dB:

3 dB:

124 | P a g e
FULL System Verification

The previous diagram shows the full UVM environment we used in verifying our system as we see nine
environments are used for verifying the whole system and each individual component using the concept of
Active and passive agents.

- Active Agents typically consists of three main components:

Driver: Generates stimuli (data, signals) to be sent to the DUT.

Sequencer: Controls the sequence in which the driver sends stimuli.

Monitor: Continuously observes the DUT's behavior and compares it to expected results.

Functionality: An active agent actively interacts with the DUT by Sending test data or control signals
generated by the sequencer through the driver and Monitoring the DUT's outputs and comparing them to
expected values using the monitor.

Facilitating comprehensive verification by exercising various functionalities and corner cases of the DUT.

125 | P a g e
Passive Agents Only has a monitor component.

Functionality: A passive agent focuses solely on observing the DUT's behavior. It receives stimuli from an
external source (e.g., another agent) and monitors the DUT's outputs against expected behavior. Then send
the observed behavior to scoreboard for comparing the actual with the expected results and generate reports
and coverage for collecting coverage information.

Here I used agent in my_env class to be the only active agent in our system to drive the stimulus and all
other environments are passive for every individual component. Ok now you are asking how I controlled the
activity status of the agent I implemented a configuration class for every component as the following

As we see every config class has two arguments one instance for the BFM mapping the interface for our
system and another argument which is uvm based enum type (uvm_active_passive_enum) to control its
activity through it.

Let us now give a look on the my_test and see these configurations and hence how they affect the definition
of driver and sequencer from being reserved and built in each environment component:

126 | P a g e
My_test class:

my_agent class:

127 | P a g e
Now after talking about the structure of the environment and gave a look on active and passive agents used
in environment let us address each component environment separately and see the tested result.

Tx_Gasket_Env_test
This environment is a passive agent one in which the test ensures that
the input data 32-bit is divided into 8-bit packets with K flag
corresponding for every packet determining if it is a command (holds 1)
or data (holds 0).

Tx_PMA_Env_test
This environment is a passive agent one in which the test ensures that the encoded pattern 10-bit is serialized
correctly one bit at a time to the receiver after driving the stimulus from the top passing through the gasket
and encoding block the test ensures collecting the data serialized comparing it with
the input from the encoding stage

128 | P a g e
Rx_S2P_Env_test
Another passive agent env monitors the out of serial to parallel stage in the receiver side and ensure the data
collected is the same compared with the data serialized from the transmitter side.

The wrong test results are expected due to the locking stage for the CDR
previously mentioned and tested in this document.

Elastic_Buffer_ Env_test

129 | P a g e
Rx_gasket_ Env_test
Ensures data decoded are collected right and the original data 32-bit collected
right.

Ok now what about encoder and decoder environments we used a UVM RAL(Register Abstraction Layer)
for testing these two components because of the great features that RAL affords for dealing with registers
and since the two components consist mainly of LUT and registers so that the RAL would be the best option
practice for them and now we will talk about RAL and integration of it with our UVM environment.

130 | P a g e
UVM Register Abstraction Layer (RAL)
The UVM Register Layer provides standard base class libraries that enable users to implement the object-
oriented model to access the DUT registers and memories. UVM Register Layer is also referred to as UVM
Register Abstraction Layer (UVM RAL). RAL provides a set of base classes and methods with a set of rules
which easies the effort required for register access.

Advantages of UVM RAL

The advantages of UVM RAL Model are,

 Provides high-level abstraction for reading and writing DUT registers. i.e, registers can be accessed
with its names
 UVM provides a register test sequence library containing predefined test cases these can be used to
verify the registers and memories
 register layer classes support front-door and back-door access
 Design registers can be accessed independently of the physical bus interface. i.e by calling read/write
methods
 The register model can be accessed from multiple concurrent threads. it internally serializes the access
to the register.
 Reusability, RAL packages can be directly reused in other environments
 Uniformity, Defines the set of rules or methodology on register access, which can be followed across
the industry
 Automated RAL model generations, Tools or open-source scripts are available for RAL Model
generation

block diagram shows using RAL in the verification testbench.

131 | P a g e
UVM RAL provides a hierarchical structure that mirrors the design hierarchy of your DUT (Design Under
Test). This makes it easier to map register models to the actual physical registers within the design. The
hierarchy typically consists of three key levels:

 uvm_reg_block: Represents a block of registers within the DUT.


 uvm_reg: Represents an individual register within a register block.
 uvm_reg_field: Represents individual fields (bit or group of bits) within a register.

Base Classes:

 UVM RAL offers base classes for each hierarchical level:

 uvm_reg_block: Defines common functionalities for accessing and managing register blocks.

 uvm_reg: Defines methods for reading, writing, and sampling coverage of individual registers.

 uvm_reg_field: Provides access to specific fields within a register, allowing you to manipulate
individual bits or groups of bits.

132 | P a g e
UVM RAL library classes have built in methods implemented in it, these methods can be used for accessing
the registers. These methods are referred to as Register Access Methods.

The register model has methods to read, write, update and mirror DUT registers and register field values,
these methods are called API (application programming interface).

API Methods can be used with Front door Access and Backdoor Access,

Front door Access -> This is the conventional method where you need to follow the proper protocol. It's like
using the main entrance of a building. You have to go through the necessary steps, such as providing valid
signals (like address, reset, and write signals) to perform read or write transactions to the register. Think of it
as following the standard procedures and protocols.

Back door Access -> This is a more direct method, similar to finding a hidden or alternate entrance to a
building. With backdoor access, you don't need to follow all the usual protocols. Instead, you just specify the
path to the register directly, and you can access it without providing all the usual signals. In the example you
mentioned, even though you haven't specified all the signals like address or reset, you can still access the
register directly.

API Methods

 read(): Reads the value of the specified field (bit or group of bits) within a register and stores it in the
provided data variable.

 write(): Writes the provided value to the specified field within a register.

 set(): Similar to write, but also updates the desired field and the mirrored copy of the register value within
the RAL model (used for coverage and consistency checks).

 get(): Returns the current value of the specified field within a register.

 peek(): Reads the value of the specified field without updating the mirrored copy within the RAL model
(useful for observing values without affecting coverage).

 poke(): Writes the provided value to the specified field without updating the mirrored copy (similar to peek
for writes).

 update(): Updates the mirrored copy of the entire register value within the RAL model (often used after
reading from the DUT to ensure consistency).

133 | P a g e
Mirrored Value:

Represents the current value of a register stored within the UVM RAL model itself.This value is updated
whenever the register is:

- Read from the DUT (Design Under Test).

- Explicitly set using the set method.

- Updated internally by the RAL model based on reset behavior or other mechanisms.

The mirrored value acts as a reference point within the verification environment, allowing UVM to track
changes and ensure consistency between the DUT's register state and the verification code's understanding
of that state.

Desired Value:

Represents the value you intend to write to a register in the DUT.It is typically set using the write or set
method within your verification sequences.

The write method only updates the desired value, while set updates both the desired value and the mirrored
copy for consistency.

By comparing the desired value with the mirrored value after a write operation, UVM can verify that the write
operation to the DUT was successful.

134 | P a g e
To summarize the API methods and visualize the difference

Frontdoor Access:

1) Write
2) Read

3) Update

4) Mirror

5) Predict

135 | P a g e
6) Randomize

Backdoor Access

7) Peek
8) Poke

Other components related to RAL model are adapter and predictor.

Adapter

The UVM register model (RAL) methods like the write() and read() deal with register transactions (i.e. register
sequence item) and DUT accepts or sends signal level transactions (bus sequence item) from/to the
testbench by an interface.Hence, there is a requirement to convert the register transactions to bus
transactions and vice-versa. This is fulfilled by the “Register Adapter”. The user-defined adapter class is
derived from the uvm_reg_adpater base class. Register model work with register transaction and DUT can
understand only bus transaction. We require mechanism to convert reg transaction to bus transaction. The
component is called Adapter (converts register transaction to bus transaction). we can’t directly apply register
transaction to DUT

136 | P a g e
Predictor

The UVM RAL predictor is a component that updates mirror values based on transactions on a physical
interface for which UVM provides the ‘uvm_reg_predictor’ base class. The DUT registers can be updated
either by RAL methods (like a read and write) or by running individual sequences with valid addresses and
data on the target agent so that the driver communicates with DUT directly.

In front door access, UVM RAL provides three models for the predictor as

 Implicit (Auto) Prediction: In auto prediction, front door access methods automatically call a predict()
method for any transaction happening over a bus i.e. reading data from a register or writing data to
the register at the end of the clock cycle.

 Explicit Prediction: It is the default mode of prediction that involves an explicit external predictor
component that snoop for bus transactions and calls the predict() method to update its mirrored
register value. Since it directly interacts with bus transactions, it requires a register adapter to convert
bus transactions into a register transaction.

 Passive Prediction: The passive prediction is similar to an explicit prediction that uses a predictor
component except the front door register access methods can’t be used.

What is used in our environment is an example for explicit prediction

137 | P a g e
Encoder_Decoder_Env_test
Now after addressing the RAL concept how this featured can be used in our environment specially for testing
our encoder and decoder environment.

As we see here we used the feature of backdoor


accessing in order to access registers of encoder
and decoder directly using the simulator database
without any association with the headache of the
bus and protocol sticking.

The APIs used for backdoor accessing these


register are poke() for writing and peek() for read
from the specified register.

Here the backdoor access is defined for the register

138 | P a g e
Encoder Test

After the end of the testing sequences as we see the backdoor sequence started independently from whole
the environment.

139 | P a g e
Decoder_Test

140 | P a g e
CDR System Verification
No Channel – No offset

We start by testing the CDR without the channel effect and without adding any PPM offset

141 | P a g e
No Channel – 300PPm offset
Adding PPM offset and testing the ability of the CDR without the channel

142 | P a g e
143 | P a g e
Channel Attenuation 10dB
Adding the channel effect with attenuation of 10dB

144 | P a g e
PI CLK after Locking

145 | P a g e
200 PPM phase difference with channel
Adding PPM offset of 200 PPM with the presence of the channel

146 | P a g e
Not Locked [Not Correct data sampled]

Locked [Correct data sampled]

147 | P a g e
500 PPM phase difference with channel
frequency Saturation

Buffer

148 | P a g e
Phase Interpolator resulting sin from interpolation

CDR Locking

149 | P a g e
UP-DN Satuation

800 PPM phase difference with channel

150 | P a g e
After CDR Locking

Early-Late Saturation

151 | P a g e
1000 PPM phase difference with channel

Before locking [Wrong Data Collection]

After Locking [Right Data Collection]

152 | P a g e
Channel Attenuation

153 | P a g e
Early-Late Saturation

154 | P a g e
155 | P a g e
Spread Spectrum Clocking (SSC):
Spread spectrum clocking is a technique used in digital electronics to intentionally modulate the spread
around the ideal frequency of the clock in a controlled way. Normally, having an ideal clock results in having
a peek at this frequency. However, with SSC the modulation spreads the energy of the signal over a wider
frequency range, which helps to reduce electromagnetic interference (EMI).

EMI is a major concern in electronic devices, as it can interfere with the operation of other devices and can
also be a health hazard. By spreading the energy of the clock signal over a wider frequency range, SSC
reduces the peak amplitude of the signal at any one frequency. This makes it less likely to interfere with other
devices or to cause health problems.

SSC is often used in conjunction with other techniques to reduce EMI, such as shielding and grounding. It is
a particularly effective technique for reducing EMI from high-speed clock signals, which are a major source
of EMI in many electronic devices.

Here are some of the benefits of using SSC:

• Reduces EMI
• Can help to meet regulatory requirements
• Relatively simple to implement

However, there are also some drawbacks to using SSC:

• Increases jitter
• Can reduce the signal-to-noise ratio
• May not be compatible with all devices

Overall, SSC is a valuable technique for reducing EMI in electronic devices. It is important to weigh the
benefits and drawbacks of SSC before using it in a particular application.

156 | P a g e
SSC can be performed by several ways:

 SSC Applied to the Reference Clock


In this method, spread spectrum modulation is directly applied to the reference clock (ref clk). The
phase-locked loop (PLL) follows the modulated reference clock, resulting in both the PLL and the
transmitted data exhibiting the spread spectrum characteristics.
It is easy to implement as it only requires modulating the reference clock. Also all components are
using clock derived from reference clock therefore ensuring consistent behavior. On the other hand, it
has less flexibility in controlling the spread spectrum characteristics for individual components.

 SSC Applied Within the PLL


In this approach, the reference clock remains constant, and the PLL itself is responsible for generating
the spread spectrum signal. The PLL modulates the clock signal, which is then used by the physical
layer (PHY).
It provides more control over the spread spectrum parameters and can be optimized for the specific
needs of the PLL and PHY. Also, other parts of the system using the reference clock are unaffected by
SSC, reducing potential performance issues.

 SSC Applied at the Transmitter


Description: In this method, neither the reference clock nor the PLL is modulated. Instead, the
transmitter directly applies spread spectrum modulation to the transmitted data. The clock and data
recovery (CDR) circuitry at the receiver is responsible for adjusting to the spread data.
It allows spreading only on the transmitter side therefore it requires sophisticated CDR circuitry to
handle the spread spectrum modulation and accurately recover the clock and data.
SSC can be implemented using different profiles such as:

 Triangular SSC with down spread which is simple to implement and have smooth transitions to allow
the CDR to easily adjust:

157 | P a g e
 Asymmetric trig SSC it is difficult to make the CDR lock due to the sudden change in the clock
frequency

 Hershey kiss which is complicated in implementation

158 | P a g e
We implemented the SSC with triangular profile with down spreading

Showing the SSC triangle which corresponds to the specifications

16.6𝜇

33.3𝜇

159 | P a g e
PPM = UI +5000 at the top of the triangle

PPM = 0 at the bottom of the triangle

Main code for calculating the SSC value by using DPI with CPP

160 | P a g e
Using the SSC clock

161 | P a g e
Separate Reference Clock With No Spreading (SRNS) - CDR Test
In our design we tested different clocking techniques, to start with we tested having Separate reference
clock with no spreading (SRNS) where the transmitter and the receiver each have their own reference clock
but no SSC is involved.
5000PPM
REF and BIT CLK TX

frequency integrator saturated

162 | P a g e
Data is correct by comparing the input stimulus(left side file) with the output result which is monitored from
the output of the system

163 | P a g e
macros to apply different testing techniques

running the SRNS_TEST from the TCL file(do file)

164 | P a g e
Separate Reference Clock With Independent Spreading (SRIS) – CDR Test
Period after jitter as expected in jittered period vs PI_clk

Jitter value during simulation

165 | P a g e
166 | P a g e
Before lock

After lock

Locking at middle

Running the TCL file (do file) with independent spreading

167 | P a g e
168 | P a g e
Separate Reference Clock With Independent Spreading (SRIS) and PPM offset –
CDR Test

SRIS with more headache PPM Offset


offset between clks and also a spreading jittered clk

data wrong before lock

After locking

169 | P a g e
170 | P a g e
Regression Testing
It is difficult or practically impossible to thoroughly test every part of the design in a big system, hence several
techniques are employed to allow for high coverage and addressing crucial areas that must be addressed
before the design is complete. Regression testing is a technique that is used; to hit corner cases and provide
high coverage by repeating the test cases with different seeds multiple times.

Automated Regression Testing:

 Automated tests are run multiple times with varying parameters (like different seeds) to simulate a
wide range of scenarios.
 By automating this process, you can quickly evaluate the system's behavior under different conditions
and identify potential issues.

Creating Test Reports:

 After running tests, generating comprehensive reports help you understand test outcomes.
 These reports often include information like code coverage, logs, waveforms, or other visualizations
that provide insight into test execution and system behavior.

Reviewing Test Results:

 The detailed reports allow engineers and testers to determine if additional testing is necessary or if
specific issues need addressing.
 Reviewing coverage can reveal which parts of the system haven't been thoroughly tested, indicating
where to focus additional testing efforts.
 Logs and other data can help diagnose and fix bugs or performance issues.

In our System, we made TCL scripts to run automated regression tests in batch mode so without the need
for opening the EDA tool the script runs in the command prompt (CMD), running the tests with random seeds
saving the log files, coverage reports, and the wave files. After the run ends, the coverage reports are merged
to get the total coverage of all the runs combined. Also the logs and wave files can be inspected independently
to figure the reason of any bug or issue that may appear in a certain run. The reports are saved in folders for
easier access.

171 | P a g e
After regression coverage finish the waves the wave file can be opened to check on the waves made during
the regression test by typing the below command in the transcript to get the signals and opening the do file
to get it in the same required format:

vsim -view wave_files/simulation_3_5_16_3_16.wlf


do wave.do

In the above code “wave_files/simulation_3_5_16_3_16.wlf” represents the folder wave_files and the wlf to
be opened.

The command do wave.do is the command required to show the signals in the format which is saved in the
wave.do tcl file.

The coverage for each run is saved in a ucdb file with the number representing the run count and some
parameters. It can be merged into a single ucdb file using the command:

vcover merge merged_file_name.ucdb file1.ucdb fil2.ucdb file3.ucdb

after merging the ucdb files to be reviewed it can be saved to a txt file or to be displayed as an html page
which allow navigating to different modules and see the coverage for each module:

Txt file:

vcover report merged_file_name.ucdb -details -annotate -all -output


merged_functional_Report.txt

html folder:

#vcover report -html -htmldir merged_coverage -verbose -threshL 50 -threshH 90


merged_file_name.ucdb

172 | P a g e
The log files can be reviewed to see the random seed given to each run to allow for re-running with the same
seed in case a bug is found and fixed to retest with the same inputs.

In addition to that, it can be used to view any info, warning, errors or printing which is normally displayed in
the transcript.

The run with different widths shows that after the width changes the CDR starts adjusting itself again until it
is able to read data correctly and the frequency integrator saturates.

173 | P a g e
Above is the run file script that runs the regression with changing parameters of CDR we used it also to find
the most appropriate parameters for the integrators in the loop.

As we see the script generate output files represent the log for every seed, wave file and coverage file
contains coverage for every run and merged report for all coverages combined.

174 | P a g e
Functional Coverage:

175 | P a g e
To hit the skp added which normally will not occur in the system a stress testing was done on the elastic
buffer.

Decreasing the frequency of writing by the TX to hit the case where the elastic buffer adds SKP to help keep
the elastic buffer half full:

SKP_added flag is raised and SKP (0x306 is written in the elastic buffer)

176 | P a g e
177 | P a g e
Underflow:

Overflow:

178 | P a g e
Code Coverage:

179 | P a g e
180 | P a g e
181 | P a g e
182 | P a g e
183 | P a g e
184 | P a g e
185 | P a g e
Assertion coverage:

Label Description Functionality Check


write an assertion for checking that the data Antecedent: when mac enable is asserted then store
comes from mac to be transmitted is received the data from mac in local variable, consequent:
correctly in RX check the data stored in local variable is equal to the
checked data after certain latency property:
PHY_1
@(posedge clk) (enable,data=Data_in)|=>
@(posedge clk) ##[0:$] (Data_out === data);

write an assertion to check when the data Antecedent: current state = idle, data collected which
collected from S2P is COMMA then go to may be 'h0fa or 'h305, consequent: next state =
comma state comma property: cs_data1(2'b00,10'h0fa,10'h305) |-
> (ns==2'b01);
COMMA_1

write an assertion to check when the data Antecedent: current state = comma, data collected
collected from S2P is not COMMA then go to which not 'h0fa or 'h305, consequent: next state =
idle state idleproperty: cs_data2(2'b01,10'h0fa,10'h305) |->
(ns==2'b00);
COMMA_2

check when you are collecting commas but Antecedent: current state = comma, data collected
you didn't get the second comma which which not 'h0fa or 'h305, consequent: next state =
occurs at count = 9 in comma state then you idleproperty: cs_data2(2'b01,10'h0fa,10'h305) and
should go to idle state (count == 9) |-> (ns == 2'b00);
COMMA_3

check when you are collecting commas but Antecedent: current state = comma, data collected
you didn't get the third comma which occurs which not 'h0fa or 'h305, consequent: next state =
at count =19 in comma state then you should idleproperty: cs_data2(2'b01,10'h0fa,10'h305) and
go to idle state (count == 19) |-> (ns == 2'b00);
COMMA_4

check when you are collecting commas but Antecedent: current state = comma, data collected
you didn't get the fourth comma which occurs which not 'h0fa or 'h305, consequent: next state =
at count = 29 in comma state then you should idleproperty: cs_data2(2'b01,10'h0fa,10'h305) and
go to idle state (count == 29) |-> (ns == 2'b00);
COMMA_5

check when you finish the comma pulses if Antecedent: current state = data, data collected
data collected at count =39 isn't comma then which not 'h0fa or 'h305, consequent: next state =
go back to idle state idleproperty: cs_data2(2'b10,10'h0fa,10'h305) and
(count == 39) |-> (ns == 2'b00);
COMMA_6

186 | P a g e
check when you are collecting commas and Antecedent: current state = comma, data collected
you got the second comma which occurs at which are 'h0fa or 'h305, consequent: next state
count = 9 in comma state then you should go =comma property: cs_data1(2'b01,10'h0fa,10'h305)
to comma state and (count == 9) |-> (ns == 2'b01);
COMMA_7

check when you are collecting commas and Antecedent: current state = comma, data collected
you got the third comma which occurs at which are 'h0fa or 'h305, consequent: next state
count = 19 in comma state then you should =comma property: cs_data1(2'b01,10'h0fa,10'h305)
go to comma state and (count == 19) |-> (ns == 2'b01);
COMMA_8

check when you are collecting commas and Antecedent: current state = comma, data collected
you got the fourth comma which occurs at which are 'h0fa or 'h305, consequent: next state
count =29 in comma state then you should go =comma property: cs_data1(2'b01,10'h0fa,10'h305)
to data state and (count == 29) |-> (ns == 2'b10);
COMMA_9

check that the data collected for second Antecedent: cr_state = comma and the cr_state one
comma is 'h0fa or 'h305 clock cycle past is idle then data after 9 clock cycles
is comma --> property : (cs == 2'b01 and ($past(cs,1)
== 2'b00)) |-> ##9 (data == 10'h0fa or data ==
COMMA_10
10'h305);

write an assertions to check the commas Antecedent: cr_state = comma and the cr_state one
collected across comma state clock cycle past is idle then data after each 10 clock
cycles is comma --> property : (cs == 2'b01 and
($past(cs,1) == 2'b00)) |->comma_num <sequence
COMMA_11 comma_num
##9 (data == 10'h0fa or data == 10'h305)
##10 (data == 10'h0fa or data == 10'h305)
##10 (data == 10'h0fa or data == 10'h305);
endsequence> ,
write an assertions to check that after Antecedent: cr_state = comma and the cr_state one
transition to data state the comma pulse clock cycle past is idle then data after 9 clock cycles
should rise after 9 clk cycles which is the first is rising of comma --> property :(cs == 2'b10 and
comma pulse check ($past(cs,1) == 2'b01)) |-> ##9 $rose(comma_pulse);
COMMA_12

check the comma pulse it asserted for only Antecedent: $rose(comma pulse), consequent:
one clock cycle $fell(comma pulse) --> property:
$rose(comma_pulse) |=> $fell(comma_pulse);
COMMA_13

COMMA_14

187 | P a g e
Antecedent: (cs == 2'b10 and ($past(cs,1) == 2'b01),
consequent: pulse_4 <sequence>
check rising of complete 4 pulses across data
property check_4_pulses;
state
(cs == 2'b10 and ($past(cs,1) == 2'b01)) |-> pulse_4;
endproperty
using repeat opertors with parameter num of Antecedent: (cs == 2'b10 and ($past(cs,1) == 2'b01))
commas as num of repetition , this assertion , consequent: comma_pulse[->COMMA_NUMBER];
more flexible as it parameterized with the property check_4_repeated_pulses;
number of comma used in the design (cs == 2'b10 and ($past(cs,1) == 2'b01)) |->
comma_pulse[->COMMA_NUMBER];
endproperty
COMMA_15

check when you go from idle to comma count Antecedent: (ns == 2'b01 and ($past(ns,1) == 2'b00))
reset signal is asserted to reset the comma , consequent : $rose(count_rst) ##1 $fell(count_rst);
counter property chek_cnt_rst1;
(ns == 2'b01 and ($past(ns,1) == 2'b00)) |->
COMMA_16
$rose(count_rst) ##1 $fell(count_rst);
endproperty

check when you go from comma to data Antecedent: (ns == 2'b10 and ($past(ns,1) == 2'b01))
count reset signal is asserted to reset the , consequent : $rose(count_rst) ##1 $fell(count_rst);
comma counter property chek_cnt_rst2;
(ns == 2'b10 and ($past(ns,1) == 2'b01)) |->
COMMA_17
$rose(count_rst) ##1 $fell(count_rst);
endproperty

check when you are in data state when Antecedent: $rose(comma_pulse), consequent:
comma pulse is asserted then data valid is $rose(rx_valid);
asserted property chk_rx_vld;
$rose(comma_pulse) |-> $rose(rx_valid);
COMMA_18
endproperty

check when you are in data state when Antecedent: $rose(comma_pulse) , consequent :
comma pulse is asserted then data valid is $fell(rx_valid);
de-asserted after 1-clk cycle property chk_rx_vld_f;
$rose(comma_pulse) |=> $fell(rx_valid);
COMMA_19
endproperty

check when you are in data state then data Antecedent: (cs==2'b10) and (count == 9 or count ==
valid is asserted after the data is ready to 19 or count == 29 or count == 39), consequent:
insert inside buffer $rose(rx_valid) ##1 $fell(rx_valid);
property chk_rx_vld_c;
COMMA_20
(cs==2'b10) and (count == 9 or count == 19 or count
== 29 or count == 39) |-> $rose(rx_valid) ##1
$fell(rx_valid);
endproperty

BUFFER_21

188 | P a g e
property chk_write_empty;
check when buffer is empty or add request is @(posedge readclk) empty or addreq |=> (data_out
asserted then the data read from buffer is == 10'h0f3) and (read_pointer ==
'h0f3 and no change occured to read pointer $past(read_pointer,1));
endproperty
check skp added is asserted if buffer isn't full property chk_skp_add;
and no delete req and the data to be inserted @(posedge writeclk) !full and (data_in == 10'h0f9 or
to buffer is 'h0f9 or 'h306 data_in == 10'h306)and !deletereq |->
$rose(skpAdd);
BUFFER_22
endproperty

check skp removed is asserted if buffer isn't property chk_skp_remove;


full and delete req and the data to be inserted @(posedge writeclk) !full and (data_in == 10'h0f9 or
to buffer is 'h0f9 or 'h306 data_in == 10'h306)and deletereq |->
$rose(skpRemove);
BUFFER_23
endproperty

check increasing of write pointer when buffer property chk_increase_wrptr;


in write mode and not full @(posedge writeclk) !full and (data_in != 10'h0f9 and
data_in != 10'h306) |->##[1:2] (write_pointer ==
$past(write_pointer,1) + 1);
BUFFER_24
endproperty

check that skp added and skp removed aren't property chk_no_add_remov_skp;
asseted when data not = 'h0f9 or 'h306 and @(posedge writeclk) !full and (data_in != 10'h0f9 and
buffer not full data_in != 10'h306) |=> !skpAdd and !skpRemove;
endproperty
BUFFER_25

check the data is written inside memory in property chk_wrt;


correct addressl logic[9:0] data;logic[4:0] ptr;
@(posedge clk) ((!full && writeclk && (data_in !=
10'h0f9) && (data_in !=
BUFFER_26
10'h306)),data=data_in,ptr=write_pointer,)) |=> data
==buffer.elastic_mem_inst.buffer[ptr-1];
endproperty

check the data is read from memory from property chk_rd;


correct addressl @(posedge readclk) (!empty and !addreq) |->##1
$past(data_out,1) ==
buffer.elastic_mem_inst.buffer[$past(read_pointer,1)-
BUFFER_27
1];
endproperty

check increasing of write pointer when buffer property chk_rd_addr;


in read mode and not empty or not add req @(posedge readclk) (!empty and !addreq) |->##1
(read_pointer == $past(read_pointer,1)+1);
BUFFER_28
endproperty

189 | P a g e
check empty condition for the buffer property chk_empty;
@(posedge readclk) (read_pointer==write_pointer) |-
>##1 empty;
endproperty
BUFFER_29

check the data that written in the buffer in property data_chk;


certain address is read correctly from the logic [4:0] ptr;logic [9:0] data;
same address @(posedge clk) ((writeclk&&!full && (data_in !=
10'h0f9) && (data_in !=
BUFFER_30
10'h306)),ptr=write_pointer,data=data_in)
|=> @(negedge readclk) first_match(rd_detect(ptr) )
##0 (data_out === data);
endproperty

190 | P a g e
191 | P a g e
Lint Checking With SPYGLASS:
The SpyGlass platform provides designers with insight about their design, early in the process at RTL, using
many advanced algorithms and analysis techniques. It functions like an interactive guidance system for
design engineers, enabling them to solve various design issues in the early stages of the design, so as to
facilitate successful implementation for complex SoCs.

If you look at the traditional design flow, where there is no mechanism or insufficient mechanism, to signoff
RTL. you will notice multiple iterations in the design flow while fixing design issues at RTL level, which also
involves significant design efforts as well as design schedule is at risk.

On the other hand, we have SpyGlass as RTL signoff tool, where we are making sure that IPs are signoff
before they get integrated at SoC level. And SoC RTL is signoff before moving downstream.

The following figure describes the SpyGlass Flow

192 | P a g e
Design Setup Stage: This is the first stage when you start a new session in SpyGlass. During this stage,
you create the basic design setup by specifying information, such as design files and design options. In
addition, you check for some basic design issues before proceeding to the next stage.

193 | P a g e
There are types of files can spyglass read:

194 | P a g e
To run design read in batch mode you can use this command

spyglass -project <file.prj> -batch -designread

and there is a sample for project file used in our GP note that the project file takes <prj> as the extension:

195 | P a g e
There are options for design read stage you can select it across the GUI

After the design read process, SpyGlass reports various violation messages categorized by severity. The
following table describes each type of severity:

196 | P a g e
SpyGlass Generates a Number of Reports Providing Details of the Analysis and there is a sample from our
GP:

After completing the Design Setup stage successfully, click the Run tab to proceed to the next stage.

A goal is a set of checks that you would like to run on the design The example below shows that we will run
lint_rtl goal.

197 | P a g e
You can run a goal in batch mode by specifying the following command:

spyglass -batch -project <project_file> -goal <goal_name>

You can also run multiple goals by specifying the following command:

spyglass -batch -project <project_file> -goal <goal_name1>, <goal_name2>

There is interesting thing in spyglass :

198 | P a g e
You can setup a goal in the project file by specifying the following commands on the command-line:

current_goal <goal_name> [-top<top_name>]


set_parameter <parameter> <value>
set_goal_option <option> <value>

Following example explains setting up a goal for SpyGlass CDC product:

current_goal cdc/cdc_verify_struct -top myTop


set_parameter use_inferred_clocks yes
set_goal_option addrule W110

Once goal(s) run is complete, ‘Analyze Results’ window will be active, to view the results of that run and
access debug capabilities

This stage enables you to analyze results of a goal run. To view the results, click the Analyze Results tab.

199 | P a g e
There is a shell script for running the PCS_TX Module used in our GP within lint checking applied to it :

Let’s now talk about the rules and goals that we said a lot before

SPYGLASS Built-in Checking:


While analyzing or synthesizing RTL designs, SpyGlass performs checks on the HDL syntax and structure.
These checks are always performed automatically, independently of which SpyGlass rules are requested to
be checked.

If any syntax or structure issues are found, SpyGlass generates the corresponding standard error or warning
messages (known as built-in messages). These built-in messages are different from the rule messages
generated during rule-checking.

There are the following classes of such built-in messages:

200 | P a g e
At the core, SpyGlass is a Collection of Rules that perform a certain task. For example,

– Rule Clock_info03a rule rule reports clock nets not driven by a clock constraint

– Rule Clock_info01 rule reports all the clocks that SpyGlass detected in the design

A typical user task constitutes of running a number of rules. These are grouped together as a Goal. For
example,

– Goal cdc_verify_struct checks for structural integrity of CDC by running the following rules
- Ac_unsync, Ac_sync, Ar_async*, Clock_sync05a/06a, Setup_quasi_static, Ac_glitch03

– User can add or remove rules from a Goal based on their design needs

A set of goals that address the analysis in a particular domain constitute a Methodology. For example,

– CDC Methodology defines how a user would analyze the design for CDC. It consists of the
following goals – cdc_setup_check, cdc_verify_struct, cdc_verify

201 | P a g e
SPYGLASS Debugging Capabilities:
We.will show how we can use spyglass for debugging after running goals and how we can read analyze
results and use spyglass helper to know more about the error in our PCS_TX Module used in our GP.

First figure shows the result of goals have already been run

Then go to analyze result to know what errors that appeared in the above figure

As you can see in violations window there are some lint errors and if you click on it the solve net helper will
provide you in details all information about the rule that has been broken and if you doubled click on the
violation it will take you directly to the piece of code that has the corresponding violation as we can see in the
case statement violation.

202 | P a g e
And we can use an incremental schematic to trace the bug in the schematic view as following

203 | P a g e
And I know that error and it’s accepted for me so I can waive this error as follows:

Shows waive message selected

Click apply

204 | P a g e
Now we can see there are no violations in our design

So what is waiver that we applied to previous violations?

A waiver is a mechanism to hide specific rule violations and allow for exceptions.

– Hide a violation known to be fixed at a later point of the design flow

– Hide false (or minor) violation which will not be fixed

– Ignore violations for DU/Block without complete functionality

– Hide violations that will not be looked at right now (temporary waivers)

– At SoC integration to hide violations with specific blocks

– Hide exceptions to design policies or company practices

– Hide certain violations from 3rd party IP

205 | P a g e
Also, you have access to all reports as follows:

SpyGlass Design Constraints (SGDC):


SG needs to understand clock/reset sources and other design intent to do CDC analysis, If you already know
clocks and resets, specify them in SGDC file.

Specify SGDC files in either of the following ways:

 By using the read_file -type sgdc <SGDC-file-name> command in a project file


 By using the Add Files option under the Add Design Files tab in SpyGlass GUI

There is a sample from sgdc file used for PCS_RX Module implemented in our GP:

206 | P a g e
207 | P a g e
Summary for SpyGlass Tool:

208 | P a g e
Formal Verification
Formal verification is a process to mathematically check the behavior of the system using a formal model, it
checks that the design satisfies some required properties. This technique is extensively to ensure that
systems behave as expected, particularly in safety-critical domains

Advantages of Formal Verification

 Exhaustiveness: Unlike testing, formal verification can provide guarantees for all possible inputs and
states.
 High Assurance: Provides a high level of assurance, which is crucial for safety-critical systems.

Challenges of Formal Verification

 Complexity: The complexity of formal methods can be very high, making them difficult to apply to large
systems.
 Scalability: Scaling formal verification techniques to large, real-world systems can be challenging.

Applied formal verification for digital loop filter:

209 | P a g e
210 | P a g e
211 | P a g e
212 | P a g e
213 | P a g e
214 | P a g e
Applied formal verification for Bang-Bang phase detector (BBPD):

215 | P a g e
216 | P a g e
217 | P a g e
218 | P a g e
Applied formal verification for PMA_RX:

219 | P a g e
Applied formal verification for PMA_TX:

220 | P a g e
Applied formal verification for PCS:

221 | P a g e
References

1. Universal Serial Bus 3.1 Specification


https://drive.google.com/file/d/1ZBBd8OX6Sc8p1ben5k2sk_4LWaJ1RjEU/view

2. PHY Interface for the PCI Express* and USB 3.0 Architectures
https://drive.google.com/file/d/12g_6AZ4udsnUtlb2-_bJfPL_TDmbZ_JZ/view

3. https://people.engr.tamu.edu/spalermo/ecen689/cdr_comparisons_hsieh_cas_2008.pdf

4. https://verificationacademy.com/verification-methodology-reference/uvm/docs_1.2/html/index.html

222 | P a g e
223 | P a g e

You might also like