0% found this document useful (0 votes)
64 views

Tutorial 09 DSP IO Transceivers

The document discusses digital signal processing (DSP) basics for field programmable gate arrays (FPGAs), including common DSP blocks, fixed-point and floating-point number representations, basic operations like addition and multiplication, and issues related to overflow and bit width that must be addressed in FPGA implementations of DSP functions.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views

Tutorial 09 DSP IO Transceivers

The document discusses digital signal processing (DSP) basics for field programmable gate arrays (FPGAs), including common DSP blocks, fixed-point and floating-point number representations, basic operations like addition and multiplication, and issues related to overflow and bit width that must be addressed in FPGA implementations of DSP functions.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 161

Programmable Logic Devices

Tutorial 9
Michal Kubíček
Department of Radio Electronics, FEEC BUT Brno
Vytvořeno za podpory projektu OP VVV Moderní a otevřené studium techniky CZ.02.2.69/0.0/0.0/16_015/0002430.
Tutorial 9

FPGAs in detail
❑ DSP in FPGA
❑ IO cells
❑ Signal integrity
❑ Synchronous data interfaces
❑ High speed transceivers

page 2 [email protected]
DSP in FPGA

page 3 [email protected]
Basic DSP blocks

Typical DSP blocks


❑ Modulators, demodulators
❑ Filters (FIR, CIC), correlators
❑ Complex multipliers, adders, dividers
❑ Transforms (FFT, IFFT, Cosine...)
❑ Digital Pre-Distortion (DPD)
❑ Direct Digital Synthesis (DDS)

page 4 [email protected]
Basic DSP blocks

CIC and FIR filter

page 5 [email protected]
Basic DSP blocks

FFT

page 6 [email protected]
Basic DSP blocks

Hilbert Transformer (FIR)

page 7 [email protected]
Basic DSP blocks

Direct Digital Synthesis (DDS)

page 8 [email protected]
DSP basics

Basic DSP operations


❑ Addition: Fairly simple function, can be implemented using a general
purpose logic (LUTs).
❑ Multiplication: More complex function, large multipliers require significant
amount of FPGA resources (not suitable for pure LUT implementation).
❑ Memories: For matrix operation, block-oriented data/signal processing,
coefficient storage, FIFO memories.
❑ More complex operations (complex arithmetic, division...) are usually
implemented on an algorithmic basis while utilizing the abovementioned
basic building blocks.

page 9 [email protected]
Binary number representation

page 10 [email protected]
DSP basics

Binary number representation


❑ Fixed point
• INTEGER or FRACTIONAL representation is used
• Relatively simple to implement ➔ usually higher performance and lower HW requirements
• Problem with dynamic range (sometimes large bit widths are required)

❑ Floating point (IEEE 754)


• More demanding implementation ➔ often more HW resources and lower performance
• Much larger dynamic range
• Problem with extremely different operands (ex. addition of a very small number to a very large
one)

page 11 [email protected]
DSP basics

INTEGER fixed point: Ones' complement


Ones' complement
Binary value Unsigned interpretation
interpretation
00000000 +0 0
00000001 1 1
⋮ ⋮ ⋮
01111101 125 125
01111110 126 126
01111111 127 127
10000000 −127 128
10000001 −126 129
10000010 −125 130
⋮ ⋮ ⋮
11111101 −2 253
11111110 −1 254
11111111 −0 255

page 12 [email protected]
DSP basics

INTEGER fixed point: Twos' complement


Two's complement
Binary value
interpretation
Unsigned interpretation Two's complement
interpretation:
00000000 0 0
VHDL SIGNED
00000001 1 1
data type
⋮ ⋮ ⋮
01111110 126 126
01111111 127 127 Unsigned
10000000 −128 128 interpretation:
10000001 −127 129 VHDL UNSIGNED
10000010 −126 130 data type
⋮ ⋮ ⋮
11111110 −2 254
11111111 −1 255

page 13 [email protected]
DSP basics

FRACTIONAL (Q) fixed point


1 0 0 1 0 0 1 1 Signed integer format : -109 ( = 147 - 256 )

1 0 0 1 0 0 1 1 Q2.5 format: -3,40625 ( = -109 / 25 )

alternative calculation:
signed integer 1/ 1/ 1/ signed INTEGER part + fractions
2 8 32
(1 bit sign + -4 + 1/2 + 1/16 + 1/32 = -3,40625
1/ 1/
2 bit value) 4 16

page 14 [email protected]
DSP basics

FLOATING POINT: IEEE 754-1985/2008

+ Huge dynamic range


– Rounding problem (addition of large and
small numbers)

page 15 [email protected]
DSP basics

Fixed-point and Floating point in VHDL


❑ VHDL-2008
• Fixed-point support added
• Floating-point support added
• Arithmetic functions for these new types added
• Conversion functions for these new types added

❑ The problem
• Very limited support in CAD tools (both synthesis and simulation tools)
• Xilinx ISE: no support (neither synthesis, nor simulation)
• Xilinx Vivado: only a limited subset of VHDL-2008 is supported (see ug900 and ug901 for
appropriate version of your tool - ex. 2019.2)

page 16 [email protected]
DSP basics

VHDL - alternative solution?


Accellera VHDL-TC + IEEE P1076 Working Group

use ieee.fixed_pkg.all;
....
VHDL-2008 DSP function support using
signal a, b : sfixed (7 downto -6); VHDL-93 language syntax (via a free
signal c: sfixed (8 downto -6); VHDL package).
begin
....
a <= to_sfixed (-3.125, 7, -6); Warning!!! Only limited support in XST
b <= to_sfixed (inp1, b’high, b’low); (Xilinx ISE synthesis)!!!
c <= a + b;

-- The decimal point is assumed to be between the "0" and "-1" index.
-- signal y : ufixed (4 downto -5)" as the data type (unsigned fixed point,
-- 10 bits wide, 5 bits of decimal), then y = 6.5 = "00110.10000", or
-- simply: y <= "01011010000";

page 17 [email protected]
Bit width reduction, rounding

page 18 [email protected]
DSP basics

Add 8b + 8b (unsigned)
254 + 254 = 508 = 1_1111_1100'b

255 + 254 = 509 = 1_1111_1101'b

255 + 255 = 510 = 1_1111_1110'b

Mathematics: The results is one-bit larger than the larger of the two operands.
VHDL: The result has only size of the larger of the two operands.
Solution: Fix the size of at least one operand to prevent overflow (can be described using
a dummy addition of a zero constant with no HW cost).

page 19 [email protected]
DSP basics

Add 8b + 8b (signed) two's complement


127 + 127 = 254 = 0_1111_1110'b

127 - 128 = -1 = 1_1111_1111'b

-128 - 128 = -256 = 1_0000_0000'b

Mathematics: The results is one-bit larger than the larger of the two operands.
VHDL: The result has only size of the larger of the two operands.
Solution: Fix the size of at least one operand to prevent overflow (can be described using
a dummy addition of a zero constant with no HW cost).

page 20 [email protected]
DSP basics

Multiply 8b x 8b (unsigned)
254 * 254 = 64516 = 1111_1100_0000_0100'b

255 * 254 = 64770 = 1111_1101_0000_0010'b

255 * 255 = 65025 = 1111_1110_0000_0001'b

Mathematics: The results size is equal to the sum of lengths of both operands.
VHDL: The results size is equal to the sum of lengths of both operands.
Solution: No action needed.

page 21 [email protected]
DSP basics

Multiply 8b x 8b (signed) two's complement


127 * 127 = 16129 = 0011_1111_0000_0001'b =
= 011_1111_0000_0001'b =
-128 * 127 = -16256 = 1100_0000_1000_0000'b =
= 100_0000_1000_0000'b
-128 * -128 = 16384 = 0100_0000_0000_0000'b

Mathematics: In fact both numbers have 7 bits + 1 sign bit ➔ the size of the result
should be 7+7+1(sign) = 15 bits (no need for two sign bits in the result). The only
exception is the multiplication of two most negative numbers (-FS = negative full scale
values). This only operation results in a necessity of the 16th bit.

page 22 [email protected]
DSP basics

Multiply 8b x 8b (signed) two's complement


127 * 127 = 16129 = 0011_1111_0000_0001'b =
= 011_1111_0000_0001'b =
-128 * 127 = -16256 = 1100_0000_1000_0000'b =
= 100_0000_1000_0000'b
-128 * -128 = 16384 = 0100_0000_0000_0000'b

VHDL: The results size is equal to the sum of lengths of both operands.
Solution: To save some hardware resources it is possible to drop the MSB of the result.
The operation -FS*-FS then results in a numerical error, which can be treated by checking
input values (the -FS at the multiplier input is handled as an overflow exception).

page 23 [email protected]
DSP basics

Multiply 8b x 8b (signed) two's complement


127 * 127 = 16129 = 0011_1111_0000_0001'b =
= 011_1111_0000_0001'b =
-128 * 127 = -16256 = 1100_0000_1000_0000'b =
= 100_0000_1000_0000'b
-128 * -128 = 16384 = 0100_0000_0000_0000'b

Motivation: In some DSP blocks (like FIR filters) there are many stages of multiply
operation. The extra bits (when kept) accumulate through the block structure, which either
results in a requirement for large bit widths throughout the DSP block (much higher HW
requirements) or degrades output dynamic range (only a small portion of the result
represents a useful signal).

page 24 [email protected]
DSP basics

Bit-width reduction
Why?
❑ To reduce hardware requirements (number of utilized FFs, LUTs, DSPs, BRAMs...)
❑ Adjust bit-width to the required output width
Problems:
❑ A small error appears at the output (noise and/or DC offset)
❑ The bit-width analysis and optimization is often a challenging task even for an
experienced engineer
16b

16b
24b
31b 24b
16b
24b

page 25 [email protected]
DSP basics

Example of CIC filter analysis


1 Convert -1 -1 Convert Convert Convert 1
z z 32
Input Output

CIC filter -1
z
-1
z

page 26
DSP basics

Example of FIR filter analysis


FIR filter

page 27
DSP basics

Methods of bit-width reduction


Each methods has different HW requirements (complexity), added noised and DC offset
error. Typically algorithms with a smaller output error require more complex HW.
❑ Truncation – trim the least significant bits; the simplest algorithm but results in the
largest error
❑ Round-half-up – correct rounding 3.5 => 4 -3.5 => -3
❑ Round-half-down – correct rounding 3.5 => 3 -3.5 => -4
❑ Round-half-even – DC offset compensation variant of the previous two algorithms
❑ Round-half-odd

page 28 [email protected]
DSP basics

Methods of bit-width reduction


Other methods of bit-width reduction (rounding)
❑ Round-alternate
❑ Round-random
❑ Round-ceiling
❑ Round-floor
❑ Round-toward-zero
❑ Round-away-from-zero
❑ Round-up
❑ Round-down
https://www.eetimes.com/an-introduction-to-different-rounding-algorithms/

page 29 [email protected]
DSP functions:
maximum operating frequency

page 30 [email protected]
DSP basics

Maximum operating (clock) frequency


Data
COMB
REG LOG REG
clk clk

CLK

Data
TCKO TLOG + TROUTE TSU

page 31 [email protected]
DSP basics

CIC filter

FMAX ...?

page 32 [email protected]
DSP basics

FIR filter

FMAX ...?

page 33 [email protected]
Implementation of elementary arithmetic
functions in FPGA

page 34 [email protected]
DSP in FPGA (implementation)

Spartan-3 (xc3s500e): c <= a + b (19b <= 18b + 18b)

Integer

19 x LUT (<1%)

maximum clock frequency


250 MHz

page 35 [email protected]
DSP in FPGA (implementation)

Spartan-3 (xc3s500e): c <= a + b (33b <= 32b + 32b)

Integer

33 x LUT (<1%)

maximum clock frequency


205 MHz

page 36 [email protected]
DSP in FPGA (implementation)

Spartan-3 (xc3s500e): c <= a * b (36b <= 18b * 18b)

Integer

382 x LUT (~4%)

maximum clock frequency


65 MHz

page 37 [email protected]
DSP in FPGA (implementation)

Spartan-3 (xc3s500e): c <= a * b (36b <= 18b * 18b)

page 38
DSP in FPGA (implementation)

Spartan-3 (xc3s500e): c <= a * b (32b <= 32b * 32b)

Integer

1097 x LUT (~11%)

maximum clock frequency


51 MHz

page 39 [email protected]
DSP in FPGA (implementation)

Spartan-3 (xc3s500e): c <= a * b (32b <= 32b * 32b)

Floaing Point

647 x LUT (~6%)

maximum clock frequency


38 MHz

page 40 [email protected]
DSP in FPGA – dedicated blocks for DSP

Virtex-II Pro MULT18 slice (~400 MHz), up to 444/FPGA


Spartan-3 MULT18 slice (~200 MHz), up to 104/FPGA

page 41 [email protected]
DSP in FPGA – dedicated blocks for DSP

Virtex-II Pro MULT18 slice (~400 MHz), up to 444/FPGA


Spartan-3 MULT18 slice (~200 MHz), up to 104/FPGA

page 42 [email protected]
DSP in FPGA – dedicated blocks for DSP

Spartan-3 (xc3s500e): c <= a * b (36b <= 18b * 18b)

1 x MULT18X18 (1 of 20)
maximum clock frequency
240 MHz

page 43 [email protected]
DSP in FPGA – dedicated blocks for DSP

Spartan-3 (xc3s500e): c <= a * b (32b <= 32b * 32b)

Integer

108 x LUT (~1%)


4 x MULT18X18 (4 of 20)

maximum clock frequency


80 MHz

page 44 [email protected]
DSP in FPGA – dedicated blocks for DSP

Spartan-3 (xc3s500e): c <= a * b (32b <= 32b * 32b)

Floating Point

153 x LUT (~1%)


4 x MULT18X18 (4 of 20)

maximum clock frequency


158 MHz

page 45 [email protected]
DSP in FPGA – dedicated blocks for DSP

Virtex-4 DSP48 slice (500 MHz), up to 512/FPGA


Stapran-3A DSP DSP48A slice (250 MHz), up to 126/FPGA

page 46 [email protected]
DSP in FPGA – dedicated blocks for DSP

Virtex-5 DSP48E slice (550 MHz), up to 1 052/FPGA

page 47 [email protected]
DSP in FPGA – dedicated blocks for DSP

Spartan-6 DSP48A1 slice (250 MHz), up to 180/FPGA

page 48 [email protected]
DSP in FPGA – dedicated blocks for DSP

Virtex-6,7 DSP48E1 slice (up to 890 MHz), up to 12 288/FPGA

page 49 [email protected]
DSP in FPGA – dedicated blocks for DSP

New generation of FPGAs: DSP blocks (hard macros) with native


support of IEEE 754 Single Precision Floating Point

page 50 [email protected]
Floating-Point DSP

Current FPGAs: only soft IP cores


Xilinx LogiCORE IP Floating-Point Operator v5.0 • multiply
• add/subtract
• divide
• square-root
• comparison
• conversion from floating-point to
fixed-point
• conversion from fixed-point to
floating-point
• conversion between floating-
point types

page 51 [email protected]
Floating-Point DSP

Fixed-point versus Floating-point

page 52 [email protected]
Floating-Point DSP

Fixed-point versus Floating-point

page 53 [email protected]
Resource sharing

Resource sharing
Try to share resource-hungry components
(like multipliers) to reduce HW requirements

page 54 [email protected]
Resource sharing

Resource sharing
FIR filter example:
• Symmetric impulse response with 256 samples
• Sampling frequency 16 MS/s
• Clock frequency 128 MHz

• 16 MS/s at 128 MHz clock ➔ 8 clock cycles for each data sample
• Symmetric impulse response ➔ only 128 multiplications needed (255 additions)
• 8 clock cycles for 128 multiplications ➔ at least 16 multipliers are required

page 55 [email protected]
Resource sharing

Resource sharing
FIR filter example:
• Symmetric impulse response with 128 samples
• Sampling frequency 122.88 MS/s
• Clock frequency 250 MHz

• 122.88 MS/s at 250 MHz clock ➔ 2 (integer) clock cycles for each data sample
• Symmetric impulse response ➔ only 64 multiplications needed (127 additions)
• 2 clock cycles for 64 multiplications ➔ at least 32 multipliers are required

page 56 [email protected]
Resource sharing

FIR filter

Exploiting of filter impulse response symmetry ➔ reduction of HW requirements


by utilizing a pre-adder (lower HW cost than a multiplier)

This is not the resource sharing!


page 57 [email protected]
FPGA
versus

Digital Signal Processor

page 58 [email protected]
FPGA versus Digital Signal Processor

Prognosis of an expert (?) from Texas Instruments (2009)


2015: The Death of the FPGA. An important footnote in the history of programmability
is the demise of the FPGA. Small multi-core CPUs consume significantly less power as well
as provide a richer set of mapping options for complex algorithms and communication
patterns than does the distributed fabric of ALUs and LUTs that make up FPGAs.

Alan Gatherer
CTO Communications Infrastructure Group, Texas Instruments

page 59 [email protected]
FPGA versus Digital Signal Processor

❑ RAW DSP performance: FPGAs are superior to digital signal


processors
Texas Instruments TMS320C6678 (8 cores @ 1.4 GHz) 360 GMAC @ 230 USD
Texas Instruments 66AK2L06 (4 cores @ 1.2 GHz) 154 GMAC @ 200 USD
Xilinx Kintex-7 XC7K160T (600 MAC @ 550 MHz) 330 GMAC @ 200 USD
Xilinx Virtex-7 XC7VX1140T (5280 MAC @ 638 MHz + logic) 6 737 GMAC @ 16 288 USD
Xilinx Kintex-7 UltraScale (5520 MAC @ 594 MHz) 3 278 GMAC @ 5 635 USD
Xilinx Virtex-7 UltraScale+ (11904 MAC @ 741 MHz + logic) 21 213 GMAC @ ????????

numbers valid @ 2018

❑ The problem: performance x price x development time

page 60 [email protected]
FPGA versus Digital Signal Processor

FPGAs and Digital Signal Processors are often


found cooperating in a single DSP system.

page 61 [email protected]
FPGA versus Digital Signal Processor

Typical use case

FPGA: parallel processing


precisely controlled (and low) latency
suitable for simple algorithms

Procesor: more complex algorithms (branch control...)


for tasks that are not so time-critical
much easier to (re)program

page 62 [email protected]
DSP to FPGA

There are many non-HDL tools for DSP


implementation into FPGAs

❑ High level synthesis (C, C++,


SystemC, SystemVerilog, Matlab...)

❑ Matlab/Simulink (DSP Builder,


System Generator)

So far no such method is suitable for end


products. For serious development stay
with HDL!
High level synthesis (HLS)
page 63 [email protected]
DSP to FPGA

Matlab/Xilinx System Generator

page 64 [email protected]
An example of simple DSP module implementation
and verification

Matlab + Altera Quartus II + ModelSim

page 65 [email protected]
Tutorial 9

FPGAs in detail
❑ DSP in FPGA
❑ IO cells
❑ Signal integrity
❑ Synchronous data interfaces
❑ High speed transceivers

page 66 [email protected]
Input / Output Cells
(Tiles, Blocks)

page 67 [email protected]
IO cells

FPGA – what is inside?


❑ I/O cells
• Interfacing of the FPGA core logic to the
outside world
• The core uses low voltage levels (to
improve performance and power
consumption)
• The core logic is very sensitive (ESD...)
• I/O cells are ruggedized, equipped with
many advanced features to enable fast
and safe interfacing

page 68 [email protected]
IO cells

Requirements on an IO cell
❑ Support of multiple logic standards (different voltage levels)
❑ Support of differential pairs
❑ Parallel bus synchronization support
❑ Fast data transmission (over 1 Gbps)
❑ Integrated termination resistors
❑ ESD protection, pull-up / pull-down resistors...

page 69 [email protected]
IO cells

IO cell basic structure


❑ Input and output buffers (OUTBUF,
INBUF)
❑ Tri-state output capability ➔ enables
bidirectional communication
❑ Programmable input inverter
❑ Coupling to adjacent IO cells to enable
implementation of differential logic
standards

page 70 [email protected]
IO cells

IO cell structure – development


❑ Input and output buffers (OUTBUF,
INBUF)
❑ Tri-state output capability ➔ enables
bidirectional communication
❑ Programmable input inverter
❑ Coupling to adjacent IO cells to enable
implementation of differential logic standards
❑ Input and output Flip-Flop
❑ Pull-up, Pull-down resistors
❑ Adjustable delay element

page 71 [email protected]
IO cells

IO cell structure –
development
❑ Support for both SDR and DDR
interfaces (dedicated SDR/DDR Flip-Flops in
the IO cell)
❑ Support for many different logic
standards (single-ended and differential),
programmable output driver (output
current, slew rate)
❑ Internal termination (Digitally
Controlled Impedance; DCI)
❑ Internal pull-up / pull-down resistor
❑ Integrated ESD protection

page 72
IO cells

Single-ended logic standards


DCI = Digitally Controlled Impedance (Xilinx designation)
Some standards require termination of the signal wire (transmission line) with a
characteristic impedance. This can be accomplished in the IO cell for both inputs and
outputs.

page 73 [email protected]
Logic standards

page 74 [email protected]
Logic standards

Single-ended logic standards


Suitable for low speed signals (up to several hundreds of Mbps per one wire)

page 75 [email protected]
Logic standards

Single-ended logic standards


Special variants for high speed communication (typical for DDRx):
termination required, speeds over 2 Gbps

page 76 [email protected]
Logic standards

Single-ended logic
standards
Input and output logic levels.
There are many standards supported
by modern FPGAs.

page 77
Logic standards

Differential logic standards


Benefits:
❑ Much better noise immunity
❑ Enables use of smaller voltage swing
❑ Smaller EM emission
❑ Lower power consumption

Where are differential logic standards used today:


❑ General purpose high-speed inter-chip communication (AD/DA converters, backplane interfaces,
LCD panels, interface PHY chips...)
❑ Standard interfaces (Ethernet, PCI-Express, SATA, USB...)

page 78 [email protected]
Logic standards

Differential logic standards


Various voltage levels

page 79 [email protected]
Logic standards

Differential logic standards

LVDS

page 80 [email protected]
Logic standards

Differential logic standards

page 81 [email protected]
FPGA banks

page 82 [email protected]
FPGA: IO Banks

FPGA bank
❑ IO pins of an FPGA are grouped into so
called BANKs
❑ Each BANK has its own power supply
input
❑ According to the power supply voltage
each BANK can support some logic
standards (for example at 3.3V power
supply the bank can use 3.3V LVCMOS or
LVTTL logic but not 2.5V LVCOMS logic).
❑ Very useful for interfacing with chips
with different voltage standard interfaces

page 83 [email protected]
FPGA: IO Banks

FPGA bank
FPGA: IO Banks

FPGA bank
Synchronous data interfaces

page 86 [email protected]
Synchronous data interfaces

Examples of synchronous data interface: SPI, I2C

page 87
Synchronous data interfaces

Examples of synchronous data interface: SPI

page 88 [email protected]
Synchronous data interfaces

Examples of synchronous data interface

page 89 [email protected]
Synchronous data interfaces

Examples of synchronous data interface

TTL / LVTTL logic levels

page 90 [email protected]
Synchronous data interfaces

Examples of synchronous data interface


DDR, DDR2, DDR3, DDR4... logic: SSTL_2, SSTL_18, SSTL_15,
SSTL_135, HSUL_12...

page 91 [email protected]
Synchronous data interfaces

Examples of synchronous data interface


ADC (ADS4229 @ 250 MSPS) => FPGA

Tbit = 2 000 ps

tSUmin = 600 ps LVDS logic


tHmin = 330 ps (differential)

1 mm PCB ~ cca 7 ps
10 cm PCB ~ cca 700 ps

page 92 [email protected]
Synchronous data interfaces

Timing requirements

Tbit

Tbit = 2 000 ps

tSUmin = 600 ps
tHmin = 330 ps

1 mm PCB ~ cca 6,5-7,5 ps


10 cm PCB ~ cca 650-750 ps

page 93 [email protected]
Synchronous data interfaces

Timing requirements
Each data line must have same propagation delay so that data are valid at the same
moment at the receiver ➔ need to match electric length of all the data paths.
Meanders are used to stretch all the PCB traces to match their length to the longest one.
FPGA interface clock input

page 95 [email protected]
Synchronous data interfaces

Clock signal distribution for synchronous interfaces


„SYSTEM synchronous“
All the system components (chips on a PCB) are connected to a single (common)
source of the clock signal.
Problem with a phase shift of the clock signal (depends on the physical length of wires)
and clock signal propagation delay in components ➔ not suitable for fast interfaces.

ADC FPGA

page 96 [email protected]
Synchronous data interfaces

Clock signal distribution for synchronous interfaces


„SOURCE synchronous“
Each data output is complemented with a phase-aligned clock signal.
Suitable for high speed interfaces (DDRx memories, fast AD/DA converters, PHY
interfaces...)

ADC FPGA

page 97 [email protected]
Synchronous data interfaces

Spartan-3: Digital Clock Manager

Due to propagation delay


the SKEW is introduced into
the system.

Result: smaller timing


margin on the synchronous
interfaces.

page 98 [email protected]
Synchronous data interfaces

Spartan-3: Digital Clock Manager

Elimination of the I/O skew

Phase alignment of the clock


signal to minimize skew all over
the system.

page 99 [email protected]
Synchronous data interfaces

Source synchronous: data and clock input

page 100 [email protected]


Synchronous data interfaces

Source synchronous: data and clock input


IO delay blocks: variable/fixed delay on inputs/outputs
Can be used to compensate different electric length of PCB traces
IO Tile

page 101 [email protected]


Further IO cell features

page 102 [email protected]


Synchronous data interfaces

DDR interface
Native support of DDR functionality directly in IO cells (DDR cannot be implemented
without such components!)

Spartan-3
7-series

page 103 [email protected]


Synchronous data interfaces

DDR interface
Native support of DDR functionality directly in IO cells.

page 104 [email protected]


IO cells

Support for precise timing adjustment


Adjustable delay blocks (IDELAY, ODELAY)
Can be used to compensate different electrical length of PCB traces (wires).
Virtex-5 64 steps with fixed resolution of 78 ps.
Virtex-7 UltraScale 31 steps with variable resolution of 2.5-15 ps

Signal propagation delay on a


FR4-based PCB: ~7 ps/mm.

page 105 [email protected]


IO cells

Serial communication support


❑ SERDES – serializer / deserializer for fast data interfaces
use smaller number of wires on PCB ➔ save board area

page 106 [email protected]


IO cells

Usage of special IO blocks


1) Inference from a generic HDL code: easy to use, portable code, but works only for very
limited number of components (ex. IO cell flip-flops)
2) IP Core Wizard: easy to use, but not portable
3) Manual instantiation: the preferred method. Either in HDL code or sometimes a
schematic capture is also possible.
In Xilinx Vivado tool the component instantiation templates can be found in Language
Templates menu ("device primitives" category).
For details on the component usage and instantiation see corresponding documentation (a user
guide); for example ug786: Xilinx 7 Series FPGA and Zynq-7000 All Programmable SoC Libraries
Guide for HDL Designs

page 107 [email protected]


IO cells

Usage of special IO blocks

page 108 [email protected]


IO cells

Usage of special IO blocks

page 109 [email protected]


Signal Integrity (SI)

page 110 [email protected]


Signal Integrity (SI)

page 111 [email protected]


Signal Integrity (SI)

Signal integrity problems


❑ Inter-symbol interferences (ISI)
❑ Inter-signal interferences (crosstalks)
❑ Different delay of parallel data interface traces (skew)
❑ Overshoot / undershoot (ringing)
❑ Ground bounce
❑ Power supply noise

page 112 [email protected]


Signal Integrity (SI)

Inter-symbol interferences (ISI)


❑ Source: limited bandwidth of media or Rx/Tx buffers
❑ Result: incorrect data interpretation

Solution ❑ Compensate channel attenuation (equalization, pre-emphasis, de-


emphasis)
❑ Increase edge density in the data signal
page 113 [email protected]
Signal Integrity (SI)

Inter-signal interferences (crosstalks)


❑ Source: parasitic inductive or capacitive coupling of adjacent wires
❑ Result: incorrect data interpretation

Solution ❑ Wire separation (physical distance, shielding)


❑ Usage of differential pairs instead of single-ended wires

page 114 [email protected]


Signal Integrity (SI)

Overshoot / undershoot (ringing)


❑ Source: reflections on the transmission line (impedance discontinuities)
❑ Result: excessive positive/negative voltage (can damage electronics), incorrect
data interpretation (multiple edges...)

Solution ❑ Use properly terminated transmission lines


❑ Decrease slew rate (slower edges)

page 115 [email protected]


Signal Integrity (SI)

Rule of thumb:
Treat a wire as a transmission line whenever its propagation
delay is more than 6-times larger than edge time of the
transmitted signal.
To use the transmission line means to use a wire with a
characteristic impedance and to terminate it with this characteristic
impedance on both sides.

page 116 [email protected]


Signal Integrity (SI)

Single-ended transmission lines (PCB)


W T
87 5.98𝐻
er ❑ Microstrip 𝑍0 (Ω) = 𝑙𝑛
H ε𝑟 + 1.41 (0.8𝑊 + 𝑇)
W
T
H1
er ❑ Embedded Microstrip
H

T 60 1.9(2𝐻 + 𝑇)
❑ Stripline 𝑍0 (Ω) = 𝑙𝑛
H er ε𝑟 (0.8𝑊 + 𝑇)
H

H
H1 er
❑ Asymetric Stripline

page 117 [email protected]


Signal Integrity (SI)

Differential transmission lines (PCB)


Dual Stripline drawbacks:
T • Thicker dielectric layer
H
W C ❑ Dual Stripline • Larger skinefect loses (8%)
• Larger dielectric loses
er T
• Hard to maintain symmetry (VIAs)
H

H ❑ Differential Stripline

❑ (Differential offset Stripline)


S

❑ Differential Microstrip
H

page 118 [email protected]


Signal Integrity (SI)

Differential logic standards


Benefits:
❑ Much better noise immunity
❑ Enables use of smaller voltage swing
❑ Smaller EM emission
❑ Lower power consumption

Where are differential logic standards used today:


❑ General purpose high-speed inter-chip communication (AD/DA converters, backplane interfaces,
LCD panels, interface PHY chips...)
❑ Standard interfaces (Ethernet, PCI-Express, SATA, USB...)
❑ Up to 56 Gb/s on each differential pair (when using transceivers).

page 119 [email protected]


Signal Integrity (SI)

Permitivity: material and frequency dependent

page 120 [email protected]


Signal Integrity (SI)

Trace impedance
Test patterns

Er ~ 6

Er ~ 3

page 121 [email protected]


Signal Integrity (SI)
Mitigation of glass epoxy structure

Trace impedance

Er ~ 6

Er ~ 3

page 122 [email protected]


Signal Integrity (SI)

Minimize impedance discontinuities


Transmission line with a series pass element (resistor, capacitor)

page 123 [email protected]


Signal Integrity (SI)

Minimize impedance discontinuities


Differential VIA – maintain symmetry of traces with respect to the ground
layer, provide good return current path.
Beware of long VIA stubs!!!

page 124 [email protected]


Signal Integrity (SI)

Beware of discontinuities in the ground plane

page 125
Signal Integrity (SI)

Proper trace termination


Many integrated circuits (not only FPGAs) feature optional internal (on-chip)
termination resistors ➔ can save PCB space.

page 126 [email protected]


Signal Integrity (SI)

Proper trace termination


The split termination is often easier to implement but requires significantly
more power (static current of VCCO/4R for each input)!

page 127 [email protected]


Signal Integrity (SI)

Some of available modes


of FPGA internal IO
termination (Xilinx DCI)

page 128 [email protected]


Signal Integrity (SI)

Signal integrity analysis


❑ Simulation
• SPICE models – transistor-resistor-capacitor models
• IBIS models – same behavior, hidden functionality (preserves company know-how)
• EM field solvers – CST, Ansoft...

❑ Measurement
• S-parameters – vector network analyzer
• Time Domain Reflectometry – dedicated measurement instrument
• Eye diagram – oscilloscopes (both real-time or sampling can be used)

page 129 [email protected]


Signal Integrity (SI)

Simulation - CST

page 130 [email protected]


Signal Integrity (SI)

S-parameters

page 131 [email protected]


Signal Integrity (SI)

Time-domain reflectometry (TDR)

page 132 [email protected]


Signal Integrity (SI)

Eye diagram measurement and analysis

page 133 [email protected]


Optics on a PCB?

What's next?
Today we are able to communicate at
about 50 Gb/s on short distances
(few centimeters) using differential
pairs on FR4-based Cu plated PCBs.
Optical waveguides on a PCB can
significantly increase bandwidth and
enable longer paths.
Problem with coupling of the optical
signal to/from the waveguide.

page 134 [email protected]


Optics on a PCB?

An optics interface experimental package


12 channels (aggregated speed 120 Gbps)

page 135 [email protected]


Optics on a PCB?

An optics interface experimental package


12 channels (aggregated speed 120 Gbps)

page 136 [email protected]


Optics on a PCB?

page 137 [email protected]


High speed transceivers
(Multi-gigabit transceivers)

page 138 [email protected]


High speed transceivers

Serial interface + differential pair: why?


Lower trace (wire) count ➔ cheaper/smaller PCB (cable)
It is very difficult to synchronize high speed parallel buses (Race Conditions)
Lower EM emission, higher EM immunity

page 139 [email protected]


High speed transceivers

Asynchronous transmission: why?


Lower trace (wire) count ➔ cheaper/smaller PCB (cable)
Difficult (or impossible) to synchronize the data and corresponding clock signal
The CDR block (a small silicon area in the receiver) is cheaper than an additional wire

Synchronous

Asynchronous

page 140 [email protected]


High speed transceivers

Serial asynchronous transmission – usage


• Fibre Channel 128GFC (4x28.05 Gb/s)
• 1G Ethernet (1.25 Gb/s), 10G Ethernet
• 100G, 200G, 400G Ethernet
• Hybrid Memory Cube (HMC) (up to 128 x 30 Gb/s)
• OC48 (2.49 Gb/s), OC-768 (39 Gb/s), OC-1920 (99 Gb/s)
• PCI EXPRESS (PCIe) Gen1 (2.5 Gb/s), Gen2 (5 Gb/s) , Gen3 (8 Gb/s)
• SATA-3.2 (16 Gb/s), USB3.1 (10 Gb/s)
• JESD204B
• XAUI, CPRI, 10GFC, Infiniband, Interlaken...

page 141 [email protected]


High speed transceivers

Serial asynchronous transmission – problems


• Signal integrity
o Inter symbol interferences ➔ encoding (8B/10B, 64/66B)
• Symbol timing recovery
o sufficient edge density ➔ encoding
o low jitter ➔ precise clock recovery and timing blocks
• PCB attenuation
o equalization ➔ preemphasis, deemphasis
• Increase bandwidth by using several transmission lines (lanes) for example PCI-Express x16
o channel bonding ➔ encoding + FIFO
• Very high slew rate is required
o use CML (current mode logic) drivers

General purpose IO cells cannot fulfil these requirements


➔ dedicated TRANSCEIVERS are used

page 142 [email protected]


High speed transceivers

SERDES – external chip


In the past there were no transceivers integrated in the FPGAs
➔ external TRANSCEIVERS or SERDES devices were used

page 143 [email protected]


High speed transceivers Today integrated in FPGAs

CH0 Tx

CH0 Rx

Low noise power


supply

CH1 Tx

CH1 Rx
High speed transceivers

page 145 [email protected]


High speed transceivers

Logic standard
Current Mode Logic (CML)
Source-Coupled Logic (SCL)

page 146 [email protected]


High speed transceivers

Symbol timing recovery (Clock and data recovery)


❑ Sufficient edge density and often zero DC bias is required ➔ use scrambling or
encoding (8B/10B or 64B/66B)
❑ The timing recovery requires some time to start ➔ use synchronization preamble or
an idle sequence, a precise reference clock at the receiver is always required
❑ Slow changes of the source clock frequency (clock wander) must be tracked while
fast changes (jitter) must be filtered-out

page 147 [email protected]


High speed transceivers

8B/10B encoding

page 148 [email protected]


High speed transceivers

8B/10B encoding

page 149 [email protected]


High speed transceivers

8B/10B decoder (1G Ethernet)


10011100101111100011100101100101010111
0101111100 ≠ K28.5
10011100101111100011100101100101010111
0101111100 ≠ K28.5
10011100101111100011100101100101010111
0101111100 = K28.5
10011100101111100011100101100101010111

11110000 10010110

page 150 [email protected]


High speed transceivers

PCB attenuation
Skin effect and proximity effect: the higher signal frequency the higher wire
resistance (for 10 GHz signal a copper trace has resistance of about 1 Ω per inch)
Dielectric attenuation: FR4 has large dissipation factor (0.02-0.03). For demanding
application a high quality dielectric can be used (dissipation factor 0.001 or lower)

2
 =
2  f  
High speed transceivers

PCB attenuation (combined contribution)


8 mil = 200 um

1 oz = 35 um
½ oz = 18 um
High speed transceivers

Equalization
❑ Signal attenuation: degradation of signal quality, namely edge slope. Quality of the
signal is measured at the receiver using the eye diagram – wide open eye (both vertically
and horizontally) is required for reliable data transmission.

page 153 [email protected]


High speed transceivers

Eye diagram
❑ Eye opening is directly related to a bit error rate
(BER)
❑ Even a relatively low error rate of 10-12 can be
unacceptable for very high speed data transmissions

page 154 [email protected]


High speed transceivers

Multigigabit transceivers - Ethernet


• from 1G up to 100G Ethernet (200G and 400G in a near future)

page 155 [email protected]


High speed transceivers

Multigigabit transceivers - backplane

page 156 [email protected]


High speed transceivers

Multigigabit transceivers – JESD204B

page 157 [email protected]


High speed transceivers

Multigigabit transceivers – JESD204B

page 158 [email protected]


High speed transceivers

Device Maximum Maximum Backplane Optical Module


Number of Data Rate Support Support
Channels (Gbps)
Stratix 10 GT (14 nm) ?? 56 Yes (28 Gbps) Yes

Stratix V GT (28 nm) 66 28 Yes Yes

Stratix V GX (28 nm) 66 12.5 Yes Yes

Stratix IV GT (40 nm) 48 11.3 Yes Yes

Stratix IV GX (40 nm) 48 8.5 Yes Yes

HardCopy® IV GX (40 nm) 36 6.5 Yes Yes

Arria® II GZ (40 nm) 24 6.375 Yes Yes

Arria II GX (40 nm) 16 6.375 Up to 3.75 Gbps -

Cyclone IV GX (60 nm) 8 3.125 - -

page 159 [email protected]


High speed transceivers

What's next?
Current transceivers are capable of 30 Gbps per differential pair for NRZ endoding, or 56
Gbps for PAM-4 encoding. This is close to physical limits for PCB traces ➔ need to search
for alternative solutions (like optical links).

page 160 [email protected]


Thank You for Your Attention!

100 Gbps development platform

You might also like