Tutorial 09 DSP IO Transceivers
Tutorial 09 DSP IO Transceivers
Tutorial 9
Michal Kubíček
Department of Radio Electronics, FEEC BUT Brno
Vytvořeno za podpory projektu OP VVV Moderní a otevřené studium techniky CZ.02.2.69/0.0/0.0/16_015/0002430.
Tutorial 9
FPGAs in detail
❑ DSP in FPGA
❑ IO cells
❑ Signal integrity
❑ Synchronous data interfaces
❑ High speed transceivers
page 2 [email protected]
DSP in FPGA
page 3 [email protected]
Basic DSP blocks
page 4 [email protected]
Basic DSP blocks
page 5 [email protected]
Basic DSP blocks
FFT
page 6 [email protected]
Basic DSP blocks
page 7 [email protected]
Basic DSP blocks
page 8 [email protected]
DSP basics
page 9 [email protected]
Binary number representation
page 10 [email protected]
DSP basics
page 11 [email protected]
DSP basics
page 12 [email protected]
DSP basics
page 13 [email protected]
DSP basics
alternative calculation:
signed integer 1/ 1/ 1/ signed INTEGER part + fractions
2 8 32
(1 bit sign + -4 + 1/2 + 1/16 + 1/32 = -3,40625
1/ 1/
2 bit value) 4 16
page 14 [email protected]
DSP basics
page 15 [email protected]
DSP basics
❑ The problem
• Very limited support in CAD tools (both synthesis and simulation tools)
• Xilinx ISE: no support (neither synthesis, nor simulation)
• Xilinx Vivado: only a limited subset of VHDL-2008 is supported (see ug900 and ug901 for
appropriate version of your tool - ex. 2019.2)
page 16 [email protected]
DSP basics
use ieee.fixed_pkg.all;
....
VHDL-2008 DSP function support using
signal a, b : sfixed (7 downto -6); VHDL-93 language syntax (via a free
signal c: sfixed (8 downto -6); VHDL package).
begin
....
a <= to_sfixed (-3.125, 7, -6); Warning!!! Only limited support in XST
b <= to_sfixed (inp1, b’high, b’low); (Xilinx ISE synthesis)!!!
c <= a + b;
-- The decimal point is assumed to be between the "0" and "-1" index.
-- signal y : ufixed (4 downto -5)" as the data type (unsigned fixed point,
-- 10 bits wide, 5 bits of decimal), then y = 6.5 = "00110.10000", or
-- simply: y <= "01011010000";
page 17 [email protected]
Bit width reduction, rounding
page 18 [email protected]
DSP basics
Add 8b + 8b (unsigned)
254 + 254 = 508 = 1_1111_1100'b
Mathematics: The results is one-bit larger than the larger of the two operands.
VHDL: The result has only size of the larger of the two operands.
Solution: Fix the size of at least one operand to prevent overflow (can be described using
a dummy addition of a zero constant with no HW cost).
page 19 [email protected]
DSP basics
Mathematics: The results is one-bit larger than the larger of the two operands.
VHDL: The result has only size of the larger of the two operands.
Solution: Fix the size of at least one operand to prevent overflow (can be described using
a dummy addition of a zero constant with no HW cost).
page 20 [email protected]
DSP basics
Multiply 8b x 8b (unsigned)
254 * 254 = 64516 = 1111_1100_0000_0100'b
Mathematics: The results size is equal to the sum of lengths of both operands.
VHDL: The results size is equal to the sum of lengths of both operands.
Solution: No action needed.
page 21 [email protected]
DSP basics
Mathematics: In fact both numbers have 7 bits + 1 sign bit ➔ the size of the result
should be 7+7+1(sign) = 15 bits (no need for two sign bits in the result). The only
exception is the multiplication of two most negative numbers (-FS = negative full scale
values). This only operation results in a necessity of the 16th bit.
page 22 [email protected]
DSP basics
VHDL: The results size is equal to the sum of lengths of both operands.
Solution: To save some hardware resources it is possible to drop the MSB of the result.
The operation -FS*-FS then results in a numerical error, which can be treated by checking
input values (the -FS at the multiplier input is handled as an overflow exception).
page 23 [email protected]
DSP basics
Motivation: In some DSP blocks (like FIR filters) there are many stages of multiply
operation. The extra bits (when kept) accumulate through the block structure, which either
results in a requirement for large bit widths throughout the DSP block (much higher HW
requirements) or degrades output dynamic range (only a small portion of the result
represents a useful signal).
page 24 [email protected]
DSP basics
Bit-width reduction
Why?
❑ To reduce hardware requirements (number of utilized FFs, LUTs, DSPs, BRAMs...)
❑ Adjust bit-width to the required output width
Problems:
❑ A small error appears at the output (noise and/or DC offset)
❑ The bit-width analysis and optimization is often a challenging task even for an
experienced engineer
16b
16b
24b
31b 24b
16b
24b
page 25 [email protected]
DSP basics
CIC filter -1
z
-1
z
page 26
DSP basics
page 27
DSP basics
page 28 [email protected]
DSP basics
page 29 [email protected]
DSP functions:
maximum operating frequency
page 30 [email protected]
DSP basics
CLK
Data
TCKO TLOG + TROUTE TSU
page 31 [email protected]
DSP basics
CIC filter
FMAX ...?
page 32 [email protected]
DSP basics
FIR filter
FMAX ...?
page 33 [email protected]
Implementation of elementary arithmetic
functions in FPGA
page 34 [email protected]
DSP in FPGA (implementation)
Integer
19 x LUT (<1%)
page 35 [email protected]
DSP in FPGA (implementation)
Integer
33 x LUT (<1%)
page 36 [email protected]
DSP in FPGA (implementation)
Integer
page 37 [email protected]
DSP in FPGA (implementation)
page 38
DSP in FPGA (implementation)
Integer
page 39 [email protected]
DSP in FPGA (implementation)
Floaing Point
page 40 [email protected]
DSP in FPGA – dedicated blocks for DSP
page 41 [email protected]
DSP in FPGA – dedicated blocks for DSP
page 42 [email protected]
DSP in FPGA – dedicated blocks for DSP
1 x MULT18X18 (1 of 20)
maximum clock frequency
240 MHz
page 43 [email protected]
DSP in FPGA – dedicated blocks for DSP
Integer
page 44 [email protected]
DSP in FPGA – dedicated blocks for DSP
Floating Point
page 45 [email protected]
DSP in FPGA – dedicated blocks for DSP
page 46 [email protected]
DSP in FPGA – dedicated blocks for DSP
page 47 [email protected]
DSP in FPGA – dedicated blocks for DSP
page 48 [email protected]
DSP in FPGA – dedicated blocks for DSP
page 49 [email protected]
DSP in FPGA – dedicated blocks for DSP
page 50 [email protected]
Floating-Point DSP
page 51 [email protected]
Floating-Point DSP
page 52 [email protected]
Floating-Point DSP
page 53 [email protected]
Resource sharing
Resource sharing
Try to share resource-hungry components
(like multipliers) to reduce HW requirements
page 54 [email protected]
Resource sharing
Resource sharing
FIR filter example:
• Symmetric impulse response with 256 samples
• Sampling frequency 16 MS/s
• Clock frequency 128 MHz
• 16 MS/s at 128 MHz clock ➔ 8 clock cycles for each data sample
• Symmetric impulse response ➔ only 128 multiplications needed (255 additions)
• 8 clock cycles for 128 multiplications ➔ at least 16 multipliers are required
page 55 [email protected]
Resource sharing
Resource sharing
FIR filter example:
• Symmetric impulse response with 128 samples
• Sampling frequency 122.88 MS/s
• Clock frequency 250 MHz
• 122.88 MS/s at 250 MHz clock ➔ 2 (integer) clock cycles for each data sample
• Symmetric impulse response ➔ only 64 multiplications needed (127 additions)
• 2 clock cycles for 64 multiplications ➔ at least 32 multipliers are required
page 56 [email protected]
Resource sharing
FIR filter
page 58 [email protected]
FPGA versus Digital Signal Processor
Alan Gatherer
CTO Communications Infrastructure Group, Texas Instruments
page 59 [email protected]
FPGA versus Digital Signal Processor
page 60 [email protected]
FPGA versus Digital Signal Processor
page 61 [email protected]
FPGA versus Digital Signal Processor
page 62 [email protected]
DSP to FPGA
page 64 [email protected]
An example of simple DSP module implementation
and verification
page 65 [email protected]
Tutorial 9
FPGAs in detail
❑ DSP in FPGA
❑ IO cells
❑ Signal integrity
❑ Synchronous data interfaces
❑ High speed transceivers
page 66 [email protected]
Input / Output Cells
(Tiles, Blocks)
page 67 [email protected]
IO cells
page 68 [email protected]
IO cells
Requirements on an IO cell
❑ Support of multiple logic standards (different voltage levels)
❑ Support of differential pairs
❑ Parallel bus synchronization support
❑ Fast data transmission (over 1 Gbps)
❑ Integrated termination resistors
❑ ESD protection, pull-up / pull-down resistors...
page 69 [email protected]
IO cells
page 70 [email protected]
IO cells
page 71 [email protected]
IO cells
IO cell structure –
development
❑ Support for both SDR and DDR
interfaces (dedicated SDR/DDR Flip-Flops in
the IO cell)
❑ Support for many different logic
standards (single-ended and differential),
programmable output driver (output
current, slew rate)
❑ Internal termination (Digitally
Controlled Impedance; DCI)
❑ Internal pull-up / pull-down resistor
❑ Integrated ESD protection
page 72
IO cells
page 73 [email protected]
Logic standards
page 74 [email protected]
Logic standards
page 75 [email protected]
Logic standards
page 76 [email protected]
Logic standards
Single-ended logic
standards
Input and output logic levels.
There are many standards supported
by modern FPGAs.
page 77
Logic standards
page 78 [email protected]
Logic standards
page 79 [email protected]
Logic standards
LVDS
page 80 [email protected]
Logic standards
page 81 [email protected]
FPGA banks
page 82 [email protected]
FPGA: IO Banks
FPGA bank
❑ IO pins of an FPGA are grouped into so
called BANKs
❑ Each BANK has its own power supply
input
❑ According to the power supply voltage
each BANK can support some logic
standards (for example at 3.3V power
supply the bank can use 3.3V LVCMOS or
LVTTL logic but not 2.5V LVCOMS logic).
❑ Very useful for interfacing with chips
with different voltage standard interfaces
page 83 [email protected]
FPGA: IO Banks
FPGA bank
FPGA: IO Banks
FPGA bank
Synchronous data interfaces
page 86 [email protected]
Synchronous data interfaces
page 87
Synchronous data interfaces
page 88 [email protected]
Synchronous data interfaces
page 89 [email protected]
Synchronous data interfaces
page 90 [email protected]
Synchronous data interfaces
page 91 [email protected]
Synchronous data interfaces
Tbit = 2 000 ps
1 mm PCB ~ cca 7 ps
10 cm PCB ~ cca 700 ps
page 92 [email protected]
Synchronous data interfaces
Timing requirements
Tbit
Tbit = 2 000 ps
tSUmin = 600 ps
tHmin = 330 ps
page 93 [email protected]
Synchronous data interfaces
Timing requirements
Each data line must have same propagation delay so that data are valid at the same
moment at the receiver ➔ need to match electric length of all the data paths.
Meanders are used to stretch all the PCB traces to match their length to the longest one.
FPGA interface clock input
page 95 [email protected]
Synchronous data interfaces
ADC FPGA
page 96 [email protected]
Synchronous data interfaces
ADC FPGA
page 97 [email protected]
Synchronous data interfaces
page 98 [email protected]
Synchronous data interfaces
page 99 [email protected]
Synchronous data interfaces
DDR interface
Native support of DDR functionality directly in IO cells (DDR cannot be implemented
without such components!)
Spartan-3
7-series
DDR interface
Native support of DDR functionality directly in IO cells.
Rule of thumb:
Treat a wire as a transmission line whenever its propagation
delay is more than 6-times larger than edge time of the
transmitted signal.
To use the transmission line means to use a wire with a
characteristic impedance and to terminate it with this characteristic
impedance on both sides.
T 60 1.9(2𝐻 + 𝑇)
❑ Stripline 𝑍0 (Ω) = 𝑙𝑛
H er ε𝑟 (0.8𝑊 + 𝑇)
H
H
H1 er
❑ Asymetric Stripline
H ❑ Differential Stripline
❑ Differential Microstrip
H
Trace impedance
Test patterns
Er ~ 6
Er ~ 3
Trace impedance
Er ~ 6
Er ~ 3
page 125
Signal Integrity (SI)
❑ Measurement
• S-parameters – vector network analyzer
• Time Domain Reflectometry – dedicated measurement instrument
• Eye diagram – oscilloscopes (both real-time or sampling can be used)
Simulation - CST
S-parameters
What's next?
Today we are able to communicate at
about 50 Gb/s on short distances
(few centimeters) using differential
pairs on FR4-based Cu plated PCBs.
Optical waveguides on a PCB can
significantly increase bandwidth and
enable longer paths.
Problem with coupling of the optical
signal to/from the waveguide.
Synchronous
Asynchronous
CH0 Tx
CH0 Rx
CH1 Tx
CH1 Rx
High speed transceivers
Logic standard
Current Mode Logic (CML)
Source-Coupled Logic (SCL)
8B/10B encoding
8B/10B encoding
11110000 10010110
PCB attenuation
Skin effect and proximity effect: the higher signal frequency the higher wire
resistance (for 10 GHz signal a copper trace has resistance of about 1 Ω per inch)
Dielectric attenuation: FR4 has large dissipation factor (0.02-0.03). For demanding
application a high quality dielectric can be used (dissipation factor 0.001 or lower)
2
=
2 f
High speed transceivers
1 oz = 35 um
½ oz = 18 um
High speed transceivers
Equalization
❑ Signal attenuation: degradation of signal quality, namely edge slope. Quality of the
signal is measured at the receiver using the eye diagram – wide open eye (both vertically
and horizontally) is required for reliable data transmission.
Eye diagram
❑ Eye opening is directly related to a bit error rate
(BER)
❑ Even a relatively low error rate of 10-12 can be
unacceptable for very high speed data transmissions
What's next?
Current transceivers are capable of 30 Gbps per differential pair for NRZ endoding, or 56
Gbps for PAM-4 encoding. This is close to physical limits for PCB traces ➔ need to search
for alternative solutions (like optical links).