USERGUIDE_FPGA
USERGUIDE_FPGA
Extended Spartan-3A,
Spartan-3E, and Spartan-3
FPGA Families
R
R
Xilinx is disclosing this user guide, manual, release note, and/or specification (the “Documentation”) to you solely for use in the development
of designs to operate with Xilinx hardware devices. You may not reproduce, distribute, republish, download, display, post, or transmit the
Documentation in any form or by any means including, but not limited to, electronic, mechanical, photocopying, recording, or otherwise,
without the prior written consent of Xilinx. Xilinx expressly disclaims any liability arising out of your use of the Documentation. Xilinx reserves
the right, at its sole discretion, to change the Documentation without notice at any time. Xilinx assumes no obligation to correct any errors
contained in the Documentation, or to advise you of any corrections or updates. Xilinx expressly disclaims any liability in connection with
technical support or assistance that may be provided to you in connection with the Information.
THE DOCUMENTATION IS DISCLOSED TO YOU “AS-IS” WITH NO WARRANTY OF ANY KIND. XILINX MAKES NO OTHER
WARRANTIES, WHETHER EXPRESS, IMPLIED, OR STATUTORY, REGARDING THE DOCUMENTATION, INCLUDING ANY
WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NONINFRINGEMENT OF THIRD-PARTY
RIGHTS. IN NO EVENT WILL XILINX BE LIABLE FOR ANY CONSEQUENTIAL, INDIRECT, EXEMPLARY, SPECIAL, OR INCIDENTAL
DAMAGES, INCLUDING ANY LOSS OF DATA OR LOST PROFITS, ARISING FROM YOUR USE OF THE DOCUMENTATION.
© 2006–2010 Xilinx, Inc. XILINX, the Xilinx logo, Virtex, Spartan, ISE, and other designated brands included herein are trademarks of Xilinx
in the United States and other countries. All other trademarks are the property of their respective owners.
Revision History
The following table shows the revision history for this document.
Spartan-3 Generation FPGA User Guide www.xilinx.com UG331 (v1.7) August 19, 2010
Date Version Revision
12/03/09 1.6 Updated “Extended Spartan-3A Family Features,” page 32. Updated “I/O Capabilities,”
page 38. Updated “Global Clock Resources” and Figure 2-1 on page 46 and “Additional
Information,” page 62 to note that local clocking is not recommended. Added Figure 2-6
and described CLK0 to CLK1 switchover in “BUFGMUX Multiplexing Details,” page
54. Updated “Using Clock Buffers/Multiplexers in a Design,” page 55 and added
Figure 2-7. Updated “XST Synthesis of Clock Buffers,” page 56. Added Table 2-8 with
clock quadrant locations. Clarified “Digital Frequency Synthesizer (DFS)”and “Phase
Shift (PS),” page 70. Updated “Output Availability Depends on DLL Frequency Mode,”
page 118. Updated “Fine Phase Shifting,” page 119. Revised Figure 4-1 to show parity
integers on the data paths. Updated “Carry and Floorplanning,” page 303. Added
“Clamp Diodes,” page 336. Removed references to older software versions in
“Specifying an I/O Standard with the IOSTANDARD Attribute,” page 343,
“LVCMOS/LVTTL Slew Rate Control and Drive Strength,” page 345, and “Differential
I/O Standards,” page 351. Clarified BSDL termination in “BLVDS Output Termination,”
page 353. Removed references to older software versions in “IOBs Organized into
Banks,” page 355. Corrected VCC value for MINI_LVDS_33/Input with DIFF_TERM
inTable 10-20. Revised VREF note in Table 10-20, Table 10-21, and Table 10-22 to state that
VREF is not used for the differential I/O standards described in the table. Removed
references to older software versions in“Floorplanning,” page 413 and “Constraints
Editor,” page 423. Updated “Differences in Packages Between Spartan-3 Generation
Families,” page 467. Added footnote in Table 17-1 to indicate that the CP132 and CPG132
packages are being discontinued. Added recommended power-down sequence to “Hot
Swap,” page 491. Updated “Application State Retained during Suspend Mode,” page
507. Updated “Extended Spartan-3A Family FPGA: Turn Off VCCO,” page 516. Updated
the FG900/FGG900 package drawing in Figure 17-12.
08/19/10 1.7 Updated values for FG400 in upper half of Table 3-18. Updated note under “Address
Input”. Added “Timing Parameters” section. Added “Note relevant to Figure 10-1”.
Added last sentence to paragraph immediately following Figure 10-15. Added last
sentence to second paragraph under “ODDR2”. Added ”to VCCO” to last sentence under
“ESD Protection”. Added “Parasitic Leakage” section. Added last paragraph to “Supply
Sequencing”.
UG331 (v1.7) August 19, 2010 www.xilinx.com Spartan-3 Generation FPGA User Guide
Spartan-3 Generation FPGA User Guide www.xilinx.com UG331 (v1.7) August 19, 2010
Table of Contents
Revision History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Chapter 1: Overview
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Spartan-3 Generation Families . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Extended Spartan-3A Family Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Spartan-3AN Platform Additional Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .33
Spartan-3A DSP Platform Additional Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .34
Spartan-3 Generation Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Architectural Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
I/O Capabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Package Marking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Ordering Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
IBUF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .322
IBUFG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .323
IBUFDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .323
OBUF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .323
OBUFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .323
IOBUF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .324
DDR and Adjustable Delay I/O Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .324
HDL Entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .324
Architectural Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
Input Delay Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .325
Programmable Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .325
Dynamic Delay in the Extended Spartan-3A Family . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .326
Storage Element Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .328
Double-Data-Rate Transmission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .330
Register Cascade Feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .331
IDDR2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .331
ODDR2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .333
Pull-Up and Pull-Down Resistors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .334
FPGA Pull-Up Resistor Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .335
Keeper Circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .335
JTAG Boundary-Scan Capability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .336
SelectIO Signal Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336
Overview of I/O Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .336
Clamp Diodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .336
LVTTL — Low-Voltage TTL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .338
LVCMOS — Low-Voltage CMOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .338
PCI — Peripheral Component Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .338
GTL — Gunning Transceiver Logic Terminated . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .338
GTL+ — Gunning Transceiver Logic Plus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .339
HSTL — High-Speed Transceiver Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .339
SSTL3 — Stub Series Terminated Logic for 3.3V . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .339
SSTL2 — Stub Series Terminated Logic for 2.5V . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .339
SSTL18 — Stub Series Terminated Logic for 1.8V . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .339
LVDS — Low Voltage Differential Signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .339
BLVDS — Bus LVDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .339
LVPECL — Low Voltage Positive Emitter Coupled Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .339
LDT — HyperTransport (formerly known as Lightning Data Transport) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .340
mini-LVDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .340
LVDS Extended — Extended Mode LVDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .340
RSDS — Reduced Swing Differential Signaling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .340
TMDS — Transition Minimized Differential Signaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .340
PPDS — Point-to-Point Differential Signaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .340
I/O Standard Differences between Spartan-3 Generation Families . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .340
Specifying an I/O Standard with the IOSTANDARD Attribute . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .343
Timing Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .344
LVCMOS/LVTTL Slew Rate Control and Drive Strength . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .345
Simultaneously Switching Outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .347
HSTL/SSTL VREF Reference Voltage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .348
Single-Ended I/O Termination Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .349
Differential I/O Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .351
On-Chip Differential Termination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .351
DCI Digitally Controlled Impedance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .354
Supply Voltages for the IOBs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
Preface
Guide Contents
This user guide contains the following chapters:
• “Section I: Designing with Spartan-3 Generation FPGAs”
• Chapter 1, “Overview”
• Chapter 2, “Using Global Clock Resources”
• Chapter 3, “Using Digital Clock Managers (DCMs)”
• Chapter 4, “Using Block RAM”
• Chapter 5, “Using Configurable Logic Blocks (CLBs)”
• Chapter 6, “Using Look-Up Tables as Distributed RAM”
• Chapter 7, “Using Look-Up Tables as Shift Registers (SRL16)”
• Chapter 8, “Using Dedicated Multiplexers”
Additional Resources
To find additional documentation, see the Xilinx website at:
http://www.xilinx.com/support/documentation/index.htm.
To search the Answer Database of silicon, software, and IP questions and answers, or to
create a technical support WebCase, see the Xilinx website at:
http://www.xilinx.com/support.
Conventions
This document uses the following conventions. An example illustrates each convention.
Typographical
The following typographical conventions are used in this document:
Conventions
Online Document
The following conventions are used in this document:
“Overview”
Chapter :
Chapter 1
Overview
This chapter provides an overview of the Spartan®-3 generation platforms. Refer to the
links in Table 1-1 for more information.
Introduction
The Spartan-3 generation of FPGAs includes the Extended Spartan-3A family
(Spartan-3A, Spartan-3AN, and Spartan-3A DSP platforms), along with the earlier
Spartan-3 and Spartan-3E families. These families of Field Programmable Gate Arrays
(FPGAs) are specifically designed to meet the needs of high volume, cost-sensitive
electronic applications, such as consumer products. The Spartan-3 generation includes 25
devices offering densities ranging from 50,000 to 5 million system gates, as shown in
Table 1-5 through Table 1-7.
The Spartan-3 platform was the industry’s first 90 nm FPGA, delivering more functionality
and bandwidth per dollar than was previously possible, setting new standards in the
programmable logic industry. The Spartan-3E platform builds on the success of the earlier
Spartan-3 platform by adding new features that improve system performance and reduce
the cost of configuration. The Extended Spartan-3A family builds on the success of the
earlier Spartan-3E platform by further enhancing configuration and reducing power to
provide the lowest total cost. The Spartan-3AN platform provides the additional benefits
of non-volatility and large amounts of on-board user flash. The Spartan-3A DSP platform
extends the density range and adds resources often required in digital signal processing
(DSP) applications.
Because of their exceptionally low cost, Spartan-3 generation FPGAs are ideally suited to a
wide range of consumer electronics applications, including broadband access, home
networking, display/projection, and digital television equipment.
The Spartan-3 generation FPGAs provide a superior alternative to mask-programmed
ASICs. FPGAs avoid the high initial cost, the lengthy development cycles, and the inherent
inflexibility of conventional ASICs. Also, FPGA programmability permits design upgrades
in the field with no hardware replacement necessary, an impossibility with ASICs.
Chapter 1: Overview
Notes:
1. + = supported, ++ = better, +++ = best.
Chapter 1: Overview
Chapter 1: Overview
XC3SD3400A 3400K 53,712 104 58 5,968 23,872 373K 2,268K 126 8 469 213
XC3S1400AN 1400K 25,344 2816 11,264 176K 576K 32 8 502 227 16M
XC3S5000 5000K 74,880 104 80 8,320 33,280 520K 1,872K 104 4 633 300
Chapter 1: Overview
Architectural Overview
The Spartan-3 generation architecture consists of five fundamental programmable
functional elements:
• Configurable Logic Blocks (CLBs) contain flexible Look-Up Tables (LUTs) that
implement logic plus storage elements used as flip-flops or latches. CLBs perform a
wide variety of logical functions as well as store data.
• Input/Output Blocks (IOBs) control the flow of data between the I/O pins and the
internal logic of the device. IOBs support bidirectional data flow plus 3-state
operation. Supports a variety of signal standards, including several high-performance
differential standards. Double Data-Rate (DDR) registers are included.
• Block RAM provides data storage in the form of 18-Kbit dual-port blocks.
• Multiplier Blocks accept two 18-bit binary numbers as inputs and calculate the
product. The Spartan-3A DSP platform includes special DSP multiply-accumulate
blocks.
• Digital Clock Manager (DCM) Blocks provide self-calibrating, fully digital solutions
for distributing, delaying, multiplying, dividing, and phase-shifting clock signals.
These elements are organized as shown in Figure 1-1, using the Spartan-3A FPGA array as
an example. A dual ring of staggered IOBs surrounds a regular array of CLBs in the
Spartan-3 and Extended Spartan-3A family. The Spartan-3E family has a single ring of
inline IOBs. Each block RAM column consists of several 18-Kbit RAM blocks. Each block
RAM is associated with a dedicated multiplier. The DCMs are positioned with two at the
top and two at the bottom of the device, plus additional DCMs on the sides for the larger
devices.
The Spartan-3 generation features a rich network of traces that interconnect all five
functional elements, transmitting signals among them. Each functional element has an
associated switch matrix that permits multiple connections to the routing.
Configuration
IOBs
CLB
Block RAM
Multiplier
DCM
IOBs
OBs
DCM
Block RAM / Multiplier
CLBs
IOBs
IOBs
DCM
IOBs
DS312-1_01_032606
Configuration
Spartan-3 generation FPGAs are programmed by loading configuration data into robust,
reprogrammable, static CMOS configuration latches (CCLs) that collectively control all
functional elements and routing resources. The FPGA’s configuration data is stored
externally in a PROM or some other non-volatile medium, either on or off the board. The
Spartan-3AN platform contains its own internal SPI flash configuration memory. After
applying power, the configuration data is written to the FPGA using one of several
different modes:
• Master Serial from a Xilinx Platform Flash PROM
• Serial Peripheral Interface (SPI) from an industry-standard SPI serial Flash
• Spartan-3E and Extended Spartan-3A family FPGAs only
• Byte Peripheral Interface (BPI) from an industry-standard x8 or x8/x16 parallel NOR
Flash
• Spartan-3E and Extended Spartan-3A family FPGAs only
• Slave Serial, typically downloaded from a processor
• Slave Parallel, typically downloaded from a processor
• Boundary Scan (JTAG), typically downloaded from a processor or system tester
Chapter 1: Overview
I/O Capabilities
The Spartan-3 generation SelectIO interface supports many popular single-ended and
differential standards, as shown in Table 1-8 and Table 1-9. Table 1-10 through Table 1-14
show the number of user I/Os as well as the number of differential I/O pairs available for
each device/package combination for the Extended Spartan-3A family, Spartan-3E, and
Spartan-3 families, respectively. Some of the user I/Os are unidirectional input-only pins
as indicated in the tables. See “LVCMOS/LVTTL Slew Rate Control and Drive Strength”
for specific drive strengths supported.
I/O Capabilities
Table 1-10: Spartan-3A FPGA DSP Available User I/Os and Differential (Diff) I/O Pairs
CS484 FG676
Device CSG484 FGG676
Notes:
1. The number in bold indicates the maximum number of I/O and input-only pins. The number in italics
indicates the number of input-only pins. The differential (Diff) input-only pin count includes both
differential pairs on input-only pins and differential pairs on I/O pins within I/O banks that are
restricted to differential inputs.
Chapter 1: Overview
Table 1-11: Spartan-3AN Available User I/Os and Differential (Diff) I/O Pairs
TQG144 FTG256 FGG400 FGG484 FGG676
Device
User Diff User Diff User Diff User Diff User Diff
108 50
XC3S50AN - - - - - - - -
(7) (24)
195 90
XC3S200AN - - - - - - - -
(35) (50)
311 142
XC3S400AN - - - - - - - -
(63) (78)
372 165
XC3S700AN - - - - - - - -
(84) (93)
502 227
XC3S1400AN - - - - - - - -
(94) (131)
Notes:
1. The number in bold indicates the maximum number of I/O and input-only pins. The number in italics indicates the number of
input-only pins. The differential (Diff) input-only pin count includes both differential pairs on input-only pins and differential pairs
on I/O pins within I/O banks that are restricted to differential inputs.
2. Spartan-3AN FPGAs are available in Pb-free packaging options. The Pb-free packages include a ‘G’ character in the ordering code.
Leaded (non-Pb-free) packages might be available for selected devices, with the same pinout and without the ‘G’ in the ordering
code. Contact Xilinx sales for more information.
I/O Capabilities
Table 1-12: Spartan-3A FPGA Available User I/Os and Differential (Diff) I/O Pairs
VQ100 TQ144 FT256 FG320 FG400 FG484 FG676
Device VQG100 TQG144 FTG256 FGG320 FGG400 FGG484 FGG676
User Diff User Diff User Diff User Diff User Diff User Diff User Diff
68 60 108 50 144 64
XC3S50A - - - - - - - -
(13) (24) (7) (24) (32) (32)
Notes:
1. The number in bold indicates the maximum number of I/O and input-only pins. The number in italics indicates the number of
input-only pins. The differential (Diff) input-only pin count includes both differential pairs on input-only pins and differential pairs
on I/O pins within I/O banks that are restricted to differential inputs.
Table 1-13: Spartan-3E FPGA Available User I/Os and Differential (Diff) I/O Pairs
VQ100 CP132 TQ144 PQ208 FT256 FG320 FG400 FG484
Device VQG100 CPG132 TQG144 PQG208 FTG256 FGG320 FGG400 FGG484
User Diff User Diff User Diff User Diff User Diff User Diff User Diff User Diff
66 30 83 35 108 40
XC3S100E - - - - - - - - - -
(7) (2) (11) (2) (28) (4)
66 30 92 41 108 40 158 65 172 68
XC3S250E - - - - - -
(7) (2) (7) (2) (28) (4) (32) (5) (40) (8)
Notes:
1. The number in bold indicates the maximum number of I/O and input-only pins. The number in italics indicates the number of
input-only pins.
Chapter 1: Overview
Table 1-14: Spartan-3 FPGA Available User I/Os and Differential (Diff) I/O Pairs
VQ100 TQ144 PQ208 FT256 FG320 FG456 FG676 FG900
Device VQG100 TQG144 PQG208 FTG256 FGG320 FGG456 FGG676 FGG900
User Diff User Diff User Diff User Diff User Diff User Diff User Diff User Diff
XC3S50 63 29 97 46 124 56 - - - - - - - - - -
XC3S200 63 29 97 46 141 62 173 76 - - - - - - - -
XC3S400 - - 97 46 141 62 173 76 221 100 264 116 - - - -
XC3S1000 - - - - - - 173 76 221 100 333 149 391 175 - -
XC3S1500 - - - - - - - - 221 100 333 149 487 221 - -
XC3S2000 - - - - - - - - - - 333 149 489 221 565 270
XC3S4000 - - - - - - - - - - - - 489 221 633 300
XC3S5000 - - - - - - - - - - - - 489 221 633 300
Package Marking
Figure 1-2 provides a top marking example for a Spartan-3A FPGA in the quad-flat
packages. Figure 1-3 shows the top marking for a Spartan-3A FPGA in a BGA package. The
markings for the BGA packages are nearly identical to those for the quad-flat packages,
except that the marking is rotated with respect to the ball A1 indicator.
On Spartan-3E and Extended Spartan-3A family FPGAs, the “5C” and “4I” part
combinations can be dual marked as “5C/4I”.
R
Fabrication Code
R
Pin P1 DS529-1_03_080406
R
Fabrication Code
SPARTAN Process Code
Device Type XC3S50ATM
Package FT256 AGQ0625 Date Code
D1234567A
4C Lot Code
Speed Grade
Temperature Range
DS529-1_02_021206
Ordering Information
Ordering Information
Spartan-3 generation FPGAs are available in both standard and Pb-free packaging options
for most device/package combinations. The Pb-free packages include a ‘G’ character in the
ordering code. The automotive device part numbers begin with XA instead of XC, and the
automotive temperature ranges include both the I Industrial range and the Q Automotive
range between -40C and +125C.
Figure 1-4 shows an example of the part ordering code. The Industrial Temperature Range
is available exclusively for the Standard (-4) Speed Grade. See Table 1-10 through
Table 1-14 for specific part/package combinations, and see the XA data sheets for specific
automotive ordering codes available.
Chapter 1: Overview
Chapter 2
Introduction
Each Spartan-3 generation FPGA offers eight high-speed, low-skew global clock resources
to optimize performance. These resources are used automatically by the Xilinx tools. Even
if the clock rate is relatively slow, it is still important to use the global routing resources to
eliminate any potential for timing hazards. It is important to understand how to define and
best take advantage of these resources.
GCLK BUFGMUX
Pad Clocks
Global
DCM Routing
Double
Lines
UG331_c2_01_100209
The primary clock path is shown with bold lines, with a dedicated clock pad (GCLK)
driving a global clock buffer (BUFGMUX) that connects through global routing resources
to clock inputs on flip-flops and other clocked elements. The GCLK pads can be used as
general-purpose I/O, and include the LHCLK and RHCLK inputs described later. A DCM
can be inserted into the path between the clock pad and clock buffer to manipulate the
clock, or the DCM can acquire the clock signal from general-purpose resources. The
BUFGMUX can multiplex between two clock sources or be used as a simple BUFG clock
buffer. The clock buffer can only drive the clock routing resources, which in turn can only
drive clock inputs. However, clock inputs on flip-flops can also come from general-
purpose routing, although their use is not recommended due to higher skew.
Clocking Infrastructure
The detailed Spartan-3E and Extended Spartan-3A family clocking infrastructure is shown
in Figure 2-2.
H G F E
RHCLK3 RHCLK2
X0Y9
H H
X3Y9 X3Y8
Top Left Top Right
Quadrant (TL) Quadrant (TR)
X0Y8
8 8
4
G G
2 • • 2
Top Spine
2 2
8 8
DCM (1)
DCM
Left, Bottom 8 8 (1)
Right, Bottom
• •
2 2
2
2
LHCLK2 LHCLK3 LHCLK4 LHCLK5
X3Y7
X0Y6 X0Y7
F F
X3Y6
E Note 3 Note 4 E
Left Spine 8 8 Horizontal Spine 8 8 Right Spine
D D
X0Y5
X3Y5 X3Y4
Note 3 Note 4
• •
X0Y4
C C
Bottom Spine
2 8 8 2
2 2
8 8
DCM
Left, Top
(1) • • DCM (1)
Right, Top
2 2
2
• • 2
LHCLK0 LHCLK1
RHCLK5 RHCLK4
X3Y3
B B
X0Y2 X0Y3
X3Y2
Bottom Left 8 4 8 Bottom Right
A Quadrant (BL) Quadrant (BR) A
4 4
D C B A
DCM DCM
Bottom, Left X1Y0 X1Y1 X2Y0 X2Y1 Bottom, Right
4
4 GCLK3 GCLK2 GCLK1 GCLK0
Notes:
1. The diagram presents electrical connectivity. The diagram locations do not necessarily match the physical location on the device,
although the coordinate locations shown are correct.
2. Number of DCMs and locations of these DCM varies for different device densities. See Table 2-1.
3. See Figure 2-13a, which shows how the eight clock lines are multiplexed on the left-hand side of the device.
4. See Figure 2-13b, which shows how the eight clock lines are multiplexed on the right-hand side of the device.
5. For best direct clock inputs to a particular clock buffer, not a DCM, see Table 2-7.
6. For best direct clock inputs to a particular DCM, not a BUFGMUX, see Chapter 3, “Using Digital Clock Managers (DCMs).”
Figure 2-2: Spartan-3E and Extended Spartan-3A Family Internal Quadrant-Based Clock Structure
Table 2-1: Spartan-3E and Extended Spartan-3A Family DCM Location Designations
Top, Top, Right, Right, Bottom, Bottom, Left, Left,
Left Right Bottom Top Right Left Top Bottom
Spartan-3A DSP FPGAs
XC3SD1800A X1Y3 X2Y3 X3Y1 X3Y2 X2Y0 X1Y0 X0Y2 X0Y1
XC3SD3400A X1Y3 X2Y3 X3Y1 X3Y2 X2Y0 X1Y0 X0Y2 X0Y1
Spartan-3A/3AN FPGAs
XC3S50A/AN X0Y0 X1Y0 N/A N/A N/A N/A N/A N/A
XC3S200A/AN X0Y1 X1Y1 N/A N/A X1Y0 X0Y0 N/A N/A
XC3S400A/AN X0Y1 X1Y1 N/A N/A X1Y0 X0Y0 N/A N/A
XC3S700A/AN X1Y3 X2Y3 X3Y1 X3Y2 X2Y0 X1Y0 X0Y2 X0Y1
XC3S1400A/AN X1Y3 X2Y3 X3Y1 X3Y2 X2Y0 X1Y0 X0Y2 X0Y1
Spartan-3E FPGAs
XC3S100E N/A X0Y1 N/A N/A X0Y0 N/A N/A N/A
XC3S250E X0Y1 X1Y1 N/A N/A X1Y0 X0Y0 N/A N/A
XC3S500E X0Y1 X1Y1 N/A N/A X1Y0 X0Y0 N/A N/A
XC3S1200E X1Y3 X2Y3 X3Y1 X3Y2 X2Y0 X1Y0 X0Y2 X0Y1
XC3S1600E X1Y3 X2Y3 X3Y1 X3Y2 X2Y0 X1Y0 X0Y2 X0Y1
Clock Inputs
Clock pins accept external clock signals and connect directly to DCMs and BUFGMUX
elements. Clock pins can also be used as general-purpose I/Os. Each Spartan-3E and
Extended Spartan-3A family FPGA has:
• 16 Global Clock inputs (GCLK0 through GCLK15) located along the top and bottom
edges of the FPGA
• 8 Right-Half Clock inputs (RHCLK0 through RHCLK7) located along the right edge
• 8 Left-Half Clock inputs (LHCLK0 through LHCLK7) located along the left edge
Clock input pins are used automatically when external signals drive clock buffers. The
user can specify a particular pin using a LOC constraint in order to force a clock onto the
left or right regional clocks, or to force a clock into a particular clock buffer and then into a
desired clock routing resource. Table 2-2, page 49 through Table 2-4, page 51 show the
clock inputs for each package with the Extended Spartan-3A family, Spartan-3E, and
Spartan-3 families, respectively.
Clock Inputs
Table 2-2: Global Clock Input Pads for Extended Spartan-3A Family FPGAs
Pad Bank VQ100 TQ144 FT256 FG320 FG400 CS484 FG484 FG676
GCLK0 2 P43 P57 N9 U10 Y11 U12 AA12 Y14
GCLK1 2 P44 P59 P9 T10 V11 V12 AB12 AA14
GCLK2 2 N/A P58 R9 V11 U11 AB13 V12 AF14
GCLK3 2 N/A P60 T9 U11 V12 AA14 U12 AE14
GCLK4 0 P83 P124 C10 B10 D11 E12 C12 K14
GCLK5 0 P84 P126 D9 C9 E11 F11 E12 J14
GCLK6 0 P85 P125 C9 A10 A10 A9 A12 B14
GCLK7 0 P86 P127 A9 B9 C10 B9 A11 A14
GCLK8 0 P88 P129 C8 A8 D10 F10 B11 F13
GCLK9 0 P89 P131 D8 B7 E10 E11 C11 G13
GCLK10 0 N/A P130 A8 B8 A9 A8 D11 B13
GCLK11 0 P90 P132 B8 C8 A8 B8 E11 C13
GCLK12 2 N/A N/A R7(1) U8 W9 Y11 U11 AA13
GCLK13 2 N/A N/A T7(1) V8 Y9 Y10 V11 Y13
GCLK14 2 P40 P54 P8 U9 V10 AA12 W12 AF13
GCLK15 2 P41 P55 T8 V9 W10 AB12 Y12 AE13
LHCLK0 3 P9 P12 G2 H3 J1 L6 L5 N6
LHCLK1 3 P10 P13 H1 J3 K2 M5 L3 N7
LHCLK2 3 P12 P15 H3 J2 K3 K1 K1 P1
LHCLK3 3 P13 P16 J3 J1 L3 L1 L1 P2
LHCLK4 3 N/A P18 J2 J4 K4 L3 M1 P4
LHCLK5 3 N/A P20 J1 K5 L5 M2 M2 P3
LHCLK6 3 P15 P19 K3 K2 L1 M6 M3 N9
LHCLK7 3 P16 P21 K1 K3 M1 N7 M4 P10
RHCLK0 1 P59 P83 K15 L18 M19 N18 M22 P21
RHCLK1 1 P60 P85 K14 K17 M20 M17 L22 P20
RHCLK2 1 P61 P87 K16 K18 L19 N21 L21 P26
RHCLK3 1 P62 P88 J16 J17 L18 M20 L20 P25
RHCLK4 1 N/A P90 J14 J16 K18 L21 M18 P23
RHCLK5 1 N/A P92 H14 K15 L17 L20 M20 N24
RHCLK6 1 P64 P91 H15 H18 K20 M18 K20 P18
RHCLK7 1 P65 P93 H16 H17 J20 L17 K19 N19
Notes:
1. N/A in XC3S50A.
Clock Inputs
Table 2-3: Global Clock Input Pads for Spartan-3E FPGAs (Cont’d)
Pad Bank VQ100 CP132 TQ144 PQ208 FT256 FG320 FG400 FG484
RHCLK4 1 P65 H13 P91 P132 H15 J17 K13 L20
RHCLK5 1 P66 H12 P92 P133 H14 J16 K14 L21
RHCLK6 1 P67 G14 P93 P134 H12 J15 K20 L18
RHCLK7 1 P68 G13 P94 P135 H11 J14 J20 L19
Notes:
1. The CP(G)132 package is being discontinued and is not recommended for new designs. See
http://www.xilinx.com/support/documentation/customer_notices/xcn08011.pdf for details.
2. The FG(G)1156 package is being discontinued and is not recommended for new designs. See
http://www.xilinx.com/support/documentation/customer_notices/xcn07022.pdf for details.
odd value. For example, GCLK0 and GCLK1 are a differential pair as are LHCLK6 and
LHCLK7.
In the Spartan-3E and Extended Spartan-3A families, two clock inputs are available for
each clock buffer, allowing up to twelve differential global clock inputs. In the Spartan-3
family, only four differential clock inputs are allowed.
IBUFG
IBUFG (see Figure 2-3) represents the dedicated input buffers for driving the BUFGMUX
or its alternatives, or the DCM.
IBUFG
I O
UG331_c4_03_080906
IBUFGDS
IBUFGDS (see Figure 2-4) is a dedicated differential signaling input buffer for connection
to the clock buffer (BUFG) or DCM. In IBUFGDS, a design level interface signal is
represented as two distinct ports (I and IB), one called the master and the other called the
slave. The master and the slave are opposite phases of the same logical signal (for example,
MYNET and MYNETB).
Clock Buffers/Multiplexers
I
O
IB
UG331_c4_04_080906
Inputs Outputs
I IB O
0 0 -(1)
0 1 0
1 0 1
1 1 -(1)
Notes:
1. The dash (-) means no change.
Clock Buffers/Multiplexers
Clock buffers/multiplexers either drive clock input signals directly onto a clock line
(BUFG) or optionally provide a multiplexer to switch between two unrelated, possibly
asynchronous clock signals (BUFGMUX).
Each BUFGMUX element, shown in Figure 2-5, is a 2-to-1 multiplexer. The select line, S,
chooses which of the two inputs, I0 or I1, drives the BUFGMUX output signal, O, as
described in Table 2-5. As specified in each data sheet’s “DC and Switching
Characteristics” section, the S input has a setup time requirement. It also has
programmable polarity.
BUFGMUX
I0
O
I1
S
UG331_c4_05_080906
The S input selects clock input I0 when Low and I1 when High, but also has built-in
programmable polarity, equivalent to swapping I0 and I1. Programmable polarity on the
clock signal is available at each flip-flop, which can be rising-edge or falling-edge
triggered, avoiding having to generate and propagate two separate clock signals.
If only one clock input is needed the second clock input and select lines do not need to be
used.
The BUFGMUX is initialized with I0 selected at power-up and after the assertion of the
Global Set/Reset (GSR). Simulation should also start with S = 0 at time 0. If S = 1 at time 0,
the output is unknown until the next falling edge of I1.
The select line can change at almost any time, independent of the clock states or transitions.
The only exception is a short setup time prior to a Low-to-High transition on the selected
clock input, which can result in an undefined runt pulse output.
Figure 2-6 shows a switchover from CLK0 to CLK1.
CLK0
Switch
CLK1
OUT
UG331_c2_02_111909
Clock Buffers/Multiplexers
BUFG
The BUFGMUX is the physical clock buffer in the device, but it can be used as a simple
single-input clock buffer. The BUFG clock buffer primitive (see Figure 2-7) drives a single
clock signal onto the clock network and is essentially the same element as a BUFGMUX,
just without the clock select mechanism. BUFG is the generic primitive for clock buffers
across multiple architectures.
BUFG
I O
UG331_c4_06_080906
I0
O
I1
S
UG331_c4_07_011008
The dedicated zero on the select line is actually implemented with a dedicated VCC source
and using the programmable polarity on the S input.
BUFGCE
CE
I O
UG331_c4_08_080906
Inputs Outputs
I CE O
X 0 0
I 1 I
The BUFGCE is built from the BUFGMUX by multiplexing a fixed value for one input. The
default value is Low when disabled. The BUFGCE_1 primitive is similar with VCC
connected to I1, making the output High when disabled. It also uses the BUFGMUX_1
primitive to guarantee there are no glitches during the transition between inputs.
Figure 2-10 shows the equivalent functionality, although the library component truly is a
primitive. The CE inversion is built into the BUFGMUX functionality. The "0" source can be
fed from any convenient unused LUT.
BUFGMUX
I0
O
O
XGND I1
CE_IN S
INV
GND UG331_c4_09_080906
If a common clock enable is used for all loads on a clock net, the BUFGCE = YES constraint
can be used to move the high-fanout clock enable to a single line on a BUFGCE:
NET “primary_clock_signal” bufgce={yes|no|true|false};
CLOCK_SIGNAL is a synthesis constraint. In the case where a clock signal goes through
combinatorial logic before being connected to the clock input of a flip-flop, XST cannot
identify what input pin is the real clock pin. This constraint can be used to define the clock
pin:
NET “primary_clock_signal” clock_signal={yes|no|true|false};
BUFGMUX Inputs
The I0 and I1 inputs to a BUFGMUX element originate from clock input pins, DCMs, or
Double-Line interconnect, as shown in Figure 2-11. As shown in Figure 2-2, page 47, there
are 24 BUFGMUX elements distributed around the four edges of the device. Clock signals
from the four BUFGMUX elements at the top edge and the four at the bottom edge are
truly global and connect to all clocking quadrants. The eight left-edge BUFGMUX
elements only connect to the two clock quadrants in the left half of the device. Similarly, the
eight right-edge BUFGMUX elements only connect to the right half of the device.
I0 0 I0
O 0 O
I1 I1
1 1
S S
LHCLK or
RHCLK input 1st GCLK pin
Figure 2-11: Spartan-3E and Extended Spartan-3A Family Clock Switch Matrix for
BUFGMUX Pair Connectivity
BUFGMUX elements are organized in pairs and share I0 and I1 connections with adjacent
BUFGMUX elements from a common clock switch matrix as shown in Figure 2-11. For
example, the input on I0 of one BUFGMUX is also a shared input to I1 of the adjacent
BUFGMUX.
The clock switch matrix for the left- and right-edge BUFGMUX elements receive signals
from any of the three following sources: an LHCLK or RHCLK pin as appropriate, a
Double-Line interconnect, or a DCM in the larger devices. These devices include the
XC3S1200E and XC3S1600E devices in the Spartan-3E family, and the XC3S700A/AN,
XC3S1400A/AN, XC3SD1800A, and XC3SD3400A devices in the Extended Spartan-3A
family.
By contrast, the clock switch matrixes on the top and bottom edges receive signals from
any of the five following sources: two GCLK pins, two DCM outputs, or one Double-Line
interconnect.
Table 2-7 indicates permissible connections between clock inputs and BUFGMUX
elements. The I0 input provides the best input path to a clock buffer. The I1 input provides
the secondary input for the clock multiplexer function.
Table 2-7: Spartan-3E and Extended Spartan-3A Family Connections from Clock Inputs to BUFGMUX
Elements and Associated Quadrant Clock
Quadrant Left-Half BUFGMUX Top or Bottom BUFGMUX Right-Half BUFGMUX
Clock
Line(1) Location(2) I0 Input I1 Input Location(2) I0 Input I1 Input Location(2) I0 Input I1 Input
Extended Extended
Spartan-3A FPGAs: Spartan-3A FPGAs:
H X0Y9 LHCLK7 LHCLK6 X1Y10 GCLK6 or GCLK10 GCLK7 or GCLK11 X3Y9 RHCLK3 RHCLK2
Spartan-3E FPGAs: Spartan-3E FPGAs:
GCLK7 or GCLK11 GCLK6 or GCLK10
Extended Extended
Spartan-3A FPGAs: Spartan-3A FPGAs:
G X0Y8 LHCLK6 LHCLK7 X1Y11 GCLK7 or GCLK11 GCLK6 or GCLK10 X3Y8 RHCLK2 RHCLK3
Spartan-3E FPGAs: Spartan-3E FPGAs:
GCLK6 or GCLK10 GCLK7 or GCLK11
Extended Extended
Spartan-3A FPGAs: Spartan-3A FPGAs:
F X0Y7 LHCLK5 LHCLK4 X2Y10 GCLK4 or GCLK8 GCLK5 or GCLK9 X3Y7 RHCLK1 RHCLK0
Spartan-3E FPGAs: Spartan-3E FPGAs:
GCLK5 or GCLK9 GCLK4 or GCLK8
Extended Extended
Spartan-3A FPGAs: Spartan-3A FPGAs:
E X0Y6 LHCLK4 LHCLK5 X2Y11 GCLK5 or GCLK9 GCLK4 or GCLK8 X3Y6 RHCLK0 RHCLK1
Spartan-3E FPGAs: Spartan-3E FPGAs:
GCLK4 or GCLK8 GCLK5 or GCLK9
D X0Y5 LHCLK3 LHCLK2 X1Y0 GCLK3 or GCLK15 GCLK2 or GCLK14 X3Y5 RHCLK7 RHCLK6
C X0Y4 LHCLK2 LHCLK3 X1Y1 GCLK2 or GCLK14 GCLK3 or GCLK15 X3Y4 RHCLK6 RHCLK7
B X0Y3 LHCLK1 LHCLK0 X2Y0 GCLK1 or GCLK13 GCLK0 or GCLK12 X3Y3 RHCLK5 RHCLK4
A X0Y2 LHCLK0 LHCLK1 X2Y1 GCLK0 or GCLK12 GCLK1 or GCLK13 X3Y2 RHCLK4 RHCLK5
Notes:
1. See “Quadrant Clock Routing,” page 59 for connectivity details for the eight quadrant clocks.
2. See Figure 2-2 for specific BUFGMUX locations, and Figure 2-13 for information on how BUFGMUX elements drive onto a specific clock line
within a quadrant.
The four BUFGMUX elements on the top edge are paired together and share inputs from
the eight global clock inputs along the top edge. Each BUFGMUX pair connects to four of
the eight global clock inputs, as shown in Figure 2-2, page 47. This optionally allows
differential inputs to the global clock inputs without wasting a BUFGMUX element.
The connections for the bottom-edge BUFGMUX elements are similar to the top-edge
connections (see Figure 2-11). On the left and right edges, only two clock inputs feed each
pair of BUFGMUX elements.
BUFGMUX Outputs
The BUFGMUX drives the global clock routing, which in turn connects to clock inputs on
device resources. The BUFGMUX can also connect to a DCM, typically used for internal
feedback to the DCM CLKFB input, as shown in Figure 2-12.
UG331_c4_11_080906
For more details on using the DCMs, see Chapter 3, “Using Digital Clock Managers
(DCMs).”
a. Left (TL and BL Quadrants) Half of Die b. Right (TR and BR Quadrants) Half of Die
DS312-2_17_103105
Figure 2-13: Spartan-3E and Extended Spartan-3A Family Clock Sources for the Eight Clock Lines within
a Clock Quadrant
Other Information
unavailable in the bottom left quadrant. However, the top left (TL) quadrant clock A can
still solely use the output from either BUFGMUX_X2Y1 or BUFGMUX_X0Y2 as the source.
To estimate the quadrant location for a particular I/O, see the footprint diagrams in the
device data sheets. For exact quadrant locations, use the PlanAhead floorplanning tool. In
the QFP packages the quadrant borders fall in the middle of each side of the package, at a
GND pin. The clock inputs fall on the quadrant boundaries, as shown in Table 2-8.
Other Information
this, concentrate logic in the fewest possible clock column regions. Use floorplanning to
reduce the number of clock columns in use. To further reduce clock power, reduce the
number of rows that the clock is driving.
Using a slower clock also reduces power. The DCM can be used to divide clocks, or slow
clocks can be further divided using registers. A design can be organized according to
required clock frequency and then each part clocked at the lowest possible frequency.
Stopping a clock eliminates the power consumed by the clock routing and by the elements
it drives. If possible, stop the clock externally where it enters the FPGA. If you can not stop
the clock externally, then disable it inside the FPGA by using the BUFGMUX or BUFGCE.
Gating a clock through internal CLB logic is not recommended because it introduces route-
dependent skew and makes the design sensitive to lot-to-lot variations, and might require
manual routing.
The alternative is to use the clock enables to disable the clock loads. This is useful when the
clock is still needed in some locations, but it does not reduce the clock distribution power.
Summary
Global clock inputs, buffers, and routing are automatically used for a design’s highest
fanout clock signals. Implementation reports should be checked to verify the usage of
clock buffers where desired. The user can specify the details of global clock usage in order
to take advantage of special features such as multiplexing and clock enables, or to
maximize the number of clocks using global resources in a design.
Additional Information
For other types of routing resources, see Chapter 12, “Using Interconnect.”
For more details on the DCMs, see Chapter 3, “Using Digital Clock Managers (DCMs).”
This chapter focuses on the Spartan-3E and Extended Spartan-3A family architectures. For
details on Spartan-3 FPGA clocks, see DS099, Spartan-3 FPGA Family Data Sheet.
For more information on input delay elements and IOSTANDARD options, see
Chapter 10, “Using I/O Resources.”
Summary
For information on the clocked resources in the FPGA, such as the CLB flip-flops and the
block RAM, see the appropriate chapters elsewhere in this user guide.
For information on setting clock performance constraints, see the ISE Constraints Guide on
the Xilinx website.
Chapter 3
Introduction
DCMs integrate advanced clocking capabilities directly into the FPGA’s global clock
distribution network. Consequently, DCMs solve a variety of common clocking issues,
especially in high-performance, high-frequency applications:
• Eliminate Clock Skew, either within the device or to external components, to
improve overall system performance and to eliminate clock distribution delays.
• Phase Shift a clock signal, either by a fixed fraction of a clock period or by
incremental amounts.
• Multiply or Divide an Incoming Clock Frequency or synthesize a completely new
frequency by a mixture of clock multiplication and division.
• Condition a Clock, ensuring a clean output clock with a 50% duty cycle.
• Mirror, Forward, or Rebuffer a Clock Signal, often to deskew and convert the
incoming clock signal to a different I/O standard—for example, forwarding and
converting an incoming LVTTL clock to LVDS.
• Any or all the above functions, simultaneously.
Table 3-1: Digital Clock Manager Features and Capabilities
Feature Description DCM Signals
Digital Clock Managers (DCMs) per Device Two to eight DCMs, depending on All
array size. See Figure 3-1, page 68.
Clock Input Sources • Global buffer input pad CLKIN
• Global buffer output
• General-purpose I/O (no deskew)
• Internal logic (no deskew)
Frequency Synthesizer Output Multiply CLKIN by the fraction (M/D) • CLKFX
where M = {2..32}, D = {1..32} • CLKFX180
Document Overview
This chapter covers an assortment of topics related to Digital Clock Managers, not all of
which are relevant to every specific FPGA application.
The “DCM Functional Overview” section provides a brief introduction to the DCM and its
functions. Similarly the “DCM Primitive” section describes all the connection ports and
attributes or constraints associated with a DCM. Likewise the “Clocking Wizard” and the
“VHDL and Verilog Instantiation” sections demonstrate the various methods to specify a
DCM design.
The “DCM Clock Requirements” and the “Input and Output Clock Frequency
Restrictions” sections explain the frequency requirements on the DCM clock input and the
various DCM clock outputs. Similarly, the “Clock Jitter or Phase Noise” section highlights
the effect jitter has on output clock quality.
Finally, the “Eliminating Clock Skew”, “Clock Conditioning”, “Phase Shifting – Delaying
Clock Outputs by a Fraction of a Period”, “Clock Multiplication, Clock Division, and
Frequency Synthesis”, and “Clock Forwarding, Mirroring, Rebuffering” sections illustrate
various applications using the DCM block.
G G G G
G G G G G G
All Other
XC3S250E XC3S200A Spartan-3
4 XC3S500E XC3S400A Family
FPGAs
G G G G G G
The DCM blocks have dedicated connections to the global buffer inputs and global buffer
multiplexers on the same edge of the device, either top or bottom. They are an integral part
of the FPGA’s global clocking infrastructure. DCMs are an optional element in the clock
distribution network and are available when required by the application. In Figure 3-2a, a
clock input feeds directly into the low-skew, high-fanout global clock network via a global
input buffer and global clock buffer.
If the application requires some or all of the DCM’s advanced clocking features, the DCM
fits neatly between the global buffer input and the buffer itself, as shown in Figure 3-2b.
a. Global Buffer Inputs and Clock Buffers Drive a Low-Skew Global Network in the FPGA
Global Global
Buffer Input Digital Clock Clock Buffer
Manager
IBUFG BUFG
I O I O Low-Skew
GCLK CLKIN Output Global Clock
Network
DCM_SP
CLKFB
UG332_c3_02_113006
b. A Digital Clock Manager (DCM) Inserts Directly into the Global Clock Path
Figure 3-2: DCMs are an Integral Part of the FPGA's Global Clock Network
DCM
PSINCDEC Phase
PSEN Shifter PSDONE
PSCLK
CLK0 Clock
CLKIN Distribution
Output Stage
CLK90
Input Stage
Delay
Delay Taps
CLK180
CLK270
CLKFB CLK2X
CLK2X180
CLKDV
CLKFX
DFS
DLL CLKFX180
Status LOCKED
RST 8
Logic STATUS [7:0]
DS099-2_07_021408
DCM Primitive
The input signals to the Phase Shift unit are PSINCDEC, PSEN, and PSCLK. The output
signals are PSDONE and the STATUS[0] signal.
Status Logic
The Status Logic indicates the current state of the DCM via the LOCKED and STATUS[0]
(Extended Spartan-3A family FPGAs only), STATUS[1], and STATUS[2] output signals.
The LOCKED output signal indicates whether the DCM outputs are in phase with the
CLKIN input. The STATUS output signals indicate the state of the DLL and PS operations.
The RST input signal resets the DCM logic and returns it to its post-configuration state.
Likewise, a reset forces the DCM to reacquire and lock to the CLKIN input.
DCM Primitive
The DCM design primitive, shown in Figure 3-4, represents all the sub-features within the
Digital Clock Manager. The name of the DCM primitive differs slightly between
Spartan-3 generation FPGA families, as shown in Table 3-4. Spartan-3 FPGAs support the
DCM primitive, while Spartan-3E and Extended Spartan-3A family FPGAs support the
more advanced DCM_SP primitive. The Xilinx ISE® software automatically maps a
Spartan-3 FPGA DCM primitive to the appropriate equivalent in a Spartan-3E or
Extended Spartan-3A family FPGA design.
Table 3-4: Digital Clock Manager Primitive by Spartan-3 Generation FPGA Family
FPGA Family Primitive
Spartan-3E FPGA
DCM_SP
Extended Spartan-3A family FPGA
Spartan-3 FPGA DCM
The DCM’s Connection Ports and Attributes, Properties, or Constraints are summarized
below.
Symbol
Spartan-3 FPGA: DCM
Spartan-3E/3A/3AN/3A DSP FPGAs: DCM_SP
CLKIN CLK0
CLKFB CLK90
CLK180
CLK270
CLK2X
CLK2X180
CLKDV
CLKFX
RST CLKFX180
PSEN STATUS[7:0]
PSINCDEC LOCKED
PSCLK PSDONE
UG331_c3_01_011008
Connection Ports
Table 3-6 lists the various connection ports to the Digital Clock Manager. Each port
connection has a brief description, which includes the signal direction, and which DCM
function units require the connection. Table 3-5 provides the abbreviated name for each
function unit used in Table 3-6.
DCM Primitive
0 No effect.
1 Reset DCM block. Hold RST pulse High for at least
three valid CLKIN cycles.
PSEN Input Variable Phase Shift enable. Invertible within DCM block. Non- 9
inverted behavior shown below. See “Variable Fine Phase
Shifting,” page 123.
PSCLK Clock Clock input to Variable Phase Shifter, clocked on rising edge. 9
Input Invertible within DCM block. See “Variable Fine Phase Shifting,”
page 123.
DCM Primitive
CLKFX180 Clock Synthesized clock output CLKFX, phase shifted by 180° (appears 9
Output to be an inverted version of CLKFX). Always has a 50% duty
cycle. If only CLKFX or CLKFX180 clock outputs are used on the
DCM, then no feedback loop is required. See “Frequency
Synthesizer (CLKFX, CLKFX180),” and “Half-Period Phase
Shifted Outputs.”
STATUS[0] Output Variable Phase Shift Overflow. Control output for “Variable Fine 9
Phase Shifting.” The Variable Phase Shifter has reached its
minimum or maximum limit value. The limit value is either ±255
or a lesser value if the phase shifter reached the end of the delay
line. See “Variable Fine Phase Shifting,” page 123.
Note: This function is not supported in the Spartan-3E family.
In the Spartan-3 family, STATUS[0] also indicates overflow for a
fixed phase shift selection.
0 The Phase Shifter has not yet reached its limit value.
1 The Phase Shifter has reached its limit value.
STATUS[1] Output CLKIN Input Stopped Indicator. Available only when the CLKFB 9 9 9
feedback input is connected. Held in reset until the LOCKED
output is asserted. Requires at least one CLKIN cycle to become
active. Never asserted if CLKIN never toggles.
PSDONE Output Variable Phase Shift operation complete. See “Variable Fine Phase 9
Shifting,” page 123.
DCM Primitive
LOW Default. The DLL function unit operates in its low-frequency mode.
All DLL-related outputs are available. The frequency for all clock
inputs and outputs must fall within the low-frequency DLL limits
specified in the Spartan-3 FPGA Data Sheet.
HIGH The DLL function unit operates in its high-frequency mode. The
Clock Doubler (CLK2X, CLK2X180) outputs are not available. The
Quadrant Phase Shifted Outputs CLK90 and CLK270 are not
available. The duty cycle for the CLKDV output is not 50% if the
CLKDV_DIVIDE attribute has a non-integer value. The frequency
for all clock inputs and outputs must fall within the high-frequency
DLL limits specified in the Spartan-3 FPGA Family Data Sheet.
CLKIN_PERIOD Specifies in ns the period of the clock used to drive the CLKIN pin of the DCM.
Optional input, primarily used only for DRC checks. On Spartan-3E and
Extended Spartan-3A family FPGAs, setting CLKIN_PERIOD helps reduce DFS
jitter and results in faster locking time.
CLK_FEEDBACK Defines the frequency of the feedback clock.
DUTY_CYCLE_CORRECTION Spartan-3 FPGAs only. Enables or disables the 50% duty-cycle correction for the
CLK0, CLK90, CLK180, and CLK270 outputs from the DLL unit. The duty cycles
for all outputs on Spartan-3E and Extended Spartan-3A family FPGAs are
always corrected to 50%.
CLKDV_DIVIDE Defines the frequency of the CLKDV output. Allowable values for
CLKDV_DIVIDE include 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6, 6.5, 7, 7.5, 8, 9, 10, 11, 12,
13, 14, 15, 16.
F CLKIN
F CLKDV = -----------------------------------------------
-
CLKDV_DIVIDE
The locking time is longer, and there is more output jitter when CLKDV_DIVIDE
is a non-integer value.
CLKFX_DIVIDE Defines the division factor for the frequency of the CLKFX and CLKFX180
outputs. Used in conjunction with the CLKFX_MULTIPLY attribute. Allowable
values for CLKFX_DIVIDE include integers ranging from 1 to 32. Default value
is 1.
CLKFX_MULTIPLY
F CLKFX = F CLKIN • ---------------------------------------------------------
CLKFX_DIVIDE
NONE Default. CLKIN and CLKFB are in phase (no skew) and phase
relationship cannot be changed. Equivalent to FIXED setting
with a PHASE_SHIFT value of 0.
FIXED Phase relationship is set at configuration by the PHASE_SHIFT
attribute value and cannot be changed by the application.
VARIABLE Phase relationship is set at configuration by the PHASE_SHIFT
attribute value but can be changed by the application using the
Variable Phase Shift controls, PSEN, PSCLK, PSINCDEC, and
PSDONE.
DCM Primitive
Do not use this setting to phase shift DCM clock outputs. Instead, use the
CLKOUT_PHASE_SHIFT and PHASE_SHIFT constraints to achieve accurate
phase shifting.
DFS_FREQUENCY_MODE Spartan-3 FPGA Family Only. Specifies the allowable frequency range for the
CLKFX and CLKFX180 output clocks from the DCM’s Digital Frequency
Synthesizer (DFS). If any DLL clock outputs are used, then the more restrictive
DLL_FREQUENCY_MODE limits the CLKIN input frequency.
LOW Default. The DFS function unit operates in its low-frequency mode.
The frequency for the CLKFX and CLKFX180 outputs must fall
within the low-frequency DFS limits specified in the Spartan-3
FPGA Data Sheet. The frequency limits for the CLKIN input
depend on if any DLL clock outputs are used.
HIGH The DFS function unit operates in its high-frequency mode. The
frequency for the CLKFX and CLKFX180 outputs must fall within
the high-frequency DFS limits specified in the Spartan-3 FPGA Data
Sheet. The frequency limits for the CLKIN input depend on if any
DLL clock outputs are used.
STARTUP_WAIT Controls whether the FPGA configuration signal DONE waits for the DCM to
assert its LOCKED signal before going High.
If more than one DCM is so configured, the FPGA waits until all DCMs are
locked.
FACTORY_JF Spartan-3 FPGA Family Only. Controls how often the DCM’s DLL unit adjusts
its tap settings. The FACTORY_JF setting affects the jitter characteristics of the
DLL element.
The settings are automatically adjusted based on the DLL_FREQUENCY_MODE
attribute.
DLL_FREQUENCY_MODE FACTORY_JF
LOW 0x8080
HIGH 0x8080
Table 3-8: DFS Unit Clock Input Frequency Requirements (-4 Speed Grade)
Function Minimum Frequency Maximum Frequency Units
Data Sheet Specification CLKIN_FREQ_FX_MIN CLKIN_FREQ_FX_MAX
Spartan-3 FPGA 1 280 MHz
Spartan-3E FPGA 0.200 333 MHz
Extended Spartan-3A family 0.200 333 MHz
FPGAs
Table 3-8 shows the clock input, CLKIN, frequency range for the Digital Frequency
Synthesizer (DFS) unit. The DFS unit, if used stand-alone, has a wider frequency range
than the DLL unit. If the application uses both units, then the more restrictive DLL
requirements apply. The table shows the data sheet specification name and an estimated
value. The actual value depends on which speed grade is required for the design and the
value specified in the data sheet takes precedence over the estimate.
Table 3-9 and Table 3-10 show the clock input, CLKIN, frequency range for the Delay-
Locked Loop (DLL) unit. The DLL frequency restrictions apply regardless if the DLL is
used stand-alone or with the DFS unit. The table shows the data sheet specification name
and value. The actual value depends on which speed grade is required for the design, and
the value specified in the data sheet takes precedence over any values shown in this user
guide.
Spartan-3E and Extended Spartan-3A family FPGAs have a single DLL operating range, as
shown in Table 3-9. The frequencies shown for Spartan-3E FPGAs are for the Stepping 1
revision.
Table 3-9: Extended Spartan-3A Family FPGAs: DLL Unit Clock Input Frequency Requirements
Table 3-10 shows the frequency range for Spartan-3 FPGAs, where the DLL has two
distinct operating frequency ranges, called Low and High. The operating mode is
controlled by the DLL_FREQUENCY_MODE attribute.
Table 3-10: Spartan-3 FPGAs: DLL Unit Clock Input Frequency Requirements
DLL Frequency Mode Attribute (DLL_FREQUENCY_MODE)
Spartan-3
18 MHz 167 MHz 48 MHz 280 MHz
FPGAs
Spartan-3E and Extended Spartan-3A family FPGA DLLs support input clock frequencies
as low as 5 MHz, whereas the Spartan-3 FPGA DLL requires at least 18 MHz.
certain amount of clock jitter on the CLKIN input and a reasonable amount of frequency
variation on both the CLKIN input and the CLKFB clock feedback input.
There are two types of jitter tolerance on the CLKIN input.
• Cycle-to-cycle jitter
• Period jitter
Cycle-to-Cycle Jitter
Cycle-to-cycle jitter indicates how much the CLKIN input period is allowed to change
from one cycle to the next. The maximum allowable cycle-to-cycle change is shown in
Table 3-11, including the data sheet specification name and an estimated value. The table
also indicates when the specification applies. While Spartan-3E and Extended Spartan-3A
family FPGAs have one distinct operating range, the acceptable amount of cycle-to-cycle
jitter decreases at input frequencies about 150 MHz. For Spartan-3 FPGAs, the limits apply
depending on the DLL_FREQUENCY_MODE attribute setting.
Period Jitter
The other applicable type of jitter is called period jitter. Period jitter indicates the maximum
variation in the clock period over millions of clock cycles. Cycle-to-cycle jitter shows the
change from one clock cycle to the next while period jitter indicates the maximum range of
change over time. The maximum allowable period jitter appears in Table 3-12, including
the data sheet specification name and an estimated value.
Table 3-14: Spartan-3E FPGA: Direct Input Connections and Optional External Feedback to Associated
DCMs
I/O Bank 0
Differential Pair Differential Pair Differential Pair Differential Pair
N P N P N P N P
Package Pin Number for Single-Ended Input Pin Number for Single-Ended Input
VQ100 P91 P90 P89 P88 P86 P85 P84 P83
CP132 B7 A7 C8 B8 A9 B9 C9 A10
TQ144 P131 P130 P129 P128 P126 P125 P123 P122
PQ208 P186 P185 P184 P183 P181 P180 P178 P177
FT256 D8 C8 B8 A8 A9 A10 F9 E9
FG320 D9 C9 B9 B8 A10 B10 E10 D10
FG400 A9 A10 G10 H10 E10 E11 G11 F11
FG484 B11 C11 H11 H12 C12 B12 E12 F12
Ð Ð Ð Ð Associated Global Buffers Ð Ð Ð Ð
GCLK11 GCLK10 GCLK9 GCLK8 GCLK7 GCLK6 GCLK5 GCLK4
BUFGMUX_X1Y10
BUFGMUX_X1Y11
BUFGMUX_X2Y10
BUFGMUX_X2Y11
Top Left DCM Top Right DCM
XC3S100E: N/A XC3S100E: DCM_X0Y1
XC3S250E, XC3S500E: DCM_X0Y1 XC3S250E, XC3S500E: DCM_X1Y1
XC3S1200E, XC3S1600E: DCM_X1Y3 XC3S1200E, XC3S1600E: DCM_X2Y3
Ð Ð Ð Ð
H G F E
Global Clock Line
D C B A
Ï Ï Ï Ï
BUFGMUX_X1Y0
BUFGMUX_X1Y1
BUFGMUX_X2Y0
BUFGMUX_X2Y1
Table 3-15: Spartan-3E FPGA: Direct Input and Optional External Feedback to Left-Edge DCMs
(XC3S1200E and XC3S1600E)
Diff. Single-Ended Pin Number by Package Type Left Edge
Clock VQ100 CP132 TQ144 PQ208 FT256 FG320 FG400 FG484 LHCLK DCM/BUFGMUX
BUFGMUX_X0Y5 Î D
BUFGMUX_X0Y4 Î C
P P9 F3 P14 P22 H5 J5 K3 M5 Î LHCLK0
Clock Lines
Pair
BUFGMUX_X0Y3 Î B
BUFGMUX_X0Y2 Î A
BUFGMUX_X0Y9 Î H
BUFGMUX_X0Y8 Î G
P P15 G3 P20 P28 J2 K3 M1 M1 Î LHCLK4
Clock Lines
Pair
Table 3-16: Spartan-3E FPGA: Direct Input and Optional External Feedback to Right-Edge DCMs
(XC3S1200E and XC3S1600E)
Right Edge Single-Ended Pin Number by Package Type Diff.
DCM/BUFGMUX RHCLK VQ100 CP132 TQ144 PQ208 FT256 FG320 FG400 FG484 Clock
D Í BUFGMUX_X3Y5
C Í BUFGMUX_X3Y4
RHCLK7 Í P68 G13 P94 P135 H11 J14 J20 L19 N
Clock Lines
Pair
RHCLK6 Í P67 G14 P93 P134 H12 J15 K20 L18 P
DCM_X3Y2
RHCLK5 Í P66 H12 P92 P133 H14 J16 K14 L21 N
Pair
RHCLK4 Í P65 H13 P91 P132 H15 J17 K13 L20 P
I/O Bank 1
B Í BUFGMUX_X3Y3
A Í BUFGMUX_X3Y2
H Í BUFGMUX_X3Y9
G Í BUFGMUX_X3Y8
RHCLK3 Í P63 J14 P88 P129 J13 K14 L14 M16 N
Clock Lines
Pair
Table 3-18: Extended Spartan-3A Family: Direct Input Connections and Optional External DCM Feedback
I/O Bank 0
Differential Pair Differential Pair Differential Pair Differential Pair
N P N P N P N P
Package Pin Number for Single-Ended Input Pin Number for Single-Ended Input
VQ100 P90 N/A P89 P88 P86 P85 P84 P83
TQ144 P132 P130 P131 P129 P127 P125 P126 P124
FT256 B8 A8 D8 C8 A9 C9 D9 C10
FG320 C8 B8 B7 A8 B9 A10 C9 B10
FG400 A8 A9 E10 D10 C10 A10 E11 D11
CS484 B8 A8 E11 F10 B9 A9 F11 E12
FG484 E11 D11 C11 B11 A11 A12 E12 C12
FG676 C13 B13 G13 F13 A14 B14 J14 K14
Ð Ð Ð Ð Associated Global Buffers Ð Ð Ð Ð
GCLK11 GCLK10 GCLK9 GCLK8 BUFGMUX_X1Y10 GCLK7 GCLK6 GCLK5 GCLK4
BUFGMUX_X2Y10
BUFGMUX_X1Y11
BUFGMUX_X2Y11
Top Left DCM Top Right DCM
XC3S50A: DCM_X0Y0 XC3S50A: DCM_X1Y0
XC3S200A, XC3S400A: DCM_X0Y1 XC3S200A, XC3S400A: DCM_X1Y1
XC3S700A/1400A, Spartan-3A DSP XC3S700A/1400A, Spartan-3A DSP
FPGA: DCM_X1Y3 FPGA: DCM_X2Y3
Ð Ð Ð Ð
H G F E
Global Clock Line
D C B A
Ï Ï Ï Ï
Bottom Left DCM Bottom Right DCM
BUFGMUX_X1Y0
BUFGMUX_X1Y1
BUFGMUX_X2Y0
BUFGMUX_X2Y1
Table 3-19: Extended Spartan-3A Family FPGA: Direct Clock Input and Optional External Feedback to
Left-Edge DCMs (XC3S700A/AN, XC3S1400A/AN, and Spartan-3A DSP FPGAs)
Diff. Single-Ended Pin Number by Package Type Left Edge
Clock FT256 FG400 FG484 CS484 FG676 LHCLK DCM/BUFGMUX
BUFGMUX_X0Y5 Î D
BUFGMUX_X0Y4 Î C
P G2 J1 L5 L6 N6 Î LHCLK0
Clock Lines
Pair
N H1 K2 L3 M5 N7 Î LHCLK1
DCM_X0Y2
P H3 K3 K1 K1 P1 Î LHCLK2
Pair
N J3 L3 L1 L1 P2 Î LHCLK3
I/O Bank 3
BUFGMUX_X0Y3 Î B
BUFGMUX_X0Y2 Î A
BUFGMUX_X0Y9 Î H
BUFGMUX_X0Y8 Î G
P J2 K4 M1 L3 P4 Î LHCLK4
Clock Lines
Pair
N J1 L5 M2 M2 P3 Î LHCLK5
DCM_X0Y1
P K3 L1 M3 M6 N9 Î LHCLK6
Pair
N K1 M1 M4 N7 P10 Î LHCLK7
BUFGMUX_X0Y7 Î F
BUFGMUX_X0Y6 Î E
Table 3-20: Extended Spartan-3A Family FPGA: Direct Clock Input and Optional External Feedback to
Right-Edge DCMs (XC3S700A/AN, XC3S1400A/AN, and Spartan-3A DSP FPGAs)
Right Edge Single-Ended Pin Number by Package Type Diff.
DCM/BUFGMUX RHCLK FT256 FG400 FG484 CS484 FG676 Clock
D Í BUFGMUX_X3Y5
C Í BUFGMUX_X3Y4
RHCLK7 Í H16 J20 K19 L17 N19 N
Clock Lines
Pair
RHCLK6 Í H15 K20 K20 M18 P18 P
DCM_X3Y2
RHCLK5 Í H14 L17 M20 L20 N24 N Pair
RHCLK4 Í J14 K18 M18 L21 P23 P
I/O Bank 1
B Í BUFGMUX_X3Y3
A Í BUFGMUX_X3Y2
H Í BUFGMUX_X3Y9
G Í BUFGMUX_X3Y8
RHCLK3 Í J16 L18 L20 M20 P25 N
Clock Lines
Pair
FPGA
Configuration
Startup Phase
RST Input
asserted
LOCKED output
is LOW
LOCKED output
is HIGH
x462_05_120206
While the CLKIN input stays within the specified limits, the DCM continues to adjust its
internal delay taps to maintain lock. However, if the CLKIN input strays well beyond the
specified limits, then the DCM potentially loses lock and deasserts the LOCKED output.
Once the DCM loses lock, it does not automatically attempt to reacquire lock. When the
DCM loses lock—i.e., LOCKED was High, then goes Low—the FPGA application must
take the appropriate action. For example, once lock is lost, resetting the DCM via the RST
input forces the DCM to reacquire lock.
DCM_SP
(from FPGA application)
DCM_RESET STATUS[2]
OR RESET LOCKED
AND
DCM_LOCKED
CLKFX_STOPPED
EN035_02_101806
Clocking Wizard
Clocking Wizard
To simplify applications using DCMs, the Xilinx ISE development software includes a
software wizard that provides step-by-step instructions for configuring a DCM. As shown
in Figure 3-7, Clocking Wizard generates a vendor-specific logic synthesis file instantiating
the DCM in either VHDL or Verilog syntax. Similarly, Clocking Wizard generates a user
constraints file (UCF) for the specific implementation. Finally, all the user specifications are
saved in a Xilinx Architecture Wizard (XAW) settings file.
Clocking Wizard
Graphically configure a
Spartan-3 Digital Clock
Manager (DCM)
Xilinx Architecture
Vendor-specific Wizard (XAW)
VHDL or Verilog settings file
User contraints
file (UCF)
UG331_c3_28_022407
Figure 3-7: Clocking Wizard Provides a Graphical Interface for Configuring Digital
Clock Managers
Choose the
language for the
generated output
Click OK
when finished UG331_c3_05_120206
Select Clocking
Wizard
Choose a
specific wizard
Click OK
when finished UG332_c3_06_120206
Clocking Wizard
General Setup
Specify most of the DCM’s options using the Xilinx Clocking Wizard General Setup panel,
as shown in Figure 3-10. The text in blue ovals shows the DCM primitive attribute name
for the corresponding setting.
• To select the outputs and functions used in the final application, check the option
boxes next to the desired DCM clock outputs. Checking the output boxes enables
related option settings below.
• Enter the frequency of the CLKIN clock input. Either specify the frequency in MHz, or
specify the clock period in nanoseconds. The specified value also sets the DCM’s
DLL_FREQUENCY_MODE attribute for Spartan-3 FPGA designs.
• Specify whether the CLKIN source is internal or external to the FPGA. If External,
then Clocking Wizard automatically inserts a global buffer input (IBUFG) primitive. If
Internal, then the source signal is provided as a top-level input within the generated
HDL source file.
• If the CLKDV output box is checked, then specify the Divide by Value for the Clock
Divider circuit. This setting defines the DCM’s CLKDV_DIVIDE attribute.
• Specify the feedback path to the DCM. If only the CLKFX or CLKFX180 outputs are
used, then select None. Otherwise, feedback is required. If the feedback is from within
the FPGA, choose Internal. If the feedback loop is from outside the FPGA, choose
External. Furthermore, specify the source of the DCM feedback, either from CLK0
(1X) or from CLK2X (2X). This setting defines the DCM’s CLK_FEEDBACK attribute.
Check CLKDV to
enable the Clock
Divider options
Check CLKFX or
DCM attribute name CLKFX180 to
enable the
Frequency
Synthesizer
options
Select Fixed to
phase shift all
outputs by the
value defined
below. Select
Enter input clock Variable mode to
frequency, with dynamically
full accuracy, in adjust phase
MHz or ns shifting using the
PSEN,
DLL_FREQUENCY_MODE PSINCDEC, and
PSCLK inputs.
PHASE_SHIFT
DUTY_CYCLE_CORRECTION
Figure 3-10: A Majority of DCM Options are Set in the General Setup Panel
• Specify whether to phase shift all DCM outputs. By default, there is no phase shifting
(None). If phase shifting is required by the application, choose whether the phase shift
value is Fixed or Variable. Selecting Variable also enables the Variable Phase Shift
controls, PSEN, PSINCDEC, PSCLK, and PSDONE. This setting defines the DCM’s
CLKOUT_PHASE_SHIFT attribute. For both Fixed and Variable modes, specify the
related Phase Shift Value, which provides either the fixed phase shift value or the
Clocking Wizard
initial value for the Variable Phase Shift. This setting defines the DCM’s
PHASE_SHIFT attribute.
• To open the Advanced Options window, click Advanced.
• When finished, click Next > to continue to the Clock Buffers panel.
Advanced Options
Various advanced DCM options are grouped together in the Advanced Options window,
shown in Figure 3-11:
• By default, the DCM has no effect on the FPGA’s configuration process. Click Wait for
DCM lock before DONE signal goes high to have the FPGA wait for the DCM to
assert its LOCKED output before asserting the DONE signal at the end of
configuration. This setting defines the DCM’s STARTUP_WAIT attribute. If checked,
additional bitstream generation option changes are required, as described in the
“Setting Configuration Logic to Wait for DCM LOCKED Output” section.
• If the CLKIN input frequency is too high for a particular DCM feature, check Divide
Input Clock by 2 to reduce the input frequency by half with nearly ideal 50% duty
cycle before entering the DCM block. This setting defines the DCM’s
CLKIN_DIVIDE_BY_2 attribute.
• If required for source-synchronous data transfer applications, modify the DCM
Deskew Adjust value to SOURCE_SYNCHRONOUS. Do not use any values other
than SOURCE_SYNCHRONOUS or SYSTEM_SYNCHRONOUS without first
consulting Xilinx. This setting defines the DCM’s DESKEW_ADJUST attribute. See
“Skew Adjustment.”
• Click OK when finished to apply any changes and return to the General Setup
window.
Click OK to apply
DCM attribute name changes and
close this window UG331_c3_08_022407
Clock Buffers
Define the clock buffer output type for each DCM clock output, shown in Figure 3-12. By
default, Clocking Wizard automatically assigns all outputs to a global buffer (BUFG).
However, there are only four global buffers along each the top or bottom edge of the
device, shared by two DCMs. In the XC3S50, there is a single DCM along the top or bottom
edge that optionally connects to all four global buffers along the edge.
• To assign clock buffer types for each DCM clock output, click Customize under Clock
Buffer Settings.
• For each DCM clock output, select a Clock Buffer output type using the drop-down
list. Table 3-21 lists the available Clock Buffer options.
• If using an Enabled Buffer output type, either specify a signal name for the buffer
enable (CE) input or use the automatically generated name.
• If using a Clock Mux output type, either specify a signal name for the select (S) input
or use the automatically generated name.
• When finished, click Next > or Finish to continue. The Next > option only appears if
the CLKFX or CLKFX180 outputs were selected in the General Setup panel.
Otherwise, click Finish to generate the HDL output (see “Generating HDL Output”).
By default, Clock
Wizard places
global buffers
(BUFG) on all the
selected DCM
clock outputs
Optionally ,
customize how
the DCM clock
outputs connect to
the other FPGA
logic using the
buttons below
Click Next to
continue UG331_c3_09_120206
Figure 3-12: Clocking Wizard Provides a Variety of Buffer Options for each DCM Output
Clocking Wizard
Enabled Buffer BUFGCE Connect to one of the four global buffers configured as an enable clock buffer
I0 O (BUFGCE). The CE input enables the buffer when High. When CE is Low, the
buffer output is zero.
CE
CE O
0 0
1 I0
Clock Mux BUFGMUX Connect to one of the four global buffers configured as a clock multiplexer
I0 (BUFGMUX). The S input selects the clock source.
O
I1
S O
S
0 I0
1 I1
I0
• Optionally, click Use Multiply (M) and Divide (D) values and enter the desired
values. Click Calculate to calculate the resulting output frequency and jitter,
displayed under Generated Output.
• Finally, click Next to generate the HDL output (see “Generating HDL Output”).
DFS_FREQUENCY_MODE
CLKFX_MULTIPLY CLKFX_DIVIDE
Figure 3-13: Set the Multiply and Divide Values for the Digital Frequency Synthesizer and Calculate the
Resulting Jitter
Review options
Click Finish to
generate output file UG331_c3_11_120206
Choose a language
Choose the
appropriate DCM
function
UG331_c3_12_120206
Use the file either as a reference or cut the content of the window into a new source file.
FPGA
B
Other
A C Device on
Board
C
Δb
Δc X462_16_123105
Similarly, the clock source is rebuffered in the FPGA and drives another device on the
board. In this case, again the clock source enters the FPGA via an input pin, is distributed
via the global clock network, feeds an output pin on the FPGA, and finally connects to the
other device via a trace on the printed circuit board (PCB). Because there is more total delay
in this clock path, the resulting skew, Δc, is also larger.
Make It Go Away!
Is there a way to eliminate clock skew? Fortunately, a DCM provides such capabilities.
Figure 3-17 shows the same example design as Figure 3-16, except this time implemented
in a Spartan-3 generation FPGA. Two DCMs eliminate the clock skew: one DCM
eliminates the skew for clocked items within the FPGA, the other DCM eliminates the
skew when clocking the other device on the board. The result is practically ideal alignment
between the clock at Points (A), (B), and (C)!
How is clock skew elimination accomplished? Remember, clock skew is caused by the
delay in the clock path. In Figure 3-17, the clock at Point (B) was skewed by Δb and the
clock at Point (C) was skewed by Δc. What if there was a way to provide Point (B) with an
early version of the clock, advanced by Δb and a way to provide Point (C) with an early
version of the clock, advanced by Δc? The result would be that all clocks would arrive at
their destinations with perfect clock edge alignment. Such perfect alignment reduces setup
times, shortens clock-to-output delays, and increases overall system performance.
DCM
Other
A C Device on
DCM
Board
A
Delay=T-Δb
B Δb
Delay=T-Δc
C Δc
x462_18_123105
Figure 3-18: Delaying a Fixed Frequency Clock Appears to Predict the Future
The clock period, T, is easy to derive knowing the frequency of the incoming monotonic
clock signal. But what are the clock skew delays Δb and Δc? With careful analysis, they can
be determined after examining the behavior of multiple systems under different
conditions. In reality, this is impractical. Furthermore, the values of Δb and Δc are different
between devices and vary with temperature and voltage on the same device.
Instead of attempting to determine the Δb and Δc delays in advance, the Spartan-3 FPGA
DCM employs a DLL that constantly monitors the delay via a feedback loop, as shown in
Figure 3-17. In this particular example, two DCMs are required—one to compensate for the
clock skew to internal signals and another to compensate for the skew to external devices,
each with their own clock feedback loop. The DLL constantly adapts to subtle changes
caused by temperature and voltage.
Locked on Target
In order to determine and insert the correct delay, the DCM samples up to several
thousand clock cycles. Once the DCM inserts the correct delay, the DCM asserts its
LOCKED output signal.
Do not use the DCM clock outputs until the DCM asserts its LOCKED signal. Until the
DCM locks onto the input clock signal, the output clocks are invalid. While the DCM
attempts to lock onto the clock signal, the output clocks can exhibit glitches, spikes, or
other spurious movements.
In an application, the LOCKED signal qualifies the output clock. Think of LOCKED as a
“clock signal good” indicator.
the pad, through the global buffer, to the DCM is eliminated from the deskewed output.
Other paths are possible, however, as shown in Table 3-22. The signal driving the CLKIN
input can also originate a general-purpose input pin (IBUF primitive) via general-purpose
interconnect, from a global buffer input (IBUFG), or from a global buffer multiplexer
(BUFGMUX, BUFGCE). Similarly, an LVDS clock input can provide the clock signal. The
deskew logic is characterized for a single-ended clock input such as LVCMOS or LVTTL.
Differential signals might incur a slight amount of phase error due to I/O timing. See the
corresponding FPGA data sheet for specific I/O timing differences.
Global Clock Buffer A global clock buffer, using either a BUFG, BUFGCE, or BUFGMUX primitive, is a
preferred source for an internally generated clock to the DCM. The delay through the
BUFG
I O global buffer is characterized, and this delay is removed from the deskewed clock
output.
BUFGCE When using BUFGCE or BUFGMUX, the input clock might change frequency or stop,
I O depending on the design. The DCM should be reset after enabling a BUFGCE or
changing inputs on a BUFGMUX. Also see “Momentarily Stopping CLKIN,” page 152.
CE
BUFGMUX
I0
O
I1
Via general-purpose I/O Any user-I/O pin, IBUF, becomes an alternate source for an external clock. The pad-to-
IBUF DCM delay cannot be predetermined due to the numerous potential input paths, and
consequently, the delay is not compensated by the DCM.
I O
Derived from internal logic Logic within the FPGA also can be the clock source. Again, the logic-to-DCM delay
cannot be predetermined and it is not compensated by the DCM.
Internal
Logic
(or BUFGMUX,
or BUFGCE)
IBUFG BUFG
Clock to
I O I O internal
CLKIN CLK0
(or CLK2X) FPGA logic
(alternate clock inputs DCM or DCM_SP
possible, but not fully
skew adjusted) CLKFB LOCKED “Clock Good”
(Internal Feedback)
UG331_c3_23_022407
SRL16
D Q RESET Feedback path delay
WCLK must match the
forward path delay to
A[3:0] guarantee skew
Recommended elimination
INIT=000F
The LOCKED signal indicates when the DCM achieves lock, qualifying the clock signal.
The LOCKED signal can enable external devices or an inverted version can connect to an
active-Low chip enable.
Why Reset?
Why is this extra reset pulse required? For an optimum locking process, a DCM configured
with external feedback requires both the CLKIN and either the CLK0 or CLK2X signals to
be present and stable when the DCM begins to lock. During the configuration process, the
external feedback, CLKFB, is not available because the FPGA’s I/O buffers are not yet
active.
At the end of configuration, the DCM begins the capture process once the device enters the
startup sequence. Because the FPGA’s global 3-state signal (GTS) still is asserted at this
time, any output pins remain in a 3-state (high-impedance, floating) condition.
Consequently, the CLKFB signal is in an unknown logic state.
When CLKFB eventually appears after the GTS is deasserted, the DCM proceeds to
capture. However, without the reset pulse, the DCM might not lock at the optimal point,
which potentially introduces slightly more jitter and greater clock cycle latency through
the DCM.
Without the reset, another possible issue might occur if the CLKFB signal, while in the
3-state condition, cross-couples with another signal on the board due to a printed-circuit
board signal integrity problem. The DCM might sense this invalid cross-coupled signal as
CLKFB and use it to proceed with a lock. This possibly prevents the DCM from properly
locking once the GTS signal deasserts and the true CLKFB signal appears.
CLKOUT Clock
CLKIN Variable Distribution
Delay Line Network
Control
CLKFB
x462_21_061903
CLKOUT Clock
Voltage Controlled Distribution
Oscillator Network
CLKIN Control
CLKFB
x462_22_061903
Implementation
A DLL or PLL is assembled using either analog or digital circuitry; each approach has its own advantages. An
analog implementation with careful circuit design produces a DLL or PLL with a finer timing resolution.
Additionally, analog implementations sometimes consume less silicon area.
Conversely, digital implementations offer advantages in noise immunity, lower power consumption and better
jitter performance. Digital implementations also provide the ability to stop the clock, facilitating power
management. Analog implementations can require additional power supplies, require close control of the power
supply, and pose problems in migrating to new process technologies.
Skew Adjustment
Most of this section discusses how to remove skew and how to phase align an internal or
external clock to the clock source. In actuality, the DCM purposely adds a small amount of
skew via an advanced attribute called DESKEW_ADJUST. In Clocking Wizard, the
DESKEW_ADJUST attribute is controlled via the Advanced Options window.
There are two primary applications for this attribute, SYSTEM_SYNCHRONOUS and
SOURCE_SYNCHRONOUS. The overwhelming majority of applications use the default
SYSTEM_SYNCHRONOUS setting. The purpose of each mode is described below.
System Synchronous
In a System Synchronous design, all devices within a data path share a common clock
source, as shown in Figure 3-23. This is the traditional and most-common system
configuration. The SYSTEM_SYNCHRONOUS option, which is the default value, adds a
small amount of clock delay so that there is zero hold time when capturing data. Hold time
is essentially the timing difference between the best-case data path and the worst-case
clock path. The DCM’s clock skew elimination function advances the clock, essentially
dramatically shortening the worst-case clock path. However, if the clock path is advanced
so far that the clock appears before the data, then hold time results. The
SYSTEM_SYNCHRONOUS setting injects enough additional skew on the clock path to
guarantee zero hold times, but at the expense of a slightly longer clock-to-output time.
DATA_OUT DATA_IN
Clock
Source
x462_23_061903
Source Synchronous
SOURCE_SYNCHRONOUS mode is an advanced setting, used primarily in high-speed
data communications interfaces. In Source Synchronous applications, both the data and
the clock are derived from the same clock source, as shown in Figure 3-24. The transmitting
devices sends both data and clock to the receiving device. The receiving device then
adjusts the clock timing for best data reception. High-speed Dual-Data Rate (DDR) and
LVDS connections are examples of such systems.
DATA_OUT DATA_IN
Clock
Source DATA_CLK
x462_24_061903
Similarly, the following application note delves into more details on system-level timing.
Although the application note is written for the Virtex-II and Virtex-II Pro FPGA
architectures, most of the concepts apply directly to Spartan-3 FPGAs.
• XAPP259: System Interface Timing Parameters
http://www.xilinx.com/support/documentation/application_notes/xapp259.pdf
Timing Comparisons
Figure 3-25 compares the effect of both SYSTEM_SYNCHRONOUS and
SOURCE_SYNCHRONOUS settings using a Dual-Data Rate (DDR) application. In DDR
applications, two data bits appear on each data line—one during the first half-period of the
clock, the second during the second half-period.
In SYSTEM_SYNCHRONOUS mode, a small amount of skew is purposely added to the
DCM clock path so that there is zero hold time.
In SOURCE_SYNCHRONOUS mode, no additional skew is inserted to the DCM clock
path. However, the FPGA application must insert additional skew or phase shifting so that
the clock appears at the ideal location in the data window.
DATA_IN
SYSTEM_SYNCHRONOUS
SOURCE_SYNCHRONOUS
SOURCE_SYNCHRONOUS
+ Fixed or Dynamic Phase Shift
x462_25_061903
Clock Conditioning
Clock conditioning is a function where an incoming clock with a duty cycle other than 50%
is reshaped to have a 50% duty cycle. Figure 3-26 shows an example where an incoming
clock, with roughly a 45% High time and a 55% Low time (45%/55% duty cycle), is
reshaped into a nearly perfect 50% duty cycle—nearly perfect because there is some
residual duty-cycle distortion specified by the CLKOUT_DUTY_CYCLE_DLL and
CLKOUT_DUTY_CYCLE_FX values in the applicable FPGA family data sheet. The DCM
itself adds little to no distortion. Most of the distortion is caused by the difference in rise
and fall times in the internal routing and clock networks. The distortion is estimated at
100 ps to 400 ps, depending on the device.
Clock Conditioning
45% 55%
CLKIN
Conditioned
Clock Output
50% 50%
UG331_c3_04_120206
Figure 3-26: DCM Duty-Cycle Correction Feature Provides 50% Duty Cycle Outputs
Clocks with 50% duty cycle are mandatory for high-speed communications interfaces such
as LVDS or Dual-Data Rate (DDR) and for clock forwarding or clock mirroring
applications. See “Dual-Data Rate (DDR) Clocking Example.”
Table 3-23: Spartan-3 FPGA Family: Clock Outputs with Conditioned 50% Duty Cycle
DCM Clock
50% Duty Cycle Output
Output
CLK0
When DUTY_CYCLE_CORRECTION attribute set to TRUE
CLK180
CLK90
CLK270 DLL_FREQUENCY_MODE Attribute
LOW HIGH
When DUTY_CYCLE_CORRECTION attribute set to TRUE Outputs not available
CLK2X
CLK2X180 DLL_FREQUENCY_MODE Attribute
LOW HIGH
Always Outputs not available
Table 3-23: Spartan-3 FPGA Family: Clock Outputs with Conditioned 50% Duty Cycle (Cont’d)
DCM Clock
50% Duty Cycle Output
Output
CLKDV
DLL_FREQUENCY_MODE Attribute
LOW HIGH
Always When CLKDV_DIVIDE attribute is an integer value
CLKFX
Always
CLKFX180
The Quadrant Phase Shifted Outputs, CLK0, CLK90, CLK180, and CLK270 have optional
clock conditioning, controlled by the DUTY_CYCLE_CORRECTION attribute. By default,
the DUTY_CYCLE_CORRECTION attribute is set to TRUE, meaning that these outputs
are conditioned to a 50% duty cycle. Setting this attribute to FALSE disables the clock-
conditioning feature, in which case the effected clock outputs have roughly the same duty
cycle as the incoming clock. Exact replication of the CLKIN duty cycle is not guaranteed.
The Half-Period Phase Shift outputs are ideal for duty-cycle critical applications such as
high-speed Dual-Data Rate (DDR) designs and clock mirrors. The Half-Period Phase Shift
output pairs provide two clocks, one with a rising edge at the beginning of the clock
period, and another rising edge precisely aligned at half the clock period, as shown in
Figure 3-27.
Delay (fraction of
clock period) 0 ½T 1T
Phase Shift (degrees) 0˚ 180˚ 360˚
CLKx
CLKx180
ODDR2
D0 Q
D1
DCM or DCM_SP
BUFG CE
CLKIN CLKx C0
C1
CLKx
(50% duty cycle)
Duty-cycle distortion
CLKx at Flip-Flop
(with duty-cycle distortion)
Factor in distortion
when using a single,
inverted clock
UG331_c3_25_120306
Figure 3-28: Dual-Data Rate (DDR) Output Using Both Edges of a Single Clock
Induces Duty-Cycle Distortion
Figure 3-29 shows a slightly modified circuit compared to Figure 3-28. In this case, the
DCM provides both a non-shifted and a 180° phase-shifted output to the DDR output flip-
flop. The CLKx clock signal precisely triggers the DDR flip-flop’s C0 input at the start of
the clock period. Similarly, the CLKx180 clock signal precisely triggers the DDR flip-flop’s
C1 input halfway through the clock period. The cost of this approach is an additional
global buffer and global clock line, but it potentially reduces the potential duty-cycle
distortion by approximately 300 ps.
ODDR2
D0 Q
D1
DCM or DCM_SP
BUFG CE
CLKIN CLKx C0
CLKx180 C1
BUFG
CLKx
(50% duty cycle)
CLKx at Flip-Flop
(with duty-cycle distortion)
180 °
Phase Shift
CLKx180 at Flip-Flop
(with duty-cycle distortion)
UG331_c3_24_120
Figure 3-29: Using Half-Period Phase Shift Outputs Reduces Potential Duty-Cycle
Distortion
Table 3-26 shows the specified duty-cycle distortion values as measured using DDR output
flip-flops and LVDS outputs. There might be additional distortion on other output types
caused by asymmetrical rise and fall times, which can be simulated using IBIS.
When using the DCM to generate high speed clocks to drive the double data rate ODDR2,
BUFGMUX_X1Y1 is recommended for CLKFX and BUFGMUX_X2Y0 is recommended for
CLKFX180 to minimize period jitter.
Delay (fraction of
clock period) 0 ¼T ½T ¾T 1T
Phase Shift (degrees) 0˚ 90˚ 180˚ 270˚ 360˚
CLK0
CLK90
CLK180
CLK270
Figure 3-30: Quadrant Phase Shift Outputs Shift CLKIN, Each by a Quarter Period
(Shown with Duty-Cycle Correction Enabled)
Table 3-27: Quadrant Phase Shift Output Availability by DLL Frequency Mode
Spartan-3 FPGAs
DLL_FREQUENCY_MODE = LOW DLL_FREQUENCY_MODE = HIGH
Output
Spartan-3E and Extended Spartan-3A family FPGAs
CLKIN ≤167 MHz CLKIN > 167 MHz
CLK0 9 9
CLK90 9
CLK180 9 9
CLK270 9
2. Variable Fine Phase Shift mode has an initial phase shift value, similar to Fixed Fine
Phase Shift, which is set during FPGA configuration. However, the phase shift value
can be changed by the application after the DCM’s LOCKED output goes High.
Note: There are important differences between the Variable phase shift feature on Spartan-3
FPGAs and that found on Spartan-3E and Extended Spartan-3A family FPGAs. See “Important
Differences Between Spartan-3 Generation FPGA Families,” page 123.
0°
270° 90°
1
of CLKIN clock period
256
(=1.40625°)
180°
0° 90° 180° 270° 0°
0 64 128 192 255
PHASE_SHIFT =
CLKIN
Clock Outputs
The minimum and maximum limits of the PHASE_SHIFT attribute depend on two values.
1. The period of the CLKIN input, TCLKIN, measured in nanoseconds.
2. For Spartan-3 family FPGAs, FINE_SHIFT_RANGE defines the maximum guaranteed
delay achievable by the phase shift delay line. The actual delay line within a given
device can be longer, but only the delay up to FINE_SHIFT_RANGE is guaranteed.
The Extended Spartan-3A family does not have a FINE_SHIFT_RANGE limit for fixed
phase shifting.
Using these two values, calculate the SHIFT_DELAY_RATIO using Equation 3-1. The
limits for the PHASE_SHIFT attribute are different, depending on whether the result is less
than or if it is greater than or equal to one.
FINE_SHIFT_RANGE-
SHIFT_DELAY_RATIO = -------------------------------------------------------------- Equation 3-1
T CLKIN
SHIFT_DELAY_RATIO < 1
If the Spartan-3 FPGA clock period is longer than the specified FINE_SHIFT_RANGE, then
the SHIFT_DELAY_RATIO < 1, meaning that maximum fine phase shift is limited by
FINE_SHIFT_RANGE. When SHIFT_DELAY_RATIO < 1, then the PHASE_SHIFT limits
are set according to Equation 3-2:
FINE_SHIFT_RANGE
PHASE_SHIFT LIMITS = ± INTEGER ⎛ 256 • ---------------------------------------------------------------⎞ Equation 3-2
⎝ T CLKIN ⎠
For example, assume that FCLKIN is 75 MHz (TCLKIN = 13.33 ns) and FINE_SHIFT_RANGE
is 10.00 ns. In this case, the PHASE_SHIFT value is limited to ±191.
Consequently, the phase shift value when SHIFT_DELAY_RATIO < 1 is shown by
Equation 3-3. To determine the phase shift resolution, set PHASE_SHIFT = 1.
PHASE_SHIFT
T PhaseShift = ⎛ ---------------------------------------------------------------⎞ • FINE_SHIFT_RANGE Equation 3-3
⎝ PHASE_SHIFT LIMITS ⎠
SHIFT_DELAY_RATIO ≥ 1
By contrast, if the Spartan-3 FPGA clock period is shorter than the specified
FINE_SHIFT_RANGE, then the SHIFT_DELAY_RATIO ≥ 1, meaning that maximum fine
phase shift is limited to ±255.
Clocking Wizard
To use Fixed Phase Shift mode, select Fixed in the Phase Shift section of Clocking Wizard’s
General Setup panel, shown in Figure 3-33. This action sets the CLKOUT_PHASE_SHIFT
attribute to FIXED.
Enter the phase shift Value, which must be an integer within the limits described above.
This action sets the PHASE_SHIFT attribute value. Clocking Wizard checks that the phase
shift value is within the limits.
Select FIXED
Enter Fixed
phase shift
Value Allowable
phase shift
range at
frequency
Equivalent phase
shift for specified
Value
UG331_c3_13_120206
Table 3-30: FIXED and VARIABLE Phase Shift Implementations by Spartan-3 Generation FPGA Family
FPGA Family
Extended Spartan-3A
Spartan-3 FPGA Spartan-3E FPGA
Family FPGA
FIXED Phase Shift unit increment or 1/ th
256 of CLKIN Period
decrement unit
FIXED Phase Shift measurement
Degrees
unit
VARIABLE Phase Shift control PSEN, PSINCDEC, PSCLK, and PSDONE signals
mechanism on the DCM
VARIABLE Phase Shift increment or 1/ thof CLKIN
256
decrement unit Period DCM_DELAY_STEP, DCM_DELAY_STEP,
between 20 to 40 ps between 15 to 35 ps
(1.4065°)
Figure showing VARIABLE Phase Figure 3-31,
Figure 3-34, page 124
Shift logic page 120
VARIABLE Phase Shift equation Equation 3-5 Equation 3-7, Equation 3-8
Table 3-30: FIXED and VARIABLE Phase Shift Implementations by Spartan-3 Generation FPGA Family
FPGA Family
Extended Spartan-3A
Spartan-3 FPGA Spartan-3E FPGA
Family FPGA
VARIABLE Phase Shift
Degrees Time
measurement unit
Does Phase Shift, measured in
degrees, change with CLKIN input No Yes
frequency?
Does Phase Shift, measured in time,
change with CLKIN input Yes No
frequency?
On Spartan-3E and Extended Spartan-3A family FPGAs, however, a Variable Phase Shift
operation results in a delay change, not a phase change. The phase shift is implemented by
cascaded delay elements, as shown in Figure 3-34. Each DCM_DELAY_STEP element
ranges from the minimum and maximum values shown in Table 3-29, page 122.
Consequently, the actual amount of phase shift time added to the clock outputs ranges
between the cumulative minimum and maximum delay through all the selected elements.
This time is relatively constant and does not change with the CLKIN frequency. The
corresponding phase shift, measured in degrees, does change with frequency.
MAX_STEPS
Time
DCM_DELAY_STEP
Phase Shift
Time > Phase Shift DCM_DELAY_STEP_MIN
< Phase Shift DCM_DELAY_STEP_MAX
Time
Phase = 360°
T CLKIN UG331_c3_20_120306
Figure 3-34: Spartan-3E and Extended Spartan-3A Family FPGA Variable Phase
Shift Logic
Based on the results from Equation 3-7 and Equation 3-8, the resulting phase shift,
measured in degrees, is determined from Equation 3-9. TCLKIN is the period of the CLKIN
input.
T VariableShift o
PHASE SHIFT = --------------------------------- • 360 Equation 3-9
T CLKIN
Operation
Use the phase shift control inputs to adjust the current phase shift value, as shown in
Figure 3-35. The rising edge of PSCLK synchronizes all Variable Phase Shift operations. A
valid operation starts by asserting the PSEN enable input for one and only one PSCLK
clock period. Asserting PSEN for more than one rising PSCLK clock edge might cause
undesired behavior.
PSCLK
PSEN
PSINCDEC
0 = Decrement phase shift
1 = Increment phase shift
PSDONE
Operation complete.
Okay to start new
operation.
STATUS[0]
(Variable Phase
Shift Overflow)
If phase shift incremented or
decremented to limit value,
STATUS[0] stays High until new
operation shifts away from limit.
UG331_c3_29_100509
The value on the PSINCDEC increment/decrement control input determines the phase
shift direction. When PSINCDEC is High, the present Variable Phase Shift value is
incremented by one unit. Similarly, when PSINCDEC is Low, the present Variable Phase
Shift value is decremented by one unit.
The actual phase shift operation timing varies and the operation completes when the DCM
asserts the PSDONE output High for a single PSCLK clock period. Between enabling PSEN
until PSDONE is asserted, the DCM output clocks slide, bit by bit, from their original
phase shift value to their new phase shift value. During this time, the DCM remains locked
on the incoming clock and continues to assert its LOCKED output.
The phase adjustment might require as many as 100 CLKIN cycles plus 3 PSCLK cycles to
take effect, at which point the DCM’s PSDONE output goes High for one PSCLK cycle.
This pulse indicates that the PS unit completed the previous adjustment and is now ready
for the next request.
To enable Dynamic Fine Phase Shift mode, set the CLKOUT_PHASE_SHIFT attribute to
VARIABLE. The PHASE_SHIFT attribute value sets the initial phase shift location,
established after FPGA configuration. The FPGA application can the dynamically adjust
the skew or phase shift on the DCM’s output clocks after the DCM's LOCKED output goes
High. If the DCM is reset, the PHASE_SHIFT value reverts to its initial configuration value.
Enable PSEN
Increment/Decrement PSINCDEC PSDONE Phase Shift Done
Phase Shift Clock PSCLK STATUS[0] Variable Phase
Shift Overflow
The maximum dynamic fine phase shift value is limited by FINE_SHIFT_RANGE, the
maximum delay tap value. The Variable Phase Shift limits are set according to
Equation 3-10.
FINE_SHIFT_RANGE
DynamicPhaseShift LIMITS = ± INTEGER ⎛ 128 • ---------------------------------------------------------------⎞ Equation 3-10
⎝ T CLKIN ⎠
For example, assume that FCLKIN is 75 MHz (TCLKIN = 13.33 ns) and FINE_SHIFT_RANGE
is 10.00 ns. In this case, the Variable Phase Shift value is limited to ±96.
The Variable Phase Shift value is shown by Equation 3-11. To determine the Variable Phase
Shift resolution, set Variable Phase Shift = 1.
DynamicPhaseShift
T PhaseShift = ⎛ --------------------------------------------------------------------------⎞ • FINE_SHIFT_RANGE Equation 3-11
⎝ DynamicPhaseShift LIMITS ⎠
For example, assume that the CLKIN clock entering the DCM is 100 MHz, which equates
to a clock period of TCLKIN = 10 ns. Using the equation in Table 3-31, the Variable Phase
Shifter is limited to phase shift operations of ±105 steps. On a Spartan-3E FPGA, this
equates to a maximum variable phase shift measured in time of up to ±2.1 ns to ±4.2 ns.
Measured in degrees, this equates to a maximum between ±75.6° and 151.2°.
Controls
As shown in Figure 3-35, page 125 and Figure 3-36, page 126, the DCM’s Variable Phase
Shift control signals allow the FPGA application to adjust the present phase relationship
between the CLKIN input and the DCM clock outputs. Table 3-32 shows the detailed
relationship between control inputs, the current and next phase relationship, how the
operation affects the delay tap, and the control outputs.
Notes:
X = don’t care.
? = indeterminate, depends on current application state.
1* = PSDONE asserted High for one PSCLK period.
-Limit = minimum delay line position.
+Limit = maximum delay line position.
Assert PSEN for only one PSCLK cycle.
When PSEN is Low, the Variable Phase Shifter is disabled and all other inputs are ignored.
All present shift values and the delay line position remain unchanged.
If the delay line has not reached its limits (-Limit or –255 when decrementing, +Limit or
+255 when incrementing), then the FPGA application can change the existing phase shift
value by asserting PSEN High and the appropriate increment/decrement value on
PSINCDEC before the next rising edge of PSCLK. The phase shift value increments or
decrements as instructed. At the end of the operation, PSDONE goes High for a single
PSCLK period indicating that the phase shift operation is complete. STATUS[0] remains
Low because no phase shift overflow condition occurred.
When the DCM is incremented beyond +255 or below –255, the delay line position remains
unchanged at its limit value of +255 or –255 and no phase change occurs. STATUS[0] goes
High, indicating a Variable Phase Shift overflow (not available in Spartan-3E FPGAs).
When a new phase shift operation changes the value in the opposition direction—i.e.,
away from the limit value—STATUS[0] returns Low.
If the phase shift does not reach +255 or –255, but the phase shift exceeds the delay-line
range—indicated by +Limit and –Limit in Table 3-32—then no phase change occurs.
However, STATUS[0] again goes High. In the Spartan-3 and Extended Spartan-3A families
only, the STATUS[0] output indicates when the delay tap reaches the end of the delay line.
In the FPGA application, however, use the limit value calculated using Equation 3-10. The
calculated delay limit is a guaranteed value. A specific device, due to processing, voltage,
or temperature, might have a longer line delay, but this cannot be guaranteed from device
to device. The phase shift value—but not the delay line positions—continues to increment
or decrement until it reaches its +255 or –255 limit. When a new phase shift operation
changes the value in the opposition direction—i.e., away from the limit value—the
STATUS[0] signal returns Low. The phase shift value is incremented or decremented back
to a value that corresponds to a valid absolute delay in the delay line.
Clocking Wizard
The Variable Phase Shift options are part of the Clocking Wizard’s General Setup panel,
shown in Figure 3-37. To enable dynamic fine phase shifting, select VARIABLE, as shown
in Figure 3-37. Enter an initial Phase Shift Value in the text box provided. The initial value
behaves exactly like the Fixed Fine Phase Shifting mode described above.
Select
VARIABLE
Selecting VARIABLE
enables the phase shift
controls PSEN,
PSINCDEC, PSCLK, and
PSDONE
UG331_c3_14_120206
Figure 3-37: Selecting Variable Fine Phase Shift Mode in Clocking Wizard
Choosing Variable mode also enables the Variable Phase Shift control signals, PSEN,
PSINCDEC, PSCLK, and PSDONE. For the Spartan-3 family, check the STATUS output box
to enable the STATUS[0] signal. STATUS[0] indicates when the Variable Phase Shifter
reaches its maximum or minimum limit value (not available in Spartan-3E family).
Example Applications
See application note XAPP268 for an example of how to use the Variable Phase Shift
function to perform dynamic phase alignment.
• XAPP268: Dynamic Phase Alignment
http://www.xilinx.com/support/documentation/application_notes/xapp268.pdf
Deskewed Clock
CLKIN CLK0
F = FCLKIN
DCM or
Clock DCM_SP
CLK0
or Distribution CLKFB
CLK2X Delay CLK2X
Clock Doubler
F = 2 FCLKIN
Clock Feedback Loop CLK2X180
A clock feedback loop to CLKFB
is required when using the
CLK0, CLK2X, CLK2X180, or Clock Divider
CLKDV outputs. Use only CLK0 FCLKIN
CLKDV F=
or CLK2X as the feedback CLKDV_DIVIDE
source. Feedback is not Usually 50% duty cycle,
required when using only the depending on conditions
CLKFX or CLKFX180 outputs.
Frequency Synthesizer
CLKFX
CLKFX_MULTIPLY
F = FCLKIN
CLKFX_DIVIDE
CLKFX180
50% duty cycle
UG331_c3_022407
All the frequency synthesis outputs, except CLKDV, always have a 50/50 duty cycle.
CLKDV usually has a 50% duty cycle except when dividing by a non-integer value at high
frequency, as shown in Table 3-37. The Clock Doubler (CLK2X, CLK2X180) circuit is not
available at high frequencies.
All the DCM clock outputs, except CLKFX and CLKFX180, are generated by the DCM’s
Delay-Locked Loop (DLL) unit and consequently require some form of clock feedback to
the CLKFB pin. The DCM’s Digital Frequency Synthesizer (DFS) unit generates the CLKFX
and CLKFX180 clock outputs. If the application uses only the CLKFX or CLKFX180
outputs, then the feedback path can be eliminated, which also extends the DCM’s
operating range. The Frequency Synthesizer has a feedback path within the DCM, based
on CLKIN.
Output Alignment
If clock feedback is used, then all the output clocks are phase aligned. Obviously, full clock-
edge alignment across all the DCM outputs occurs only occasionally because some of the
outputs are divided clock values. For example, the CLKDV output is aligned to CLKIN
and CLK0 every CLKDV_DIVIDE cycles. Similarly, the CLK2X output is aligned to CLK0
every other clock cycle. The CLKFX output is aligned to CLKIN every CLKFX_DIVIDE
cycles of CLKIN and every CLKFX_MULTIPLY cycles of CLKFX.
Individual outputs are aligned to CLKIN, but when using divided clocks the DCM
arbitrarily picks a rising edge to align to; therefore, the rising edge of the CLKFX output
might not be aligned to the other outputs. For example, a divide-by-two function on
CLKDV and a divide-by-four function on CLKFX could be aligned on a falling edge
instead of a rising edge. To align the rising edges in this case, use CLKIN_DIVIDE_BY_2 on
the input, and use the CLK0 output for the divide-by-two and the CLKDV output (with D
= 2) for the divide-by-four. If this is not possible, the CLKDV output of one DCM can be
cascaded to a second DCM and CLKDV, with D = 2 for both. Also note that the first rising
edge of CLKFX after LOCKED is High is not always the one aligned to the rising edge of
CLK0. For example, if CLKFX is set to a 1.5X multiple of CLK0, the first rising edge of
CLK0 after LOCKED is achieved might be aligned to the falling edge of CLKFX, or it might
be aligned to the rising edge of CLKFX. In this case, you will have alignment on rising
edges at every other CLK0, but not for the very first CLK0 after LOCKED is High.
F F
FPGA
DCM DCM
F F F F.m
n
Overclocked,
time-shared logic
F F.x
DCM
F x462_38_070903
Frequency
X462_39_011008
See Module 3 of DS099, Spartan-3 FPGA Family: Complete Data Sheet for details.
Figure 3-40: Input and Output Clock Frequency Restrictions (Spartan-3 FPGA Family, Low-Frequency
Mode Example)
Table 3-34: DCM Frequency Restriction Examples (Spartan-3 FPGA Family, Low-Frequency Mode Example)
Input Output
Comments
Frequency Frequency
1.2 MHz 12.8 MHz Not possible in a single DCM. FCLKIN is within acceptable range for DFS unit, but FCLKFX
requires at least an 18 MHz output frequency.
1.2 MHz 32.4 MHz Possible in a single DCM using DFS unit. Set CLKFX_MULTIPLY = 27. FCLKFX is within
the DFS output frequency range.
25 MHz 2.5 MHz Possible in a single DCM using both the DFS and DLL units. Use the CLKDV output for a
30 MHz 2.5 MHz signal, setting CLKDV_DIVIDE=10. Use the CLKFX output for a 30 MHz signal,
setting CLKFX_MULTIPLY = 6 and CLKFX_DIVIDE = 5. All input and output
frequencies are within appropriate ranges.
CLK2X On Spartan-3 FPGAs, the CLK2X frequency limits are determined by the
CLK2X180 DLL_FREQUENCY_MODE attribute.
FPGA Family Minimum Frequency Maximum Frequency
CLKOUT_FREQ_2X_LF_MIN CLKOUT_FREQ_2X_LF_MAX
Spartan-3 FPGA
36 MHz 334 MHz
CLKOUT_FREQ_2X_MIN CLKOUT_FREQ_2X_MAX
Spartan-3E FPGA
(Stepping 1) 10 MHz -4: 311 MHz
-5: 334 MHz
CLKOUT_FREQ_DV_LF_MIN CLKOUT_FREQ_DV_LF_MAX
1.125 MHz 110 MHz
Spartan-3 FPGA
CLKOUT_FREQ_DV_HF_MIN CLKOUT_FREQ_DV_HF_MAX
3.0 MHz 185 MHz
CLKOUT_FREQ_DV_MIN CLKOUT_FREQ_DV_MAX
Spartan-3E FPGA
0.3125 MHz (312.5 kHz) -4: 160 MHz
(Stepping 1) -5: 183 MHz
CLKOUT_FREQ_DV_MIN CLKOUT_FREQ_DV_MAX
Extended Spartan-3A
0.3125 MHz (312.5 kHz) -4: 166 MHz
FPGA
-5: 186 MHz
60%/40% (or 40%/60%) or better duty cycle. Consequently, the CLKDV output, divided
by 1.5 in high-frequency mode cannot provide a clock input to a second cascaded DCM.
Table 3-37: CLKDV Duty Cycle with DLL_FREQUENCY_MODE=HIGH
High Time/
CLKDV_DIVIDE Attribute Duty Cycle
Total Cycle
Integer 50.000% 1/2
1.5 33.333% 1/3
2.5 40.000% 2/5
3.5 42.857% 3/7
4.5 44.444% 4/9
5.5 45.454% 5/11
6.5 46.154% 6/13
7.5 46.667% 7/15
Clocking Wizard
The Clock Divider controls are in Clocking Wizard’s General Setup window. Check the
CLKDV output box, shown in Figure 3-41a. Then, choose the Clock Divider’s Divide by
Value using the drop-down list, shown in Figure 3-41b.
Divide By Value
CLKDV 2
UG331_c3_21_120306
x462_40b_061903
a. Check the CLKDV Output Box b. Select the Divide by Value from the Drop-Down List
Figure 3-41: Specifying the Clock Divider in Clocking Wizard
frequency as CLKFX but is phase shifted 180°, or half a clock period. Because both
Frequency Synthesizer outputs have 50% duty cycles, CLKFX180 appears to be an inverted
version of CLKFX.
Two attributes, set at design time, control the synthesized output frequency, as shown in
the equation in Table 3-39. The CLKIN clock input is multiplied the fraction formed by
C LKFX_M U LTIPLY as the numerator and CLKFX_DIVIDE as the denominator. For
example, to create a 155 MHz output using a 75 MHz CLKIN input, the Frequency
Synthesizer multiplies CLKIN by the fraction 31/15. Note that it does not multiply CLKIN
by 31 first, then divide by the result by 15. Multiplying CLKIN by 31 would result in a
2.325 GHz output frequency—well outside the frequency range of the Spartan-3 FPGA
DCM.
The multiplier and divider values should be reduced to their simplest form, which results
in faster lock times. For example, reduce the fraction 6/8 to 3/4.
Frequency synthesis always requires some form of clock feedback. However, the DFS unit
has an internal feedback loop based on CLKIN and does not require a separate loop on
CLKFB if used without the DLL unit.
The CLKFX output is phase aligned with the CLKIN input every CLKFX_DIVIDE cycles of
CLKIN and every CLKFX_MULTIPLY cycles of CLKFX. For example, if
CLKFX_MULTIPLY = 3 and CLKFX_DIVIDE = 5, then the CLKFX output is phase aligned
with the CLKIN input every five CLKIN cycles and every three CLKFX cycles. After the
DCM asserts its LOCKED output, the DFS unit is resynchronized to the CLKIN input at
each concurrence and phase alignment is nearly perfect at these edges.
Table 3-39: Frequency Synthesizer Summary
DCM Output(s) CLKFX
CLKFX180 (same as CLKFX, phase shifted 180°)
Output Frequency CLKFX_MULTIPLY
F CLKIN • ---------------------------------------------------------
CLKFX_DIVIDE
Clocking Wizard
To enable the Frequency Synthesizer in Clocking Wizard, check the CLKFX, CLKFX180, or
both clock outputs in the General Setup window, as shown in Figure 3-42.
CLK0
Check CLKFX or
CLKFX CLKFX180 to enable
the Frequency
CLKFX180 Synthesizer options
UG331_c3_15_061708
If using the CLKFX or CLKFX180 clock outputs stand-alone, then optionally extend the
frequency limits by disabling any DLL clock outputs and any feedback.
• Disable DCM feedback by selecting None, as shown in Figure 3-43. Without feedback,
the CLKFX and CLKFX180 frequency range is extended to both lower and higher
frequencies and disables the CLK0 and other DLL outputs.
Feedback
Source: Internal External None
Value: 1X 2X
x462_42_070903
Finally, enter the desired output frequency or the Multiply and Divide values, as described
in the Clocking Wizard Clock Frequency Synthesizer panel section.
performance, both the clock input and the clock feedback paths require differential global
buffer inputs (IBUFGDS), which unfortunately consumes all the global buffer inputs along
one edge of the device. However, this solution provides the best-quality clock forwarding
solution at high frequencies.
FPGA
OFDDRCPE/ODDR2 OBUFDS
IBUFGDS VCC D0 Q
GND D1
DCM
BUFG CE
CLKIN CLKx C0
CLKFB CLKx180 C1
IBUFGDS BUFG
Figure 3-44: High-Frequency (250+ MHz) LVDS Clock Forwarding Circuit with 50% Duty Cycle
Clock Edge
with Jitter
Cycle-to-Cycle Jitter
Cycle-to-cycle jitter, also called adjacent cycle jitter, indicates the maximum clock period
variance from one clock cycle to the next, as shown in Figure 3-46. In this simple example,
the maximum change from one cycle to the next is +100 ps and –100 ps, or put simply,
±100 ps. Although the clock period can change by larger absolute amounts when measured
over millions of clock cycles, the clock period never changes by more than ±100 ps from one
clock cycle to the next.
T0 T1 = T0 + 100 ps T2 = T1 - 100 ps
x462_45_062203
Cycle-to-cycle is an important measure of the quality of a clock output or oscillator but has
little use in analyzing the timing of an application.
Period Jitter
Period jitter is the summation of all the cycle-to-cycle jitter values over millions of clock
cycles. Peak jitter indicates the earliest and the latest transition times compared to the ideal
clock transition time over consecutive clocks.
Period jitter for Digital Clock Mangers is random and is expressed as peak-to-peak jitter.
Conceptually, the position of the clock transition is a probabilistic distribution or
histogram, centered around the ideal, desired clock position, as shown in Figure 3-47. The
actual distribution might not appear purely Gaussian and can be bimodal. Regardless,
most actual clock transitions occur near the desired ideal position. However, measured
over millions of clock cycles, some clock transitions occur far from the desired position.
The statistical distance from the desired position is measured in standard deviations, also
called σ (sigma). Because the DCM is an all-digital design, it is highly stable and Xilinx
specifies jitter deviation to ±7σ or peak-to-peak jitter to 14σ. As a point of reference, ±7σ
guarantees that 99.99999999974% of the jitter values are less than the specified worst-case
jitter value. A 14σ peak-to-peak jitter, ±7σ jitter deviation, equates to a maximum bit error
rate (BER) of 1.28 x 10-12.
-7σ +7σ
Peak-to-peak
Period Jitter
Bit Period
Peak-to-Peak
2 2
JITTER PK – PK = ( JITTER INPUT ) + ( JITTER SPEC ) Equation 3-12
Peak-to-Peak Deviation
2 2
( JITTER INPUT ) + ( JITTER SPEC )
JITTER PK = ± ----------------------------------------------------------------------------------------------- Equation 3-13
2
where
JITTERINPUT = The input period jitter, measured at the clock input pin of the FPGA
JITTERSPEC = The DLL clock output period jitter, as specified in the FPGA family
data sheet for the associated output port
Example
Assume that an input clock has 150 ps peak-to-peak period jitter, optionally expressed as
±75 ps. The incoming clock is duty-cycle corrected, using the same frequency, on the CLK0
DCM output.
In this case, JITTERINPUT = 150 ps. The value for JITTERSPEC is the Spartan-3 FPGA Data
Sheet specification called CLKOUT_JITT_PER_0, which is estimated here as ±100 ps, or
200 ps peak-to-peak.
2 2
JITTER PK – PK = ( 150 ps ) + ( 200 ps ) = 250 ps Equation 3-14
Consequently, the total jitter on the DCM output is 250 ps peak-to-peak or ±125 ps.
Figure 3-49: Calculating Jitter for Cascaded DCMs Depends on which DCM Outputs are Used
Consequently, the jitter at any point in the cascaded DCM chain depends on the factors
described above. The following examples illustrate how to calculate total jitter at the
various points in the circuit.
Assume that DCM (B) uses the CLKDV output with an integer divider value. Use the
Spartan-3 Data Sheet specification called CLKOUT_PER_JITT_DV1 for the DCM output
jitter, estimated here as 300 ps (±150 ps). Calculate the total period jitter on clock (B) using
Equation 3-12. Because there are now three elements involved—the input jitter, the jitter
from DCM (A), and the jitter from DCM (B)—expand the RMS equation appropriately.
2 2 2
JITTER PK – PK ( B ) = ( 150ps ) + ( 400ps ) + ( 300ps ) = 522ps = ±261ps Equation 3-16
Finally, assume that DCM (C) phase shifts the output from DCM (B) by 90°. Use the
Spartan-3 Data Sheet specification called CLKOUT_PER_JITT_90 for the DCM output jitter,
estimated here as 300 ps (±150 ps). Calculate the total period jitter on clock (C) using
Equation 3-12. Because there are now four elements involved—the input jitter, the jitter
from DCM (A), the jitter from DCM (B), and the jitter from DCM (C)—expand the RMS
equation appropriately.
2 2 2 2
JITTER PK – PK ( C ) = ( 150ps ) + ( 400ps ) + ( 300ps ) + ( 300ps ) Equation 3-17
= 602ps = ±301ps
Finally, assume again that DCM (C) phase shifts the output from DCM (B) by 90°. Use the
Spartan-3 Data Sheet specification called CLKOUT_PER_JITT_90 for the DCM output jitter,
estimated here as 300 ps (±150 ps). Calculate the total period jitter on clock (C) using the
following equation. Because the preceding DCM used the CLKFX output, the total
incoming jitter is set at 700 ps, worst-case. Use the RMS equation to calculate the resulting
output jitter as shown below.
2 2
JITTER PK – PK ( C ) = ( 700ps ) + ( 300ps ) = 762ps = ±381ps Equation 3-19
(SDR) applications, the clock period and the bit period are equal. However, in dual-data-
rate (DDR) applications, the bit period is half the clock period.
where
If the total jitter is specified as a positive value instead of a deviation from the clock
period—e.g., 200 ps instead of ±100 ps—subtract half the positive value—i.e., 100 ps. The
bit period is only shortened by the negative deviation. The positive deviation adds to the
bit period, adding more timing slack.
Example
Assume that an incoming clock signal enters the FPGA at 75 MHz and that the clock source
has ±100 ps of jitter. The application clocks data on the rising edge of an internally
generated 150 MHz clock, or a total bit period, TBIT, of 6.67 ns. How long is the available bit
period, TAVAILABLE, after considering the effects of jitter?
The CLK2X output from the Clock Doubler generates a 150 MHz clock from the 75 MHz
clock input. The Clock Doubler output, CLK2X, has ±200 ps of worst-case jitter according to
the CLKOUT_PER_JITT_2X specification in the Spartan-3 Data Sheet. Adding the DCM’s
±200 ps of jitter to the clock source’s ±100 ps of jitter using root-mean square (RMS), the
total jitter, tTOTAL_JITTER, is ±0.223 ns.
2 2
t TOTAL_JITTER = ( ±100ps ) + ( ±200ps ) = ±223.60ps = ±0.223ns Equation 3-21
Because data is only clocked on the rising clock edge, there are no duty-cycle distortion
effects and tDUTY_CYCLE_DISTORTION = 0.
Therefore, the total available clock period, TAVAILABLE is reduced down to 6.444 ns from a
total bit period of 6.667 ns. Effectively, this forces the logic to operate at 155.1831 MHz
instead of 150 MHz.
Optionally Place Virtual Ground Pins Around DCM Input and Output
Connections
On sensitive, high frequency DCM inputs or outputs, use additional user-I/O pins to
create extra connections to the PCB ground—i.e., create virtual ground pins. Place these
virtual ground pins on the I/O pads adjacent to the sensitive DCM signal. Make sure that
the I/O pads are on adjacent pads on the FPGA die level, not just on adjacent pins or balls
on the package. Adjacent balls on BGA packages do not necessarily connect to adjacent
pads on the FPGA. These techniques reduce the internal voltage drop and improve the
jitter.
To create a “virtual ground”, configure an IOB as a high-drive output driving GND (Low
logic level) and connect the IOB externally directly to the ground plane, as shown in
Figure 3-50.
“Virtual” FPGA
ground pin
Direct connection OBUF
to PCB ground GND
plane
IBUFG
Sensitive DCM To DCM
Clock Input CLKIN Input
OBUF
GND
x462_49_061903
Figure 3-50: Place Virtual Ground Pins Adjacent to Sensitive DCM Input or Output
Clock Signals
The same technique can be used to provide a virtual VCC rail connection. Turning I/O into
virtual GND or virtual VCC can not only help with sensitive signals, but also help with pin
migration. For more information on virtual grounds, see white paper
WP323: Signal Integrity Tips and Tricks.
VCCAUX
Supply Keep VCCAUX noise
(at FPGA) envelope to < 200 mV,
peak-to-peak
dV < 10 mV
Avoid sudden changes from
one DC level to another.
Keep dV/dt < 10mV/mS. dt < 1 mS x462_50_061903
3. If VCCAUX and VCCO are of the same power plane, every VCCAUX/VCCO pin must be
properly decoupled or bypassed (see “Properly Design the Power Distribution
System”). Separate the VCCAUX supply from any VCCO supplies if Guidelines 1 and 2
above cannot be maintained.
4. The CLK2X output is especially affected by the power or ground shift. Consequently,
the CLKFX output, using CLKFX_MULTIPLY =2 and CLKFX_DIVIDE=1, might
provide a better quality output when all IOBs and CLBs are switching. The CLKFX
circuitry updates the tap every three input clocks in the DFS mode, as opposed to the
slower update rate for the CLK2X output.
UG332_c1_04_120306
See the "Configuration Bitstream Generator (BitGen) Settings" chapter in UG332: Spartan-3
Generation Configuration User Guide for more information.
Start-up Cycles 0 1 2 3 4 5 6
Start-up CLK
GTS_cycle A
LCK_cycle B
GWE_cycle C
DONE_cycle D
x462_52_062403
b. Set the cycle where the start-up logic waits for the DCM(s) to assert LOCKED after
the GTS_cycle. The DCMs require some form of external input—a clock and
possibly a feedback signal—before the DCM can lock on the clock signal.
c. After achieving valid DCM lock, assert the FPGA's internal Global Write Enable
(GWE_cycle) signal.
d. Finally, assert the DONE signal.
Figure 3-54 shows these same option settings from within Project Navigator.
UG331_c3_17_022407
The specific start-up phase timing and the timing of both the GWE_cycle and DONE_cycle
are flexible. However, if using the STARTUP_WAIT attribute on a DCM, the GTS_cycle
must always happen before the LCK_cycle. Otherwise, the DCM never locks and
configuration never completes! Similarly, if using External Feedback, the FPGA’s outputs
must first be enabled (GTS_cycle) so that the external feedback signal can propagate back
to the DCM.
Click Configuration
Options
UG331_c3_18_120306
Figure 3-55: Configuration Option Allows DCM Reset During Reconfiguration Process
CLKIN
CLK2X
x462_55_062403
Chapter 4
Introduction
All Spartan-3 generation FPGAs feature multiple block RAMs, organized in columns. The
total amount of block RAM depends on the size of the Spartan-3 generation FPGA as
shown in Table 4-1.
Notes:
1. 1Kbit = 1,024 bits, per memory conventions.
Each block RAM contains 18,432 bits of fast static RAM, 16K bits of which is allocated to
data storage and, in some memory configurations, an additional 2K bits allocated to parity
or additional "plus" data bits. Physically, the block RAM has two completely independent
access ports, labeled Port A and Port B. The structure is fully symmetrical, and both ports
are interchangeable and support data read and write operations. Each memory port is
synchronous with its own clock, clock enable, and write enable. Read operations are also
synchronous and require a clock edge and clock enable.
Though physically a dual-port memory, block RAM simulates single-port memory in an
application, as shown in Figure 4-1. Furthermore, each block memory supports multiple
configurations or aspect ratios. Table 4-2 summarizes the essential SelectRAM features.
Cascade multiple block RAMs to create deeper and wider memory organizations with a
minimal timing penalty incurred through specialized routing resources.
The block RAMs in the Spartan-3A DSP platform include an optional output register
similar to the block RAM output register of the Virtex®-4 FPGA. The output register
enables full-speed operation at over 250 MHz for all data widths.
Introduction
WEA RAMB16_SwA_SwB
ENA
SSRA
DOPA[pA–1:0]
CLKA
ADDRA[rA–1:0] DOA[wA–pA–1:0]
DIA[wA–pA–1:0]
DIPA[pA–1:0]
WEB WE RAMB16_Sw
ENB EN
SSRB DOPB[pB–1:0] SSR
DOP[p–1:0]
CLKB CLK
DOB[wB–pB–1:0] DO[w–p–1:0]
ADDRB[rB–1:0] ADDR[r–1:0]
DIB[wB–pB–1:0] DI[w–p–1:0]
DIPB[pB–1:0] DIP[p–1:0]
Figure 4-1: SelectRAM 18K Blocks Perform as Dual-Port (a) and Single-Port (b) Memory
Table 4-2: SelectRAM 18K Block Memory Features and Applications (Cont’d)
Single-Port Yes
True Dual-Port Yes
ROM, Initial RAM Contents Yes
Mixed Data Port Widths Yes
Power-Up Condition User-defined data, defaults to zero
Potential Applications Local data storage, FIFOs, elastic stores, register files, buffers,
stacks, circular buffers, shift registers, delay lines, waveform
storage and generation, direct digital synthesis, CAMs,
associative memories, function tables, function generators,
wide logic functions, code converters, encoders, decoders,
counters, state machines, microsequencers, program storage
for embedded processor(s)
The Xilinx CORE Generator system supports various modules containing block RAM for
Spartan-3 devices including:
XC3S50/A/AN
XC3S100E
XC3S200/A/AN XC3S4000
XC3S400/A/AN XC3S5000
XC3S700A/AN
XC3S1400A/AN XC3SD1800A
XC3S250E
XC3S500E
XC3S1200E
XC3S1600E
XC3S1000
XC3S1500 XC3SD3400A
XC3S2000
Embedded Multipliers
2 CLBs 2 CLBs UG332_c4_12_011008
Figure 4-2: Block RAMs Arranged in Columns with Detailed Floorplan of XC3S200
digital signal processing functions. In the Spartan-3A DSP platform, the multiplier is
extended into the DSP48A block.
Special interconnect surrounding the block RAM provides efficient signal distribution for
address and data. Furthermore, special provisions allow multiple block RAMs to be
cascaded to create wider or deeper memories.
Data Flows
Spartan-3 generation block RAM is constructed of true dual-port memory and
simultaneously supports all the data flows and operations shown in Figure 4-3. Both ports
access the same set of memory bits but with two potentially different address schemes
depending on the port’s data width.
1. Port A behaves as an independent single-port RAM supporting simultaneous read and
write operations using a single set of address lines.
2. Port B behaves as an independent single-port RAM supporting simultaneous read and
write operations using a single set of address lines.
3. Port A is the write port with a separate write address, and Port B is the read port with
a separate read address. The data widths for Port A and Port B can be different also.
4. Port B is the write port with a separate write address, and Port A is the read port with
a separate read address. The data widths for Port B and Port A can be different also.
Write Read 3
4 Read Write
Port A
Port B
Spartan-3
Dual Port
Block RAM
Write Write
1 2
Read Read
X463_03_060606
Figure 4-3: Block RAM Support Single- and Dual-Port Data Transfers
Signals
The signals connected to a block RAM primitive divide into four categories, as listed
below. Table 4-4 lists the block RAM interface signals, the signal names for both single-port
and dual-port memories, and signal direction.
Signals
operation, the behavior of the data output latches is controlled by the WRITE_MODE
attribute (see “Read Behavior During Simultaneous Write — WRITE_MODE,” page 173).
Signals
35 34 33 32 31 24 23 16 15 8 7 0
1 0
7 6 F
Byte 3
5 4 E
3 2 D
(16Kbits data) 1 0 C
8Kx2
No Parity
7 6 3
Byte 0
5 4 2
3 2 1
1 0 0
0
7 1F
Byte 3
6 1E
5 1D
4 1C
16Kx1
3 3
Byte 0
2 2
1 1
0 0
X463_04_062503
Address Input
As dual-port RAM, both ports operate independently while accessing the same set of
18 Kbit memory cells.
Note: Whenever a block RAM port is enabled (ENA or ENB = High), all address transitions must
meet the data sheet setup and hold times with respect to the port clock (CLKA or CLKB). This
requirement must be met even if the RAM read output is of no interest, or WE is deasserted, including
ROM mode. There are some instances in which these requirements might not be able to be met; for
instance, if there is a multi-cycle path on the address input signals, or while the clock is stabilizing.
Work around this by disabling the port via ENA/ENB during the time that the address inputs do not
meet setup and hold requirements. Deasserting ENA/ENB disables the port so that violating the
address input setup and hold requirements does not affect block RAM contents. Assert ENA/ENB
again when resuming normal read/write functionality.
Control Inputs
Clock — CLK (CLKA, CLKB)
Each port is fully synchronous with independent clock pins. All port input pins have setup
time referenced to the port CLK pin. The data bus has a clock-to-out time referenced to the
CLK pin. Clock polarity is configurable and is rising edge triggered by default.
With default polarity, a Low-to-High transition on the clock (CLK) input controls read,
write, and reset operations.
Signals
Table 4-6: RAMB16BWE/R Write Operations (Extended Spartan-3A Family FPGAs Only)
Table 4-6: RAMB16BWE/R Write Operations (Extended Spartan-3A Family FPGAs Only) (Cont’d)
Attributes
In synchronous mode, if RST and the enable signal EN are High, the data output latches
and optional output registers for the DO and DOP outputs are synchronously set to a ‘0’ or
‘1’ according to the SRVAL parameter.
In asynchronous mode, if RST and the enable signal EN are High, the data output latches
and optional output registers for the DO and DOP outputs are asynchronously set to a ‘0’
or ‘1’ according to the SRVAL parameter.
The mode is set by setting the RSTTYPE attribute to “SYNC” for synchronous operation or
“ASYNC” for asynchronous operation. The default for RSTTYPE is synchronous. Due to
improved timing and circuit stability, it is recommended to always have this set to "SYNC"
unless an asynchronous reset is absolutely necessary.
A RST operation does not affect block RAM cells and does not disturb write operations on
the other port.
The polarity of RST is configurable and is active High by default.
The RST input is available on the RAMB16BWER component for the Spartan-3A DSP
platform. The RAMB16 and RAMB16BWE components provide the SSR input instead.
Unused Inputs
Tie any unused data or address inputs to logic ‘1’. Connecting the unused inputs High
saves logic and routing resources compared to connecting the inputs Low.
Attributes
A block RAM has a number of attributes that control its behavior as shown in Table 4-7 for
VHDL and Verilog. The CORE Generator system uses slightly different values, as
described below.
Table 4-7: Block RAM Attributes and VHDL/Verilog Attribute Names (Cont’d)
Function VHDL or Verilog Attribute Default Value
Initial Content for Data Memory, INIT_xx Initialized to zero
Loaded during Configuration
Initial Content for Parity Memory, INITP_xx Initialized to zero
Loaded during Configuration
Data Output Latch Initialization INIT (single-port) Initialized to zero
INIT_A, INIT_B (dual-port)
Data Output Latch Synchronous SRVAL (single-port) Reset to zero
Set/Reset Value SRVAL_A, SRVAL_B (dual-port)
Data Output Latch Behavior during WRITE_MODE WRITE_FIRST
Write
Block RAM Location LOC N/A
Reset Type RSTTYPE SYNC
(Spartan-3A DSP FPGA only)
Number of Ports
Although physically dual-port memory, each block RAM performs as either single-port or
dual-port memory. The method to specify the number of ports depends on the design
entry tool.
X463_05_060606
Attributes
Memory Size
Figure 4-6: Selecting Memory Width and Depth in CORE Generator System
ADDRESS Y • WIDTH Y
START_ADDRESS X = INTEGER ⎛ ------------------------------------------------------------⎞
⎝ WIDTH X ⎠
( ( ADDRESS Y + 1 ) • WIDTH Y ) – 1
END_ADDRESS X = INTEGER ⎛ ----------------------------------------------------------------------------------------⎞
⎝ WIDTH X ⎠
Attributes
If, due the memory organization, one port includes parity bits and the other does not, then
the above equations are invalid and the values for width should only include the data bits.
The parity bits are not available on any port that is less than 8 bits wide.
Content Initialization
By default, block RAM is initialized with all zeros during the device configuration
sequence. However, the contents can also be initialized with user-defined data.
Furthermore, the RAM contents are protected against spurious writes during
configuration.
memory_initialization_radix=16;
memory_initialization_vector= 80, 0F, 00, 0B, 00, 0C, …, 81;
To include the coefficients file, locate the appropriate section in the CORE Generator
wizard and check Load Init File, as shown in Figure 4-8. Then, click Load File and select
the coefficients file.
Initial Contents
Global Init Value:
The INITP_xx attributes define the initial contents of the memory cells corresponding to
parity bits, i.e., those bits that connect to the DIP/DOP buses. By default these memory
cells are also initialized to all zeros.
The eight initialization attributes from INITP_00 through INITP_07 represent the
memory contents of parity bits. Each INITP_xx is a 64-digit (256-bit) hex-encoded bit
vector and behaves like an INIT_xx attribute. The same formula calculates the bit
positions initialized by a particular INITP_xx attribute.
Initial Contents
Global Init Value:
Figure 4-9: Specifying Initial Value for Block RAM Data Output Latches
Attributes
also the default behavior for Virtex-II and Virtex-II Pro devices. However, READ_FIRST
mode is the most useful as it increases the efficiency of block RAM at each clock cycle,
allowing designs to use maximum bandwidth. In READ_FIRST mode, a memory port
supports simultaneous read and write operations to the same address on the same clock
edge, free of any timing complications.
Table 4-11 outlines how the WRITE_MODE setting affects the output data latches on the
same port, and how it affects the output latches on the opposite port during a
simultaneous access to the same address.
Table 4-11: WRITE_MODE Affects Data Output Latches During Write Operations
Effect on Opposite Port
Write Mode Effect on Same Port (Dual-Port Mode Only, Same Address)
WRITE_FIRST Data on DI, DIP inputs written into specified RAM Invalidates data on DO, DOP outputs.
Read After Write location and simultaneously appears on DO, DOP
(Default) outputs.
READ_FIRST Data from specified RAM location appears on DO, Data from specified RAM location appears
Read Before Write DOP outputs. on DO, DOP outputs.
(Recommended) Data on DI, DIP inputs written into specified
location.
NO_CHANGE Data on DO, DOP outputs remains unchanged. Invalidates data on DO, DOP outputs.
No Read on Write Data on DI, DIP inputs written into specified
location.
Mode selection is set by configuration. One of these three modes is set individually for
each port by an attribute. The default mode is WRITE_FIRST.
Data_in Data_out
WE
EN
CLK
Address RAM Location
Attributes
Figure 4-12 demonstrates that a valid write operation during a valid read operation results
in the write data appearing on the data output.
CLK
WE
Address aa bb cc dd
ENABLE
X463_12_020503
Data_in Data_out
WE
EN
CLK
Address RAM Location
Figure 4-14 demonstrates that the older RAM data always appears on the data output,
regardless of a simultaneous write operation.
CLK
WE
Address aa bb cc dd
ENABLE
X463_14_020503
This mode is particularly useful for building circular buffers and large, block-RAM-based
shift registers. Similarly, this mode is useful when storing FIR filter taps in digital signal
processing applications. Old data is copied out from RAM while new data is written into
RAM.
NO_CHANGE Mode
In NO_CHANGE mode, the output latches are disabled and remain unchanged during a
simultaneous write operation, as shown in Figure 4-15. This behavior mimics that of
simple synchronous memory where a memory location is either read or written during a
clock cycle, but not both.
Data_in Data_out
WE
EN
CLK
Address RAM Location
The NO_CHANGE mode is useful in a variety of applications, including those where the
block RAM contains waveforms, function tables, coefficients, and so forth. The memory
can be updated without affecting the memory output.
Figure 4-16 shows that the data output retains the last read data if there is a simultaneous
write operation on the same port.
Attributes
CLK
WE
Address aa bb cc dd
ENABLE
X463_16_020503
Write Mode
Read After Write Read Before Write No Read On Write
Upper Upper
Left Right
RAMB16_X0Y(m-1) RAMB16_X(n-1)Y(m-1)
XC3S50/A/AN
XC3S100E
n = total columns
m = total rows
XC3S200/A/AN XC3S4000
XC3S400/A/AN XC3S5000
XC3S700A/AN XC3SD1800A
XC3S1400A/AN XC3SD3400A
XC3S250E
XC3S500E
RAMB16_X0Y0 XC3S1200E
Lower Lower XC3S1600E
Left Right XC3S1000
XC3S1500
XC3S2000
RAMB16_X(n-1)Y0 UG331_c4_13_033007
Location attributes cannot be specified directly in the CORE Generator system. However,
location constraints can be added to VHDL or Verilog instantiations.
Notes:
1. No Chg = No Change, addr = address to RAM, data = RAM data, pdata = RAM parity data.
2. Refer to “Content Initialization,” page 171.
3. Refer to “Data Output Latch Initialization,” page 172.
4. Refer to “Data Output Latch Synchronous Set/Reset Value,” page 173.
5. Refer to “WRITE_FIRST or Transparent Mode (Default),” page 174.
6. Refer to “READ_FIRST or Read-Before-Write Mode,” page 175.
7. Refer to “NO_CHANGE Mode,” page 176.
General Characteristics
• A write operation requires only one clock edge.
• A read operation requires only one clock edge.
• All inputs are registered with the port clock and have a setup-to-clock timing
specification.
• All outputs have a read-through function or one of three read-during-write functions,
depending on the state of the WE pin. The outputs relative to the port clock are
available after the clock-to-out timing interval.
• Block RAM cells are true synchronous RAMs and do not have a combinatorial path
from the address to the output.
• The ports are completely independent of each other without arbitration. Each port has
its own clocking, control, address, read/write functions, initialization, and data
width.
• Output ports are latched with a self-timed circuit, guaranteeing glitch-free read
operations. The state of the output port does not change until the port executes
another read or write operation.
B A B A
READ Port
CLK_A
Address_A aa bb
WE_A
Data_out_A UNKNOWN 4444
Address_B aa bb cc
Clock-to-clock
setup violation X463_19_020503
The first rising edge on CLK_A violates the clock-to-clock setup parameter, because it
occurs too soon after the last CLK_B clock edge. The write operation on port B is valid
because Data_in_B, Address_B, and WE_B all had sufficient setup time before the rising
edge on CLK_B. Unfortunately, the read operation on port A is invalid because it depends
on the RAM contents being written to Address_B and the read clock, CLK_A, happened
too soon after the write clock, CLK_B.
On the second rising edge of CLK_B, there is another valid write operation to port B. The
memory location at address (bb) contains 4444. Data on the Data_out_A port is still invalid
because there has not been another rising clock edge on CLK_A. The second rising edge of
CLK_A reads the new data at location (bb), which now contains 4444. This time, the read
operation is valid because there has been sufficient setup time between CLK_B and
CLK_A.
Notes:
1. ADDRA=ADDRB, ENA=1, ENB=1, DIPA ≠ DIPB, DIA ≠ DIB, ?=Unknown or invalid data.
Notes:
1. ADDRA=ADDRB, ENA=1, ENB=1, ?=Unknown or invalid data
Conflict Resolution
There is no dedicated monitor to arbitrate the result of identical addresses on both ports.
The application must time the two clocks appropriately. However, conflicting
simultaneous writes to the same location never cause any physical damage.
Both modules are parameterizable as with most CORE Generator modules. To create a
module, specify the component name and choose to include or exclude control inputs, and
choose the active polarity for the control inputs. For the Dual-Port Block Memory, once the
organization or aspect ratio for Port A is selected, only the valid options for Port B are
displayed.
Optionally, specify the initial memory contents. Unless otherwise specified, each memory
location initializes to zero. Enter user-specified initial values via a Memory Initialization
File, consisting of one line of binary data for every memory location. A default file is
generated by the CORE Generator system. Alternatively, create a coefficients file (.coe),
which not only defines the initial contents in a radix of 2, 10, or 16, but also defines all the
other control parameters for the CORE Generator system.
The output from the CORE Generator system includes a report on the options selected and
the device resources required. If a very deep memory is generated, some external
multiplexing might be required, and these resources are reported as the number of logic
slices required. In addition, the software reports the number of bits available in block RAM
that are less than 100% utilized. For simulation purposes, the CORE Generator system
creates VHDL or Verilog behavioral models.
• CORE Generator: Single-Port Block Memory module (RAM or ROM)
• CORE Generator: Dual-Port Block Memory module (RAM or ROM)
Instantiation Templates
For VHDL- and Verilog-based designs, various instantiation templates are available to
speed development. Within the Xilinx ISE Project Navigator, select Edit Æ Language
Templates from the menu, and then select VHDL or Verilog, followed by Component
Instantiation Æ Block RAM from the selection tree.
The appendices include example code showing how to instantiate block RAM in both
VHDL and Verilog.
In VHDL, each template has a component declaration section and an architecture section.
Each part of the template must be inserted within the VHDL design file. The port map of
the architecture section must include the signal names used in the application.
The SelectRAM_Ax templates (with x = 1, 2, 4, 9, 18, or 36) are single-port modules and
instantiate the corresponding RAMB16_Sx module.
FIFOs
First-In, First-Out (FIFO) memories, also known as elastic stores, are perhaps the most
common application of block RAM, other than for random data storage. FIFOs typically
resynchronize data, either between two different clock domains, or between two parts of a
system that have different data rates, even though they operate from a single clock. The
Xilinx CORE Generator system provides two parameterizable FIFO modules, one a
synchronous FIFO where both the read and write clocks are synchronous to one another
and the other an asynchronous FIFO where the read and write clocks are different.
Application note XAPP261 demonstrates that the FIFO read and write ports can be
different data widths, integrating the data width converter into the FIFO.
Application note XAPP291 describes a self-addressing FIFO that is useful for throttling
data in a continuous data stream.
• CORE Generator: Synchronous FIFO module
• CORE Generator: Asynchronous FIFO module
• XAPP258: FIFOs Using Block RAM, includes reference design
• XAPP261: Data-Width Conversion FIFOs Using Block RAM Memory, includes reference
design
• XAPP291: Self-Addressing FIFO
Data2MEM
Utility
Figure 4-20: The Data2BRAM Utility Updates Block RAM Contents in a Bitstream
ADDR[m]
ADDR[m -1:0]
Single-Port A
DIA DOA
WEA
ENA
SSRA
CLKA
CLKB
SSRB
Single-Port B
ENB
WEB
DIB DOB
ADDR[n -1:0]
ADDR[n]
X463_21_062503
Figure 4-21: One Block RAM Becomes Two Independent Single-Port RAMs
Both ports are independent, each with its own memory organization, data inputs and
outputs, clock input, and control signals. For example, Port A could be 256x36 while Port
B is 2Kx4.
Figure 4-21 splits the available memory evenly between the two ports. With additional
logic on the upper address lines, the memory can be split into other ratios.
ADDR[8]
ADDR[7:0] ADDRA[7:0]
DI[71:36] DIA DOA DO[71:36]
WE WEA
ENA ENA
SSR SSRA
CLK CLKA
CLKB
SSRB
ENB
WEB
DI[35:0] DIB DOB DO[35:0]
ADDRB[7:0]
ADDRB[8]
X463_22_062403
The most-significant address line, ADDR[8] is tied High on one port and Low on the other.
Both ports share the same the address inputs, control inputs, and clock input.
n -1 n -2
n …
IN 0 3
OUT
1 2
X463_20_020503
Figure 4-24 describes the hardware implementation to create a circular buffer using block
RAM. A modulo-n counter drives the address inputs to a single-port block RAM. For
simple data delay lines, the block RAM writes new data on every clock cycle.
The circular buffer also reads the delayed data value on every clock edge. Using block
RAM’s READ_FIRST write mode, both the incoming write data and the outgoing read
data use the same clock input and the same clock edge, both simplifying the design and
improving overall performance. The actual write and read behavior is described in
Figure 4-17.
IN DI DO OUT
ADDR
Counter
WE
EN
SSR
CLK
WRITE_MODE=READ_FIRST
X463_24_020503
Figure 4-24: Circular Buffer Implementation Using Block RAM and Counter
In Figure 4-24, the width of the IN and OUT data ports is identical, although they do not
need be. Using dual-port mode, the ports can be different widths. Figure 4-25 shows an
example where byte-wide data enters the block RAM and a 32-bit word exits the block
RAM. Furthermore, the data can be delayed up to 2,048 byte-clock cycles.
Byte 3
Byte 2
Byte 1
Byte 0
IN
Block RAM
Circular Buffer/ Data delayed up to
2,048 clock cycles
Delay Buffer
OUT
Figure 4-25: Merge Circular Buffer and Port-Width Converter into a Single Block
RAM
A single block RAM is configured as dual-port memory. The incoming byte-wide data
feeds Port B, which is configured as a 2Kx9 memory. The outgoing 32-bit data appears on
Port A and consequently, Port A is configured as a 512x36 memory.
512x36
ADDRA[8:0]
÷n
EN DOA[31:0] OUT
WEA
ENA
CLKA
CLKB
ENB
WEB
IN
DIB[7:0]
TC
ADDRB[11:2]
÷4 ADDRB[1:0]
2Kx9
X463_26_062503
Manipulating the addresses that feeds both ports creates the 4n-byte clock delay. Every
32-bit output word requires four incoming bytes. Consequently, a divide-by-4 counter
feeds the two lower address bits, ADDRB[1:0]. After four bytes are stored, a terminal
count, TC, from the lower counter enables Port A plus a separate divide-by-n counter. The
enable signal latches the 32-bit output data on Port B and increments the upper counter.
The combination of the divide-by-4 counter and the divide-by-n counter effectively create
a divide-by-4n counter. The output from the divide-by-n counter forms the more-
significant address bits to Port B, ADDRB[11:2] and the entire address to Port A,
ADDRA[9:0].
7 State Bits
38 Output Bits
256x36
7 bits State Outputs 36 bits
Control
Output
X463_27_062503
Figure 4-27: 128-State Finite State Machine with 38 Outputs in a Single Block RAM
A dual-port block RAM is divided into two completely independent half-size, single-port
memories by tying the most-significant address bit of one port High and the other one
Low, similar to Figure 4-21. Port A is configured as 2Kx9 but used as a 1Kx9 single-port
ROM. Seven outputs feed back as address inputs, stepping through the 128 states. The
1Kx9 ROM has ten total address lines, seven of which are the current-state inputs and the
remaining three address inputs determine the eight-way branch. Any of the 128 states can
conditionally branch to any set of eight new states, under the control of these three address
inputs.
Port B is configured as 512 x 36 and used as a 256 x 36 single-port ROM. It receives the same
7-bit current-state value from Port A, and drives 36 outputs that can be arbitrarily defined
for each state. However, due to the synchronous nature of block ROM, the 36 outputs from
the 256x36 ROM are delayed by one clock cycle. The eighth address input can invoke an
alternate definition of the 36 outputs. Two additional state bits are available from the 1Kx9
block, but are not delayed by one clock.
This same basic architecture can be modified to form a 256-state finite state machine with
four-way branch, or a 64-state state machine with 16-way branch.
If additional branch-control inputs are needed, they can be combined using an input
multiplexer. The advantages of this design are its low cost (a single block RAM), its high
performance (125+ MHz), the absence of layout or routing issues, and complete design
freedom.
1Kx18
TC ADDRB[9:0]
DOA[9:0] COUNT[19:10]
CNT[9:0] DOA[10] TERM_COUNT
DOA[17:11]
(unused)
WEA
EN ENA
SSR SSRA
CLK CLKA
TC CLKB
SSRB Terminal Count
CNT[9:0] ENA ENB from Port B enables
WEB Port A every 1024
(unused) clock cycles.
DOB[17:11]
DOB[10]
EN DOB[9:0] COUNT[9:0]
ADDRB[9:0]
1Kx18
UG331_c4_15_120406
Figure 4-28: Two 10-Bit Counters Create a 20-Bit Binary Counter Using a Single Block RAM
A 20-bit binary counter can be constructed from two identical 10-bit binary counters, with
the lower 10-bit counter enabling the upper 10-bit counter every 1024 clock cycles. In this
example, Port B is a 1Kx18 ROM (WEB is Low) that forms the lower 10-bit counter. The 10
less-significant data outputs, representing the current state, connect directly to the 10
address inputs, ADDRB[9:0]. The next state is looked up in the ROM using the current state
applied to the address pins. The 11th data bit, D[10], forms the terminal-count output from
the counter. In this example, the upper seven data bits, DOB[17:11] are unused.
The next-state logic for a binary counter appears in Table 4-15. The counter starts at state
0—or the value specified by the INIT or SRVAL attributes—and counts through to 0x3FF
(1023 decimal) at which time the terminal count, D[10], is active and the counter rolls over
back to 0.
TC COUNT
ADDR[9:0] D[9:0]
(Hex) D[10] (Hex)
0 0 1
1 0 2
2 0 3
… … …
3FFF 1 0
Port A is configured nearly identically to Port B, except that Port A is enabled by the
terminal count output from Port B. The 10-bit counter in Port A has the identical counting
pattern as Port B, except that it increments at 1/1024th the rate of Port B.
With a simple modification, the 20-bit up counter becomes an 18-bit up/down counter.
Using the most-significant address input as a direction control, the same basic counter
architecture either increments or decrements its count, as shown in Table 4-16. In this
example, the counter increments when the Up/Down control is Low and decrements
when High. The ROM is split between the incrementing and decrementing next-state logic.
1 1FFF 0 1FFE
(Down)
1FFE 0 1FFD
1FFD 0 1FFC
… … …
0 1 1FFF
• Counters with other incrementing and decrementing patterns including fast gray-
code counters.
• A six-digit BCD counter in one block ROM, configured as 512x36, plus one CLB.
Four-Port Memory
Each block RAM is physically a dual-port memory. However, due to the block RAM’s fast
access performance, it is possible to create multi-port memories by time-division
multiplexing the signals in and out of the memory. A block RAM with some additional
logic easily supports up to four ports but at the cost of additional access latency for each
port. The following application note provides additional details and a reference design.
• XAPP228: Quad-Port Memories in Virtex Devices, includes reference design
• Various other combinations are possible, but might have restrictions to the number of
inputs, the number of shared inputs, or the complexity of the logic function.
Due to the flexibility and speed of CLB logic, block RAM might not be faster or more
efficient for simple wide functions like an address decoder, where multiple inputs are
ANDed together. Block RAM is faster and more efficient for complex logic functions, such
as majority decoders, pattern matching, and correlators.
1 0
•
•
•
•
•
• Σ
0 1 > MATCH
Threshold
Is number of
matching bits
greater than
threshold?
Number of bits
that must match X463_29_040403
If the application requires a new matching pattern or different logic function, it could be
loaded via the second memory port.
Implemented in CLB logic, this function would require numerous logic cells and multiple
layers of logic. However, because the MATCH, MASK, and Threshold values are known in
advance, the function can be pre-computed and then stored in block RAM. For each input
condition, i.e., starting at address 0 and incremented through the entire memory, the
output condition can be precomputed. A 14-input fuzzy pattern matching circuit requires
a single block RAM and performs the operation in a single clock cycle.
flop with whatever LUT logic is driving it. No register is packed into block RAM without
LUT logic, and vice versa.
To specify which register outputs are converted to block RAM outputs, create a file
containing a list of the net names connected to the register output(s). Set the environment
variable XIL_MAP_BRAM_FILE to the file name, which instructs the mapping software to
use this file. The MAP program looks for this environment variable whenever the –bp
option is specified. Only those output nets listed in the file are converted into block RAM
outputs.
• PCs:
set XIL_MAP_BRAM_FILE=file_name
• Workstations:
setenv XIL_MAP_BRAM_FILE file_name
Port A
Active Waveform
DOA
ADDRA
DIB
ADDRB
Port B
Update Waveform X463_30_062503
Figure 4-30: Dual-Port Block RAM Facilitates Waveform Storage and Updates
Conclusion
The Spartan-3 generation FPGA’s abundant, fast, and flexible block RAMs provide
invaluable on-chip local storage for scratchpad memories, FIFOs, buffers, look-up tables,
and much more. Using unique capabilities, block RAM implements such functions as shift
registers, delay lines, counters, and wide, complex logic functions.
Block RAM is supported in applications using the broad spectrum of Xilinx ISE
development software, including the CORE Generator system and can be inferred or
directly instantiated in VHDL or Verilog synthesis designs.
"000000000000000000000000000000000000000000000000FEDCBA9876543210";
attribute INIT_01 of U_RAMB16_S36: label is
"0000000000000000000000000000000000000000000000000000000000000000";
... (snip)
attribute INIT_3F of U_RAMB16_S36: label is
"0000000000000000000000000000000000000000000000000000000000000000";
--
-- Signal Declarations:
--
-- signal VCC : std_logic;
-- signal GND : std_logic;
signal CLK_BUFG: std_logic;
signal INV_SET_RESET : std_logic;
--
begin
-- VCC <= ’1’;
-- GND <=’0’;
--
-- Instantiate the clock buffer
U_BUFG: BUFG
port map (
I => CLK,
O => CLK_BUFG
);
--
-- Use of the free inverter on SSR pin
INV_SET_RESET <= NOT SET_RESET;
-- Block SelectRAM Instantiation
U_RAMB16_S36: RAMB16_S36
port map (
DI => DATA_IN (31 downto 0), -- insert 32 bits data-in bus (<31 downto 0>)
DIP => DATA_IN (35 downto 32), -- insert 4 bits parity data-in bus (or <35
-- downto 32>)
ADDR => ADDRESS (8 downto 0), -- insert 9 bits address bus
EN => ENABLE, -- insert enable signal
WE => WRITE_EN, -- insert write enable signal
SSR => INV_SET_RESET, -- insert set/reset signal
CLK => CLK_BUFG, -- insert clock signal
DO => DATA_OUT (31 downto 0), -- insert 32 bits data-out bus (<31 downto 0>)
DOP => DATA_OUT (35 downto 32) -- insert 4 bits parity data-out bus (or <35
-- downto 32>)
);
--
end XC3S_RAMB_1_PORT_arch;
/* attribute
WRITE_MODE "READ_FIRST"
INIT "000000000"
SRVAL "012345678"
INITP_00
"0123456789ABCDEF000000000000000000000000000000000000000000000000"
INITP_01
"0000000000000000000000000000000000000000000000000000000000000000"
... (snip)
INITP_07
"0000000000000000000000000000000000000000000000000000000000000000"
INIT_00
"0123456789ABCDEF000000000000000000000000000000000000000000000000"
INIT_01
"0000000000000000000000000000000000000000000000000000000000000000"
... (snip)
INIT_3F
"0000000000000000000000000000000000000000000000000000000000000000"
*/
endmodule
Chapter 5
CLB Array
The CLBs are arranged in a regular array of rows and columns as shown in Figure 5-1.
Each density varies by the number of rows and columns of CLBs (see Table 5-1).
IOBs
Slice
CLB DS312-2_31_021205
Slices
Slices
Each CLB comprises four interconnected slices, as shown in Figure 5-3. These slices are
grouped in pairs. Each pair is organized as a column with an independent carry chain. The
left pair supports both logic and memory functions and its slices are called SLICEM. The
right pair supports logic only and its slices are called SLICEL. Therefore half the LUTs
support both logic and memory (including both RAM16 and SRL16 shift registers) while
half support logic only, and the two types alternate throughout the array columns. The
SLICEL reduces the size of the CLB and lowers the cost of the device, and can also provide
a performance advantage over the SLICEM.
COUT
YBMUX
YB
FXINA
Fi
FXINB GYMUX
G[4:1]
G-LUT DYMUX
D Q YQ
FFY
WS DI CE
CK
SR REV
BY
Top Portion
CE
CLK
SR
Common Logic
WS DI
F[4:1]
F-LUT
WF[4:1]
FFX
CYINIT
BX
Bottom Portion
Notes:
1. Options to invert signal polarity as well as other options that enable lines for various functions are not shown.
2. The index i can be 6, 7, or 8, depending on the slice. The upper SLICEL has an F8MUX, and the upper SLICEM has an
F7MUX. The lower SLICEL and SLICEM both have an F6MUX.
Slice Overview
CLB
SLICE
X1Y1
SLICE
X1Y0
COUT
Switch Interconnect
Matrix CIN to Neighbors
SLICE
X0Y1
SHIFTOUT
SHIFTIN
SLICE
X0Y0
CIN DS099-2_05_082104
Slice Overview
A slice includes two LUT function generators and two storage elements, along with
additional logic, as shown in Figure 5-4.
Both SLICEM and SLICEL have the following elements in common to provide logic,
arithmetic, and ROM functions:
• Two 4-input LUT function generators, F and G
• Two storage elements
• Two wide-function multiplexers, F5MUX and FiMUX
• Carry and arithmetic logic
F5MUX F5MUX
SRL16
RAM16 Carry Register Carry Register
LUT4 (F) LUT4 (F)
Logic Cells
The combination of a LUT and a storage element is known as a “Logic Cell”. The
additional features in a slice, such as the wide multiplexers, carry logic, and arithmetic
gates, add to the capacity of a slice, implementing logic that would otherwise require
additional LUTs. Benchmarks have shown that the overall slice is equivalent to 2.25 simple
logic cells. This calculation provides the equivalent logic cell count shown in Table 5-1.
Slice Details
Figure 5-2 is a detailed diagram of the SLICEM. It represents a superset of the elements and
connections to be found in all slices. The dashed and gray lines (blue when viewed in
color) indicate the resources found only in the SLICEM and not in the SLICEL.
Each slice has two halves, which are differentiated as top and bottom to keep them distinct
from the upper and lower slices in a CLB. The control inputs for the clock (CLK), Clock
Enable (CE), Slice Write Enable (SLICEWE1), and Reset/Set (RS) are shared in common
between the two halves.
The LUTs located in the top and bottom portions of the slice are referred to as "G" and "F",
respectively, or the "G-LUT" and the "F-LUT". The storage elements in the top and bottom
portions of the slice are called FFY and FFX, respectively.
Each slice has two multiplexers with F5MUX in the bottom portion of the slice and FiMUX
in the top portion. Depending on the slice, the FiMUX takes on the name F6MUX, F7MUX,
or F8MUX, according to its position in the multiplexer chain. The lower SLICEL and
SLICEM both have an F6MUX. The upper SLICEM has an F7MUX, and the upper SLICEL
has an F8MUX.
The carry chain enters the bottom of the slice as CIN and exits at the top as COUT. Five
multiplexers control the chain: CYINIT, CY0F, and CYMUXF in the bottom portion and
Slice Details
CY0G and CYMUXG in the top portion. The dedicated arithmetic logic includes the
exclusive-OR gates XORF and XORG (bottom and top portions of the slice, respectively) as
well as the AND gates FAND and GAND (bottom and top portions, respectively).
See Table 5-2 for a description of all the slice input and output signals.
Look-Up Tables
The Look-Up Table or LUT is a RAM-based function generator and is the main resource for
implementing logic functions. Furthermore, the LUTs in each SLICEM pair can be
configured as Distributed RAM or a 16-bit shift register, as described later.
Each of the two LUTs (F and G) in a slice have four logic inputs (A1-A4) and a single output
(D). Any four-variable Boolean logic operation can be implemented in one LUT. Functions
with more inputs can be implemented by cascading LUTs or by using the wide function
multiplexers that are described later.
Wide Multiplexers
The output of the LUT can connect to the wide multiplexer logic, the carry and arithmetic
logic, or directly to a CLB output or to the CLB storage element. See Figure 5-5.
4
G[4:1] A[4:1] D YQ
FFY
G-LUT
4
F[4:1] A[4:1] D XQ
FFX
F-LUT
DS312-2_33_111105
Wide Multiplexers
Wide-function multiplexers effectively combine LUTs in order to permit more complex
logic operations. Each slice has two of these multiplexers with F5MUX in the bottom
portion of the slice and FiMUX in the top portion. The F5MUX multiplexes the two LUTs in
a slice. The FiMUX multiplexes two CLB inputs which connect directly to the F5MUX and
FiMUX results from the same slice or from other slices. For more information on the wide
multiplexers, see Chapter 8, “Using Dedicated Multiplexers.”
Storage Elements
The storage element, which is programmable as either a D-type flip-flop or a level-
sensitive transparent latch, provides a means for synchronizing data to a clock signal,
among other uses. The storage elements in the top and bottom portions of the slice are
called FFY and FFX, respectively. FFY has a fixed multiplexer on the D input selecting
either the combinatorial output Y or the bypass signal BY. FFX selects between the
combinatorial output X or the bypass signal BX.
The functionality of a slice storage element is identical to that described earlier for the I/O
storage elements. All signals have programmable polarity; the default active-High
function is described.
The control inputs R, S, CE, and C are all shared between the two flip-flops in a slice.
FDRSE
D Q
CE
C
R
DS312-2_40_021305
Figure 5-6: FD Flip-Flop Component with Synchronous Reset, Set, and Clock
Enable
Table 5-4: FD Flip-Flop Functionality with Synchronous Reset, Set, and Clock
Enable
Inputs Outputs
R S CE D C Q
1 X X X ↑ 0
0 1 X X ↑ 1
0 0 0 X X No Change
0 0 1 1 ↑ 1
0 0 1 0 ↑ 0
Storage Elements
Initialization
The CLB storage elements are initialized at power-up, during configuration, by the global
GSR signal, and by the individual SR or REV inputs to the CLB. The storage elements can
also be re-initialized using the GSR input on the STARTUP primitive. See “Global
Controls,” page 401.
Table 5-5: Slice Storage Element Initialization
Signal Description
SR Set/Reset input. Forces the storage element into the state specified by the attribute
SRHIGH or SRLOW. SRHIGH forces a logic “1” when SR is asserted. SRLOW
forces a logic “0”. For each slice, set and reset can be set to be synchronous or
asynchronous.
REV Reverse of Set/Reset input. A second input (BY) forces the storage element into the
opposite state. The reset condition is predominant over the set condition if both are
active. Same synchronous/asynchronous setting as for SR.
GSR Global Set/Reset. GSR defaults to active High but can be inverted by adding an
inverter in front of the GSR input of the STARTUP element. The initial state after
configuration or GSR is defined by a separate INIT0 and INIT1 attribute. By
default, setting the SRLOW attribute sets INIT0, and setting the SRHIGH attribute
sets INIT1.
Timing Parameters
There are several possible paths through the CLB. For any timing parameter, examine the
source and destination to help define the path. Most timing parameters have names based
on the source and destination. Setup time parameters typically are named according to the
input pin followed by “CK”, such as tCECK for setup from CE to CLK. Hold time
parameters are named with “CK” followed by the input pin, such as tCKCE for hold time
from CLK to CE. Table 5-6 defines the most common CLB timing parameters.
Distributed RAM
The LUTs in the SLICEM can be programmed as distributed RAM. This type of memory
affords moderate amounts of data buffering anywhere along a data path. One SLICEM
LUT stores 16 bits (RAM16). For more information on the distributed RAM, see Chapter 6,
“Using Look-Up Tables as Distributed RAM.”
Shift Registers
It is possible to program each SLICEM LUT as a 16-bit shift register. Used in this way, each
LUT can delay serial data anywhere from 1 to 16 clock cycles without using any of the
dedicated flip-flops. The resulting programmable delays can be used to balance the timing
of data pipelines. For more information on the shift registers, see Chapter 7, “Using Look-
Up Tables as Shift Registers (SRL16).”
Related Materials
The following documents provide supplementary information useful with this chapter:
• WP272: Get Smart About Reset: Think Local, Not Global
Applying a global reset to your FPGA designs is not a very good idea and should be
avoided. This is a controversial issue, so this white paper looks at the reasons why
such a design policy should be considered.
• WP273: Performance + Time = Memory (Cost Saving with 3-D Design)
Operating logic at a higher rate than the processing rate allows operations to be
achieved sequentially. As with a processor, logic is timeshared over multiple clock
cycles. Memory holds values not being used on a given clock cycle. The FPGA can be
considered to be a three-dimensional volume to be filled. "Performance + Time =
Memory" is a strange formula, but when understood, it can often result in
significantly lower cost implementations with Xilinx devices.
• WP275: Get Your Priorities Right - Make Your Design up to 50% Smaller
This white paper describes a rarely noticed design technique that can make a
difference in the size and the performance of your FPGA design. Control signals on
FPGA flip-flops have a built-in priority. If you can learn to write code that is
sympathetic to the priorities, the results will be rewarding. This white paper provides
some simple VHDL and Verilog examples to explain key points.
Chapter 6
Introduction
In addition to the embedded 18-Kbit block RAMs,
Spartan-3 generationSpartan-3 generation FPGAs feature distributed RAM within each
Configurable Logic Block (CLB). Each SLICEM function generator or LUT within a CLB
resource optionally implements a 16-deep x 1-bit synchronous RAM. The LUTs within a
SLICEL slice do not have distributed RAM.
Distributed RAM writes synchronously and reads asynchronously. However, if required
by the application, use the register associated with each LUT to implement a synchronous
read function. Each 16 x 1-bit RAM is cascadable for deeper and/or wider memory
applications, with a minimal timing penalty incurred through specialized logic resources.
Spartan-3 generation CLBs support various RAM primitives up to 64-deep by 1-bit-wide.
Two LUTs within a SLICEM slice combine to create a dual-port 16x1 RAM—one LUT with
a read/write port, and a second LUT with a read-only port. One port writes into both 16x1
LUT RAMs simultaneously, but the second port reads independently.
Distributed RAM is crucial to many high-performance applications that require relatively
small embedded RAM blocks, such as FIFOs or small register files. The Xilinx
CORE Generator software automatically generates optimized distributed RAMs for the
Spartan-3 generation architecture. Similarly, the CORE Generator system creates
Asynchronous and Synchronous FIFOs using distributed RAMs.
• Single-port RAM with synchronous write and asynchronous read. Synchronous reads
are possible using the flip-flop associated with distributed RAM.
• Dual-port RAM with one synchronous write and two asynchronous read ports. As
above, synchronous reads are possible.
As illustrated in Figure 6-1, dual-port distributed RAM has one read/write port and an
independent read port.
WCLK WCLK
Read Port
DPO
Address
Read
x464_01_062503
Any write operation on the D input and any read operation on the SPO output can occur
simultaneously with and independently from a read operation on the second read-only
port, DPO.
Write Operations
The write operation is a single clock-edge operation, controlled by the write-enable input,
WE. By default, WE is active High, although it can be inverted within the distributed RAM.
When the write enable is High, the clock edge latches the write address and writes the data
on the D input into the selected RAM location.
When the write enable is Low, no data is written into the RAM.
Read Operation
A read operation is purely combinatorial. The address port—either for single- or dual-port
modes—is asynchronous with an access time equivalent to a LUT logic delay.
Characteristics
WCLK
DATA_IN d
ADDRESS aa
WRITE_EN
twrite
tread tread
DATA_OUT MEM(aa) d
Previous New
Data Data
x464_02_070303
Characteristics
• A write operation requires only one clock edge.
• A read operation requires only the logic access time.
• Outputs are asynchronous and dependent only on the LUT logic delay.
• Data and address inputs are latched with the write clock and have a setup-to-clock
timing specification. There is no hold time requirement.
• For dual-port RAM, the A[#:0] port is the write and read address, and the DPRA[#:0]
port is an independent read-only address.
cascading the G-LUT address lines, which are used for both read and write operations, to
the F-LUT write address lines (WF[4:1] in Figure 5-2, page 206), and by cascading the
G-LUT data input DI through the DIF_MUX in Figure 5-2 and to the DI input on the
F-LUT. One CLB provides a 16x1 dual-port memory as shown in Figure 6-5, page 224.
The INIT attribute can be used to preload the memory with data during FPGA
configuration. The default initial contents for RAM is all zeros. If WE is held Low, the
element can be considered a ROM. The ROM function can be implemented in the SLICEL.
Table 6-1: Distributed RAM Resources by FPGA Family and Device (Cont’d)
Feature Distributed RAM Blocks Distributed RAM Bits
XC3S4000 27,648 442,368
XC3S5000 33,280 532,480
Table 6-3 lists the various single- and dual-port distributed RAM primitives supported by
the different Xilinx FPGA families. For each type of RAM, the table indicates how many
instances of a particular primitive fit within a single CLB. For example, two 32x1 single-
port RAM primitives fit in a single Spartan-3 generation CLB. Similarly, two 16x1 dual-
port RAM primitives fit in a Spartan-3 generation CLB but a single 32x1 dual-port RAM
primitive does not.
Table 6-3: Single- and Dual-port RAM Primitives Supported in a CLB by Family
Single-Port RAM Dual-Port RAM
Family
16x1 32x1 64x1 128x1 16x1 32x1 64x1
Spartan-3 Generation FPGAs 4 2 1 - 2 - -
Spartan-II/Spartan-IIE FPGAs 4 2 1 - 2 - -
Virtex/Virtex-E FPGAs
Virtex-II/Virtex-II Pro FPGAs 8 4 2 1 4 2 1
Virtex-4 FPGAs 4 2 1 - 2 - -
Virtex-5 FPGAs 8 6 4 2 4 4 2
Library Primitives
There are four library primitives that support Spartan-3 generation distributed RAM,
ranging from 16 bits deep to 64 bits deep. All the primitives are one bit wide. Three
primitives are single-port RAMs and one primitive is dual-port RAM, as shown in
Table 6-4.
The input and output data are one bit wide. However, several distributed RAMs,
connected in parallel, easily implement wider memory functions.
Figure 6-3 shows generic single-port and dual-port distributed RAM primitives. The
A[#:0] and DPRA[#:0] signals are address buses.
RAMyX1S RAMyX1D
D D
O SPO
WE WE
WCLK WCLK
R/W Port
A[#:0] A[#:0]
DPO
DPRA[#:0]
Read Port
X464_03_062503
Notes:
1. data_a = word addressed by bits A#-A0.
2. data_d = word addressed by bits DPRA#-DPRA0.
Signal Ports
As shown in Table 6-6, wider library primitives are available for 2-bit and 4-bit RAMs.
Signal Ports
Each distributed RAM port operates independently of the other while reading the same set
of memory cells.
Clock — WCLK
The clock is used for synchronous writes. The data and the address input pins have setup
times referenced to the WCLK pin. Active on the positive edge by default with built-in
programmable polarity.
Enable — WE
The enable pin affects the write functionality of the port. An inactive Write Enable prevents
any writing to memory cells. An active Write Enable causes the clock edge to write the data
input signal to the memory location pointed to by the address inputs. Active High by
default with built-in programmable polarity.
Data In — D
The data input provides the new data value to be written into the RAM.
Attributes
The INIT attribute is required for any ROM instantiation. The ROM is initialized to the
INIT value at configuration and does not change during operation. For example, on a
ROM16X1, the parameter INIT = 10A7 produces the following datastream:
0001 0000 1010 0111
Attributes
X0Y1 X1Y1
X0Y0 X1Y0
SLICEM SLICEL
Logic/ROM Logic/ROM only
Distributed RAM
Shift Register x464_04_070803
When a LOC property is assigned to a distributed RAM instance, the Xilinx ISE® software
places the instance in the specified location. Figure 6-4 shows the X,Y coordinates for the
slices in a Spartan-3 generation CLB. Again, only SLICEM slices support memory.
Distributed RAM placement locations use the slice location naming convention, allowing
LOC properties to transfer easily from array to array.
For example, the single-port RAM16X1S primitive fits in any LUT within any SLICEM. To
place the instance U_RAM16 in slice X0Y0, use the following LOC assignment:
INST "U_RAM16" LOC = "SLICE_X0Y0";
The 16x1 dual-port RAM16X1D primitive requires both 16x1 LUT RAMs within a single
SLICEM slice, as shown in Figure 6-5. The first 16x1 LUT RAM, with output SPO,
implements the read/write port controlled by address A[3:0] for read and write. The
second LUT RAM implements the independent read-only port controlled by address
DPRA[3:0]. Data is presented simultaneously to both LUT RAMs, again controlled by
address A[3:0], WE, and WCLK.
SLICEM
D 16x1 SPO
LUT
A[3:0] RAM
(Read/
Write) Optional
WE Register
WCLK
DPO
16x1
LUT
RAM
DPRA[3:0] (Read Optional
Only) Register
x464_05_062603
A 32x1 single-port RAM32X1S primitive fits in one slice, as shown in Figure 6-6. The 32 bits
of RAM are split between two 16x1 LUT RAMs within the SLICEM slice. The A4 address
line selects the active LUT RAM via the F5MUX multiplexer within the slice.
SLICEM
D Optional
16x1 Register
A[3:0] LUT
A4 RAM
WE
WCLK
F5MUX
16x1
LUT
RAM Optional
Register
x464_06_062603
The 64x1 single-port RAM64X1S primitive occupies both SLICEM slices in the CLB. The
read path uses both F5MUX and F6MUX multiplexers within the CLB.
Table 6-8 shows all Distributed RAM design elements and the number of slices required in
the Spartan-3 generation FPGA families.
memory_initialization_radix=16;
memory_initialization_vector= 80, 0F, 00, 0B, 00, 0C, …, 81;
Figure 6-7: A Simple Coefficients File (.coe) Example for a Byte-Wide Memory
The output from the CORE Generator system includes a report on the options selected and
the device resources required. If a very deep memory is generated, then some external
multiplexing might be required; these resources are reported as the number of logic slices
required. For simulation purposes, the CORE Generator system creates VHDL or Verilog
behavioral models.
The CORE Generator FIFO Generator supports both distributed and block RAMs.
• CORE Generator: Distributed Memory Module
http://www.xilinx.com/support/documentation/ip_documentation/dist_mem_gen_ds322.pdf
• CORE Generator: FIFO Generator
http://www.xilinx.com/support/documentation/ip_documentation/fifo_generator_ds317.pdf
It is still possible to directly instantiate distributed RAM, even if portions of the design
infer distributed RAM.
Instantiation Templates
For VHDL- and Verilog-based designs, various instantiation templates are available to
speed development. Within the Xilinx ISE Project Navigator, select Edit Æ Language
Templates from the menu, and then select VHDL or Verilog, followed by Device
Primitive Instantiation Æ FPGA Æ RAM/ROM Æ Distributed RAM from the selection
tree. Cut and paste the template into the source code for the application and modify it as
appropriate.
There are also downloadable VHDL and Verilog templates available for all single-port and
dual-port primitives. The RAM_xS templates (where x = 16, 32, or 64) are single-port
modules and instantiate the corresponding RAMxX1S primitive. The ‘S’ indicates single-
port RAM. The RAM_16D template is a dual-port module and instantiates the
corresponding RAM16X1D primitive. The ‘D’ indicates dual-port RAM.
• VHDL Distributed RAM Templates
xapp464_vhdl.zip
• Verilog Distributed RAM Templates
xapp464_verilog.zip
The following are single-port templates:
• RAM_16S
• RAM_32S
• RAM_64S
The following is a dual-port template:
• RAM_16D
In VHDL, each template has a component declaration section and an architecture section.
Insert both sections of the template within the VHDL design file. The port map of the
architecture section must include the design signal names.
Templates for the RAM_16S module are provided below as examples in both VHDL and
Verilog code.
-- Copy the following two statements and paste them before the
-- Entity declaration, unless they already exist.
Library UNISIM;
use UNISIM.vcomponents.all;
-- <-----Cut code below this line and paste into architecture body---->
RAM16X1S_inst : RAM16X1S
generic map (
INIT => X"0000")
port map (
O => O, -- RAM output
A0 => A0, -- RAM address[0] input
A1 => A1, -- RAM address[1] input
A2 => A2, -- RAM address[2] input
A3 => A3, -- RAM address[3] input
D => D, -- RAM data input
WCLK => WCLK, -- Write clock input
WE => WE -- Write enable input
);
RAM16X1S #(
.INIT(16'h0000) // Initial contents of RAM
) RAM16X1S_inst (
.O(O), // RAM output
.A0(A0), // RAM address[0] input
.A1(A1), // RAM address[1] input
.A2(A2), // RAM address[2] input
.A3(A3), // RAM address[3] input
.D(D), // RAM data input
.WCLK(WCLK), // Write clock input
.WE(WE) // Write enable input
);
Conclusion
Conclusion
Frequently FPGA designs require multiple small, fast, and flexible memories for system
configuration, control, and status functions. These memories are usually distributed
throughout the design. The distributed RAM in the Spartan-3 generation FPGAs is ideal
for such applications, and allows the CLBs to be changed from logic to memory "on
demand". These memories can then be linked together for various data width or depth
requirements. The Xilinx tools automatically use distributed RAM for small arrays or they
can be instantiated in a design.
Chapter 7
Introduction
Spartan-3 generation FPGAs can configure the look-up table (LUT) in a SLICEM slice as a
16-bit shift register without using the flip-flops available in each slice. Shift-in operations
are synchronous with the clock, and output length is dynamically selectable. A separate
dedicated output allows the cascading of any number of 16-bit shift registers to create
whatever size shift register is needed. Each CLB resource can be configured using four of
the eight LUTs as a 64-bit shift register.
This document provides generic VHDL and Verilog submodules and reference code
examples for implementing from 16-bit up to 64-bit shift registers. These submodules are
built from 16-bit shift-register primitives and from dedicated MUXF5, MUXF6, and
MUXF7 multiplexers.
These shift registers enable the development of efficient designs for applications that
require delay or latency compensation. Shift registers are also useful in synchronous FIFO
and Content-Addressable Memory (CAM) designs. To quickly generate a Spartan-3 shift
register without using flip-flops (i.e., using the SRL16 element(s)), use the CORE Generator
RAM-based Shift Register module.
LUT Structure
The LUT can be described as a 16:1 multiplexer with the four inputs serving as binary
select lines, and the values programmed into the LUT serving as the data being selected
(see Figure 7-1).
1 0 1 1 1 0 0 0 1 1 1 0 1 0 0 1
A[3:0] 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111
D x465_01_070603
With the SRL16 configuration, the fixed LUT values are configured instead as an
addressable shift register (see Figure 7-2). The shift register inputs are the same as those for
the synchronous RAM configuration of the LUT: a data input, clock, and clock enable (not
shown). A special output for the shift register is provided from the last flip-flop, called Q15
on the library primitives or MC15 in the FPGA Editor. The LUT inputs asynchronously (or
dynamically) select one of the 16 storage elements in the shift register.
Q15 or
MC15
DIN DQ DQ DQ DQ DQ DQ DQ DQ DQ DQ DQ DQ DQ DQ DQ DQ
CLK
A[3:0] 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111
D x465_02_040203
Each shift register provides a shift output MC15 for the last bit in each LUT, in addition to
providing addressable access to any bit in the shift register through the normal D output
(Figure 7-3). The address inputs A[3:0] are the same as the distributed RAM address lines,
which come from the LUT inputs F[4:1] or G[4:1].
SHIFTIN SRLC16
SHIFT-REG
4 D Output
A[3:0] A[3:0]
MC15 Registered
D Q
WS DI Output
DI (BY)
WSG (optional)
CE (SR) WE
CLK CK
SHIFTOUT
or YB X465_03_040203
Registered Output
Each SRL16 LUT has an associated flip-flop that makes up the overall logic cell. The
addressable bit of the shift register can be stored in the flip-flop for a synchronous output
or can be fed directly to a combinatorial output of the CLB. When using the register, it is
best to have fixed address lines selecting a static shift register length to avoid timing
hazards. The CLB flip-flop can be used to provide one more shift delay for the addressable
bit. Since the clock-to-output delay of the flip-flop is faster than the shift register,
performance can be improved by addressing the second-to-last bit and then using the flip-
flop as the last stage of the shift register. Using the flip-flop also allows for asynchronous or
synchronous set or reset of the output.
The shift register input can come from a dedicated SHIFTIN signal, and the Q15/MC15
signal from the last stage of the shift register can drive a SHIFTOUT output. The
addressable D output is available in all SRL primitives, while the Q15/MC15 signal that
can drive SHIFTOUT is only available in the cascadable SRLC16 primitive.
The SRL16 can shift from either LSB to MSB or MSB to LSB according to the application.
Although the device arbitrarily names the output MC15, it can be the LSB of the user
function.
Slice Structure
The two logic cells within a slice are connected for cascading a shift register up to 32 bits
(see Figure 7-4). These connect the Q15/MC15 of the first shift register to the DI (or Q0 flip-
flop) of the second shift register.
SLICEM
LC
SRL16
MC15
DI
SRL16
LC
UG331_c9_04_072406
If dynamic addressing (or "dynamic length adjustment") is desired, the two separate data
outputs from each SRL16 must be multiplexed together. One of the two SRL16 bits can be
selected by using the F5MUX to make the selection (see Figure 7-5).
SRL16
LC F5MUX
A[3:0]
SRL16
LC
A4
X465_05_070703
CLB Structure
The Spartan-3 generation CLB contains four slices, each with two LUTs, but only two allow
LUTs to be used as SRL16 components or distributed RAM. The two left-hand SLICEM
components allow their two LUTs to be configured as a 16-bit shift register. SHIFTOUT to
SHIFTIN connections are available to cascade the two SLICEM components. The four left-
hand LUTs of a single CLB can be combined to produce delays up to 64 clock cycles (see
Figure 7-6). It is also possible to combine shift registers across more than one CLB.
1 Shift Chain
in CLB
IN DI D FF
SRLC16
MC15
DI D FF
SRLC16
MC15
SLICEM S1
SHIFTOUT
SHIFTIN
DI D FF
SRLC16
MC15
DI D FF
SRLC16
MC15
SLICEM S0
OUT
CASCADABLE OUT
X465_06_040503
The multiplexers can be used to address multiple SLICEMs similar to the description for
combining the two LUTs within a SLICEM. The F6MUX can be used to select from three or
four SRL16 components in a CLB, providing up to 64 bits of addressable shift register (see
Figure 7-7).
D
LUT
F5
LUT
SLICEM S1
F6
LUT
F5
LUT
SLICEM S0 CLB
X465_07_040203
Library Primitives
The shift register element is known as the SRL16 (Shift Register LUT 16-bit), with a C
added to signify a cascade ability (Q15 output) and E to indicate a clock enable. See
Figure 7-8 for an example of the SRLC16E component.
D SRLC16E
CE Q
CLK Q15
A0
A1
A2
A3
X465_19_040503
Eight library primitives are available that offer optional clock enable (CE), inverted clock
(CLK), and cascadable output (Q15) combinations.
Library Primitives
Table 7-1 lists all of the available primitives for synthesis and simulation.
Table 7-1: Shift Register Primitives
Primitive Length Control Address Inputs Output
SRL16 16 bits CLK A3, A2, A1, A0 Q
SRL16E 16 bits CLK, CE A3, A2, A1, A0 Q
SRL16_1 16 bits CLK A3, A2, A1, A0 Q
SRL16E_1 16 bits CLK, CE A3, A2, A1, A0 Q
SRLC16 16 bits CLK A3, A2, A1, A0 Q, Q15
SRLC16E 16 bits CLK, CE A3, A2, A1, A0 Q, Q15
SRLC16_1 16 bits CLK A3, A2, A1, A0 Q, Q15
SRLC16E_1 16 bits CLK, CE A3, A2, A1, A0 Q, Q15
Port Signals
Clock — CLK
Either the rising edge or the falling edge of the clock is used for the synchronous shift-in.
The data and clock enable input pins have set-up times referenced to the chosen edge of
CLK.
Data In — D
The data input provides new data (one bit) to be shifted into the shift register.
Data Out — Q
The data output Q provides the data value (1 bit) selected by the address inputs.
GSR
The global set/reset (GSR) signal has no impact on shift registers.
Attributes
Content Initialization — INIT
The INIT attribute defines the initial shift register contents. The INIT attribute is a hex-
encoded bit vector with four digits (0000). The left-most hexadecimal digit is the most
significant bit. By default the shift register is initialized with all zeros during the device
configuration sequence, but any other configuration value can be specified.
Location Constraints
Figure 7-9 shows how the slices are arranged within a CLB. Each CLB has four slices, but
only the two at the bottom-left of the CLB can be used as shift registers. These are both
designated SLICEM in CLB positions S0 and S1. The relative position coordinates are X0Y0
and X0Y1. To constrain placement, these coordinates can be used in a LOC property
attached to the SRL primitive. Note that the dedicated CLB shift chain runs from the top to
the bottom, but the start and end of the shift register can be in any of the four SLICEM
LUTs.
Library Primitives
CLB
SLICE
X1Y1
SLICE
X1Y0
COUT
Switch Interconnect
Matrix CIN to Neighbors
SLICE
X0Y1
SHIFTOUT
SHIFTIN
SLICE
X0Y0
CIN X465_08_040203
Notes:
1. m = 0, 1, 2, 3.
Data Flow
Each shift register (SRL16 primitive) supports:
• Synchronous shift-in
• Asynchronous 1-bit output when the address is changed dynamically
• Synchronous shift-out when the address is fixed
SRLC16E
D Q
Address
SRL16E
CE
Q15
CLK
D Q
Address
CE SRLC16E
CLK D Q
Address
CE
CLK Q15
X465_09_070703
Shift Operation
The shift operation is a single clock-edge operation with an active-High clock enable
feature. When enable is High, the input (D) is loaded into the first bit of the shift register,
and each bit is shifted to the next highest bit position. In a cascadable shift register
configuration (such as SRLC16), the last bit is shifted out on the Q15 output.
The bit selected by the 4-bit address appears on the Q output.
CLK
CE
D
tshift
Q15
Address 7 10
taccess taccess
Characteristics
• A shift operation requires one clock edge.
• Dynamic-length read operations are asynchronous (Q output).
• Static-length read operations are synchronous (Q output).
• The data input has a setup-to-clock timing specification.
• In a cascadable configuration, the Q15 output always contains the last bit value.
• The Q15 output changes synchronously after each shift operation.
or reset inputs, and does not have access to all bits at the same time, using such capabilities
precludes the use of the SRL16, and the function is implemented in flip-flops. The
cascadable shift register (SRLC16) might be inferred if the shift register is larger than 16 bits
or if only the Q15 is used.
In fact, adding a reset is one way to force a synthesis tool to use flip-flops instead of the
SRL16 when flip-flops are preferred for performance or other reasons. If a reset is not
needed, simply connect a dummy signal and use an appropriate KEEP attribute to prevent
the synthesis tool from optimizing it out of the design.
Although the SRL16 shift register does not have a parallel load capability, an equivalent
function can be implemented simply by anticipating the load requirement and shifting in
the proper data. This requires predictable timing for the load command.
begin
process(C)
begin
if (C’event and C=’1’) then
Q_INT <= Q_INT(14 downto 0) & D;
end if;
end process;
Q <= Q_INT(15);
end Behavioral;
An inverted clock (SRL16_1) is inferred by replacing C='1' with C='0'. A clock enable
(SRL16E) is inferred by inserting if (CE='1') then after the first if-then statement.
always @(Q_INT)
begin
Q <= Q_INT[15];
end
An inverted clock (SRL16_1) is inferred by replacing (posedge C) with (negedge C). A
clock enable (SRL16E) is inferred by inserting if(CE) after the begin statement.
The submodules are based on SRLC16E primitives, which are associated with dedicated
multiplexers (MUXF5, MUXF6, and so forth). This implementation allows a fast static- and
dynamic-length mode, even for very large shift registers.
Figure 7-12 represents the cascadable shift registers (32-bit and 64-bit) implemented by the
submodules in Table 7-3.
A4 A5, A4 A5
5 6
Add. Add.
A3, A2, A1, A0 A3, A2, A1, A0 A4
D D Q D D Q
4 4
A[3:0] A[3:0]
CE CE
Q15 Q15
SRLC16E SRLC16E
MUXF5 MUXF5
D Q D Q
4 4
A[3:0] A[3:0]
CE CE
Q
32-bit Shift Register
MUXF6
D Q
4
A[3:0]
CE
Q15
SRLC16E
MUXF5
D Q
4
A[3:0]
CE
Q15 Q63
SRLC16E
All clock enable (CE) and clock (CLK) inputs are connected to one global clock enable and
one clock signal per submodule. If a global static- or dynamic-length mode is not required,
the SRLC16E primitive can be cascaded without multiplexers.
FF
D Q Synchronous
D Q
SRLC16E Output
Address
CE (Write Enable)
CLK Q15
X465_12_040203
This configuration provides a better timing solution and simplifies the design. Because the
flip-flop must be considered to be the last register in the shift-register chain, the static or
dynamic address should point to the desired length minus one. If needed, the cascadable
output can also be registered in a flip-flop. The delay from the SRL16 to the flip-flop is a
fixed CLB setup time delay and is not controlled by a PERIOD constraint.
D D D D
LUT LUT
Q15 Q15
SRLC16 SRLC16
D D
LUT LUT
Q15 Q15
SRLC16 SRLC16
FF
D Q OUT D Q D Q OUT
"0111" 4 A[3:0] (40-bit SRL) "0110" A[3:0] (40-bit SRL)
LUT LUT
Q15 Q15
SRLC16 SRLC16
X465_13_051505
VHDL Template:
-- Module: SHIFT_REGISTER_C_16
-- Description: VHDL instantiation template
-- CASCADABLE 16-bit shift register with enable (SRLC16E)
-- Device: Spartan-3 Generation Family
---------------------------------------------------------------------
-- Components Declarations:
--
component SRLC16E
-- pragma translate_off
generic (
-- Shift Register initialization ("0" by default) for functional
simulation:
INIT : bit_vector := X"0000"
);
-- pragma translate_on
port (
D : in std_logic;
CE : in std_logic;
CLK : in std_logic;
A0 : in std_logic;
A1 : in std_logic;
A2 : in std_logic;
A3 : in std_logic;
Q : out std_logic;
Q15 : out std_logic
);
end component;
-- Architecture Section:
--
-- Attributes for Shift Register initialization (“0” by default):
attribute INIT: string;
--
attribute INIT of U_SRLC16E: label is “0000”;
--
-- ShiftRegister Instantiation
U_SRLC16E: SRLC16E
port map (
D => , -- insert input signal
CE => , -- insert Clock Enable signal (optional)
CLK => , -- insert Clock signal
A0 => , -- insert Address 0 signal
A1 => , -- insert Address 1 signal
A2 => , -- insert Address 2 signal
A3 => , -- insert Address 3 signal
Q => , -- insert output signal
Q15 => -- insert cascadable output signal
);
Verilog Template:
// Module: SHIFT_REGISTER_16
// Description: Verilog instantiation template
// Cascadable 16-bit Shift Register with Clock Enable (SRLC16E)
// Device: Spartan-3 Generation Family
//-------------------------------------------------------------------
defparam
//SelectShiftRegister-II Instantiation
SRLC16E U_SRLC16E ( .D(),
.A0(),
.A1(),
.A2(),
.A3(),
.CLK(),
.CE(),
.Q(),
.Q15()
);
ASET SSET
D[N:0] Q[N:0]
A[M:0]
CE
CLK
x465_14_040203
Applications
Delay Lines
The register-rich nature of the Xilinx FPGA architecture allows for the addition of pipeline
stages to increase throughput. Data paths must be balanced to keep the desired
functionality. The SRL16 can be used when additional clock cycles of delay are needed
anywhere in the design (see Figure 7-16).
12 Cycles
Operation A Operation B
4 Cycles 8 Cycles
Operation C
3 Cycles
9-cycle imbalance
3 Cycles
12 Cycles
Operation A Operation B
4 Cycles 8 Cycles
Operation C Pipeline
9 Cycles
3 Cycles using SRL16
Paths statically
balanced
12 Cycles X465_20_040603
Applications
with the feedback coming from bits 49 and 52. A third method is to duplicate the LFSR in
multiple SRLs and address different bits from each one. Yet another method is to generate
multiple addresses in one SRL clock cycle to capture multiple bit positions. The XNOR gate
required for any LFSR can be conveniently located in the SLICEL part of the CLB. More
detail is available in XAPP210.
Bit 52
Bit 1
D Q
XNOR
Bit 49
Bit 17
D Q Output
D Q
SRL16
Bit 52
Address = 15
Bit 33
D Q D Q
SRL16 Bit 51
Address = 15
Bit 49
D Q D Q
SRL16 Bit 50
Address = 15
x465_15_040203
LFSR 1
LFSR 2
x465_16_051505
FIFOs
Synchronous FIFOs can be built out of the SRL16 components. These are useful when other
resources become scarce, providing up to 64 bits per CLB. For larger FIFOs, the block RAM
is the most efficient resource to use. See XAPP256 for more detail.
CLK
SRL16
SINIT Based DATA_OUT
FIFO
DATA_IN Address
Counter
RD_EN
FIFO FIFO_COUNT
WR_EN Count
Status FULL
Flag
Generation EMPTY
x465_17_051505
Counters
Any desired repeated sequence of 16 states can be achieved by feeding each output with an
SRL16. Cascading the SRL16 allows even longer arbitrary count sequences. A terminal
count can be generated by using the standard carry chain (see Figure 7-20).
TC
Q3
SRL 0 1
Q2
SRL 0 1
Q1
SRL 0 1
Q0
SRL VCC
0 1
x465_18_051505
Conclusion
The SRL16 configuration of the Spartan-3 generation LUT provides a space-efficient shift
register that otherwise require 16 flip-flops. This feature is automatically used when a
small shift register is described in HDL code. However, creative consideration of the uses
of the SRL16 as described here can provide even more significant advantages in many
applications.
Chapter 8
Introduction
A multiplexer, or mux, is a common building block of almost every logic design, selecting
one of several possible input signals. Spartan-3 generation FPGAs are very efficient at
implementing multiplexers: small ones in the look-up tables and larger ones using
dedicated multiplexer resources. Any Spartan-3 generation device easily implements:
• a 4:1 mux in one slice
• a 16:1 mux in one CLB
• a 32:1 mux in two CLBs
The same logic resources also can be used for wide, general-purpose logic functions. For
applications like comparators, encoder-decoders, or case statements, these resources
provide an optimal solution. These resources are used automatically by the Xilinx
development system, especially when a CASE statement is used, and then optimized for
the timing requirements of a given design. This chapter explains how to further optimize
the use of dedicated multiplexers and how to analyze their use in a design.
This chapter describes the dedicated multiplexer resources in the Spartan-3 generation
architecture. The signals and parameters associated with the multiplexers are defined. The
many methods to include multiplexers in a design are described along with
recommendations and guidelines for their use.
LUT
net
net LUT
LUT
net
net LUT
LUT
net
net LUT
LUT
X466_01_040303
LUT F5MUX
LUT
F6MUX
LUT
F5MUX
LUT
X466_02_050505
FiMUX
4
LUT Reg
F5MUX
4
LUT Reg
F5MUX
The F5MUX always combines the two LUTs in a slice. If those two LUTs contain 2:1 muxes
with the same control input, then the overall result is a 4:1 mux (see Figure 8-4).
LUT
4:1 MUX
F5MUX
LUT
X466_04_050505
The F5MUX is so named because it generates any possible Boolean logic function of five
inputs (see Figure 8-5). If the two LUTs contain independent functions of the same four
inputs, the mux select line becomes the fifth input. The F5MUX becomes a function
expander that is just as efficient as another 3-input LUT for implementing any 5-input
function. This is a significant advantage over other FPGA architectures.
4
LUT
LUT
X466_05_050505
As shown in Figure 8-6, the F5MUX also produces some functions of up to nine inputs, if
they can be partitioned into two 4-input LUTs and a mux.
4
LUT
X466_06_050605
Consequently, the F5MUX generates any 5-input function, the 4:1 mux 6-input function, or
some 9-input functions.
FiMUX
The second mux, called the FiMUX, functions as either an F6MUX, F7MUX, or F8MUX,
depending on its location and connections to the other muxes.
F8MUX
SLICEL S3
F6MUX
SLICEL S2
F7MUX
SLICEM S1
F6MUX
SLICEM S0
CLB
X466_07_040303
Each FiMUX receives inputs from muxes of the next lower number; for example, the two
F6MUX results drive the F7MUX. Like the F5MUX, the FiMUX has the flexibility to
implement other types of functions besides just multiplexers. The F6MUX is so named
because it creates any function of six inputs. Similarly, the F7MUX generates any function
of seven inputs, and the F8MUX generates any function of eight inputs.
Table 8-1: Mux Capabilities
Total Number of Inputs per Function
Mux Usage Input Source For Any For Limited
For Mux
Function Functions
F5MUX F5MUX LUTs 5 6 (4:1 mux) 9
FiMUX F6MUX F5MUX 6 11 (8:1 mux) 19
F7MUX F6MUX 7 20 (16:1 mux) 39
F8MUX F7MUX 8 37 (32:1 mux) 79
Naming Conventions
In this document and in the Spartan-3 generation data sheets, the mux that serves as either
F6MUX, F7MUX, or F8MUX generically is called an FiMUX (i = 6, 7, or 8). This name
avoids confusion with the static CLB mux that generates the X output, which the FPGA
Editor refers to as the "FXMUX". The FiMUX is always referred to as the "F6MUX" in the
FPGA Editor. The timing analyzer also refers to the path through the FiMUX to the CLB
pin as "TIF6Y", although it can be used as an F7MUX or F8MUX.
The library components are called MUXF5, MUXF6, MUXF7, and MUXF8. MUXF6,
MUXF7, and MUXF8 use the FiMUX and restrict the placement to a specific relative
location in the CLB.
FXINB
F8 X
FXINA
F5 F5
FXINB FX
F6
FXINA
F5
F5
FXINB
FX
F7
FXINA
F5 F5
FXINB
F6 FX
FXINA
F5 F5
X466_08_050505
Implementation Examples
FiMUX
FXINA 1
FX (Local Feedback to FXIN)
FXINB 0
Y (General Interconnect)
BY
D Q YQ
F5MUX
F[4:1] LUT 1
F5 (Local Feedback to FXIN)
G[4:1] LUT 0
X (General Interconnect)
BX
D Q XQ
x466_13_050505
Implementation Examples
Wide-Input Multiplexers
Each LUT optionally implements a 2:1 multiplexer. In each slice, the F5MUX and two LUTs
can implement a 4:1 multiplexer. As shown in Figure 8-10, the F6MUX and two slices
implement an 8:1 multiplexer. The F7MUX and the four slices of any CLB implement a 16:1
multiplexer, and the F8MUX and two CLBs implement a 32:1 multiplexer.
DATA[0]
LUT
DATA[1]
8:1
DATA[7:0]
F5 (S2 & S3)
DATA[2]
LUT
DATA[3]
16:1 output
S1 F7
F5
8:1
DATA[15:8]
(S0 & S1)
DATA[6]
LUT
DATA[7]
S0 CLB
SELECT[0]
SELECT[1] SELECT[2:0]
SELECT[2] SELECT[3]
Implementation Examples
Wide-Input Functions
Slices S0 and S2 have an F6MUX, designed to combine the outputs of two F5MUX
resources. Figure 8-11 illustrates a combinatorial function up to 19 inputs in the slices S0
and S1, or in the slices S2 and S3.
SLICEL OR SLICEM
FiMUX
4
LUT Reg
F5MUX
4
LUT Reg
S_F5
S_F6
OUT_F6
F6MUX
4
LUT Reg
F5MUX
4
LUT Reg
S_F5
The slice S1 has an F7MUX, designed to combine the outputs of two F6MUXs. Figure 8-12
illustrates a combinatorial function up to 39 inputs in a Spartan-3 generation CLB.
SLICEL S3
FiMUX
4
LUT Reg
F5MUX
4
LUT Reg
S_F5
S_F6 SLICEL S2
F6MUX
4
LUT Reg
F5MUX
4
LUT Reg
S_F5
S_F7 SLICEM S1
F7MUX OUT_F
4
LUT Reg
F5MUX
4
LUT Reg
S_F5
S_F6
SLICEM S0
F6MUX
4
LUT Reg
F6MUX
4
LUT Reg
S_F5
X466_11_030603
Implementation Examples
The slice S3 of each CLB has an F8MUX. Combinatorial functions of up to 79 inputs fit in
two CLBs as shown in Figure 8-13. The outputs of two F7MUXs are combined through
dedicated routing resources between two adjacent CLBs in a column.
FiMUX
Slice S3
F6MUX
Slice S2
F7MUX
Slice S1
F6MUX
Slice S0
CLB
F8MUX OUT_F8
Slice S3
F6MUX
Slice S2
F7MUX
Slice S1
F6MUX
Slice S0
CLB
X466_12_060606
Timing Parameters
There are several possible paths through the CLB multiplexers. The two types of
multiplexers are considered separately (F5MUX and FiMUX). Each multiplexer type has
two types of inputs: data inputs and select lines. The output of the mux drives the local
interconnect through the F5 and FX CLB pins, the general interconnect through the X and
Y CLB pins, or the D input on the flip-flop. See Figure 8-9, page 259 for a block diagram
showing dedicated multiplexers in a Spartan-3 generation CLB. Note that although the
mux functionality is identical between the slices with memory and those without, the
timing values are independent and can vary slightly.
Although the multiplexers are connected in series inside the CLB, each mux actually feeds
a CLB output pin, which feeds back to an input pin through zero-delay local interconnect.
Thus each reported block delay element will have only one mux from input to output. The
Spartan-3 generation architecture improves on the Virtex®-II architecture by providing a
direct path from the F5MUX or FiMUX to the flip-flop in the CLB.
Programmable Polarity
As with most resources in the Spartan-3 generation FPGA, inverters are free in large
multiplexers. The functions in the LUT can have inverters added to inputs or outputs with
no effect on performance or utilization. The control inputs to the F5MUX (BX) and FiMUX
(BY) have programmable polarity inside the CLB.
Floorplanning Multiplexers
The wide multiplexers force a particular placement on the LUTs being combined. The
LUTs must always be in the same slice for the F5MUX and in adjacent vertical slices for the
wider muxes. This vertical orientation aligns nicely with the arithmetic logic.
The wide multiplexers cannot be used in conjunction with the arithmetic logic because the
arithmetic XOR gate is multiplexed with the F5MUX result. Also, the 32x1 configuration of
the distributed RAM uses the F5MUX for the fifth address input.
Other Multiplexers
The CLB also contains other multiplexers for routing signals through the logic resources.
The CYMUX for propagating carry signals is the only other dynamic mux. Several other
muxes are used for selecting one of multiple paths. One is called the FXMUX in the FPGA
Editor, since it routes the F LUT signal to the X CLB output. Do not confuse this static mux
with the FXMUX name that is sometimes used for the FiMUX described here.
When multiplexing clock signals, remember to use the BUFGMUX, which helps eliminate
glitches on the resulting clock. Another special multiplexer is found in the I/O to support
DDR interfaces. The DDR mux combines two signals onto one output by automatically
muxing back and forth between them as they are clocked into the IOB. See the Spartan-3
generation data sheets for more information on these other multiplexing features.
Inference
Multiplexers are typically inferred by a conditional statement, most commonly the CASE
or IF-THEN-ELSE statement. The IF statement generally produces priority-encoded logic.
The CASE statement is more likely to generate an optimized multiplexer.
Synthesis options can determine whether multiplexers are inferred and how they are
implemented. For XST, the MUX_EXTRACT constraint specifies whether multiplexers are
inferred, and the MUX_STYLE constraint specifies whether they are implemented in the
dedicated logic multiplexers or the carry multiplexers (CY_MUX). The default is to infer
automatically the best resource.
CASE statements should be full (all branches defined) to avoid creating a latch. Undefined
branches assume the current value needs to be maintained, implying memory. They also
should be parallel (branch conditions all mutually exclusive) to avoid a priority encoder.
Some synthesis tools, such as XST, have options to assume full and parallel CASE
statements even if not written that way. It is good practice to include a “When Others”
(VHDL) or “Default” (Verilog) branch to make sure even undefined inputs do not generate
a latch.
An IF statement can contain a set of different expressions while a CASE statement is
evaluated against a common controlling expression. In general, use the CASE statement
for complex decoding and use the IF statement for speed critical paths.
Most current synthesis tools can determine if the IF-ELSIF conditions are mutually
exclusive, and will not create extra logic to build the priority tree. The following are points
to consider when writing IF statements:
• Make sure that all outputs are defined in all branches of an IF statement. If not, they
can create latches or long equations on the CE signal. A good way to prevent this is to
have default values for all outputs before the IF statements.
• Limit the number of input signals into an IF statement to reduce the number of logic
levels. If there are a large number of input signals, see if some of them can be
predecoded and registered before the IF statement.
• Avoid bringing the dataflow into a complex IF statement. Only control signals should
be generated in complex IF-ELSE statements.
Make sure you do not write the code such that your synthesis tool will infer BUFT-based
multiplexers. A BUFT-based multiplexer usually requires a statement with a "Z" value.
Some synthesis tools might automatically or optionally convert BUFT logic to
multiplexers.
A decoder is a special case of a multiplexer where the inputs are fixed as one-hot values.
Decoders of up to 4:16 in size are easily implemented in individual LUTs for each output
and do not need to use the dedicated multiplexers, or they can even use the Carry muxes
for high performance.
The following subsections provide examples of 2:1 muxes described using the CASE
statement in Verilog and VHDL code.
Verilog Inference
module MUX_2_1 (DATA_I, SELECT_I, DATA_O);
input [1:0]DATA_I;
input SELECT_I;
output DATA_O;
reg DATA_O;
case (SELECT_I)
1'b0 : DATA_O <= DATA_I[0];
1'b1 : DATA_O <= DATA_I[1];
default : DATA_O <= 1'bx;
endcase
endmodule
VHDL Inference
entity MUX_2_1 is
port (
DATA_I: in std_logic_vector (1 downto 0);
SELECT_I: in std_logic;
DATA_O: out std_logic
);
end MUX_2_1;
Library Primitives
Four library primitives are available that offer access to the dedicated multiplexers in each
slice: MUXF5, MUXF6, MUXF7, and MUXF8. These use the F5MUX and FiMUX CLB
resources (see “Naming Conventions,” page 257). Each of the multiplexer primitives looks
identical (see Figure 8-14). The actual selection simply determines where in the CLB the
multiplexer can be located, as shown in Table 8-5.
I0
O
I1
S
x466_14_040303
The generic multiplexer components also can take advantage of the dedicated
multiplexers. The M2_1 schematic library component is implemented in a LUT, while the
larger multiplexers in the library use the F5MUX and FiMUX components.
for the final result, respectively, while the M16_1E schematic library component keeps the
enable on the final mux, forcing it into a LUT instead of the F7MUX. Figure 8-15 shows the
M4_1E schematic library component logic.
M2_1E
D0 D0
O
D1 D1
S0 MUXF5
E M01 M01 I0
O O
M23 I1
M2_1E 0
D2 D0 S
O
D3 D1
S0 S0
E E M23
S1
x466_15_040303
I0 I0 LO
LO
I1 I1 O
S S
x466_16_040303
Figure 8-16: MUXF5_L and MUXF5_D Primitives to Model Local Output Timing
Submodules
In addition to the primitives, five submodules that implement multiplexers from 2:1 to 32:1
are provided in VHDL and Verilog code. Synthesis tools can automatically infer the above
primitives (MUXF5, MUXF6, MUXF7, and MUXF8); however, the submodules described
in this section use instantiation of the multiplexers to guarantee an optimized result.
Table 8-6 lists available submodules.
• xapp466_vhdl.zip
• xapp466_vhdl.zip
Port Signals
Data In — DATA_I
The data input provides the data to be selected by the SELECT_I signal(s).
Control In — SELECT_I
The select input signal or bus determines the DATA_I signal to be connected to the output
DATA_O. For example, the MUX_4_1_SUBM multiplexer has a 2-bit SELECT_I bus and a
4-bit DATA_I bus. Table 8-7 shows the DATA_I selected for each SELECT_I value.
Table 8-7: Selected Inputs
SELECT_I[1:0] DATA_O
00 DATA_I[0]
01 DATA_I[1]
10 DATA_I[2]
11 DATA_I[3]
Applications
Multiplexers are used in various applications. These are often inferred by synthesis tools
when a “case” statement is used (see the example below). Comparators, encoder-decoders,
and wide-input combinatorial functions are optimized when they are based on one level of
LUTs and dedicated multiplexer resources of the Spartan-3 generation CLBs.
source code. The submodule code can also be “cut and pasted” into the designer source
code.
VHDL Template
-- Module: MUX_16_1_SUBM
-- Description: Multiplexer 16:1
--
-- Device: Spartan-3 Family
---------------------------------------------------------------------
library IEEE;
use IEEE.std_logic_1164.all;
library UNISIM;
use UNISIM.VCOMPONENTS.ALL;
entity MUX_16_1_SUBM is
port (
DATA_I: in std_logic_vector (15 downto 0);
SELECT_I: in std_logic_vector (3 downto 0);
DATA_O: out std_logic
);
end MUX_16_1_SUBM;
--
-- MUXF7 instantiation
U_MUXF7: MUXF7
port map (
I0 => DATA_LSB,
I1 => DATA_MSB,
S => SELECT_I (3),
O => DATA_O
);
--
end MUX_16_1_SUBM_arch;
--
Verilog Template
// Module: MUX_16_1_SUBM
//
// Description: Multiplexer 16:1
// Device: Spartan-3 Family
//-------------------------------------------------------------------
//
module MUX_16_1_SUBM (DATA_I, SELECT_I, DATA_O);
input [15:0]DATA_I;
input [3:0]SELECT_I;
output DATA_O;
wire [2:0]SELECT;
reg DATA_LSB;
reg DATA_MSB;
/*
//If synthesis tools support MUXF7 :
always @ (DATA_I or SELECT_I)
case (SELECT_I)
4'b0000 : DATA_O <= DATA_I[0];
4'b0001 : DATA_O <= DATA_I[1];
4'b0010 : DATA_O <= DATA_I[2];
4'b0011 : DATA_O <= DATA_I[3];
4'b0100 : DATA_O <= DATA_I[4];
4'b0101 : DATA_O <= DATA_I[5];
4'b0110 : DATA_O <= DATA_I[6];
4'b0111 : DATA_O <= DATA_I[7];
4'b1000 : DATA_O <= DATA_I[8];
4'b1001 : DATA_O <= DATA_I[9];
4'b1010 : DATA_O <= DATA_I[10];
4'b1011 : DATA_O <= DATA_I[11];
4'b1100 : DATA_O <= DATA_I[12];
4'b1101 : DATA_O <= DATA_I[13];
4'b1110 : DATA_O <= DATA_I[14];
4'b1111 : DATA_O <= DATA_I[15];
default : DATA_O <= 1'bx;
endcase
*/
case (SELECT)
3'b000 : DATA_LSB <= DATA_I[0];
3'b001 : DATA_LSB <= DATA_I[1];
3'b010 : DATA_LSB <= DATA_I[2];
3'b011 : DATA_LSB <= DATA_I[3];
3'b100 : DATA_LSB <= DATA_I[4];
3'b101 : DATA_LSB <= DATA_I[5];
3'b110 : DATA_LSB <= DATA_I[6];
3'b111 : DATA_LSB <= DATA_I[7];
default : DATA_LSB <= 1'bx;
endcase
case (SELECT)
3'b000 : DATA_MSB <= DATA_I[8];
3'b001 : DATA_MSB <= DATA_I[9];
3'b010 : DATA_MSB <= DATA_I[10];
3'b011 : DATA_MSB <= DATA_I[11];
3'b100 : DATA_MSB <= DATA_I[12];
3'b101 : DATA_MSB <= DATA_I[13];
3'b110 : DATA_MSB <= DATA_I[14];
3'b111 : DATA_MSB <= DATA_I[15];
default : DATA_MSB <= 1'bx;
endcase
// MUXF7 instantiation
ASET SSET
M[N:0] O • D Q
CE
S[M:0]
CLK
ACLR SCLR
x465_17_041003
MCH[N:0]
ASET SSET
•
•
• O[N:0] D[N:0] Q[N:0]
MA[N:0] CE
S[M:0]
CLK
X466_18_060506
The CORE Generator system also offers the specific functions of the BUFT-based
Multiplexer (and the equivalent BUFE-based Multiplexer). As with the generic Bit and Bus
Multiplexers, they are implemented in LUTs and/or muxes.
Related Materials
The following document provides supplementary information useful with this chapter:
WP274: Multiplexer Selection
This white paper considers a variety of ways in which multiplexers can be implemented
within Xilinx FPGA devices, including some alternative techniques that can lead to more
efficient and lower cost implementations.
Summary
The dedicated multiplexers in the Spartan-3 generation architecture enable wider
functions than possible in the four-input LUTs. These multiplexers are automatically used
by the software tools but careful coding can help optimize their use to minimize resource
requirements and improve performance of designs.
Chapter 9
Introduction
The basic building block of the FPGA is the look-up table, or LUT. Although arithmetic
functions can be implemented in the LUTs, they require the generation of a sum and a
carry for every input and could quickly use up LUT and routing resources. Arithmetic
functions are common enough to warrant dedicating their own special resources for
implementation. The arithmetic logic allows the generation of a sum outside the LUT and
the carry logic provides dedicated routing resources for cascading a carry signal between
slices of a CLB and between CLBs. The carry chain cascades from the bottom to the top of
each column of CLB slices.
The arithmetic logic consists of a discrete XOR component for single level sum completion,
an AND gate for multiplication, and multiplexers for controlling signal flow. These gates
work in conjunction with the LUTs to implement efficient arithmetic functions, including
counters and multipliers, typically at two bits per slice. Each CLB provides two separate
carry chains of four bits each. The resources can be used to improve the performance of
arithmetic functions and can also be used to cascade LUTs for wide-input logic functions.
This logic is known as a half adder because it does not include a Carry input. Accounting
for a Carry input can be done simply by repeating the half adder to add the first Sum and
the Carry input and then generating a final Carry output if either half-adder generated a
Carry (using an OR gate). A full adder is created as shown in Figure 9-1.
Carry Out
Carry In
UG331_c11_01_072906
This logic can easily be implemented in two LUTs with three inputs each to generate Sum
and Carry. The problem with this implementation is that it requires two LUTs for every
input bit, and the Carry propagates through the full LUT delay for each bit.
A better implementation is to "look ahead" and determine if the input Carry signal needs to
be propagated (the inputs are different) or generated (both inputs are High). See Table 9-2.
This case is similar to the half Sum and Carry values described earlier. The Propagate
signal, which is the same as the half Sum in the first half adder, can be implemented using
the same XOR gate.
If Propagate is not True, then A = B and either signal can be used directly as the Generate
signal. Thus the Carry output can be defined by a multiplexer controlled by Propagate that
Resource Details
allows the Carry input through when Propagate = 1 and allows A (or B) through when
Propagate = 0.
The full Sum is still generated by a second XOR gate, resulting in the logic shown in
Figure 9-2.
COUT
Propagate/Half Sum
0 1
Sum
Generate
CIN UG331_c11_02_011508
The logic has been split into three functions that cannot all be combined into one or two
LUTs. To optimize the implementation of this logic, the Spartan-3 generation CLB provides
a dedicated XOR gate outside the LUT to generate the Sum, called XORCY, and a dedicated
mux to provide the Carry, called MUXCY, as shown in Figure 9-3.
COUT
Function
Generator MUXCY
B
A 0 1
Sum
CIN
XORCY
UG331_c11_03_073006
An advantage of this structure is that it provides a very fast carry propagation, since it only
requires the delay of a 2:1 mux. Also, this structure uses only one LUT, or one half of a slice,
allowing two bits per slice and therefore an efficient, high-density implementation. With
dedicated connections at each cascade point, from COUT to the CIN in the other half of a
slice, to the CIN in the other slice in a CLB, and to the CIN of the next CLB, the carry chain
can propagate up a column of CLBs with very high performance.
Resource Details
The Spartan-3 generation carry and arithmetic logic consists of dedicated CLB resources
and inter-CLB routing. The logic is almost identical within the two logic cells in each of the
slice and is identical in both the logic-only SLICEL and the SLICEM that adds distributed
RAM capability. A simplified view of one logic cell is shown in Figure 9-4.
COUT
1
MUXCY
0 1
B LUT
A
Sum
XORCY
D Q
MULT_AND
1
0
Bypass CIN UG331_c11_04_073006
Figure 9-4: Simplified View of Spartan-3 FPGA Carry and Arithmetic Logic in One
Logic Cell
This implementation adds flexibility to the MUXCY beyond the standard functionality
described earlier. The A/B "Generate" input to the MUXCY can come from either signal, or
even an AND of the two signals. The carry chain can be initialized with a 1 or 0 or fed by
an independent bypass input to MUXCY. The MUXCY control input can be fixed to 1 to
always propagate the carry. The output sum, the XORCY, can be optionally registered.
Figure 9-5 shows the entire carry logic and connections for one slice. The dashed lines
indicate an additional fixed multiplexer that is only found in the SLICEM half of the CLB.
COUT
YB
CYSELG
1 CYMUXG Y
G4 0 1
G3 D Q YQ
G2 G-LUT
G1 XORG
CY0G
GAND 1
0
BY
XB
CYSELF
1 CYMUXF X
F4 0 1
F3 D Q XQ
F2 F-LUT
F1 XORF
CY0F
FAND 1 CYINT
0 1
0
CIN
BX UG331_c11_05_073006
Resource Details
MUXCY
The dynamic mux generically referred to as MUXCY is available at both the bottom (called
CYMUXF) and top (CYMUXG) of each slice.
The "0" input to the MUXCY typically comes from one of the LUT inputs. It can be fed by
two of the four LUT inputs (F1 or F2 on the bottom and G1 or G2 on the top). In addition,
the "0" input can come from a dedicated AND gate (MULT_AND) of those two inputs (for
multiplier functions, as discussed later). It can also be fed directly by a 0 or 1 to use it as a
simple wide gate (to be discussed later). A sixth input comes from the CLB bypass input
(BX or BY) as an alternative to using a LUT input, allowing the carry chain to be initialized
or continued from anywhere in the device. This fixed 6:1 mux driving the 0 input on the
MUXCY is called CY0F in the bottom half of the slice and CY0G in the top half.
The "1" input to the MUXCY is the carry input CIN, which also feeds the XORCY input.
The select input to the MUXCY is the LUT output, where the LUT is typically configured as
an XOR gate for the Propagate selection. This is the same LUT output that provides the
other XORCY input.
XORCY
The XOR gate generically referred to as XORCY is available at both the bottom (called
XORF) and top (XORG) of each slice. The inputs are sourced by the carry signal input CIN
and the LUT output. The XORCY output goes to the primary output of the logic cell (to
both the combinatorial output and the flip-flop). This path goes through a fixed mux that
chooses between the LUT, the wide multiplexers, or the XORCY, which is called FXMUX at
the bottom and GYMUX at the top of the slice.
Resource Details
COUT COUT
to S0 of the next CLB to CIN of S2 of the next CLB
0 1 MUXCY
FF
LUT
(First Carry Chain)
SLICEL X1Y1
0 1 MUXCY
FF
LUT
CIN
COUT
0 1 MUXCY
FF
LUT
SLICEL X1Y0
0 1 MUXCY
FF
0 1 MUXCY LUT
FF
LUT
SLICEM X0Y1
0 1 MUXCY
FF
LUT
CIN
COUT (Second Carry Chain)
0 1 MUXCY
FF
LUT
SLICEM X0Y0
0 1 MUXCY
FF
LUT
UG331_06_073006
As a result of the dedicated routing structure, the carry chain runs vertically up the
columns of CLBs, with four bits per CLB in the two slices on one side. The other side of the
CLB has a completely independent carry chain, so there are two chains per column.
The total number of carry chains is twice the number of CLB columns, as shown in
Table 9-3. The number of bits per column is limited by the number of logic cells per
column. In the Spartan-3E and Extended Spartan-3A family architectures, some of the CLB
columns are interrupted by the DCMs and provide fewer bits per column.
XC3S100E 24 88
XC3S250E 36 136
XC3S500E 52 184
XC3S1200E 76 240
XC3S1600E 100 304
XC3S50A/AN 24 64
XC3S200A/AN 32 128
XC3S400A/AN 48 160
XC3S700A/AN 64 192
XC3S1400A/AN 80 288
XC3SD1800A 96 352
XC3SD3400A 116 416
Carry chains can be split or cascaded to provide even more flexibility. Splitting the carry
chain means connecting the COUT of one MUXCY to the CIN signal of multiple MUXCYs,
continuing the carry into two chains without having to duplicate the logic. Cascading the
carry chain means connecting COUT through normal logic to a CIN other than one directly
above. This can be used to continue a carry chain into a second column.
Splitting and cascading can be done since the COUT from each MUXCY not only feeds the
next MUXCY up the column, but is also available at a CLB bypass output (XB on the
bottom, YB on the top). These CLB outputs can only be driven by the MUXCY or by the
SRL16 shiftout. Also, the CIN can come from the CLB bypass inputs BX/BY or from a LUT
input in either one of the two MUXCY components.
Multiplication Resources
Multiplication Resources
Special resources are also available for multiplication. One-bit multiplication is logically
very simple, requiring only sets of AND gates.
+ P4
A3
+ P3
A2
+ P2
A1
+ P1
A0
+ P0
‘0’
B1 B0
UG331_c11_07_080306
While the latter stages of the addition tree are pure add functions, look at the way in which
the first two partial products are formed and then applied to the first stage adder in
Figure 9-7. In the majority of cases, the two adder inputs are each driven by a 2-input AND
gate. As these AND gates would each occupy a LUT, a multiplier suddenly becomes very
large in an FPGA. In the case of a 12-bit by 8-bit multiplier it would require 12 x 8 = 96
LUTs (48 slices) just to implement the AND gates.
However, an optimization is quickly visible. The AND gate associated with one of the
adder inputs can be absorbed into the LUT forming the half sum for addition. This in itself
reduces the size of the 12-bit by 8-bit multiplier by 48 LUTs (24 slices).
COUT
Am
Bn+1
Am+1 0 1
Bn Pm+1
CIN UG331_c11_08_073106
An ideal situation would be to absorb the other AND gate into the LUT, but the signal it
produces is also required by the MUXCY part of the addition function. So this circuit
appears to be the optimal that can be achieved.
However, Spartan-3 generation FPGAs allow this second AND gate to be absorbed. Next
to each LUT is yet another component called the MULT_AND. It has the effect of recreating
the same input to the MUXCY, even though the desired signal is now buried within the
LUT.
COUT
Am
Bn+1
Am+1 0 1
Bn Pm+1
CIN
MULT_AND UG331_c11_09_073106
The generic MULT_AND gate is called FAND at the bottom of the slice, combining the F1
and F2 LUT inputs into the MUXCY at the bottom, CYMUXF. GAND at the top of the slice
combines G1 and G2 into the MUXCY at the top, CYMUXG (see Figure 9-5).
This dedicated AND gate can be used for any other function besides a multiplier, but it can
only connect to a MUXCY, which in turn can feed a CLB output to any logic.
COUT and YB are always the same signal in the SLICEL components (SLICEM allows
driving YB from the SRL16 output). In the same way the CYMUXF output always drives
the XB output in the SLICEL.
Table 9-6 summarizes the carry logic functions. For a detailed picture of the CLB slice, see
Chapter 5, “Using Configurable Logic Blocks (CLBs).”
Performance
Performance of carry logic based functions is determined by three components: the delay
to get into the carry chain, the delay for each bit of the function in the carry chain, and the
delay to generate the last result (see Figure 9-10). The delay to get into the carry chain is
from the F inputs to the COUT output, tOPCYF, is approximately 0.9 ns (see Figure 9-11).
The delay for each slice is the zero delay routing plus the delay from CIN to COUT, which
is tBYP, approximately 0.2 ns (see Figure 9-12). Some functions can fit four bits per slice
while most fit two bits per slice. The delay to generate the final result is typically the CIN
delay to the YQ output or tCINCK, which is approximately 1.3 ns, or tCINY for a
combinatorial result, which is approximately 1.2 ns (see Figure 9-13). Thus the total delay
is 2.1 ns for four bits (2.2 ns registered) plus 0.2 ns for every additional two bits.
Performance
CLB
0 1 MUXCY
FF
LUT Q7
CIN
TNET = 0
COUT
0 1 MUXCY
FF
LUT Q5
TBYP
SLICEL X1Y0
0 1 MUXCY
FF
LUT Q4
TNET = 0
CLB
0 1 MUXCY
FF
LUT Q3
CIN
TNET = 0
COUT
0 1 MUXCY
FF
LUT Q1
TOPCYF
SLICEL X1Y0
0 1 MUXCY
FF
LUT Q0
UG331_10_073106
COUT
YB
CYSELG
1 CYMUXG Y
G4 0 1
G3 D Q YQ
G2 G-LUT
G1 XORG
CY0G
GAND 1
0
BY
XB
CYSELF
1 CYMUXF X
F4 0 1
F3 D Q XQ
F2 F-LUT
F1 XORF
CY0F
CIN
BX UG331_c11_11_073106
Performance
COUT
YB
CYSELG
1 CYMUXG Y
G4 0 1
G3 D Q YQ
G2 G-LUT
G1 XORG
CY0G
GAND 1
0
BY
XB
CYSELF
1 CYMUXF X
F4 0 1
F3 D Q XQ
F2 F-LUT
F1 XORF
CY0F
FAND 1 CYINT
0 1 TBYP
0
CIN
BX UG331_c11_12_073106
COUT
YB
CYSELG
1 CYMUXG Y
G4 0 1
G3 D Q YQ
G2 G-LUT
G1 XORG
CY0G
GAND 1
0
BY
XB
CYSELF
1 CYMUXF X
F4 0 1
F3 D Q XQ
F2 F-LUT
F1 XORF
CY0F
CIN
BX UG331_c11_13_073106
Optimize the delays to get to and from the carry-based function. Bring the source inputs
close to the carry function; use registered signals in a column to the left of the carry
function if possible to provide near-zero routing delay to the carry function. The output of
the carry function can be registered directly in the same CLB to provide high performance
through pipelining.
The SLICEL timing is often faster than the SLICEM timing due to the simpler structure,
and therefore the SLICEL should be favored for the highest performance.
Use the following estimates for Spartan-3 generation carry-based adders, counters, and
accumulators:
• 8 bits: 3.0 ns or 333 MHz
• 16 bits: 3.8 ns or 263 MHz
• 32 bits: 5.4 ns or 185 MHz
• 64 bits: 8.6 ns or 116 MHz
Specifications
The carry and arithmetic logic is defined by multiple timing specifications to cover each of
the possible signal paths, as described in Table 9-7.
Specifications
Primitives
MUXCY
MUXCY
S
0 1
DI CI
UG331_c11_14_073106
The MUXCY primitive is used to implement a 1-bit high-speed carry propagate function.
DI is mapped to a CLB Direct Input while CI is the Carry Input. The select input S comes
from the LUT; when Low, S selects DI; when High, S selects CI. The O output can cascade
to the CI of the next MUXCY above or be fed to a CLB output.
The MUXCY primitive gets mapped to the CYMUXF or CYMUXG components at the
bottom and top of the Slice, respectively. The S select input is normally driven by an XOR
gate in a LUT, but the LUT can be fixed to zero to always select the DI input. A fixed 1 on
the S input always selects the CI carry input, and can be implemented inside the mux itself,
saving the LUT for other functions.
The MUXCY is also available as two additional primitives with "local" outputs. Local
outputs reflect the dedicated connections between logic elements, in this case the direct
connections from COUT to CIN. The local output on the primitive does not control the
routing but allows the design tools to better estimate the timing before implementation. An
O pin on a MUXCY connected to a CI pin on another MUXCY almost always uses zero-
delay connections, reflected by the local output (LO). The general-purpose output reflects
the longer block and routing delays for splitting the carry chain by feeding it to the bypass
outputs XB or YB. MUXCY_L will model the zero delay of the COUT path while
MUXCY_D has both local and general-purpose outputs. Both paths are always available
and can be used at the same time. If the O pin connects to a CI of another MUXCY, use the
LO output of the MUXCY_L or MUXCY_D. If O connects to anything else, use a generic
MUXCY or a the O output of a MUXCY_D.
LO LO O
MUXCY_L MUXCY_D
S S
0 1 0 1
DI CI DI CI
UG331_c11_15_073106
VHDL Instantiation
-- Component Declaration for MUXCY should be placed
-- after architecture statement but before begin keyword
component MUXCY
port (O : out STD_ULOGIC;
CI : in STD_ULOGIC;
DI : in STD_ULOGIC;
S : in STD_ULOGIC);
end component;
-- Component Attribute specification for MUXCY
-- should be placed after architecture declaration but
-- before the begin keyword
-- Attributes should be placed here
-- Component Instantiation for MUXCY should be placed
-- in architecture after the begin keyword
MUXCY_INSTANCE_NAME : MUXCY
port map (O => user_O,
CI => user_CI,
DI => user_DI,
S => user_S);
Verilog Instantiation
MUXCY MUXCY_instance_name (.O (user_O),
.CI (user_CI),
.DI (user_DI),
.S (user_S));
XORCY
The XORCY primitive is a special dedicated XOR with general output used for generating
faster and smaller arithmetic functions. The XORCY primitive gets mapped to the XORF or
XORG component in the bottom or top of the slice, respectively. The Logic Input (LI) is
driven by the LUT output, typically the same as the S input on the MUXCY. The Carry
Input (CI) is driven by the output of a MUXCY or initialized by another signal. The O
output drives the combinatorial or registered output of the slice.
LI
O
CI
UG331_c11_16_073106
XORCY is also available as two additional primitives with Local Outputs or LO pins.
Although there is no special routing for the output of an XORCY as there is for the
MUXCY, the fast delay modeled by the Local Output can be used when direct connections
are to be used to the adjacent CLB or back to the same CLB, or when the XORCY directly
feeds a flip-flop.
As mentioned earlier the XORCY is used to complete the sum initiated by an XOR in the
LUT. The XOR in the LUT is represented by a general-purpose XOR2 component or similar
function.
VHDL Instantiation
-- Component Declaration for XORCY should be placed
-- after architecture statement but before begin keyword
component XORCY
port (O : out STD_ULOGIC;
CI : in STD_ULOGIC;
LI : in STD_ULOGIC;
end component;
-- Component Attribute specification for XORCY
-- should be placed after architecture declaration but
-- before the begin keyword
-- Attributes should be placed here
-- Component Instantiation for XORCY should be placed
-- in architecture after the begin keyword
XORCY_INSTANCE_NAME : XORCY
port map (O => user_O,
CI => user_CI,
LI => user_LI);
Verilog Instantiation
XORCY XORCY_instance_name (.O (user_O),
.CI (user_CI),
.LI (user_LI));
MULT_AND
The MULT_AND primitive is an AND component used almost exclusively for building
faster and smaller multipliers. The MULT_AND primitive maps into the FAND or GAND
gate in the Spartan-3 FPGA Slices. The inputs come from two specific LUT inputs, F1 and
F2 or G1 and G2. The output can only connect to the DI input on a MUXCY, so the
primitive is only available with an "LO" Local Output to reflect the lack of any routing
delays in pre-implementation timing analysis.
I1
LO
I0
UG331_c11_17_073106
Even if a generic AND2 gate is used, if it feeds into the MUXCY data input, it will likely be
placed in the MULT_AND for efficiency. In the same way, a MULT_AND feeding into a
generic M2_1 mux typically forces the mux into the MUXCY. The MULT_AND is used to
"duplicate" one of two AND gates in the LUT in a typical multiplier. Those AND gates are
designated by general-purpose AND2 components.
VHDL Instantiation
-- Component Declaration for MULT_AND should be placed
-- after architecture statement but before begin keyword
component MULT_AND
port (LO : out STD_ULOGIC;
I0 : in STD_ULOGIC;
I1 : in STD_ULOGIC);
end component;
-- Component Attribute specification for MULT_AND
-- should be placed after architecture declaration but
-- before the begin keyword
-- Attributes should be placed here
-- Component Instantiation for MULT_AND should be placed
Verilog Instantiation
MULT_AND MULT_AND_instance_name (.LO (user_LO),
.I0 (user_I0),
.I1 (user_I1));
Macros
Table 9-8 shows the library macros that use the carry logic. All adders, adder/subtractors,
and accumulators use the carry logic. Only counters that begin with "CC" use the carry
logic, along with Magnitude Comparators with "MC". Wide gates of 16 bits use the carry
logic while 12-bit and smaller gates do not. None of the standard library macros use the
MULT_AND gate.
Adder
The adder and adder/subtractor components in the CORE Generator software
automatically use the carry logic in a similar fashion to the library macros. The CORE
Generator version allows for more flexibility in terms of data widths and registered
outputs, among other functions.
ASET SSET
D_OVFL Q_OVFL
ADD D_C_OUT Q_C_OUT
OVFL
A[N:0] C_OUT
A_SIGNED
S[P:0] D[P:0] Q[P:0]
B_SIGNED
B[M:0] BYPASS
BYPASS CE
C_IN
CLK
UG331_c11_18_073106
Accumulator
The CORE Generator system's Accumulator is similar to the Adder/Subtractor but with
the registered output Q feeding back in the A input of the adder. Functions built using the
Accumulator use the Spartan-3 generation carry logic.
Comparator
The CORE Generator system's Comparator function also takes advantage of the carry logic
in implementation.
Multiplier
The CORE Generator system also includes a multiplier that can be targeted to either the
carry logic or the dedicated 18 x 18 multipliers. The graphical interface for the Multiplier
core includes the Multiplier Construction option. If Use LUTs is selected, the multiplier is
built using the carry and MULT_AND logic. If Use 18 x 18 Multiplier Blocks is selected, the
multiplier is built using the dedicated 18 x 18 multipliers.
UG331_c11_19_073106
Logic Gates
The 16-input gates in the library are implemented using carry logic. However, the logic
gates in the CORE Generator system basic elements get implemented in LUTs, not the
carry logic. When Use RPMs is selected, the function is placed in a column similar to the
carry-based functions. The arithmetic functions always use the carry logic even if Use
RPMs is not selected.
MUX_STYLE Constraint
The MUX_STYLE constraint guides the Xilinx XST synthesis tool to the type of multiplexer
implementation desired. This constraint controls the way the macrogenerator implements
the multiplexer functions. Allowed values are:
MULT_STYLE Constraint
The MULT_STYLE constraint guides the XST synthesis tool to the type of multiplier
implementation desired. This constraint controls the way the macrogenerator implements
the multiplier macros. Allowed values are:
• Auto: let synthesis tools decide which implementation is best (default in Project
Navigator)
• Block: use dedicated MULT18X18 multipliers
• LUT (default at command line): implement using carry and MULT_AND logic
• Pipe_block: use dedicated MULT18X18S pipeline multipliers (to be supported in a
future release)
• KCM: Constant Coefficient Multiplier (command line only)
• Pipe_LUT: pipeline multiplier using carry and MULT_AND logic with registered
inputs and outputs
Try selecting specific types of multiplier implementations to determine if the performance
or density improves for your own designs.
complete whole anywhere in the device. For example, an adder can be defined as requiring
adjacent slices in one column, but that stack of slices can be placed by the tools wherever it
is most efficient inside the device.
RLOC constraints can be applied to any of the carry primitives - MUXCY, XORCY, and
MULT_AND, along with flip-flops. An RLOC constraint cannot be applied directly to a
gate primitive but it can be applied to a LUT defined via the FMAP component. In
Spartan-3 generation designs, the RLOC constraint is specified using the slice-based XY
coordinate system (RLOC = X0Y0, etc.). Slices are numbered on an XY grid beginning in
the lower left corner of the chip. X ascends in value horizontally from left to right. Y
ascends in value vertically from bottom to top. A CLB actually encompasses two rows and
two columns of the coordinate system; slices S2 (X1Y0) and S3 (X1Y1) are considered
horizontally adjacent to slices S0 (X0Y0) and S1 (X0Y1) in the CLB. Each part of the
hierarchy can have its own independent set of relative constraints, and the user can even
define multiple sets per hierarchical block.
CLB
SLICE
X1Y1
SLICE
X1Y0
COUT
Switch Interconnect
Matrix CIN to Neighbors
SLICE
X0Y1
SHIFTOUT
SHIFTIN
SLICE
X0Y0
CIN DS099-2_05_082104
RLOC=X0Y0, the next two RLOC=X0Y1, and so on. See the example for the ADD4 library
component in Figure 9-21.
FMAP
B3 I3 RLOC=X0Y1 dummy
A3 dummy I3
B3
A3
RLOC=X0Y1
FMAP
B2 I2 dummy
A2 RLOC=X0Y1
dummy I2
B2
A2
RLOC=X0Y1
FMAP
B1 I1 RLOC=X0Y0 dummy
A1 dummy I1
B1
A1
RLOC=X0Y0
FMAP
B0 I0 RLOC=X0Y0 dummy
A0 dummy I0
B0
A[3:0] A0
B[3:0] RLOC=X0Y0
UG331_c11_21_080606
For more information on relative location constraints, see the Libraries and Constraints
guides. If relative locations are to be applied to both slices and other resources such as
block RAM, consider using the RPM GRID system described in XAPP416.
Applications
Although the carry logic is most directly applicable to arithmetic functions, it can also be
used for other types of logic.
Wide Gates
The MUXCY is useful for general-purpose logic because it is controlled by the LUT and can
have a fixed 0 or 1 input without using resources. An AND gate can be implemented in a
mux by selecting the input "A" when B is High, as shown in Figure 9-22. The 0 data on the
0 side of the mux is available as a fixed input within the slice and does not require any
resources.
Applications
A*B
B 0 1
0 A
UG331_c11_22_080106
With two LUTs and two MUXCYs per slice, two four-input functions can be combined into
one result in each slice, as shown in Figure 9-23.
F*G
G
LUT MUXCY
0 1
0
F
LUT MUXCY
0 1
0 1
UG331_c11_23_080106
The initial 1 on the 1 input of the MUXCY can be sourced by one of the global 1 signals
available within the FPGA structure. Alternatively, it can be connected to a ninth input to
create a 9-input AND gate. With appropriate inversions, the AND function can be turned
into a NAND, OR, or NOR gate, with the same efficiency. A 1 is also available on the
MUXCY data input, while an initial zero can be generated in any unused LUT.
This implementation of wide logic functions provides higher performance and more
efficient utilization of the FPGA resources. The carry chain eliminates multiple levels of
logic and provides a fast path to the final result. The only limit on the width of the gate is
the number of LUTs in a column, allowing over 400 inputs in one function. Common
applications include wide input decoding, comparators, and counters.
The 16-input gates in the Xilinx library use the MUXCY logic (see Figure 9-24). 12-input
gates and smaller use multiple levels of LUTs.
OUT
4 MUXCY
0 1
LUT
0
Slice
4 MUXCY
0 1
LUT
0
16
AND OUT
4 MUXCY
0 1
LUT
0
Slice
4 MUXCY
0 1
LUT
VCC
UG331_c11_24_080106
Remember that another alternative for wide logic functions is to use the F5MUX and
FiMUX (F6MUX, F7MUX, F8MUX). These multiplexers are more efficient for registered
functions since they feed directly into the flip-flop in the same CLB, and can create any
function of up to 8 inputs in one level of logic, and some functions of up to 79 inputs. See
Chapter 8, “Using Dedicated Multiplexers” for more details.
Sum of Products
Generic logic descriptions will always be optimized into the four-input LUTs of the
Spartan-3 architecture, minimizing the number of resources required. The Xilinx software
is very efficient at optimizing logic to fit into the LUT structure, where the only limit is the
number of inputs, not the type of function.
Some logic architectures, including CPLDs, incorporate a sum-of-products structure, using
wide AND gates followed by OR gates. The wide AND gates can be implemented in the
Spartan-3 FPGA using the carry logic. The OR gates would be implemented in LUTs, with
up to four wide AND gates able to be combined in one fast LUT. Since carry-based AND
gates will be vertical with the result at the top, the OR LUT should be placed in a CLB
above the column-based AND gates.
Note that the ORCY function, used in the Virtex-II and Virtex-II Pro families to OR together
carry-based AND gates, is not available in the Spartan-3 family. Using the LUT instead
provides for a smaller CLB (and therefore lower cost), and offers more placement
flexibility.
Comparators
The AND function in the MUXCY can be extended to implement an equality comparator
of two four-bit values per slice.
Applications
A=B
A3 A3=B3
B3
A(2:3)=B(2:3)
0 1 MUXCY
A2
B2 A2=B2
0
A1 A1=B1
B1
A(0:1)=B(0:1)
0 1 MUXCY
A0
B0 A0=B0
0 1
UG331_c11_25_080106
A magnitude comparator can also be implemented using the carry logic, at two bits per
slice.
A1=B1
B1
A1 0 1 MUXCY
A0 < B0
A0=B0
A0
B0 0 1 MUXCY
UG331_c11_26_080106
As with the logic gates, inverters can be used to generate inequality or other comparisons.
Adders
The adder is the fundamental function of the carry logic, as described earlier. Two bits per
Logic Cell can be added.
COUT
Function
Generator MUXCY
B
A 0 1
Sum
D Q
CIN
XORCY
UG331_c11_27_080106
Counters
A binary counter can be implemented by toggling each flip-flop when all the lower-order
flip-flops are High. Figure 9-28 shows a typical binary up counter using toggle flip-flops.
Each bit toggles if all the lower-order bits are high. The AND gates are required for each bit
and get wider as the counter gets larger.
T Q
T Q
T Q
T Q
1 T Q
UG331_c11_28_080106
These wide AND gates are also a candidate for the carry logic, especially since it avoids
having to duplicate the gate at different widths. Figure 9-29 shows the same binary counter
using the MUXCY in place of the AND gates. Each MUXCY expands the width of the AND
gate by the additional bit needed for each stage of the counter, and there is no redundant
logic. The performance limit is only the propagation of the carry chain instead of wide
AND gates.
Applications
MUXCY
0 1
T Q
0
MUXCY 0 1
T Q
0
MUXCY 0 1
T Q
0
MUXCY 0 1
T Q
0
1 T Q
UG331_c11_29_080106
Figure 9-30 shows the implementation of the carry-based binary counter using the D flip-
flops available in the slices. The inverter and XOR gates are implemented in the LUT
preceding each flip-flop.
MUXCY
0 1
D Q
0
MUXCY 0 1
D Q
0
MUXCY 0 1
D Q
0
MUXCY 0 1
D Q
0
MUXCY 0 1
D Q
0 1
UG331_c11_30_080606
Multipliers
Multiplication is typically done by generating partial products and then adding the results.
The carry logic optimizes both aspects of the design.
One-bit multiplication is logically very simple, requiring only sets of AND gates. These
gates either allow the input value to be passed or force the partial product completely to
zero.
A3 A3
A2 A2
First Second
Partial Partial
A1 Product A1 Product
A0 A0
B0 B1 UG331_c11_31_080206
Then, all the partial products need to be added together with the appropriate bit
weighting. If there is sufficient time (enough clock cycles), the classical serial "shift and
add" technique can be adopted based on an accumulator; however, for a maximum
performance parallel multiplier, an addition tree is needed. This tree is effectively
implemented using carry-based adders, with pipelining registers available if desired for
better performance.
COUT
+ P4
A3
+ P3
A2
+ P2
A1
A0 + P1
+ P0
‘0’ CIN
B1 B0
UG331_c11_32_011508
These AND gates can be implemented in the MULT_AND and the LUTs available in the
CLBs.
Applications
LUT
CY_MUX
CY_XOR
MULT_AND
Dedicated
AxB AND Gate
B
UG331_c11_33_080206
12
A 14
+
16
+
14
+
20
+
14
+
16
+
14
8 +
B
3 Levels of Logic (3-Stage Pipeline)
54 Slices Fully Pipelined
UG331_c11_34_080606
8
A 10
+
12
+
10
+
16
+
10
+
12 20
+ +
10
+
Compensation
10 Delay
+
12
+
10
12 +
B
4 Levels of Logic
72 Slices Fully Pipelined UG331_c11_35_080606
The fact that 8 is a power of 2 means that the "12 x 8" breaks down nicely into 4 multiplier
adders in the first stage; hence, it leads to a symmetrical addition tree of 3 levels. In
contrast, the "8 x 12" is less elegant: the 6 multiplier adders of the first stage do not sum
easily, leading to more adders and 4 levels of logic. For a fully pipelined multiplier, there is
even the requirement for delay compensation.
The multiplier adders and pure adders of the "8 x 12" are generally a smaller number of
bits than in the "12 x 8"; but with the efficient carry-based adders in the Spartan-3
architecture, this has a very minimal impact on performance. In any case, both multipliers
have the same largest-size adder at the final stage. Combinatorial multiplier performance
will be set by the number of logic levels, and in this case, the "12 x 8" will definitely win.
It is difficult to know for certain how each design entry tool handles the implementation of
complex functions, so experiment with alternative implementations. Even simply
switching the order of the inputs could have a significant affect on the performance and
resource requirements.
Conclusion
possibly using Muxes to expand to more inputs, and removes the column-based
requirement of the MULT_AND based multipliers.
X 26 X 25 X 24 X 22 X 21
+ + + + + 119*x
UG331_c11_36_080306
X 27 X 23
- - 119*x
UG331_c11_37_011508
Conclusion
Dedicated carry logic provides fast arithmetic addition and subtraction. The
Spartan-3 generation CLB has two separate carry chains with two bits per slice. The
dedicated carry path and carry multiplexer can also be used to cascade function generators
for implementing wide logic functions. The arithmetic logic includes an XOR gate that
allows a two-bit full adder to be implemented within a slice. In addition, a dedicated AND
improves the efficiency of multiplier implementations. These resources are used
automatically by synthesis tools or can be explicitly called out by the user.
Chapter 10
IOB Overview
The Input/Output Block (IOB) provides a programmable, unidirectional or bidirectional
interface between a package pin and the FPGA’s internal logic, supporting a wide variety
of standard interfaces. The robust feature set includes programmable control of output
strength and slew rate, registered or combinatorial inputs and outputs with dedicated
double data rate (DDR) registers, programmable input delays, on-chip termination, and
hot-swap capability.
Figure 10-1, page 317 is a simplified diagram of the IOB’s internal structure. There are
three main signal paths within the IOB: the output path, input path, and 3-state path. Each
path has its own pair of storage elements that can act as either registers or latches. For more
information, see “Storage Element Functions,” page 328. The three main signal paths are as
follows:
• The input path carries data from the pad, which is bonded to a package pin, through
an optional programmable delay element directly to the I line. After the delay
element, there are alternate routes through a pair of storage elements to the IQ1 and
IQ2 lines. The IOB outputs I, IQ1, and IQ2 lead to the FPGA’s internal logic. The delay
element can be set to ensure a hold time of zero (see “Input Delay Functions”).
• The output path, starting with the O1 and O2 lines, carries data from the FPGA’s
internal logic through a multiplexer and then a three-state driver to the IOB pad. In
addition to this direct path, the multiplexer provides the option to insert a pair of
storage elements.
• The 3-state path determines when the output driver is high impedance. The T1 and T2
lines carry data from the FPGA’s internal logic through a multiplexer to the output
driver. In addition to this direct path, the multiplexer provides the option to insert a
pair of storage elements.
All signal paths entering the IOB, including those associated with the storage elements,
have an inverter option. Any inverter placed on these paths is automatically absorbed into
the IOB.
IOB Overview
T
TFF1
T1 D Q
CE
CK
SR REV
DDR
MUX
TCE
T2 D Q
TFF2
CE
CK
SR REV
Three-state Path
VCCO
OFF1
O1 D Q
CE
OTCLK1 CK Pull-Up ESD (1)
SR REV
DDR I/O
MUX Pin
OCE
Program- Pull-
O2 Q mable ESD
D Down
OFF2 Output
CE Driver
OTCLK2 CK
SR REV Keeper
Latch
Output Path
Programmable
I LVCMOS, LVTTL, PCI
Delay
IQ1 Programmable
Delay Single-ended Standards
using VREF
IDDRIN1 D Q
IDDRIN2 IFF1 VREF
CE Pin
ICLK1 CK
SR REV Differential Standards
ICE
I/O Pin
from
IQ2 Adjacent
D Q IOB
IFF2
CE
ICLK2 CK
SR REV
SR
REV Input Path DS312-2_19_080210
The devices offer complementary solutions for different applications. The Spartan-3A DSP
platform is optimized for digital signal processing and similar logic-intensive applications.
The Spartan-3AN platform offers a non-volatile FPGA solution. The Spartan-3A platform
has the highest number of I/Os per gate, and is most cost-effective for applications that are
I/O intensive. The Spartan-3E family offers a higher number of gates per I/O, making it
cost-effective for applications requiring more logic than I/O. The I/O ratios differ
primarily because the Extended Spartan-3A family has a dual, staggered I/O ring around
the device, while the Spartan-3E family has a single in-line I/O ring. The Spartan-3 family
offers even higher density solutions for both gates and I/Os, and also has a staggered I/O
ring.
Input-Only Pins
To optimize the I/O ring and reduce cost, some I/O blocks in the Extended Spartan-3A
and Spartan-3E families are input-only pins. Dedicated Inputs are IOBs usable only as
inputs. Pin names designate a Dedicated Input if the name starts with IP, for example, IP_x
or IP_Lxxx_x. Dedicated inputs retain the full functionality of the IOB for input functions
with a single exception for differential inputs (IP_Lxxx_x). For the differential Dedicated
Inputs, the on-chip differential termination is not available. To use the on-chip differential
termination, either choose a differential pair that supports outputs (IO_Lxxx_x) or use an
external 100Ω termination resistor on the board.
The unidirectional, input-only block has a subset of the full IOB capabilities. Thus there are
no connections or logic for an output path. The following paragraphs assume that any
reference to output functionality does not apply to the input-only blocks. The number of
input-only blocks varies with device size but is never more than 25% of the total IOB count.
For details on the number of input-only pins in each part/package combination, see
Chapter 1, “Overview.”
Summary of Differences
Table 10-2 highlights the major differences between the I/O resources of the different
Spartan-3 generation FPGA families. Some of these differences are described in more detail
later in this chapter.
Design Entry
In many cases the I/O resources are automatically selected by the implementation tools.
Users might want to specify particular components for special purposes, such as using
dedicated clock inputs. The components listed below can be instantiated in HDL code or in
a schematic.
Library Components
The Xilinx library includes an extensive list of components designed to provide support for
the variety of I/O features (Table 10-3). Most of these components represent variations of
the five generic I/O elements:
• IBUF (input buffer)
• IBUFG (global clock input buffer)
• OBUF (output buffer)
• OBUFT (3-state output buffer)
• IOBUF (input/output buffer)
Design Entry
Notes:
1. Must use a bidirectional differential IOSTANDARD such as BLVDS.
2. Must be differential if the DDR_ALIGNMENT = C0/C1 feature is used.
Earlier families had additional I/O components, but these are not recommended for use in
new designs. These components included:
• Bus I/O (Example: IBUF4)
These are still available for schematic entry only at 4, 8, and 16 bits wide, but
individual components allow more control over constraints.
• Registered I/O (Example: IFD)
These are also available for schematic entry and include both registered and latched
I/Os. However, it is recommended that the software be allowed to optimize to either
the IOB or the CLB, whichever is more efficient.
• I/O Standard Suffix (Example: IBUF_LVCMOS18)
These components included the IOSTANDARD as part of the component name. It is
recommended to apply an IOSTANDARD constraint to a generic component instead.
Registered I/O
The Spartan-3 generation IOB includes an optional flip-flop or latch on the input path,
output path, and 3-state control input. However, there are no special library components
for the I/O registers. To simplify design, especially synthesis, the standard register
primitives are automatically absorbed into the IOB when possible. This feature is selected
by the user by turning on the Map Property "Pack I/O Registers/Latches into IOBs", which
can be set to Off (default), For Inputs Only, For Outputs Only, or For Inputs and Outputs.
Alternatively, the IOB = TRUE property can be placed on a register to force the mapper to
place the register in an IOB.
An optional delay element is associated with the input path in each logic input primitive
(IBUF or IOBUF). When the buffer drives an input register within the IOB, the delay
element activates by default to ensure a zero hold time requirement. The delay element is
not used for non-registered inputs, to provide higher performance. The user can override
the defaults; see “Input Delay Functions,” page 325 for more details.
Differential I/O
The Spartan-3 generation IOBs include differential I/O standards such as LVDS, BLVDS,
and RSDS. Differential I/O requires two pins for every signal, which toggle in opposite
directions. To support differential signaling, most I/O components have differential
versions with DS in the name and two I/O pins on the component.
On the inputs, if only the P side of the differential pair is called out, the N side is
automatically configured as the other half of the differential pair. If the N input is called
out in a design for simulation and system-level integration, it is trimmed during the
mapping process, although physically it is still used in conjunction with the P input, and
the software does not allow it to be used for any other purpose.
On the outputs, both the P and N sides of the differential pair must be defined. The IOB
must have the same net source the control pins: clock, set/reset, three-state, three-state
clock enable, and output clock enable. In addition, the output pins must be inverted with
respect to each other, and, if output registers are used, the D inputs must be inverted to
each other and the INIT states must be opposite values (one High and one Low). Three-
state registers must have the same inputs and have the same INIT states. INIT states must
be set correctly for the power-up state even if the INIT function is not used in the design
(INIT is connected to ground).
The pins that can be used as differential pairs are specified in the Module 4 pinout tables,
including the special pairs that can be used for clock inputs.
IBUF
Signals used as inputs to the device must source an input buffer (IBUF) via an external
input port. Figure 10-2 shows the generic IBUF symbol.
IBUF
I O
x133_01_111699
Design Entry
IBUFG
IBUFG is a special global clock input buffer that can connect directly to the BUFG (global
clock buffer) and DCM components. A standard input driving a clock signal is put onto an
IBUFG by the Xilinx tools, or the user can instantiate the IBUFG directly. See Chapter 2,
“Using Global Clock Resources,” for more details. Figure 10-3 shows the generic IBUFG
symbol.
IBUFG
I O
x133_03_111699
IBUFDS
IBUFDS is an input buffer that supports differential signaling. In IBUFDS, a design level
interface signal is represented as two ports (I and IB), one deemed the "master" and the
other the "slave." The master and the slave are opposite phases of the same logical signal
(for example, MYNET and MYNETB). Figure 10-4 shows the generic IBUFDS symbol.
I
O
IB
UG331_c10_03_111106
OBUF
An OBUF must drive outputs through an external output port. Figure 10-5 shows the
generic output buffer (OBUF) symbol.
OBUF
I O
x133_04_111699
OBUFT
The generic 3-state output buffer OBUFT, shown in Figure 10-7, typically implements
3-state outputs. Unused I/Os are configured with a disabled OBUFT.
OBUFT
T
I O
x133_05_111699
IOBUF
Use the IOBUF symbol for bidirectional signals that require both an input buffer and a
3-state output buffer with an active high 3-state pin. This symbol combines the
functionality of the OBUFT and IBUF symbols. Figure 10-7 shows the generic
input/output buffer IOBUF.
IOBUF
T
I IO
x133_06_111699
HDL Entry
I/O components can be easily instantiated in VHDL or Verilog code. The Xilinx
development system includes language templates for any of the standard I/O
components.
Following is an example of the template for the IOBUF input/output buffer component.
Registers can automatically be merged into the I/O block, simplifying the generation of
the HDL code.
-- INOUT_PORT : inout STD_LOGIC;
--**Insert the following between the
-- 'architecture' and 'begin' keywords**
signal IN_SIG, OUT_SIG, T_ENABLE: std_logic;
component IOBUF
port (I, T: in std_logic;
O: out std_logic;
IO: inout std_logic);
Architectural Details
end component;
--**Insert the following after the 'begin' keyword**
U1: IOBUF port map (I => OUT_SIG, T => T_ENABLE,
O => IN_SIG, IO => INOUT_PORT);
Architectural Details
Delay
Pad D Q
IOB
Pad BUFG
UG331_c10_08_111106
Figure 10-8: Simplified View of Data and Clock Routing to Input Flip-Flop
There are actually two flip-flops on the input path to support double data rate signaling
called IFF1 and IFF2. They generate IOB signals IQ1 and IQ2, respectively, as shown in
Figure 10-1, page 317. The delay element choice affects both flip-flops.
The delay element is not used for non-registered (combinatorial) inputs in order to provide
higher performance. An IOB can supply both a registered and a non-registered version of
the same input pad if required in the application. When both paths are used, the delay
element choice is independent for the two paths, for example, allowing the registered path
to be delayed while the combinational path is not.
The user can override the defaults, either adding the delay to a combinatorial input or
removing it from a registered input. Extra delay might be required on some clock or data
inputs, for example, in interfaces to various types of RAM. If the design uses a DCM in the
clock path, then the delay element can be removed from registered inputs, still without a
hold time requirement.
Programmable Delay
In the Spartan-3E and Extended Spartan-3A families, the delay block itself has
programmable delay values.
Each IOB has a programmable delay block that can delay the input signal by a
programmable amount. In Figure 10-9, the signal path has a coarse delay element that can
be bypassed. The input signal then feeds a 6-tap delay line in the Spartan-3E family (an
8-tap delay line in the Extended Spartan-3A family). All six taps are available via a
multiplexer for use as an asynchronous input directly into the FPGA fabric. Three of the six
taps are also available via a multiplexer to the D inputs of the synchronous storage
elements. The coarse delay element is common to both asynchronous and synchronous
paths, and must be either used or not used for both paths.
IFD_DELAY_VALUE
Synchronous input (IQ1)
D Q
Coarse Delay
PAD
These delay values are set up in the silicon once at configuration time through the
IBUF_DELAY_VALUE and the IFD_DELAY_VALUE parameters. The default
IBUF_DELAY_VALUE is 0, bypassing the delay elements for the asynchronous input. The
user can set this parameter to 0-12 in the Spartan-3E family. The default
IFD_DELAY_VALUE is AUTO; the Xilinx software chooses the default value automatically
because the value depends on device size. The default values are shown in the data sheet
timing specifications, and are indicated in the Map report generated by the
implementation tools. The user can select a specific IFD_DELAY_VALUE from 0-6 in the
Spartan-3E family, and the resulting timing is reported by the Timing Analyzer tool.
IBUF_DELAY_VALUE and IFD_DELAY_VALUE are independent for each input. If the
same input pin uses both registered and non-registered input paths, both parameters can
be used, but they must both be in the same half of the total delay (both either bypassing or
using the initial delay element).
Architectural Details
changes to the delay amount. The choice of the coarse delay element is still fixed as part of
the device configuration.
IFD_DELAY_VALUE
Synchronous input (IQ1)
D Q
Coarse Delay
PAD
S[2:0]
The delay values at configuration are still controlled by the IFD_DELAY_VALUE and
IBUF_DELAY_VALUE parameters. To use the dynamic adjustment delay for
combinatorial inputs, replace the IBUF component with the IBUF_DLY_ADJ component
(see Figure 10-11) and connect the three select inputs. The IBUF_DLY_ADJ component is
only used for the combinatorial (non-registered) path, and has no affect on the IFD
(registered) path.
IBUF_DLY_ADJ
I O
S[2:0]
UG331_c10_11_111206
The IBUF_DLY_ADJ only allows moving up or down half of the total delay amount. The
DELAY_OFFSET parameter specifies whether it is the first half or the second half of the
delay amounts. DELAY_OFFSET = ON feeds the coarse delay element into the dynamic
mux, while DELAY_OFFSET = OFF bypasses the coarse delay element. Table 10-4 shows
how the IBUF_DELAY_VALUE corresponds to the Select lines. The binary equivalents of
the Select lines, 0 to 7, correspond to the IBUF_DELAY_VALUE options of 1-8 or 9-16. An
IBUF_DELAY_VALUE of 0 corresponds to completely bypassing the delay functions, and
is available with the IBUF component only, not IBUF_DLY_ADJ.
0 1
1 2
2 3
3 4
OFF
4 5
5 6
6 7
7 8
0 9
1 10
2 11
3 12
ON
4 13
5 14
6 15
7 16
Figure 10-12 shows how the two types of delay specifications control the muxes.
0 1 2 3 4 5 6 7
3
(dynamic)
IBUF_DELAY_VALUE 0 >0
UG331_c10_12_111206
Architectural Details
This is accomplished by taking data synchronized to the clock signal’s rising edge and
converting it to bits synchronized on both the rising and the falling edge. The combination
of two registers and a multiplexer is referred to as a Double-Data-Rate D-type flip-flop
(ODDR2).
Table 10-5 describes the signal paths associated with the storage element.
As shown in Figure 10-1, page 317, the upper registers in both the output and three-state
paths share a common clock. The OTCLK1 clock signal drives the CK clock inputs of the
upper registers on the output and three-state paths. Similarly, OTCLK2 drives the CK
inputs for the lower registers on the output and three-state paths. The upper and lower
registers on the input path have independent clock lines: ICLK1 and ICLK2.
Clock routing resources are often shared between adjacent IOBs, including differential
pairs. In these situations, the two OTCLK1, OTCLK2, ICLK1, and ICLK2 signals must be
identical when both IOBs used them. The software can swap between the upper and lower
registers if necessary, unless both are used in a DDR configuration.
The OCE enable line controls the CE inputs of the upper and lower registers on the output
path. Similarly, TCE controls the CE inputs for the register pair on the three-state path and
ICE does the same for the register pair on the input path.
The Set/Reset (SR) line entering the IOB controls all six registers, as is the Reverse (REV)
line.
In addition to the signal polarity controls, each storage element additionally supports the
controls described in Table 10-6.
Double-Data-Rate Transmission
Double-Data-Rate (DDR) transmission describes the technique of synchronizing signals to
both the rising and falling edges of the clock signal. Register pairs are available in all three
IOB paths to perform DDR operations.
The pair of storage elements on the IOB’s Output path (OFF1 and OFF2), used as registers,
combine with a special multiplexer to form a DDR D-type flip-flop (ODDR2). This
primitive permits DDR transmission where output data bits are synchronized to both the
rising and falling edges of a clock. DDR operation requires two clock signals (usually 50%
duty cycle), one the inverted form of the other. These signals trigger the two registers in
alternating fashion, as shown in Figure 10-13. The Digital Clock Manager (DCM) generates
the two clock signals by mirroring an incoming signal, and then shifting it 180 degrees.
This approach ensures minimal skew between the two signals. Alternatively, the inverter
inside the IOB can be used to invert the clock signal, thus only using one clock line and
both rising and falling edges of that clock line as the two clocks for the DDR flip-flops.
The storage-element pair on the Three-State path (TFF1 and TFF2) also can be combined
with a local multiplexer to form a DDR primitive. This permits synchronizing the output
enable to both the rising and falling edges of a clock. This DDR operation is realized in the
same way as for the output path.
The storage-element pair on the input path (IFF1 and IFF2) allows an I/O to receive a DDR
signal. An incoming DDR clock signal triggers one register, and the inverted clock signal
triggers the other register. The registers take turns capturing bits of the incoming DDR data
signal. The primitive to allow this functionality is called IDDR2.
Note that the ODDR2 and IDDR2 primitives must be used to access the special DDR logic
in the IOBs.
Aside from high bandwidth data transfers, DDR outputs also can be used to reproduce, or
mirror, a clock signal on the output. This approach is used to transmit clock and data
signals together (source synchronously). A similar approach is used to reproduce a clock
signal at multiple outputs. The advantage for both approaches is that skew across the
outputs is minimal.
Architectural Details
DCM DCM
180˚ 0˚ 0˚
FDDR FDDR
D1 D1
Q1 Q1
CLK1 CLK1
D2 D2
Q2 Q2
CLK2 CLK2
DS312-2_20_021105
IDDR2
As a DDR input pair, the master IOB registers incoming data on the rising edge of ICLK1
(= D1) and the rising edge of ICLK2 (= D2), which is typically the same as the falling edge
of ICLK1. This data is then transferred into the FPGA fabric. At some point, both signals
must be brought into the same clock domain, typically ICLK1. This can be difficult at high
frequencies because the available time is only one half of a clock cycle assuming a 50%
duty cycle. See Figure 10-14 for a graphical illustration of this function.
D Q D1
PAD
To Fabric
D Q D2
ICLK2
ICLK1
ICLK1
ICLK2
IDDR2
D
C0 Q0
C1
CE Q1
R
S
UG331_c10_15_111206
Architectural Details
D Q D1
PAD
To Fabric
IQ2 IDDRIN2
D Q D Q D2
ICLK1
ICLK2
ICLK1
ICLK2
ODDR2
As a DDR output pair, the master IOB registers data coming from the FPGA fabric on the
rising edge of OCLK1 (= D1) and the rising edge of OCLK2 (= D2), which is typically the
same as the falling edge of OCLK1. These two bits of data are multiplexed by the DDR mux
and forwarded to the output pin. The D2 data signal must be resynchronized from the
OCLK1 clock domain to the OCLK2 domain using FPGA slice flip-flops. Placement is
critical at high frequencies, because the time available is only one half a clock cycle. See
Figure 10-17 for a graphical illustration of this function.
In the ODDR2 component for the Extended Spartan-3A family, the DDR_ALIGNMENT
attribute allows both data bits to be captured on C0 or C1 (DDR_ALIGNMENT=C0 or
DDR_ALIGNMENT=C1). DDR_ALIGNMENT is only available on differential pairs (see
“Register Cascade Feature,” page 331).
Note: The Spartan-3E family does not support using the C0 or C1 alignment feature of the ODDR2
flip-flop. The ODDR2 flip-flop without the alignment feature is fully supported, as is the IDDR2 flip-flop
with alignment. Without the alignment feature, the ODDR2 component behaves equivalently to the
ODDR flip-flop components on previous Xilinx FPGA families. The Spartan-3A/3AN production
devices and Spartan-3A DSP devices fully support this feature.
Clock routing resources are often shared between adjacent IOBs, including differential
pairs. In these situations, the two OTCLK1, OTCLK2, ICLK1, and ICLK2 signals must be
identical when both IOBs used them. The software can swap between the upper and lower
registers if necessary, unless both are used in a DDR configuration.
D1 D Q
PAD
From
Fabric
D2 D Q
OCLK1
OCLK2
OCLK1
OCLK2
ODDR2
D0
D1
C0
C1 Q
CE
R
S
UG331_c10_17_111206
Architectural Details
During configuration, a Low logic level on the PUDC_B pin (HSWAP in the
Spartan-3E family and HSWAP_EN in the Spartan-3 family) activates pull-up resistors on
all I/O and Input-only pins not actively used in the selected configuration mode.
Notes:
3. Spartan-3AN FPGAs require VCCAUX = 3.0 to 3.6V
Keeper Circuit
Each I/O has an optional keeper circuit (see Figure 10-19) that keeps bus lines from
floating when not being actively driven. The KEEPER circuit retains the last logic level on
a line after all drivers have been turned off. Apply the KEEPER attribute or use the
KEEPER library primitive to use the KEEPER circuitry. Pull-up and pull-down resistors
override the KEEPER settings.
Pull-up
Output Path
Input Path
Keeper
Pull-down
UG331_c10_19_111206
Clamp Diodes
In the Spartan-3/3E FPGAs, internal clamp diodes protect all device pins against excess
voltage transients. Each I/O pin has two clamp diodes that are always connected to the
pin, regardless of the signal standard selected. One diode extends P-to-N from the pin to
VCCO, and the second diode extends N-to-P from the pin to GND. During normal
operation, these diodes are reverse biased. The VIN absolute maximum rating (see
DS099, Spartan-3 FPGA Family Data Sheet and DS312, Spartan-3E FPGA Family Data Sheet)
specifies the voltage range that each I/O pin can tolerate. When interfacing to a signal that
exceeds the VIN absolute maximum rating, use external components to limit the applied
voltage to the device I/O pin. For example, inserting a resistor between the signal and the
I/O pin will form a voltage divider with the internal clamp diodes when they become
forward biased. Other methods to limit excess voltage transients can be used including
using external level translators, external signal clamps, or external resistor voltage
dividers.
In the Extended Spartan-3A family FPGAs, the I/O uses a floating-well technique to
provide superior hot-swap capability. A clamp diode between the I/O pin and VCCO is
provided for PCI bus applications, but this diode is typically disabled because it would
defeat the purpose of the floating-well technique. As in Spartan-3/3E FPGAs, when
interfacing with signals that exceed the VIN absolute maximum rating, external
components must be used to limit the applied voltage to the device I/O pin. Unlike
Spartan-3/3E FPGAs, it is not possible to use a series resistor, because no clamp diodes are
present. However, all other methods listed are applicable.
See XAPP459, Eliminating I/O Coupling Effects when Interfacing Large-Swing Single-Ended
Signals to User I/O Pins on Spartan-3 Generation FPGAs for more information.
Table 10-8 provides a brief overview of the I/O standards supported by the
Spartan-3 generation FPGAs, including the sponsors and common uses for the standard.
The standard numbers are indicated where appropriate.
mini-LVDS
A serial, intra-flat panel solution that serves as an interface between the timing control
function and an LCD source driver.
Table 10-10 and Table 10-11 show the available I/O standards for the Spartan-3 generation
families.
Notes:
1. Outputs are restricted to banks 1 and 3. Inputs are unrestricted.
Notes:
1. These differential outputs are restricted to banks 0 and 2. Inputs are unrestricted.
2. These high-drive outputs are restricted to banks 1 and 3. Inputs are unrestricted.
Earlier libraries had I/O components with IOSTANDARD already specified as part of the
component, such as IBUF_LVTTL. These are not recommended for use in new designs. The
preferred method is to use the I/O component IBUF and assign IOSTANDARD = LVTTL.
IOSTANDARD can be attached to a net or signal when the net or signal is connected to a
pad. In this case, IOSTANDARD is treated as attached to the IOB primitive. When attached
to a design element, IOSTANDARD propagates to all applicable elements in the hierarchy
within the design element.
In VHDL, before using IOSTANDARD, it must be declared with the following syntax:
attribute iostandard: string;
After IOSTANDARD has been declared, specify the VHDL constraint as follows:
Timing Analysis
The choice of IOSTANDARD affects the timing for the I/O pin. The data sheet provides
example timing for the LVCMOS25 I/O standard with Fast slew rate and 12 mA drive.
This delay requires adjustment whenever a signal standard other than LVCMOS25 is
assigned to an Input or a standard other than LVCMOS25 with 12 mA drive and Fast slew
rate is assigned to an Output. The adjustments are automatically included in the Timing
Analyzer reports generated by the Xilinx development tools.
When measuring timing parameters at the programmable I/Os, different signal standards
call for different test conditions. The data sheets list the conditions to use for each standard.
The method for measuring Input timing is as follows: a signal that swings between a Low
logic level of VL and a High logic level of VH is applied to the Input under test. Some
standards also require the application of a bias voltage to the VREF pins of a given bank to
properly set the input-switching threshold. The measurement point of the Input signal
(VM) is commonly located halfway between VL and VH.
For the Output test setup, one end of the termination resistor RT is connected to a
termination voltage VT and the other end is connected to the Output. For each standard, RT
and VT generally take on the standard values recommended for minimizing signal
reflections. If the standard does not ordinarily use terminations (for example, LVCMOS,
LVTTL), then RT is set to 1MΩ to indicate an open connection, and VT is set to zero. The
same measurement point (VM) that was used at the Input is also used at the Output.
VT (VREF)
FPGA Output
RT (RREF)
VM (VMEAS)
CL (CREF)
ds312-3_04_090105
The capacitive load (CL) is connected between the output and GND. The Output timing for
all standards, as published in the speed files and the data sheet, is always based on a CL
value of zero. High-impedance probes (less than 1 pF) are used for all measurements. Any
delay that the test fixture might contribute to test measurements is subtracted from those
measurements to produce the final timing numbers as published in the speed files and
data sheet.
High output current drive strength and FAST output slew rates generally result in fastest
I/O performance. However, these same settings can also result in transmission line effects
on the PCB for all but the shortest board traces. Each IOB has independent slew rate and
drive strength controls. Use the slowest slew rate and lowest output drive current that
meets the performance requirements for the end application. Note that in the
Extended Spartan-3A family, the 16 mA drive setting is faster than the 24 mA drive setting
for the slow slew rate. If 24 mA drive and the highest performance is needed, use the fast
slew rate instead.
LVCMOS25/33 and LVTTL standards have about 100 mV of hysteresis on inputs.
SSTL2 Class I
V = 1.25V
TT
V = 2.5V
CCO
50Ω
25Ω
Z = 50
VREF = 1.25V
x133_15_011900
Once a configuration data file is loaded into the FPGA that calls for the inputs of a given
bank to use HSTL/SSTL, a few specifically reserved I/O pins on the same bank
automatically convert to VREF inputs. All the VREF inputs on a bank need to be connected,
and all need to connect to the same voltage. As a result, HSTL and SSTL inputs can only be
combined in a bank if they use the same VREF voltage (for example, the 1.8V versions of the
SSTL and HSTL standards, where VREF = 0.9V.) For banks that do not contain HSTL or
SSTL, VREF pins remain available for user I/Os or input pins.
VREF is also required for inputs using the GTL and GTLP I/O standards, which are
supported only in the Spartan-3 family. LVTTL and LVCMOS standards do not require
VREF . The only differential standards that require VREF are the differential forms of the
HSTL and SSTL I/O standards (DIFF_HSTL and DIFF_SSTL).
Z=50
VREF
Z=50 Z=50
VREF VREF
Z=50
Z=50
VREF
VREF
x133_07_112499
Sample circuits illustrating valid termination techniques for several HSTL and SSTL
standards appear in Figure 10-23 through Figure 10-29. LVTTL, LVCMOS, and PCI
standards require no termination. For GTL or DCI termination in the Spartan-3 family, see
the Spartan-3 FPGA data sheet at DS099.
HSTL Class I
VTT = 0.75V
VCCO = 1.5V
50Ω
Z = 50
VREF = 0.75V
x133_10_112499
VREF = 0.9V
x133_11_112499
HSTL Class IV
VTT = 1.5V VTT = 1.5V
VCCO = 1.5V
50Ω 50Ω
Z = 50
VREF = 0.9V
x133_12_112499
SSTL3 Class I
VTT = 1.5V
VCCO = 3.3V
50Ω
25Ω
Z = 50
VREF = 1.5V
x133_13_112499
SSTL3 Class II
VTT = 1.5V VTT = 1.5V
VCCO = 3.3V
50Ω 50Ω
25Ω
Z = 50
VREF = 1.5V
x133_14_112499
SSTL2 Class I
V = 1.25V
TT
V = 2.5V
CCO
50Ω
25Ω
Z = 50
VREF = 1.25V
x133_15_011900
SSTL2 Class II
VTT= 1.25V VTT= 1.25V
VCCO = 2.5V
50Ω 50Ω
25Ω
Z = 50
VREF = 1.25V
x133_16_011900
100Ω
Z0 = 50Ω
Spartan-3E
Differential Input
Spartan-3A with On-Chip
Z0 = 50Ω Differential
Differential
Terminator
~100Ω
Output
Z0 = 50Ω
DS312-2_24_061606
The on-chip differential termination is powered by VCCO. Therefore, the VCCO level in a
bank must match the voltage standard for any input using differential termination. In the
Extended Spartan-3A family, on-chip differential termination is specified at 100Ω nominal
in banks with VCCO = 3.3V. The on-chip differentiation termination can be used in banks
powered by VCCO = 2.5V, but a wider resistance range is specified. See Module 3 of DS529,
Spartan-3A FPGA Family Data Sheet for specific values. Figure 10-31 shows the details of
using the differential termination in the Extended Spartan-3A family.
Bank 1
Bank 3
DIFF_TERM=Yes
b) Differential pairs using DIFF_TERM=Yes constraint DS529-3_09_022507
Figure 10-31: External Input Termination Resistors for Extended Spartan-3A Family FPGA LVDS, RSDS,
MINI_LVDS, and PPDS I/O Standards
In the Spartan-3E family, on-chip differential termination is only supported on banks with
VCCO = 2.5V, and is specified at 120Ω nominal (see Module 3 of DS312, Spartan-3E FPGA
Family Data Sheet).
The DIFF_TERM attribute is set to TRUE to enable differential termination on a differential
I/O pin pair. This attribute uses the following syntax in the UCF:
INST <I/O_BUFFER_INSTANTIATION_NAME> DIFF_TERM = “<TRUE/FALSE>”;
TMDS_33 Termination
The Extended Spartan-3A family TMDS_33 standard requires pull-up resistors as shown
in Figure 10-32.
Bank 1
Bank 3
3.3V
Bank 2 Bank 2
50Ω 50Ω
VCCO = 3.3V VCCAUX = 3.3V
TMDS_33 TMDS_33
Figure 10-32: External Input Resistors Required for TMDS_33 I/O Standard
140Ω 100Ω
Z0 = 50Ω
165Ω
ds312-3_07_102105
Figure 10-33: External Output and Input Termination Resistors for BLVDS I/Os
For a bidirectional BLVDS connection, the design must be simulated using the IBIS models
to verify resistor values and their effect on rise and fall times. Refer to XAPP243, Bus LVDS
with Virtex-E Devices for information on an example termination method.
In the Extended Spartan-3A family, the BLVDS outputs are allowed on any bank, and there
is no VCCO restriction on inputs, as shown in Figure 10-34.
Bank 1
Bank 3
1/4 th of Bourns
Bank 3
165Ω
DS529-3_07_020107
Figure 10-34: Extended Spartan-3A Family FPGA External Output and Input
Termination Resistors for BLVDS I/Os
ESD Protection
Protection circuitry on all Spartan-3 generation I/Os protects all device pads against
damage from electro-static discharge (ESD) as well as excessive voltage transients. ESD
protection specifications are typically ±2000V for the Human Body Model. Details are
provided in Module 3 of each family’s data sheet.
In the Extended Spartan-3A family, this protection circuitry does not limit I/O voltage
range.
In the Spartan-3E and Spartan-3 families, clamp diodes protect all device pads against
damage from both ESD as well as excessive voltage transients. Each I/O has two clamp
diodes: one diode extends P-to-N from the pad to VCCO, and a second diode extends
N-to-P from the pad to GND. During operation, these diodes are normally biased in the off
state. These clamp diodes are always connected to the pad, regardless of the signal
standard selected. The presence of diodes limits the ability of Spartan-3/3E FPGA I/Os to
tolerate high signal voltages. The VIN absolute maximum rating in Module 3 of each data
sheet specifies the voltage range that I/Os can tolerate. Input voltages outside the VIN max
voltage range are permissible provided that the IIK input diode clamp diode rating is met
and no more than 100 pins exceed the range simultaneously.
The Extended Spartan-3A family of FPGAs has clamp diodes to VCCO only after
configuring the I/Os to the PCI33 or PCI66 I/O standards.
available under one of the five possible VCCO values. In addition, inputs often do not need
to match the voltage applied to VCCO.
Further flexibility is achieved by offering multiple VCCO levels in a single device. The
VCCO power rails are provided independently each bank of I/Os, or side of the device.
Most Spartan-3 generation devices organize IOBs into four I/O banks as shown in
Figure 10-35. Each bank maintains separate VCCO and VREF supplies. The separate
supplies allow each bank to independently set VCCO (which provides current to the
outputs and additionally powers the on-chip differential termination) and VREF (which
supplies the reference voltage for HSTL and SSTL). Refer to Table 10-17 through
Table 10-19 for VCCO and VREF requirements. Most members of the Spartan-3 family have
eight I/O banks—see DS099, Spartan-3 FPGA Family Data Sheet for more details.
Bank 0
Bank 3
Bank 1
Bank 2
DS312-2_26_021205
The design implementation tools automatically assign pins to separate banks when
necessary to meet VCCO or VREF requirements. The user can also assign pins using any of
the floorplanning tools available. In the pinout, bank numbers are specified for each I/O
pin on the device. For example, IO_L18P_0 is a part of differential pair L18 on bank 0.
Table 10-17: Extended Spartan-3A Family FPGA Single-Ended IOSTANDARD Bank Compatibility
VCCO Supply/Compatibility Input Requirements
Single-Ended Board
IOSTANDARD 1.2V 1.5V 1.8V 2.5V 3.3V VREF Termination
Voltage (VTT)
Input/
LVTTL Input Input Input Input N/R(1) N/R
Output
Input/
LVCMOS33 Input Input Input Input N/R N/R
Output
Input/
LVCMOS25 Input(2) Input(2) Input(2) Input(2) N/R N/R
Output
Input/
LVCMOS18 Input Input Input Input N/R N/R
Output
Table 10-17: Extended Spartan-3A Family FPGA Single-Ended IOSTANDARD Bank Compatibility (Cont’d)
VCCO Supply/Compatibility Input Requirements
Single-Ended Board
IOSTANDARD 1.2V 1.5V 1.8V 2.5V 3.3V VREF Termination
Voltage (VTT)
Input/
LVCMOS15 Input Input Input Input N/R N/R
Output
Input/
LVCMOS12 Input Input Input Input N/R N/R
Output
Input/
PCI33_3 Input Input Input Input N/R N/R
Output
Input/
PCI66_3 Input Input Input Input N/R N/R
Output
Input/
HSTL_I_18 Input Input Input Input 0.9 0.9
Output
Input/
HSTL_II_18 Input Input Input Input 0.9 0.9
Output
Input/
HSTL_III_18 Input Input Input Input 1.1 1.8
Output
Input/
HSTL_I Input Input Input Input 0.75 0.75
Output
Input/
HSTL_III Input Input Input Input 0.9 1.5
Output
Input/
SSTL18_I Input Input Input Input 0.9 0.9
Output
Input/
SSTL18_II Input Input Input Input 0.9 0.9
Output
Input/
SSTL2_I Input Input Input Input 1.25 1.25
Output
Input/
SSTL2_II Input Input Input Input 1.25 1.25
Output
Input/
SSTL3_I Input Input Input Input 1.5 1.5
Output
Input/
SSTL3_II Input Input Input Input 1.5 1.5
Output
Notes:
1. N/R - Not required for input operation.
2. To use LVCMOS25 inputs when VCCO is not 2.5V, VCCAUX must be set to 2.5V.
Notes:
1. N/R - Not required for input operation.
Notes:
1. Banks 4 and 5 of any Spartan-3 device in a VQ100 package do not support signal standards using VREF.
2. The VCCO level used for the GTL and GTLP standards must be no lower than the termination voltage (VTT), nor can it be lower than
the voltage at the I/O pad.
Table 10-20: Extended Spartan-3A Family FPGA Differential IOSTANDARD Bank Compatibility
Table 10-20: Extended Spartan-3A Family FPGA Differential IOSTANDARD Bank Compatibility (Cont’d)
Table 10-20: Extended Spartan-3A Family FPGA Differential IOSTANDARD Bank Compatibility (Cont’d)
Notes:
1. Banks 0 and 2 can each support any two of the following 2.5V differential standards: LVDS_25 outputs, MINI_LVDS_25 outputs,
RSDS_25 outputs, PPDS_25 outputs, or any two of the following 3.3V differential standards: LVDS_33 outputs, MINI_LVDS_33
outputs, RSDS_33 outputs, PPDS_33 outputs, TMDS_33 outputs. Other I/O bank restrictions might apply.
2. VREF is not used for the differential I/O standards.
3. Spartan-3AN FPGAs require VCCAUX = 3.3V
4. Power VCCAUX rails at 3.3V and set CONFIG VCCAUX = 3.3.
Input,
Applies to
LVDS_25 Input On-chip Differential Termination(2), Input
Outputs Only
Output
Input,
Applies to
RSDS_25 Input On-chip Differential Termination(2), Input
Outputs Only
Output
Input,
Applies to
MINI_LVDS_25 Input On-chip Differential Termination(2), Input
Outputs Only
Output
LVPECL_25 Input Input Input
BLVDS_25 Input Input, Output Input
No Differential
DIFF_HSTL_I_18 Input, Output Input Input Bank Restriction
(other I/O bank
DIFF_HSTL_III_18 Input, Output Input Input restrictions might
apply)
DIFF_SSTL18_I Input, Output Input Input
DIFF_SSTL2_I Input Input, Output Input
Notes:
1. Each bank can support one of the following: LVDS_25 outputs, MINI_LVDS_25 outputs, RSDS_25 outputs.
2. On-chip differential termination is not supported on input-only pins (differential pad type IP).
3. VREF is not used for the differential I/O standards.
Applies to Outputs
LVDS_25 Input Input, Output Input
Only
Applies to Outputs
RSDS_25 Input Input, Output Input
Only
Applies to Outputs
LDT_25 (ULVDS_25) Input Input, Output Input
Only
Applies to Outputs
LVDSEXT_25 Input Input, Output Input
Only
LVPECL_25 Input Input, Output Input No Differential
BLVDS_25 Input Input, Output Input Bank Restriction
(other I/O bank
DIFF_HSTL_II_18 Input, Output Input Input restrictions might
DIFF_SSTL2_II Input Input, Output Input apply)
Notes:
1. Each bank can support any two of the following: LVDS_25 outputs, RSDS_25 outputs, LDT_25 (ULVDS_25) outputs, LVDSEXT_25
outputs.
2. VREF is not used for the differential I/O standards.
turning on the input protection diodes (see Module 3 of DS312, Spartan-3E FPGA Family
Data Sheet for the specifications). The Extended Spartan-3A family FPGA maximum VIN
values are independent of VCCO, except for the PCI standards —see Module 3 of
DS529, Spartan-3A FPGA Family Data Sheet for the specifications.
In some applications it might be desirable to receive signals with a greater voltage swing
than the I/Os ordinarily permit. The most common case is receiving 5V signals on pins set
to a 3.3V I/O standard. These large-swing signals might be by design or can be a result of
severe overshoot.
A similar situation might exist on the outputs, where the Spartan-3 generation FPGA
needs to drive external devices supporting standards with larger swing. The
Spartan-3 generation outputs at 3.3V can directly drive most 5V devices, although with
less margin. Similarly, the LVCMOS25 dedicated configuration outputs can directly drive
most 3.3V external devices.
For the specific case of interfacing to the PCI bus, see XAPP457, Powering and Configuring
Spartan-3 Generation FPGAs in Compliant PCI Applications.
Voltage Translators
Xilinx recommends the use of voltage level translators as the preferred solution when
interfacing with large-swing signals. A voltage level translator can be as simple as a two-
resistor voltage divider, or as complex as a PCI-to-PCI bridge.
Open-Drain Interfacing
According to another approach, the outputs of certain external devices can be configured
as open-drain outputs. Such outputs with pull-up resistors tied to a low voltage rail can be
used to limit the signal swing so that the FPGA’s Power Diode does not turn on. See
Figure 10-36.
OFF
Input
ROH
IOH VOH
Open-Drain OFF
ROL
FPGA
X484_10_121205
The open-drain output is comparatively slow with reduced noise margin; it is most
suitable for cases where timing is not too critical. Fortunately, in most cases, Dedicated
Inputs (PROG_B, TDI, TMS, and TCK) do not usually need to switch very fast.
In this type of solution, the internal Power Diode between the FPGA’s output and its
associated power rail is allowed to turn on, and a resistor in series with the output is used
to limit the current. The current, which flows back into the regulator, is known as reverse
current. Another resistor can be put across the power supply’s output to help ensure proper
regulation.
In this solution, parasitic leakage current between user I/Os in differential pin pairs can
occur, even though the I/Os are configured with single-ended standards. This parasitic
leakage can occur when the VIN is beyond the recommended operating conditions, either
positive or negative, even if the input current is limited. This parasitic current can cause
unexpected device behavior, but does not damage the device. If you use this technique for
large-swing signals, you should either leave the other pin of the differential pair unused, or
manage the potential effects of the increased leakage. See XAPP459, “Eliminating I/O
Coupling Effects when Interfacing Large-Swing Single-Ended Signals to User I/O Pins on
Spartan-3 Generation FPGAs.”
Parasitic Leakage
Parasitic leakage current between user I/Os in differential pin pairs can occur even though
the I/Os are configured with single-ended standards. To provide flexibility, differential
I/O pairs can be used either as a differential I/O or as two single-ended I/Os using
transistors to disable the DIFF_TERM resistor when it is not needed, as shown in
Figure 10-37. This affects Spartan-3, Spartan-3E, and Spartan-3A family differential I/O
pairs.
Parasitic leakage occurs in all Spartan-3 generation families. Although the Spartan-3
family does not support DIFF_TERM resistors, parasitic leakage still exists and must be
accounted for. The Spartan-3 family general recommended operating conditions is limited
to –0.3V.
Victim IBUF
Parasitic Leakage
Aggressor IBUF
DIFF_TERM IBUFDS
Disabled
DIFF_TERM
Disabled
UG331_c10_20_080410
If either of the inputs is negatively driven for any extended periods of time, parasitic
leakage might exist. Pins not affected include these cases:
• The input is located on an input-only pin: IP_L<XXY>_<Z>
• XX: Differential pair numbering
• Y: P/N
• Z: Bank number
• The input is located on a pin that only supports single-ended I/O standards: IO_<Z>
• Z: Bank number
• The aggressor undershoot is 0.0V to –0.2V
• The victim is an input with a pull-up resistor less than 1 kΩ with the aggressor pulse
less than 1 ns for the following IOSTANDARDs:
• LVCMOS33
• LVCMOS25
• LVCMOS18
• LVCMOS15
• The victim is an input pin actively driven High with at least 7.0 mA
• The victim is an output pin with any of the following IOSTANDARDs:
• LVCMOS33_[FAST, SLOW, QUIETIO]_[6, 8, 12, 16, 24]
• LVCMOS25_[FAST, SLOW, QUIETIO]_[6, 8, 12, 16, 24]
• LVCMOS18_[FAST, SLOW, QUIETIO]_[6, 8, 12, 16]
• LVCMOS15_[FAST, SLOW, QUIETIO]_[4, 6, 8, 12]
• LVCMOS12_[FAST, SLOW, QUIETIO]_[6]
• PCI33_3, PCI66_3
• HSTL_I, HSTL_III, HSTL_I_18, HSTL_II_18, HSTL_III_18
• SSTL18_I, SSTL_18_II, SSTL2_I, SSTL2_II, SSTL3_I, SSTL3_II
Two typical situations can occur that might cause parasitic leakage to affect a design. In the
first situation, as is more common, an undershoot on input voltages can cause fast-falling
edges to drop below 0V for short periods of time (see Figure 10-38). Undershoot is only a
problem if the aggressor's input voltage drops below –0.2 V long enough for the
differential termination resistors to be enabled.
VCCO
0V
–0.2V
–0.5V
To analyze the effect of the parasitic leakage associated with undershoot, testing is
performed using a weak pull-up termination resistor, as shown in Figure 10-39. A negative
pulse is injected into the aggressor. The leakage current is then measured on the victim.
Leakage current for undershoot pulses of 1 ns is shown in Table 10-23.
In the case of LVCMOS12, the victim is pulled down to 0.62V when the aggressor voltage
is –0.5V. For LVCMOS12 inputs, this is below the recommended input voltage
(VIH = 0.7V) required to ensure that a high value is correctly detected.
VTT
IBUF
Victim
DIFF_TERM
Disabled
DIFF_TERM IBUF
Disabled
Aggressor
UG331_c10_22_080910
Notes:
1. 1 kΩ pull-up resistor terminated to VTT, TUNDERSHOOT = 1 ns
In the second situation where the undershoot lasts long enough, understanding how the
parasitic leakage behaves in a steady state is important. For example, during hot-swap,
input signals coming over a backplane might cause an input voltage to switch below 0.0V
long enough to cause parasitic leakage. Figure 10-40 shows the leakage current for the
different voltages of the victim pin. Because the leakage curves change with different
aggressor voltages, four curves are shown in the figure with the aggressor voltage set at –
0.2V, –0.3V,
–0.4V, and –0.5V.
6
VAGGRESSOR = –0.5V
4
IVICTIM (mA)
VAGGRESSOR = –0.4V
2
1
VAGGRESSOR = –0.3V
VAGGRESSOR = –0.2V
0
-0.5 0 0.5 1 1.5 2 2.5 3 3.5
VVICTIM (V)
UG331_c10_23_080410
6.00
VAGGRESSOR = –0.5V
5.00
4.00
IVICTIM (mA)
3.00
VAGGRESSOR = –0.4V
2.00
1.00
VAGGRESSOR = –0.3V
VAGGRESSOR = –0.2V
0.00
-0.5 0 0.5 1 1.5 2 2.5 3 3.5
VVICTIM (V) UG331_c10_24_080610
7.00
VAGGRESSOR = –0.5V
6.00
5.00
IVICTIM (mA)
4.00
3.00
VAGGRESSOR = –0.4V
2.00
VAGGRESSOR = –0.3V
1.00
VAGGRESSOR = –0.2V
0.00
-0.5 0 0.5 1 1.5 2 2.5 3 3.5
VVICTIM (V) UG331_c10_25_080610
6.00
RPULLUP = 691Ω
VCCO = 3.3V
5.00
RPULLUP = 254Ω
VAGGRESSOR = –0.5V
VCCO = 3.3V
4.00
IVICTIM (mA)
VIH = 2.0V
3.00 RPULLUP = 2955Ω
VCCO = 3.3V
VAGGRESSOR = –0.4V
2.00
1.00
VAGGRESSOR = –0.3V
VAGGRESSOR = –0.2V
0.00
-0.5 0 0.5 1 1.5 2 2.5 3 3.5
VVICTIM (V)
UG331_c10_26_080610
Figure 10-43: Resistor Sizing Using Parasitic Leakage for LVCMOS33 (85°C)
Figure 10-44 and Figure 10-45 show three resistor values plotted against the leakage curves
for LVCMOS25 and LVCMOS12, respectively.
RPULLUP = 438Ω
VIH = 1.7V
VCCO = 2.5V
3.00
2.00
VAGGRESSOR = –0.4V
RPULLUP = 1889Ω
1.00
VCCO = 2.5V
VAGGRESSOR = –0.3V
VAGGRESSOR = –0.2V
0.00
-0.5 0 0.5 1 1.5 2 2.5
VVICTIM (V) UG331_c10_28_080610
Figure 10-44: Resistor Sizing Using Parasitic Leakage Curve for LVCMOS25 (85°C)
6.00
RPULLUP = 107Ω
RPULLUP = 302Ω VCCO= 1.2V
VAGGRESSOR = –0.5V
5.00 VCCO= 1.2V
4.00
IVICTIM (mA)
VIH = 0.7V
3.00
VAGGRESSOR = –0.4V
2.00
RPULLUP = 1388Ω
1.00 VCCO= 1.2V
VAGGRESSOR = –0.3V
VAGGRESSOR = –0.2V
0.00
-0.5 0 0.5 1 1.5
VVICTIM (V)
UG331_c10_28_080610
Figure 10-45: Resistor Sizing Using Parasitic Leakage Curve for LVCMOS12 (85°C)
Similarly, it is possible to determine if an output driver is strong enough to keep the output
at the logical High level, VOH. Using the Spartan-3A FPGA IBIS models, the output drivers
can be compared against the parasitic leakage curves. As shown in Figure 10-46,
LVCMOS33 (drive = 2 or 4 mA) crosses the VAGGRESSOR = –0.5V curve close to the VOH
limit of 2.9V. Using LVCMOS33 with an output drive of 6 mA or more provides additional
margin.
25
LVCMOS33 24 mA
LVCMOS33 16 mA
20
LVCMOS33 12 mA
LVCMOS33 6,8 mA
IVICTIM (mA)
15 LVCMOS33 2,4 mA
10
VAGGRESSOR = –0.5V
5
VOH = 2.9V
VAGGRESSOR = –0.4V
VAGGRESSOR = –0.3V
0 VAGGRESSOR = –0.2V
-0.5 0 0.5 1 1.5 2 2.5 3 3.5
VVICTIM (V) UG331_c10_29_080610
PUDC_B revert to the user settings and PUDC_B is available as a general-purpose I/O. For
more information on PULLUP and PULLDOWN, see “Pull-Up and Pull-Down Resistors,”
page 334.
For more information on the power-up and configuration processes, see UG332, Spartan-3
Generation Configuration User Guide.
Chapter 11
Introduction
Spartan-3 generation FPGAs have a number of features to fortify the chip’s arithmetic
capabilities. Carry logic and dedicated carry routing continues to be provided as in past
generations. Dedicated AND gates in the CLBs accelerate array multiplication operations.
The most significant addition is the dedicated 18x18 two’s-complement multiplier block.
With 3 to 104 of these dedicated multipliers in each device, fast arithmetic functions can be
implemented with minimal use of the general-purpose resources. In addition to the
performance advantage, dedicated multipliers require less power than CLB-based
multipliers.
The embedded multipliers offer fast, efficient means to create 18-bit signed by 18-bit
signed multiplication products. The multiplier blocks share routing resources with the
Block SelectRAM™ memory, allowing for increased efficiency for many applications.
Applications such as signed-signed, signed-unsigned, and unsigned-unsigned
multiplication, logical, arithmetic, and barrel shifters, two’s-complement and magnitude
return are easily implemented.
High-level synthesis tools usually automatically infer the dedicated multiplier for generic
multiplication operations in VHDL or Verilog. To allow more user control or to use special
features of the multiplier, it can be instantiated in a design or defined using the
CORE Generator™ system.
MULT18X18SIO
A[17:0] P[35:0]
B[17:0]
CEA
CEB
CEP
CLK
RSTA
RSTB
RSTP
BCIN[17:0] BCOUT[17:0]
DS312-2_28_021205
Optional registers are available on both the A and B inputs and the P output. The
registered paths share a common clock CLK and have independent active-High clock
enables and synchronous resets. The CLK, CE, and RST inputs all have programmable
polarity.
The 18-bit width of the Spartan-3 generation multiplier is unusual but matches with the 18-
bit width of the block RAM, which includes parity bits. Standard 8-bit or 16-bit multipliers
can be created by using part of the multiplier block, or a 32-bit multiplier can be created via
cascading. The Xilinx architecture allows any non-standard bit width to be implemented,
exactly matching the needs of the application. Unused multiplier inputs are connected
automatically to zero via connections to unused LUTs that are set to zero.
Location Constraints
MULT18X18SIO embedded multiplier instances can have LOC properties attached to them
to constrain placement. MULT18X18SIO placement locations differ from the convention
used for naming CLB locations, allowing LOC properties to transfer easily from array to
array.
DIA[#:0] DOA[#:0]
Port A DOA[17:0]
A[17:0] A
RAMB16BWE
MULT18X18SIO
DIB[#:0] DOB[#:0]
DOB[17:0]
Port B B[17:16]
DIB[31:16] B[15:0] B[17:0] B
UG331_c13_03_081106
The Spartan-3A DSP platform offers enhanced routing and features that avoid conflicts
between block RAM and multiplier routing.
AREG
(Optional)
CEA CE
A[17:0] D Q
PREG
(Optional)
RST
CEP CE
RSTA X D Q P[35:0]
BREG
(Optional)
RST
CEB CE
B[17:0] D Q RSTP
RST
RSTB
DS312-2_27_021205
CLK
Timing Specification
Multiplier performance can be enhanced by limiting the number of bits or putting critical
signals on the LSBs, or by pipelining. When pipelining, the registers boost the multiplier
clock rate, beneficial for higher performance applications.
The result is generated faster for the LSBs than the MSBs, since the MSBs require more
levels of addition, so timing specifications are different for each of the 36 multiplier
outputs. Designs should use only as many output bits as are necessary. For example, if two
unsigned numbers will never have a product of 235 or higher, the P[35] output is always
zero. For any pair of signed numbers of n bits, if you will never have -2n-1 x -2n-1, then the
MSB is always identical to the next lower-order bit (P[2n-1] = P[2n-2]). Also consider that if
some outputs must have longer routing delays, they should be put on the output LSBs to
balance with the MSB delays.
For the same reason, the data input setup time for the registered output multiplier is
shorter for the MSBs than the LSBs, but the timing parameters do not differentiate between
pins for setup time. For additional safety margin in a design, slower inputs should be put
on the MSBs.
Expanding Multipliers
Multiplication using inputs with more than 18 bits is possible by decomposing the
multiplication process into smaller subprocesses. The binary representation of either input
can be split at any point, provided the proper weighting and sign of the MSBs is taken into
account. Splitting off the 18 MSBs of the input makes the best use of the 18-bit signed
multipliers.
Cascading Multipliers
The Spartan-3E/3A/3AN FPGA MULT18X18SIO primitive has two additional ports
called BCIN and BCOUT to cascade or share the multiplier’s B input among several
multiplier blocks. The 18-bit BCIN “cascade” input port offers an alternate input source
from the more typical B input. The B_INPUT attribute specifies whether the specific
implementation uses the BCIN or B input path. Setting B_INPUT to DIRECT chooses the B
input. Setting B_INPUT to CASCADE selects the alternate BCIN input. The BREG register
then optionally holds the selected input value, if required.
BCOUT is an 18-bit output port that always reflects the value applied to the multiplier’s
second input. This value is the B input, the cascaded value from the BCIN input, or the
output of the BREG, if it is inserted.
Figure 11-4 illustrates the four possible configurations using different settings for the
B_INPUT attribute and the BREG attribute.
BCOUT[17:0] BCOUT[17:0]
BREG
CEB CE
X X
D Q
CLK BREG = 0
RST B_INPUT = CASCADE
BREG = 1
RSTB B_INPUT = CASCADE
BCIN[17:0] BCIN[17:0]
BCOUT[17:0]
BCOUT[17:0]
BREG
X
CEB CE
X
B[17:0]
B[17:0] D Q
BREG = 0
CLK B_INPUT = DIRECT
RST
BREG = 1
RSTB B_INPUT = DIRECT DS312-2_29_021505
The BCIN and BCOUT ports have associated dedicated routing that connects adjacent
multipliers within the same column. Via the cascade connection, the BCOUT port of one
multiplier block drives the BCIN port of the multiplier block directly above it. There is no
connection to the BCIN port of the bottom-most multiplier block in a column or a
Expanding Multipliers
connection from the BCOUT port of the top-most block in a column. As an example,
Figure 11-5 shows the multiplier cascade capability for a column of multipliers four blocks
tall. For clarity, the figure omits the register control inputs.
BCOUT
A
P
B
BCOUT
A
P
B
BCOUT
A
P
B
BCOUT
A
P
B
DS312-2_30_021505
When using the BREG register, the cascade connection forms a shift register structure
typically used in DSP algorithms such as direct-form FIR filters. When the BREG register is
omitted, the cascade structure essentially feeds the same input value to more than one
multiplier. This parallel connection serves to create wide-input multipliers and implement
transpose FIR filters. It is used in any application requiring several multipliers to have the
same input value.
Examples
For example, Figure 11-6 shows how a 22x16 multiplier could be implemented. The 22-bit
value is decomposed into an 18-bit signed value and a 4-bit unsigned value from the LSBs.
Two partial products are formed. The first is a 20-bit signed product, which is the result of
multiplying the 16-bit signed value by the 4-bit unsigned section. The second is a 34-bit
signed product, formed by multiplying the 16-bit signed value by the 18-bit signed section.
The addition process restores the weighting of the products (note the least significant bits
of the first product bypass the addition) and forms the final 38-bit product. Since the first
product is signed, the 20-bit value needs to be sign-extended before addition. The adder
itself only needs to be 34 bits, requiring 17 slices.
18
22 34
A
16 34
MULT18X18SIO +
16
B 16
16 38
P
20
Unsigned 4 4
UG331_c13_07_081106
The implementation can vary depending on the performance needs and available
resources. The second multiplier can be implemented in the MULT18X18SIO resource or in
CLBs if it is small. Pipelining can be added to improve performance, using the built-in
capabilities of the dedicated multipliers. If both inputs are greater than 18 bits, then four
partial products are formed, but the purely unsigned result from the LSBs simply can be
concatenated with the 36-bit signed product of the MSBs and added to the other two
results.
Figure 11-7 represents the cascaded scheme used to implement a 35-bit by 35-bit signed
multiplier utilizing four embedded multipliers and two adders.
The fixed adder is 53 bits wide (17 LSBs are always 0 on one input).
The 34-bit by 34-bit unsigned submodule is constructed in a similar manner with the most
significant bit on each operand being tied to logic Low.
A[34:17] A
36 36 36
P [69:34]
B[34:17] B
MULT18X18SIO
[33:0]
0, A[16:0] A
34 34 34
P
0, B[16:0] B
MULT18X18SIO 70 70
+
A[34:17] A
36 36 MSB 69
P
0, B[16:0] B • •
• •
MULT18X18SIO MSB 53
36 36
+ [52:17]
0 16
0, A[16:0] A • •
36 36 • •
P 0 0
B[34:17] B
MULT18X18SIO
UG331_c13_08_081106
Figure 11-8 represents the MULT18X18SIO connections for calculating the square of both a
6-bit signed number and a 5-bit unsigned number.
A_6S [17:12]
x00 [11:5] A
A_5U [4:0]
[35:24] P_6S
P [23:10] NC
[9:0] P_5U
B_6S [17:12]
x00 [11:5] B
B_5U [4:0]
MULT18X18SIO
UG331_c13_11_081106
Design Entry
There are many options for including the Spartan-3 generation multiplier in a design. The
library primitive MULT18X18SIO described earlier can be instantiated in the schematic or
HDL code. Synthesis tools can infer a multiplier block from the multiply operator,
including Xilinx XST, Synplicity Synplify, and Mentor Precision. They will infer the register
when the operation is controlled by a clock for a synchronous multiplier.
Mentor synthesis features a pipeline multiplier that involves putting levels of registers in
the logic to introduce parallelism and, as a result, use CLB resources instead of the
dedicated multipliers. A certain construct in the input RTL source code description is
required to allow the pipelined multiplier feature to take effect. See the Synthesis and
Simulation Design Guide for more information.
The following VHDL example will infer the MULT18X18SIO with the PREG output
register:
library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_arith.all;
use ieee.std_logic_unsigned.all;
entity mult18x18sio is
port ( a : in std_logic_vector(7 downto 0);
b : in std_logic_vector(7 downto 0);
Design Entry
clk : in std_logic;
prod : out std_logic_vector(15 downto 0));
end mult18x18sio;
architecture arch_mult18x18sio of
mult18x18sio is
begin
process(clk) is begin
if clk’event and clk = ’1’ then
prod <= a*b;
end if;
end process;
end arch_mult18x18sio;
The following is a Synchronous Multiplier VHDL example coded for Mentor:
library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_arith.all;
use ieee.std_logic_unsigned.all;
entity mult18x18sio is
port( clk: in std_logic;
a: in std_logic_vector(7 downto 0);
b: in std_logic_vector(7 downto 0);
prod: out std_logic_vector(15 downto 0));
end mult18x18sio;
architecture arch_mult18x18sio of
mult18x18sio is
signal reg_prod : std_logic_vector(15 downto 0);
begin
process(clk)
begin
if(rising_edge(clk))then
reg_prod <= a * b;
prod <= reg_prod;
end if;
end process;
end arch_mult18x18sio;
The following is a Synchronous Multiplier Verilog example coded for Synplify and XST:
module mult18x18sio(a,b,clk,prod);
input [7:0] a;
input [7:0] b;
input clk;
output [15:0] prod;
reg [15:0] prod;
always @(posedge clk) prod <= a*b;
endmodule
The following is a Synchronous Multiplier Verilog example coded for Mentor:
module mult18x18sio (a,b,clk,prod);
input [7:0] a;
input [7:0] b;
input clk;
output [15:0] prod;
reg [15:0] reg_prod, prod;
always @(posedge clk) begin
reg_prod <= a*b;
prod <= reg_prod;
endmodule
MULT_STYLE Constraint
The MULT_STYLE constraint controls the implementation of the MULT18X18SIO
primitives. In the Project Navigator, the default is that the Xilinx Synthesis Tool (XST) will
select the best type of implementation. To ensure that the embedded multipliers are used,
set MULT_STYLE = Block or select "Block" for the "Multiplier Style" property in the Project
Navigator. The MULT_STYLE constraint can also be applied globally at the XST command
line. For the MULT18X18SIO, the MULT_STYLE constraint is attached to the component,
not the output bus. See the Constraints Guide for more information.
A O
A_SIGNED Q
B
LOADB LOAD_DONE
SWAPB
RFD
ND RDY
CE
CLK
ACLR
SCLR
X467_06_032403
The CORE Generator system uses the embedded multiplier for the default Parallel
multiplier type. The Multiplier Construction XCO parameter option or the c_mult_type
Generic option gives the user the choice to implement the function in look-up tables
instead.
Figure 11-10 shows the timing diagram for the Multiplier Generator.
CLK
SCLR
ND
new multiplier inputs A(n) & B(n)
A&B XXX XXX XXX XXX A0 A1 An An A
n+1
Input
interval depends on multiplier latency
RDY
new multiplier output
DOUT = 0 (SCLR was 1)
DOUT D0 Dn Dn Dn+1
System Generator
The Multiplier Generator is used by the System Generator for DSP when the MULT block
is used. System Generator presents a high level and abstract view of the design, but also
exposes key features in the underlying silicon, making it possible to build extremely high-
performance FPGA implementations. The System Generator also provides blocks for
compiling MATLAB M-code into synthesizable HDL code. The System Generator uses the
embedded multiplier when a parallel multiplier is selected.
MAC Cores
The CORE Generator system and the System Generator can also implement more complex
functions using the multiplier as a building block. The Multiply Accumulator (MAC) core
supports up to 32-bit inputs and optional user-defined pipelining. The options of an
Embedded or LUT based implementation control whether the dedicated multipliers or
CLB resources are used for the function. The MAC implementation uses relatively few CLB
resources beyond the dedicated multipliers and provides flexibility that is key to matching
a design to the lowest density and lowest cost solution possible.
The MAC and MAC-based FIR filters include an automatic pipeline control which is based
on required system clock performance. Levels of pipeline will automatically be inserted
based on the design requirement for a perfect speed/area trade-off.
Notes:
1. The control signals CLK, CEA, RSTA, CEB, RSTB, CEP, and RSTP have the option of inverted polarity.
Data Flow
Each embedded multiplier block (MULT18X18SIO primitive) supports two independent
dynamic data input ports: 18-bit signed or 17-bit unsigned. The two inputs are referred to
as the multiplicand and the multiplier, or the factors, while the output is the product.
Multipliers with inputs less than 18 bits are implemented by sign-extending the inputs
(i.e., replicating the most-significant bit). Wider multiplication operations are performed
by combining the dedicated multipliers and slice-based logic in any viable combination or
by time-sharing a single multiplier.
Unsigned multiplication is performed by restricting the inputs to the positive range. The
most-significant bit is tied Low and the unsigned value is represented in the remaining 17
less-significant bits.
IOBs
CLB
Block RAM
Multiplier
DCM
IOBs
OBs
DCM
Block RAM / Multiplier
CLBs
IOBs
IOBs
DCM
IOBs
DS312-1_01_032606
Notes:
1. The XC3S700A/AN and XC3S1400A/AN have two additional DCMs on both the left and right sides as indicated by
the dashed lines. The XC3S50A/AN has only two DCMs at the top and only one Block RAM/Multiplier column.
Shifter
A multiplier can be used as a shifter. One operand is routed to the output, shifted by n
positions, if the other operand is a power of two (2n). Since the sign-bit (MSB) cannot be
used to control the shift, the 18x18 two’s-complement multiplier can shift by 0 to 16
positions.
Of the 36 output lines, those less significant than the shifted data lines are automatically
filled with zeros; those more significant than the shifted data are filled with zeros or ones,
depending on the state of the MSB input. This is the natural result of the two’s-complement
multiplication.
The user can either perform a logic shift of 17 input bits by holding the MSB input Low, or
perform an arithmetic shift of an 18-bit two’s-complement number, effectively sign-
extending the MSB.
A conventional CLB-based shifter would use an array of n multiplexers, each with n
inputs, and require a large amount of routing resources. Multiplier-based shifters larger
than 18 bits, and barrel shifters of any length, require external OR gating of the outputs, but
use far fewer CLB resources.
Magnitude Return
To generate the absolute value of a number by using multiplication, multiply by 1 if it is
positive (MSB is zero), and multiply by -1 if it is negative (MSB is one). In two’s-
complement notation, 1 is all zeros ending in a one as the LSB, and -1 is all ones, including
the LSB. Therefore, a magnitude return or absolute value generator can be implemented by
multiplying by a value with a one as the LSB and the MSB of the input value in all the other
bit positions. Figure 11-12 shows a magnitude return generator.
• 17
16
•
• A
•
1
0
16
•
P •
•
0
• 17
• 16
• B
•
•
1
1 0
X467_12_032503
Two’s-Complement Return
Generating the two’s complement of a number typically requires only one LUT per bit
with the carry logic used for larger numbers. However, if LUTs are heavily used, the
multiplier can be used to return the two’s complement of the input. Multiplying an input
number by an equivalent length number of all ones generates the two’s complement of the
number over the same length of the output bits. Any extraneous higher-order bits are
ignored. Figure 11-13 shows a two’s complement return generator.
10
•
• A 35
•
•
0
P •
• NC
11
10
1 10 •
P •
•
• B 0
•
•
1 0
X467_13_032503
Complex Multiplication
Complex multiplication is multiplication of complex numbers, which contain real and
imaginary components with the imaginary unit i equal to the square root of -1. Complex
multiplication can be carried out using only three real multiplications: ac, bd, and
(a + b)(c + d). The real part of (a + ib)(c + id) is ac - bd, and the imaginary part is
(a + b)(c + d) - ac - bd. The large number of multipliers in the Spartan-3 generation
architecture makes it convenient to do even complex multiplication.
Floating-Point Multiplication
Floating-point values add an exponent to the number and sign bit used in binary
multiplication. A 32-bit floating-point multiplier can be implemented using four of the
dedicated multiplier blocks and CLB resources. Such multipliers are available from Xilinx
AllianceCORE partners.
Conclusion
FPGAs have a significant advantage over general-purpose DSP chips because their logic
can be customized for the specific application. Some functions can run over 100 times
faster and require much less expense in an FPGA. A key feature to take advantage of is the
dedicated multiplier block. Take advantage of the automatic optimization of
multiplication logic, and the user controls when necessary to get the exact results desired.
The CORE Generator system can create simple multipliers or combine them into more
complex functions such as MACs.
Chapter 12
Using Interconnect
Interconnect is the programmable network of signal pathways between the inputs and
outputs of functional elements within the FPGA, such as IOBs, CLBs, DCMs, and block
RAM.
Overview
Interconnect, also called routing, is segmented for optimal connectivity. There are four
kinds of interconnect: long lines, hex lines, double lines, and direct lines. The Xilinx ISE®
Place and Route (PAR) software exploits the rich interconnect array to deliver optimal
system performance and the fastest compile times. Knowledge of the interconnect details
can help guide design techniques but is typically not necessary to efficient FPGA design.
Some types of global interconnect are controlled by the design. These include the clock
routing, selected via the use of global clock buffers, and discussed in more detail in
Chapter 2, “Using Global Clock Resources.” Two other global signals, GTS (Global Three-
State) and GSR (Global Set/Reset), are selected via the use of the STARTUP component,
which is described at the end of this chapter.
Switch Matrix
The switch matrix connects to the different kinds of interconnects across the device. An
interconnect tile, shown in Figure 12-1, is defined as a single switch matrix connected to a
functional element, such as a CLB, IOB, or DCM. If a functional element spans across
multiple switch matrices such as the block RAM or multipliers, then an interconnect tile is
defined by the number of switch matrices connected to that functional element. A device
can be represented as an array of interconnect tiles where interconnect resources are for the
channel between any two adjacent interconnect tile rows or columns as shown in
Figure 12-2.
Switch
Switch
CLB Matrix
Matrix
Switch
Matrix
Switch 18Kb MULT
IOB
Matrix Block 18 x 18
Switch RAM
Matrix
Switch
DCM
Matrix Switch
Matrix
DS312_08_020905
Figure 12-1: Four Types of Interconnect Tiles (CLBs, IOBs, DCMs, and Block
RAM/Multiplier)
DS312_09_020905
Switch Matrix
Horizontal and 24
Vertical Long Lines
(horizontal channel
shown as an example)
CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB
6 6 6 6 6
DS312-2_10_071906
Horizontal and 8
Vertical Hex Lines
(horizontal channel
shown as an example) CLB CLB CLB CLB CLB CLB CLB
DS312-2_11_020905
Horizontal and 8
Vertical Double Lines
(horizontal channel
shown as an example) CLB CLB CLB
DS312-2_15_022305
Direct Connections
CLB CLB CLB
DS312-2_12_020905
Long Lines
Each set of 24 long line signals spans the die both horizontally and vertically and connects
to one out of every six interconnect tiles. At any tile, four of the long lines drive or receive
signals from a switch matrix. Because of their low capacitance, these lines are well-suited
for carrying high-frequency signals with minimal loading effects (e.g. skew). If all global
clock lines are already committed and additional clock signals remain to be assigned, long
lines serve as a good alternative.
Hex Lines
Each set of eight hex lines are connected to one out of every three tiles, both horizontally
and vertically. Thirty-two hex lines are available between any given interconnect tile. Hex
lines are only driven from one end of the route.
Double Lines
Each set of eight double lines are connected to every other tile, both horizontally and
vertically. in all four directions. Thirty-two double lines available between any given
interconnect tile. Double lines are more connections and more flexibility, compared to long
line and hex lines.
Direct Connections
Direct connect lines route signals to neighboring tiles: vertically, horizontally, and
diagonally. These lines most often drive a signal from a "source" tile to a double, hex, or
long line and conversely from the longer interconnect back to a direct line accessing a
"destination" tile.
Global Controls
Global Controls
In addition to the general-purpose interconnect, Spartan-3 generation FPGAs have two
global logic control signals, as described in Table 12-1.
The Global Set/Reset (GSR) signal replaces the global reset signal included in many ASIC-
style designs. Use the GSR control instead of a separate global reset signal in the design to
free up CLB inputs, resulting in a smaller, more efficient design. However, the GSR signal
always re-initializes every flip-flop. The GSR signal is asserted automatically during the
FPGA configuration process, guaranteeing that the FPGA starts-up in a known state.
STARTUP_SPARTAN3 Primitives
The GSR and GTS signal sources are defined and connected using a special primitive for
each family: STARTUP_SPARTAN3, STARTUP_SPARTAN3E, or STARTUP_SPARTAN3A
(used for Spartan-3A, Spartan-3AN, and Spartan-3A DSP FPGAs). GSR and GTS are active
during configuration, and connecting signals to them on the STARTUP primitive defines
how they are controlled after configuration. By default, they are disabled on a selected
clock cycle of the start-up phase, enabling the flip-flops and I/Os in the device. The
primitives also include one or two other signals used specifically during configuration.
Each family has a CLK input that is an alternate clock for the start-up process (see the
“Sequence of Events” chapter in UG332, Spartan-3 Generation Configuration User Guide). The
Spartan-3E family has an additional input MBT for the MultiBoot Trigger (see the
“Reconfiguration and MultiBoot” chapter in UG332, Spartan-3 Generation Configuration User
Guide).
Summary
The flexible interconnect resources of the Spartan-3 generation FPGA families allow
efficient implementation of almost any configuration of the logic and I/O resources. The
Xilinx ISE software automatically places and routes designs to take best advantage of these
resources. Customers can control the usage of the global clock signals by the use of global
clock buffers. The global set/reset and global three-state signals are controlled by the use
of the STARTUP component.
“Using IP Cores”
Chapter :
Chapter 13
Introduction
Combined with the Spartan-3 generation FPGA family, the ISE optimized design tools
help you finish faster and lower your project costs. The ISE package is a collection of Xilinx
software design tools that concentrate on delivering the most productivity available for
your Spartan-3 generation logic performance. With ProActive Timing Closure technology,
you get the fastest runtimes in programmable logic ensuring you reach your performance
goals quicker. Incremental Design delivers faster re-compile times with guaranteed
performance, and the optional Xilinx ChipScope™ Pro verification tools provide real-time
debug with advantages that are not possible in ASIC designs. The ISE development system
makes sure you get through the logic design process faster, saving both time and project
costs, and getting you to market ahead of your competition.
Design Flow
The standard design flow for Spartan-3 generation FPGAs consists of the following three
major steps. The entire design implementation flow is run simply by selecting the desired
result in the Xilinx Graphical User Interface (GUI). The tools automatically determine
which programs and files are needed to bring the appropriate output up to date.
1. Design Entry and Synthesis
In this step of the design flow, you create your design using a Xilinx-supported
schematic editor, a Hardware Description Language (HDL) for text-based entry, or
both. If you use an HDL for text-based entry, you must synthesize the HDL file into an
industry-standard Electronic Data Interchange Format (EDIF) file. If you use the Xilinx
Synthesis Technology (XST) tool, a Xilinx-specific NGC netlist file is created, which can
be converted to an EDIF file.
2. Design Implementation
By implementing the specific Xilinx Spartan-3 generation architecture, you convert the
logical design file format, such as EDIF, that you created in the design entry or
synthesis stage into a physical file format. The physical information is contained in the
Native Circuit Description (NCD) file. Then you create a bitstream file from these files
and optionally program a PROM for subsequent programming of your
Spartan-3 generation device.
3. Design Verification
Using a gate-level simulator, you ensure that your design meets your timing
requirements and functions properly. In-circuit verification can be performed by
downloading your design to the device using Xilinx iMPACT Programming Software.
Design verification can begin immediately after design entry and can be repeated after
various steps of design implementation.
Figure 13-1 shows the general overall design flow for Spartan-3 generation FPGAs.
Design Flow
Design
Design Entry Verification
Functional
Simulation
Design
Synthesis
Optimization
FPGAs
- Mapping
- Placement
- Routing
Back Timing
Annotation Simulation
Bitstream
Generation
Download to a In-Circuit
Xilinx Device Verification
x473_01_062103
Schematic Synthesis
Libraries Libraries
HDL
NGDBuild
x473_02_061703
Hierarchical Design
Design hierarchy is important in both schematic and HDL entry for the following reasons:
• Helps you conceptualize your design
• Adds structure to your design
• Promotes easier design debugging
• Makes it easier to combine different design entry methods (schematic, HDL, or state
editor) for different parts of your design
• Makes it easier to design incrementally, which consists of designing, implementing,
and verifying individual parts of a design in stages
• Reduces optimization time
• Facilitates concurrent design, which is the process of dividing a design among a
number of people who develop different parts of the design in parallel, such as in
Modular Design
Xilinx strongly recommends that you name the components and nets in your design. These
names are preserved and used by the Xilinx tools. These names are also used for back-
annotation and appear in the debug and analysis tools. If you do not name your
components and nets, the tools automatically generate the names, making it difficult to
analyze circuits.
Design Flow
Schematic Entry
Schematic tools provide a graphical interface for design entry. You can use these tools to
connect symbols representing the logic components in your design. You can build your
design with individual gates, or you can combine gates to create functional blocks.
Primitives and macros are the “building blocks” of a device library. The Xilinx
Spartan-3 generation libraries provide primitives as well as common high-level macro
functions, all optimized for the Spartan-3 generation architecture. Primitives are basic
circuit elements, such as AND and OR gates, and special device resources, such as the
DCM and block RAM. Each primitive has a unique library name, symbol, and description.
Macros contain multiple library elements, which can include primitives and other macros.
Soft macros have pre-defined functionalities, but have flexible mapping, placement, and
routing. Relationally Placed Macros (RPMs) have fixed mapping and relative placement.
Macros are not available for synthesis because synthesis tools have their own module
generators and do not require RPMs. If you wish to override the module generation, you
can instantiate Xilinx-provided CORE Generator™ modules, which include pre-built
optimization for the Spartan-3 generation architecture. For most leading-edge synthesis
tools, this is not needed unless it is for a module that cannot be inferred.
Constraints
You might want to constrain your design within certain timing or placement parameters to
specify your required pin locations or timing requirements. You can specify logic mapping,
block placement, and timing specifications. Constraints can be entered as parameters or
attributes on library components. You can enter constraints by hand or use one of several
graphical tools for generating constraint files and evaluating the results. Constraints found
in the design are written to an NCF file (Netlist Constraints File). Constraints created
separately are written to a UCF file (User Constraints File).
Design Implementation
Design Implementation begins with the translating and then mapping of a logical design
file to a specific Spartan-3 generation device. It is complete when the physical design is
successfully routed and a bitstream is generated. You can alter constraints during
implementation in the same way as during the Design Entry step.
Figure 13-3 shows an overall view of the design implementation flow for
Spartan-3 generation FPGAs.
Design Flow
UCF NGDBuild
Floorplanner
MAP
TRACE &
Timing Analyzer PAR
NCD
BitGen
BIT
iMPACT
X473_03_082206
Translating
NGDBuild performs all the steps necessary to read a netlist file in EDIF or NGC format and
create an NGD file describing the logical design. A logical design is in terms of logic
elements, such as AND gates, OR gates, decoders, flip-flops, and RAMs. The NGD file
resulting from an NGDBuild run contains both a logical description of the design reduced
to Xilinx primitives and a description in terms of the original hierarchy expressed in the
input netlist. The output NGD file then can be mapped to the Spartan-3 generation
resources.
NGDBuild performs the following steps to convert a netlist to an NGD file:
1. Reads the source netlist(s). NGDBuild invokes the Netlister Launcher. The Netlist
Launcher determines the type of the input netlist and starts the appropriate netlist
reader program. The netlist readers incorporate NCF files associated with each netlist.
NCF files contain timing and layout constraints for each module.
Mapping
The MAP program maps a logical design to a Spartan-3 generation FPGA. The input to
MAP is an NGD file, which contains a logical description of the design in terms of both the
hierarchical components used to develop the design and the lower-level Xilinx primitives.
Additionally, it contains any number of hard placed-and-routed physical macro files. MAP
then maps the logic to the components (logic cells, I/O cells, and other components) in the
Spartan-3 generation architecture. The output design is a Native Circuit Description
(NCD) file, which is a physical representation of the design mapped to the components in
the Spartan-3 generation architecture. The NCD file then can be placed and routed.
MAP performs the following steps when mapping a design:
1. Selects the target Xilinx device, package, and speed.
2. Reads the information in the input design file.
3. Performs a Logical DRC (Design Rule Check) on the input design. If any DRC errors
are detected, the MAP run is aborted. If any DRC warnings are detected, the warnings
are reported, but MAP continues to run.
4. Removes unused logic, where all unused components and nets are removed.
5. Maps pads and their associated logic into IOBs.
6. Maps the logic into Xilinx components (IOBs, CLBs, etc.). If any Xilinx mapping
control symbols appear in the design hierarchy of the input file, MAP uses the existing
mapping of these components in preference to re-mapping them. The mapping is
influenced by various constraints.
7. Updates the information received from the input NGD file and writes this updated
information into an NGM file. This NGM file contains both logical information about
the design and physical information about how the design was mapped. The NGM file
is used only for back-annotation.
8. Creates a physical constraints (PCF) file. This text file contains any constraints
specified during design entry. If no constraints were specified during design entry, an
empty file is created so that you can enter constraints directly into the file using a text
editor.
9. Runs a physical Design Rule Check (DRC) on the mapped design. If DRC errors are
found, MAP does not write an NCD file.
10. Creates an NCD file, which represents the physical design. The NCD file describes the
design in terms of Xilinx components (CLBs, IOBs, and so forth).
11. Writes a MAP report (MRP) file, which lists any errors or warnings found in the
design, details how the design was mapped, and supplies statistics about component
usage in the mapped design.
Design Flow
Placing
The PAR placer executes multiple phases of the placer. PAR writes the NCD after all the
phases are completed. During placement, PAR places components into sites based on
factors such as constraints specified in the PCF file, the length of connections, and the
available routing resources. Timing-driven placement is automatically invoked if PAR
finds timing constraints in the physical constraints file.
Routing
The next stage is routing the placed design. PAR writes the NCD file when the design is
fully routed. At this point the design can be analyzed against timing. A new NCD is
written as the routing improves. The router performs a procedure to converge on a
solution that routes the design to completion and meets timing constraints. Timing-driven
routing is automatically invoked if PAR finds timing constraints in the physical constraints
file.
Floorplanning
Floorplanning is the process of specifying user placement constraints. The PlanAhead tool
provides a graphical view of placement, while the FPGA Editor provides a graphical view
of both placement and routing. Both tools can be used before or after PAR to analyze or
constrain the design.
Bitstream Generation
After the design has been completely routed, it is necessary to configure the device so that
it can execute the desired function. This configuration is done using files generated by
BitGen, the Xilinx bitstream generation program. BitGen takes a fully routed NCD file as
its input and produces a configuration bitstream (binary BIT file).
The BIT file contains all of the configuration information from the NCD file defining the
internal logic and interconnections of the Spartan-3 generation FPGA, plus device-specific
information from other files associated with the target device. The binary data in the BIT
file then can be downloaded into the FPGA memory cells or it can be used to create a
PROM file.
Design Verification
Design verification is the process of testing the functionality and performance of your
design. You can verify Xilinx designs in the following ways:
Simulation
Input Stimulus
Integrated Tool
Simulation Design Entry
Functional Simulator Paths
Translate to
Simulation Netlist NGD
Simulator Format
Static Timing
Timing Simulation Path
BitGen
In-Circuit Verification
Back-Annotation
NGA
Xilinx FPGA
x473_04_062103
Simulation
Design simulation involves testing your design using software models. It is most effective
when testing the functionality of your design and its performance under worst-case
conditions. You can easily probe internal nodes to check your circuit’s behavior, and then
use these results to make changes in your design. Simulation is performed using Xilinx or
third-party tools that are linked to the Xilinx Development System. The software models
provided for your simulation tools are designed to perform detailed characterization of
your design. You can perform functional or timing simulation.
Design Flow
Functional Simulation
Functional simulation determines if the logic in your design is correct before you
implement it in a device. Functional simulation can take place at the earliest stages of the
design flow. Because timing information for the implemented design is not available at this
stage, the simulator tests the logic in the design using unit delays.
Timing Simulation
Timing simulation verifies that your design runs at the desired speed for your device
under worst-case conditions. This process is performed after your design is mapped,
placed, and routed. At this time, all design delays are known. Timing simulation is
valuable because it can verify timing relationships and determine the critical paths for the
design under worst-case conditions. It also can determine whether or not the design
contains setup or hold violations. Before you can simulate your design, you must go
through the back-annotation process, as described below. During this process, the Xilinx
netlist writers create suitable formats for various simulators.
Note that naming the nets during your design entry is important for both functional and
timing simulation because it allows you to find the nets in the simulations more easily than
looking for a software-generated name.
Back-Annotation
Before timing simulation can occur, the physical design information must be translated
and distributed back to the logical design. This back-annotation process is done with a
program called NGDAnno. These programs create a database for the netlist writers, which
translate the back-annotated information into a netlist format that can be used for timing
simulation.
NGDAnno is a command line program that distributes information about delays, setup
and hold times, clock to out, and pulse widths found in the physical NCD design file back
to the logical NGD file. NGDAnno reads an NCD file as input. The NCD file can be a
mapped-only design, or a partial or fully placed and routed design. An NGM file, created
by MAP, is an optional source of input. NGDAnno merges mapping information from the
NGM file with placement, routing, and timing information from the NCD file. NGDAnno
outputs a Native Generic Annotated (NGA) file, which is a back-annotated NGD file. This
file is input to the appropriate netlist writer, which converts the binary Xilinx database
format back to an ASCII netlist.
Netlist Writers (NGD2EDIF, NGD2VER, or NGD2VHDL) take the output of NGDAnno
and create a simulation netlist in the specified format. An NGD or NGA file is input to each
of the netlist writers. The NGD file is a logical design file containing primitive components,
while the NGA file is a back-annotated logical design file.
You can run static timing analysis using the Timing Reporter And Circuit Evaluator
(TRACE) program, which is accessible through the Timing Analyzer GUI. Use either tool
to evaluate how well the place and route tools met the input timing constraints.
In-Circuit Verification
As a final test, you can verify how your design performs in the target application. In-circuit
verification tests the circuit under typical operating conditions. Because you can program
your Xilinx devices repeatedly, you can easily load different iterations of your design into
your device and test it in-circuit. To verify your design in-circuit, download your design
bitstream into a device using the iMPACT programming software with the Parallel Cable
IV or Platform Cable USB.
Design Entry
• HDL Editor
• Schematic Editor - Engineering Capture System (ECS)
• CORE Generator system
Synthesis
• XST - Xilinx Synthesis Technology
• Integration with Precision synthesis from Mentor Graphics
• Integration with Synplify/Pro and Amplify synthesis from Synplicity
Simulation
• ISE Simulator
• Integration with ModelSim Simulator from Model Technology
Implementation
• Translate
• Map
• Place and Route (PAR)
• Floorplanner
• FPGA Editor
• Timing Analyzer
• XPower Power Analysis
Device Download
• BitGen Bitstream Generator
• iMPACT Configuration Tool
• ChipScope Pro Logic Analyzer
ISE Versions
The ISE development systems are available in the following configurations.
• ISE WebPACK™ Tool
The ISE WebPACK tool is the easiest development system to get. This free tool is
downloadable from the Xilinx website at: (http://www.xilinx.com/webpack).
ISE WebPACK software combines support for advanced HDL entry, synthesis, and
verification capabilities for all Xilinx CPLDs and lower-density FPGAs. All
Spartan-3 generation FPGA families are supported. The original Spartan-3 family is
supported up to the XC3S1500 density, while all densities are supported for the
Spartan-3E and Spartan-3A/3AN families.
• ISE Foundation™ Tool
The ISE Foundation tool is a complete, ready-to-use design environment that
integrates schematic, synthesis, and verification technologies into an intuitive, yet
highly advanced design solution. The tool has full device support as well as the full
suite of tools. See more information at
http://www.xilinx.com/ise/logic_design_prod/foundation.htm
To see a table comparison of these versions, see the Development Systems Overview at
http://www.xilinx.com/ise/devsys_feature_guide.pdf.
Development system updates are provided on a regular basis. These are available as
Service Packs that can be downloaded from the Xilinx website
(http://www.xilinx.com/support/download/index.htm). Always use the latest
development system update for the best results.
Project Navigator
Project Navigator is the primary user interface for the Xilinx ISE tools. You can create,
define, and compile your Spartan-3 generation design using a suite of tools accessible from
Project Navigator. Each step of the design process, from design entry to downloading the
design to the device, is managed from Project Navigator as part of a project. These include:
• Design Entry
• Constraint Entry
• Synthesis
• Simulation
• Implementation
• Device Programming
X473_05_082206
Project
The ISE development system organizes and tracks your design as a project. A project is a
collection of all files necessary to create and download your design to the selected device.
The following information is required for each project:
• A unique project name
• A specified target device family (architecture)
• A specified target device
• A specified design flow
Each project has a directory, device family, device, and design flow associated with it as
project properties. The project properties enable Project Navigator to display and run only
those processes appropriate for the targeted device and design flow.
Sources
A source is any element that contains information about a design. In Project Navigator, you
can create and add sources to your project. Each project can contain many sources, each
one representing a different part of the overall design. Sources can include the description
of circuits (as represented by schematics and hardware description language files), state
diagrams, simulation models, test files, and documentation of the design.
Source Hierarchy
One source file in a project is the top-level source for the design. The top-level source
defines the inputs and outputs to be mapped into the device, and references the logic
descriptions contained in lower-level sources in a hierarchical design. A project must
contain at least one source as the top-level source. All source files and their accompanying
icons are displayed in the Sources in Project window below the project file.
The term instantiation describes when one source references another. Lower-level sources
also can instantiate sources to build as many levels of logic hierarchy as necessary to
describe your design.
Valid top-level source types include the following:
• Schematics
• HDL files (VHDL or Verilog)
• EDIF
For more information on the Project Navigator, see
http://www.xilinx.com/products/design_tools/logic_design/design_entry/projnav.htm.
ISE Tools
The ISE development system includes a number of individual tools and capabilities that
can be accessed standalone or within the Project Navigator.
HDL Editor
The HDL Editor is a text editor designed especially for editing HDL source files. In
addition to regular editing features, the editor provides syntax coloring. The syntax-
coloring feature supports both VHDL and Verilog. The HDL Editor operates as a standard
text editor as well. The ISE HDL Editor provides optimized, ready-to-use language and
synthesis templates for easy insertion into an HDL source file.
HDL Advisor
The HDL Advisor gives advisory messages in the XST synthesis report files. The messages
are designed to make suggestions on how code can be changed to reduce design size and
meet timing requirements. These HDL advisors allow designers to produce better code
earlier, reducing design time, and resulting in better space utilization in the
Spartan-3 generation FPGA.
Partner Tools
The Xilinx tools provide easy integration with third-party tools, including Precision
synthesis from Mentor Graphics and Synplify/Pro and Amplify synthesis from Synplicity.
These tools can be purchased separately from the vendor.
ModelSim simulators from Model Technology can provide the simulation functions for an
ISE development system. ModelSim Xilinx Edition III (MXE-III) is available as an option
from Xilinx. It offers a complete PC HDL simulation environment that enables you to
verify the HDL source code as well as the functional and timing models of your designs.
Clocking Wizard
To reduce the complexities of new device technologies like Digital Clock Managers (DCM),
ISE tools include Architecture Wizards, allowing users access through an intuitive easy-to-
use dialog. Through the use of the ISE Architecture Wizards, designers can access these
leading edge technologies quickly by creating the component through a push-button flow
rather than learning all the attributes in HDL. Then the component simply can be
instantiated in the user’s design by copying the instantiation template created by the
Architecture Wizard. The Clocking Wizard supports all the capabilities of the
Spartan-3 generation DCMs.
Data2MEM Tool
Data2MEM is fundamentally a data translation tool. It translates contiguous fragments of
data into the proper initialization records for Block RAMs. It automates distribution of that
data across multiple physical Block RAMs that constitute a contiguous logical data space.
Data2MEM is also a simplified means for initializing block RAMs.
Incremental Design
Incremental Design gets your overall design to market faster by minimizing the impact
from late-arriving design changes. The Incremental Design flow facilitates more debug
cycles in a day when making small design changes. A designer quickly and easily can
floorplan design areas along hierarchy boundaries, and then finish the design as normal.
Later, if a design change is required, Incremental Design ensures that only the area of the
design change need be re-implemented; the rest of the design stays locked and intact,
delivering overall design completion faster.
For more information on Incremental Design, see
http://www.xilinx.com/products/design_tools/logic_design/advanced/incrementaldesign.htm.
Modular Design
Modular Design lets you implement a “divide and conquer” approach to multi-million
gate FPGA designs. Partitioning a design into smaller functional modules reduces the
complexities of design, implementation, and verification. These design modules then can
be brought through the design flow independently, leveraging all of the powerful tools
within the Xilinx FPGA design flow. Once completed, a module's implementation is
preserved, guaranteeing the timing in the finished device. This technology is a
requirement for any organization employing a team design methodology for the design of
a multi-million gate FPGA.
Constraints Editor
Constraints are user instructions placed on elements of a schematic or HDL design, either
in the design itself or in a separate file. They can indicate a number of things such as
placement, implementation, naming, signal direction, and timing considerations. In the
Xilinx development system, logical constraints are placed in a file called the UCF (User
Constraints File). The Constraints Editor is a graphical program that can be used to create
and modify those constraints.
PlanAhead Tool
The optional PlanAhead™ tool provides an intuitive environment that delivers a faster,
more efficient design solution, allowing designers to find and fix problems early and
helping to achieve performance goals. The PlanAhead tool provides hierarchical, block-
based, modular and incremental design methodologies, enabling designers to change only
part of the design, leaving placement of the rest intact, thus shortening design iterations. It
helps designers consistently maintain the required performance, even while making
frequent changes.
For more information on the PlanAhead tool, see http://www.xilinx.com/planahead.
FPGA Editor
The FPGA Editor is a graphical application for displaying and configuring FPGAs. The
FPGA Editor requires an NCD file. This file contains the logic of your design mapped to
components such as CLBs and IOBs. In addition, the FPGA Editor reads from and writes to
a Physical Constraints File (PCF).
The following is a list of a few of the functions you can perform on your designs in the
FPGA Editor:
• Place and route critical components before running automatic place and route
• Fine-tune placement and routing after running automatic place and route
• Add probes to design to examine the signal states of the targeted device
• Run the Bitstream Generator and download the resulting file to the targeted device
• Create an entire design by hand (for advanced users)
For more information on the FPGA Editor PROBE tool, see
http://www.xilinx.com/products/design_tools/logic_design/verification/fpgaeditorprobe.htm.
ISE Simulator
ISE Simulator provides a complete, full-featured HDL simulator integrated within the ISE
development system. ISE Simulator comes in two versions:
• Free ISE Simulator Lite, included with all ISE configurations, is ideal for low-density
FPGA designs and is limited to 15,000 lines of HDL source code.
• ISE Simulator full version supports any design density and is a low-cost optional
add-on to ISE Foundation.
For more information on the ISE Simulator, see
http://www.xilinx.com/products/design_tools/logic_design/verification/ise_simulator.htm.
Conclusion
The ISE design environment brings you the fastest, most complete family of design tools
available. The ISE tools are available in multiple configurations with various optional tools
and interfaces to third-party tools, allowing you to customize the set of tools for your own
needs. The ISE development system combines advanced technologies such as ProActive
Timing Closure with a flexible, easy-to-use graphical interface to help you achieve the best
possible designs with the least time and effort, regardless of your experience level.
Chapter 14
Using IP Cores
Summary
This chapter provides an overview of the Xilinx CORE Generator™ System and the Xilinx
Intellectual Property (IP) offerings that facilitate the Spartan®-3 generation design process.
For more detailed and complete information, consult the CORE Generator System on-line
help available at http://toolbox.xilinx.com/docsan/xilinx92/help/iseguide/
mergedProjects/coregen/coregen.htm, and the Xilinx IP Center available at
http://www.xilinx.com/ipcenter/index.htm.
This chapter applies to all Spartan-3 generation FPGA families.
A complete catalog of Xilinx cores and IP tools resides on the IP Center, including:
• LogiCORE Products
• AllianceCORE Products
• Candidate Core Products
• Design Files
• Alliance Program Partner Services
LogiCORE Products
LogiCORE products are designed, sold, licensed, and supported by Xilinx. LogiCORE
products include a wide selection of generic, parameterized functions, such as muxes,
adders, multipliers, and memory cores, which are bundled with the Xilinx CORE
Generator software at no additional cost to licensed software customers. System-level
cores, such as PCI, Reed-Solomon, ADPCM, HDLC, POS-PHY, and Color Space
Converters are also available as optional, separately licensed products. The CORE
Generator commonly is used to quickly generate Spartan-3 generation block and
distributed memories. A more detailed listing of available Spartan-3 generation
LogiCORE products is available in Table 14-1, page 429 and on the Xilinx IP Center website
(http://www.xilinx.com/ipcenter).
Types of IP currently offered by the Xilinx LogiCORE program include:
• Basic Elements: logic gates, registers, multiplexers, adders, multipliers
• Communications and Networking: ADPCM modules, HDLC controllers, ATM
building blocks, forward error correction modules, cable modem solutions, 10/100
Ethernet MAC, SPI-4.2, and POS-PHY interfaces
• DSP and Video Image Processing: cores ranging from small building blocks (e.g., Time
Skew Buffers) to larger system-level functions (e.g., FIR Filters and FFTs)
• System Logic: accumulators, adders, subtracters, complementers, multipliers,
integrators, pipelined delay elements, single and dual-port distributed and block
RAM, ROM, and synchronous and asynchronous FIFOs
• Standard Bus Interfaces: PCI Interfaces, PCI Express PIPE Endpoint, I2C, CAN
• Processor Solutions: MicroBlaze™ 32-bit soft processor, PicoBlaze™ 8-bit soft
processor, and peripherals
AllianceCORE Products
AllianceCORE products are intellectual property (IP) cores developed, sold, and
supported by third-party Xilinx Alliance Program members. AllianceCORE certification
provides a showcase for the most popular IP cores offered.
To receive the AllianceCORE designation, members must submit netlist deliverables of the
core in the form of an ISE® project that includes both VHDL and Verilog wrapper files for
the current devices recommended for new designs. Xilinx then performs a “flow check” on
these deliverables to verify they run through the Xilinx design tools, and the reported
implementation results (device utilization and clock rates) are repeatable. Xilinx does not
verify core functionality or compliance with specific standards.
Design Files
Xilinx offers two types of design files: XAPP application notes developed by Xilinx and
reference designs developed by Xilinx and its partners. Both types are extremely valuable
to customers looking for guidance when designing systems. Application notes developed
by Xilinx usually include supporting design files. They are supplied free of charge, without
technical support or warranty. Reference designs often can be used as starting points for
implementing a broad spectrum of functions in Xilinx programmable logic.
SignOnce
The SignOnce IP License has been the industry’s first and only set of common license terms
for programmable logic soft IP cores. Xilinx and leading third-party providers have agreed
to offer cores to FPGA customers under a common set of terms known as the SignOnce IP
License, simplifying the process by which customers can access IP from multiple suppliers.
For more information see http://www.xilinx.com/ipcenter/signonce.htm.
Spartan-3E, Spartan-3
1394a Link Layer Controller (C1394A) CAST, Inc. Candidate Core X
FPGAs
Spartan-3E, Spartan-3
Adaptive Image Enhancement (Iridix) Apical Limited Candidate Core
FPGAs
Spartan-3E, Spartan-3,
Compact Video Controller (logiCVC) Xylon d.o.o. AllianceCORE X
Spartan-IIE FPGAs
Digital Design
Gamma Correction, Dynamic Candidate Core Spartan-3 FPGAs
Corporation
4i2i Communications
H.264 Encoder, Baseline (4i2i) AllianceCORE X Spartan-3 FPGAs
Ltd.
Spartan-3, Spartan-IIE
I2S Transmitter (CWda06) Coreworks Candidate Core X
FPGAs
Spartan-3E, Spartan-3
JPEG, Codec (JPEG_C) CAST, Inc. AllianceCORE X
FPGAs
Spartan-3E, Spartan-3
JPEG, Decoder (JPEG_D) CAST, Inc. AllianceCORE X
FPGAs
Spartan-3E, Spartan-3
JPEG, Encoder (JPEG_E) CAST, Inc. AllianceCORE X
FPGAs
Amphion
JPEG, Motion Decoder (CS6150) AllianceCORE X Spartan-3 FPGAs
Semiconductor, Ltd.
Amphion
MPEG-2 Video Decoder (CS6651) AllianceCORE X Spartan-3 FPGAs
Semiconductor, Ltd.
Spartan-3, Spartan-3A
MPEG4 Simple Profile Decoder Xilinx, Inc. LogiCORE X
FPGAs
Spartan-3, Spartan-3A
MPEG4 Simple Profile Encoder Xilinx, Inc. LogiCORE X
FPGAs
4i2i Communications
MPEG-4 Video Compression Decoder AllianceCORE X Spartan-3 FPGAs
Ltd.
4i2i Communications
MPEG-4 Video Compression Encoder AllianceCORE X Spartan-3 FPGAs
Ltd.
Basic Logic
Spartan-3, Spartan-IIE,
8b/10b Decoder Xilinx, Inc. LogiCORE
Spartan-II FPGAs
Spartan-3, Spartan-IIE,
8b/10b Encoder Xilinx, Inc. LogiCORE
Spartan-II FPGAs
Spartan-3E, Spartan-3
Binary Counter Xilinx, Inc. LogiCORE
FPGAs
Spartan-3E, Spartan-3,
Binary Decoder Xilinx, Inc. LogiCORE Spartan-IIE, Spartan-II
FPGAs
Spartan-3E, Spartan-3,
BUFE-based Multiplexer Slice Xilinx, Inc. LogiCORE
Spartan-IIE FPGAs
Spartan-3E, Spartan-3
Comparator Xilinx, Inc. LogiCORE
FPGAs
Spartan-3E, Spartan-3,
FD-based Shift Register Xilinx, Inc. LogiCORE
Spartan-IIE FPGAs
Spartan-3E, Spartan-3,
LD-based Parallel Latch Xilinx, Inc. LogiCORE
Spartan-IIE FPGAs
Spartan-3, Spartan-IIE
Linear Feedback Shift Register (LFSR) Xilinx, Inc. LogiCORE
FPGAs
Spartan-3E, Spartan-3
RAM-based Shift Register Xilinx, Inc. LogiCORE
FPGAs
Spartan-3, Spartan-IIE,
8b/10b Decoder Xilinx, Inc. LogiCORE
Spartan-II FPGAs
Spartan-3, Spartan-IIE,
8b/10b Encoder Xilinx, Inc. LogiCORE
Spartan-II FPGAs
Spartan-3E, Spartan-3
Binary Counter Xilinx, Inc. LogiCORE
FPGAs
Spartan-3E, Spartan-3,
Binary Decoder Xilinx, Inc. LogiCORE Spartan-IIE, Spartan-II
FPGAs
Spartan-3E, Spartan-3,
BUFE-based Multiplexer Slice Xilinx, Inc. LogiCORE
Spartan-IIE FPGAs
Spartan-3E, Spartan-3
Comparator Xilinx, Inc. LogiCORE
FPGAs
Spartan-3E, Spartan-3,
FD-based Parallel Register Xilinx, Inc. LogiCORE
Spartan-IIE FPGAs
Spartan-3E, Spartan-3,
FD-based Shift Register Xilinx, Inc. LogiCORE
Spartan-IIE FPGAs
Spartan-3E, Spartan-3,
LD-based Parallel Latch Xilinx, Inc. LogiCORE
Spartan-IIE FPGAs
Spartan-3, Spartan-IIE
Linear Feedback Shift Register (LFSR) Xilinx, Inc. LogiCORE
FPGAs
Spartan-3E, Spartan-3
RAM-based Shift Register Xilinx, Inc. LogiCORE
FPGAs
Silicon Interfaces
1394 Link Layer Controller (SI16FW10) Candidate Core Spartan-3 FPGAs
America Inc.
Spartan-3E, Spartan-3
1394a Link Layer Controller (C1394A) CAST, Inc. Candidate Core X
FPGAs
Spartan-3E, Spartan-3,
CAN (DO-DI-CAN) Xilinx, Inc. LogiCORE Spartan-3A, Spartan-3A
DSP FPGAs
Spartan-3E, Spartan-3,
CAN 2.0 B Compatible Network
Xylon d.o.o. AllianceCORE X Spartan-IIE, Spartan-II
Controller (LogiCAN)
FPGAs
Intelliga Integrated Spartan-3, Spartan-IIE
CAN 2.0B Bus Controller (iCAN) AllianceCORE X
Design, Ltd. FPGAs
Spartan-3E, Spartan-3
CAN Bus Controller 2.0B CAST, Inc. AllianceCORE X
FPGAs
Spartan-3, Spartan-IIE
I2C Bus Controller (I2C) CAST, Inc. AllianceCORE X
FPGAs
Spartan-3E, Spartan-3
I2C Bus Controller Master (DI2CM) Digital Core Design AllianceCORE X
FPGAs
Spartan-3E, Spartan-3
I2C Bus Controller Slave (DI2CS) Digital Core Design Candidate Core X
FPGAs
Spartan-3E, Spartan-3
LIN Controller CAST, Inc. AllianceCORE X
FPGAs
Spartan-3, Spartan-IIE
OPB Arbiter Xilinx, Inc. LogiCORE
FPGAs
Spartan-3E, Spartan-3,
OPB IPIF Architecture Xilinx, Inc. LogiCORE Spartan-3A, Spartan-IIE
FPGAs
Spartan-3, Spartan-IIE
OPB to OPB Bridge (Lite Version) Xilinx, Inc. LogiCORE
FPGAs
Spartan-3, Spartan-3A,
OPB/PLB PCI32 Interface Xilinx, Inc. LogiCORE
Spartan-IIE FPGAs
Spartan-3, Spartan-IIE
PCI 32-bit Master Interface (PCI-M32) CAST, Inc. AllianceCORE X
FPGAs
Spartan-3, Spartan-IIE
PCI 32-bit Target Interface (PCI-T32) CAST, Inc. AllianceCORE X
FPGAs
Spartan-3, Spartan-IIE
PCI Host Bridge (PCI-HB) CAST, Inc. AllianceCORE X
FPGAs
Spartan-3E, Spartan-3,
PCI32 Single Project Spartan Series
Xilinx, Inc. LogiCORE X Spartan-IIE, Spartan-II
(DO-DI-PCI32-SP)
FPGAs
Spartan-3A DSP,
PCI32 Spartan Series Interface Spartan-3A, Spartan-3E,
Xilinx, Inc. LogiCORE X
(DO-DI-PCI32-IP) Spartan-3, Spartan-IIE,
Spartan-II FPGAs
Spartan-3A DSP,
PCI64 Virtex® and Spartan Series Spartan-3A, Spartan-3E,
Xilinx, Inc. LogiCORE X
Interface, IP Only (DO-DI-PCI64-IP) Spartan-3, Spartan-IIE,
Spartan-II FPGAs
Spartan-3A, Spartan-3A
PCI64 Interface Design Kit (DO-DI-
Xilinx, Inc. LogiCORE X DSP, Spartan-3, Spartan-
PCI64-DKT)
IIE, Spartan-II FPGAs
Spartan-3, Spartan-IIE
PCI-PCI Bridge (EP440) Eureka Technology AllianceCORE X
FPGAs
Spartan-3, Spartan-IIE
PowerPC® Bus Master (EP201) Eureka Technology AllianceCORE X
FPGAs
Spartan-3, Spartan-IIE
PowerPC Bus Slave (EP100) Eureka Technology AllianceCORE X
FPGAs
Spartan-3E, Spartan-3,
SPI-Master/Slave (DSPI) Digital Core Design Candidate Core X
Spartan-IIE FPGAs
Spartan-3E, Spartan-3,
SPI-Slave (DSPIS) Digital Core Design Candidate Core X
Spartan-IIE FPGAs
Spartan-3, Spartan-IIE
USB 1.1 Function Controller (CUSB) CAST, Inc. AllianceCORE X
FPGAs
Spartan-3A DSP,
V3.0 PCI 32-bit/(66-33 MHz) Core and Spartan-3A, Spartan-3,
Xilinx, Inc. LogiCORE X
Design Kit Spartan-IIE, Spartan-II
FPGAs
Spartan-3E, Spartan-3,
V3.0 PCI 64-bit/(66-33 MHz) and PCI
Xilinx, Inc. LogiCORE X Spartan-IIE, Spartan-II
32-bit/(66-33 MHz) Core
FPGAs
Silicon Interfaces
1394 Link Layer Controller (SI16FW10) Candidate Core Spartan-3 FPGAs
America Inc.
Spartan-3E, Spartan-3
1394a Link Layer Controller (C1394A) CAST, Inc. Candidate Core X
FPGAs
Spartan-3E, Spartan-3,
16550 UART w/ FIFOs (D16550) Digital Core Design AllianceCORE X
Spartan-IIE FPGAs
Spartan-3E, Spartan-3,
16750 UART w/ FIFOs (D16750) Digital Core Design Candidate Core X
Spartan-IIE FPGAs
Spartan-3, Spartan-IIE,
8b/10b Decoder Xilinx, Inc. LogiCORE
Spartan-II FPGAs
Spartan-3, Spartan-IIE,
8b/10b Encoder Xilinx, Inc. LogiCORE
Spartan-II FPGAs
Spartan-3, Spartan-IIE
AES Encryption CAST, Inc. AllianceCORE X
FPGAs
Spartan-3A DSP,
Spartan-3E, Spartan-3,
Convolutional Encoder Xilinx, Inc. LogiCORE
Spartan-IIE, Spartan-II
FPGAs
Spartan-3E, Spartan-3,
DES Encryption CAST, Inc. Candidate Core X
Spartan-IIE FPGAs
Spartan-3, Spartan-IIE
DES3 Encryption CAST, Inc. AllianceCORE X
FPGAs
Spartan-3, Spartan-IIE
Ethernet MAC, 10/100 (MAC) CAST, Inc. AllianceCORE X
FPGAs
Spartan-3A DSP,
Ethernet Statistics Xilinx, Inc. LogiCORE X Spartan-3E, Spartan-3,
Spartan-3A FPGAs
Spartan-3E, Spartan-3
Framer, SONET OC-3 to OC-768 (Xelic) Xelic, Inc. Candidate Core X
FPGAs
HDLC, Single-Channel
Avnet Design Services AllianceCORE X Spartan-3 FPGAs
(MC-XIL-HDLC)
Spartan-3E, Spartan-3,
MD5 Message Digest Algorithm CAST, Inc. AllianceCORE X
Spartan-IIE FPGAs
Spartan-3, Spartan-3A,
OPB 10/100 Ethernet Media Access
Xilinx, Inc. LogiCORE X Spartan-IIE, Spartan-II
Controller (EMAC)
FPGAs
Spartan-3, Spartan-II
OPB Single Channel HDLC Controller Xilinx, Inc. LogiCORE
FPGAs
Spartan-3A DSP,
Packet Queue Xilinx, Inc. LogiCORE X Spartan-3A, Spartan-3E,
Spartan-3 FPGAs
NitAl Consulting
PCI Express (PExCore-254) Candidate Core X Spartan-3 FPGAs
Services, Inc.
Spartan-3A DSP,
Reed-Solomon Decoder Xilinx, Inc. LogiCORE X Spartan-3A, Spartan-3E,
Spartan-3 FPGAs
Spartan-3E, Spartan-3,
Reed-Solomon Encoder Xilinx, Inc. LogiCORE X
Spartan-IIE FPGAs
Spartan-3, Spartan-IIE
SDLC Controller (SDLC) CAST, Inc. AllianceCORE X
FPGAs
Spartan-3A, Spartan-3A
SPI-3 Physical Layer Interface,
Xilinx, Inc. LogiCORE X DSP, Spartan-3E,
Multi-channel
Spartan-3 FPGAs
Spartan-3A DSP,
SPI-4 Phase 2 Interface Solutions
Xilinx, Inc. LogiCORE X Spartan-3A, Spartan-3E,
(DO-DI-POSL4MC)
Spartan-3 FPGAs
Spartan-3E, Spartan-3,
SPI-Master/Slave (DSPI) Digital Core Design Candidate Core X
Spartan-IIE FPGAs
Spartan-3E, Spartan-3,
SPI-Slave (DSPIS) Digital Core Design Candidate Core X
Spartan-IIE FPGAs
Spartan-3, Spartan-IIE
TDES AMBA Platform SoC Solutions, LLC AllianceCORE X
FPGAs
Spartan-3, Spartan-II
TDM Switch (TDM_H110) Calyptech Pty Ltd. AllianceCORE X
FPGAs
Spartan-3A DSP,
Tri-Mode Ethernet Media Access
Xilinx, Inc. LogiCORE X Spartan-3E, Spartan-3,
Controller (TEMAC)
Spartan-3A FPGAs
Spartan-3A,
Turbo Convolutional Code Decoder, Spartan-3A DSP,
Xilinx, Inc. LogiCORE
CDMA2000/3GPP2 Spartan-3E, Spartan-3,
Spartan-II FPGAs
Spartan-3A DSP,
Turbo Convolutional Code Encoder, Spartan-3A, Spartan-3E,
Xilinx, Inc. LogiCORE
CDMA2000/3GPP2 Spartan-3, Spartan-II
FPGAs
Turbo Product Code (TPC) Decoder Xilinx, Inc. LogiCORE Spartan-3 FPGAs
Turbo Product Code (TPC) Encoder Xilinx, Inc. LogiCORE Spartan-3 FPGAs
Spartan-3, Spartan-IIE
USB 1.1 Function Controller (CUSB) CAST, Inc. AllianceCORE X
FPGAs
Spartan-3, Spartan-IIE
USB 2.0 Function Controller (CUSB2) CAST, Inc. AllianceCORE X
FPGAs
UTOPIA, Multi-port Serial Bridge Phystream Ltd. Candidate Core X Spartan-3 FPGAs
Spartan-3E, Spartan-3,
Viterbi Decoder Xilinx, Inc. LogiCORE X Spartan-3A, Spartan-IIE
FPGAs
Spartan-3, Spartan-IIE
Z80 Serial I/O Controller (CZ80SIO) CAST, Inc. AllianceCORE X
FPGAs
Spartan-3E, Spartan-3,
Convolutional Encoder Xilinx, Inc. LogiCORE Spartan-IIE, Spartan-II
FPGAs
Spartan-3E, Spartan-3,
CORDIC Xilinx, Inc. LogiCORE Spartan-IIE, Spartan-II
FPGAs
Spartan-3E, Spartan-3,
Direct Digital Synthesizer Xilinx, Inc. LogiCORE Spartan-IIE, Spartan-II
FPGAs
Spartan-3E, Spartan-3,
Divider Xilinx, Inc. LogiCORE Spartan-IIE, Spartan-II
FPGAs
Spartan-3A DSP,
DVB-S.2 FEC Encoder Xilinx, Inc. LogiCORE X Spartan-3E, Spartan-3
FPGAs
Spartan-3E, Spartan-3,
Fast Fourier Transform Xilinx, Inc. LogiCORE
Spartan-3A
Sundance
FFT/IFFT (FC100) Multiprocessor Candidate Core Spartan-3 FPGAs
Technology Ltd
Spartan-3E, Spartan-3,
FIR Compiler Xilinx, Inc. LogiCORE Spartan-3A,
Spartan-3A DSP
Spartan-3E, Spartan-II
IEEE 802.16E CTC Decoder Xilinx, Inc. LogiCORE X
FPGAs
Spartan-3A DSP,
IEEE 802.16e CTC Encoder Xilinx, Inc. LogiCORE Spartan-3E, Spartan-3,
Spartan-3A FPGAs
Spartan-3E, Spartan-3
IEEE 802.16e LDPC Encoder Xilinx, Inc. LogiCORE
FPGAs
J.83 Universal Modulator Annex A/C Xilinx, Inc. LogiCORE X Spartan-3 FPGAs
Spartan-3, Spartan-IIE
Linear Feedback Shift Register (LFSR) Xilinx, Inc. LogiCORE
FPGAs
Spartan-3E, Spartan-3,
Reed-Solomon Decoder Xilinx, Inc. LogiCORE X
Spartan-3A FPGAs
Spartan-3A DSP,
Reed-Solomon Encoder Xilinx, Inc. LogiCORE X Spartan-3E, Spartan-3,
Spartan-IIE FPGAs
Spartan-3E, Spartan-3,
Sine Cosine Look Up Table Xilinx, Inc. LogiCORE Spartan-IIE, Spartan-II
FPGAs
Spartan-3, Spartan-IIE
TMS32025 DSP Processor (C32025) CAST, Inc. AllianceCORE X
FPGAs
Turbo Product Code (TPC) Decoder Xilinx, Inc. LogiCORE Spartan-3 FPGAs
Turbo Product Code (TPC) Encoder Xilinx, Inc. LogiCORE Spartan-3 FPGAs
Spartan-3E, Spartan-3,
Viterbi Decoder Xilinx, Inc. LogiCORE X Spartan-3A, Spartan-IIE
FPGAs
Spartan-3A DSP,
Viterbi Decoder, Spartan-3A, Spartan-3E,
Xilinx, Inc. LogiCORE X
(IEEE 802-Compatible) Spartan-3, Spartan-IIE
FPGAs
Spartan-3E, Spartan-3
WiMAX FEC Pack Xilinx, Inc. LogiCORE X
FPGAs
Math
Spartan-3E, Spartan-3,
Accumulator Xilinx, Inc. LogiCORE Spartan-IIE, Spartan-II
FPGAs
Spartan-3E, Spartan-3,
Adder/Subtracter Xilinx, Inc. LogiCORE Spartan-IIE, Spartan-II
FPGAs
Spartan-3E, Spartan-3
Comparator Xilinx, Inc. LogiCORE
FPGAs
Spartan-3A, Spartan-3A
Complex Multiplier Xilinx, Inc. LogiCORE DSP, Spartan-3E,
Spartan-3 FPGAs
Spartan-3E, Spartan-3,
CORDIC Xilinx, Inc. LogiCORE Spartan-IIE, Spartan-II
FPGAs
Sundance
FFT/IFFT (FC100) Multiprocessor Candidate Core Spartan-3 FPGAs
Technology Ltd
Spartan-3E, Spartan-3,
Floating Point Divider (DFPDIV) Digital Core Design AllianceCORE X
Spartan-IIE FPGAs
Spartan-3E, Spartan-3,
Floating Point Multiplier (DFPMUL) Digital Core Design AllianceCORE X
Spartan-IIE FPGAs
Spartan-3E, Spartan-3,
Multiplier Accumulator Xilinx, Inc. LogiCORE Spartan-IIE, Spartan-II
FPGAs
Spartan-3E, Spartan-3,
Multiplier Xilinx, Inc. LogiCORE Spartan-IIE, Spartan-II
FPGAs
Spartan-3E, Spartan-3,
Sine Cosine Look Up Table Xilinx, Inc. LogiCORE Spartan-IIE, Spartan-II
FPGAs
Spartan-3A DSP,
Spartan-3A, Spartan-3E,
Asynchronous FIFO Xilinx, Inc. LogiCORE
Spartan-3, Spartan-IIE,
Spartan-II FPGAs
Spartan-3A DSP,
Spartan-3A, Spartan-3E,
Block Memory Generator Xilinx, Inc. LogiCORE X
Spartan-3, Spartan-IIE,
Spartan-II FPGAs
Spartan-3A DSP,
Spartan-3A, Spartan-3E,
BUFT-based Multiplexer Slice Xilinx, Inc. LogiCORE
Spartan-3, Spartan-IIE,
Spartan-II FPGAs
Spartan-3A DSP,
Spartan-3A, Spartan-3E,
Content Addressable Memory (CAM) Xilinx, Inc. LogiCORE
Spartan-3, Spartan-IIE,
Spartan-II FPGAs
Spartan-3A DSP,
Spartan-3A, Spartan-3E,
Distributed Memory Generator Xilinx, Inc. LogiCORE X
Spartan-3, Spartan-IIE,
Spartan-II FPGAs
Spartan-3A DSP,
Spartan-3A, Spartan-3E,
Distributed Memory Xilinx, Inc. LogiCORE
Spartan-3, Spartan-IIE,
Spartan-II FPGAs
Spartan-3A DSP,
Spartan-3A, Spartan-3E,
Dual-Port Block Memory Xilinx, Inc. LogiCORE
Spartan-3, Spartan-IIE,
Spartan-II FPGAs
Spartan-3A DSP,
Spartan-3A, Spartan-3E,
FD-based Parallel Register Xilinx, Inc. LogiCORE
Spartan-3, Spartan-IIE,
Spartan-II FPGAs
Spartan-3A DSP,
Spartan-3A, Spartan-3E,
FD-based Shift Register Xilinx, Inc. LogiCORE
Spartan-3, Spartan-IIE,
Spartan-II FPGAs
Spartan-3A DSP,
Spartan-3A, Spartan-3E,
FIFO Generator Xilinx, Inc. LogiCORE
Spartan-3, Spartan-IIE,
Spartan-II FPGAs
Spartan-3A DSP,
Spartan-3A, Spartan-3E,
LD-based Parallel Latch Xilinx, Inc. LogiCORE
Spartan-3, Spartan-IIE,
Spartan-II FPGAs
Spartan-3A DSP,
Spartan-3A, Spartan-3E,
RAM-based Shift Register Xilinx, Inc. LogiCORE
Spartan-3, Spartan-IIE,
Spartan-II FPGAs
Spartan-3E, Spartan-3,
SDRAM Controller, DDR (NWL) Northwest Logic AllianceCORE X
Spartan-IIE FPGAs
Spartan-XL, Spartan-3
SDRAM Controller, DDR2 (NWL) Northwest Logic AllianceCORE X
FPGAs
Spartan-3E, Spartan-3,
SDRAM Controller, SDR (NWL) Northwest Logic AllianceCORE X
Spartan-IIE FPGAs
Spartan-3A, Spartan-3A
DSP, Spartan-3E,
Single-Port Block Memory Xilinx, Inc. LogiCORE
Spartan-3, Spartan-IIE,
Spartan-II FPGAs
Spartan-3, Spartan-IIE,
Synchronous FIFO Xilinx, Inc. LogiCORE
Spartan-II FPGAs
Embedded Processing
Spartan-3E, Spartan-3,
16550 UART w/ FIFOs (D16550) Digital Core Design AllianceCORE X
Spartan-IIE FPGAs
Spartan-3E, Spartan-3,
16750 UART w/ FIFOs (D16750) Digital Core Design Candidate Core X
Spartan-IIE FPGAs
Spartan-3E, Spartan-3
68681 DUART (ML68681) Millogic Ltd Candidate Core
FPGAs
Spartan-3E, Spartan-3
82C51 USART (ML82C51) Millogic Ltd Candidate Core
FPGAs
Spartan-3, Spartan-IIE,
Block RAM (BRAM) Block (v1.00a) Xilinx, Inc. LogiCORE
Spartan-II FPGAs
Spartan-3, Spartan-IIE,
ChipScope OPB IBA Xilinx, Inc. LogiCORE
Spartan-II FPGAs
Spartan-3, Spartan-IIE,
ChipScope PLB IBA Xilinx, Inc. LogiCORE
Spartan-II FPGAs
Spartan-3E, Spartan-3,
Compact Video Controller (logiCVC) Xylon d.o.o. AllianceCORE X
Spartan-IIE FPGAs
Digital Clock Manager (DCM) Module Xilinx, Inc. LogiCORE Spartan-3 FPGAs
Spartan-3, Spartan-3A,
Fixed Interval Timer (FIT) Xilinx, Inc. LogiCORE Spartan-IIE, Spartan-II
FPGAs
Spartan-3, Spartan-3A,
FSL_V20 Xilinx, Inc. LogiCORE Spartan-IIE, Spartan-II
FPGAs
Spartan-3, Spartan-3A,
LMB BRAM Interface Controller Xilinx, Inc. LogiCORE Spartan-IIE, Spartan-II
FPGAs
Spartan-3, Spartan-IIE,
MicroBlaze Parameterized Netlist Xilinx, Inc. LogiCORE
Spartan-II FPGAs
Spartan-3, Spartan-3A
Multi Channel OPB DDR Controller Xilinx, Inc. LogiCORE
FPGAs
Spartan-3, Spartan-3A
Multi Channel OPB DDR2 Controller Xilinx, Inc. LogiCORE
FPGAs
Spartan-3, Spartan-3A
Multi Channel OPB EMC Controller Xilinx, Inc. LogiCORE
FPGAs
Spartan-3, Spartan-3A,
Multi-Channel-OPB (MCH_OPB)
Xilinx, Inc. LogiCORE Spartan-IIE, Spartan-II
SDRAM Controller
FPGAs
Spartan-3, Spartan-3A,
OPB 16550 UART Controller Xilinx, Inc. LogiCORE
Spartan-IIE FPGAs
Spartan-3, Spartan-IIE
OPB Arbiter Xilinx, Inc. LogiCORE
FPGAs
Spartan-3, Spartan-3A,
OPB BRAM Controller Xilinx, Inc. LogiCORE
Spartan-IIE FPGAs
Spartan-3E, Spartan-3,
OPB Bus Structure Xilinx, Inc. LogiCORE Spartan-3A, Spartan-IIE
FPGAs
Spartan-3, Spartan-3A,
OPB Central DMA Controller Xilinx, Inc. LogiCORE Spartan-IIE, Spartan-II
FPGAs
Spartan-3E, Spartan-3,
OPB DDR SDRAM Controller Xilinx, Inc. LogiCORE Spartan-3A, Spartan-IIE
FPGAs
Spartan-3, Spartan-3A,
OPB Delta-Sigma Analog to Digital
Xilinx, Inc. LogiCORE Spartan-IIE, Spartan-II
Converter (ADC)
FPGAs
Spartan-3, Spartan-3A,
OPB Delta-Sigma Digital to Analog
Xilinx, Inc. LogiCORE Spartan-IIE, Spartan-II
Converter (DAC)
FPGAs
Spartan-3, Spartan-3A,
OPB EMC (DO-EDK) Xilinx, Inc. LogiCORE Spartan-IIE, Spartan-II
FPGAs
Spartan-3, Spartan-3A,
OPB GPIO Xilinx, Inc. LogiCORE
Spartan-IIE FPGAs
Spartan-3E, Spartan-3,
OPB IPIF Architecture Xilinx, Inc. LogiCORE Spartan-3A, Spartan-IIE
FPGAs
Spartan-3, Spartan-3A,
OPB PCI Arbiter Xilinx, Inc. LogiCORE
Spartan-II FPGAs
Spartan-3, Spartan-3A,
OPB SDRAM Controller Xilinx, Inc. LogiCORE
Spartan-II FPGAs
Spartan-3, Spartan-II
OPB Single Channel HDLC Controller Xilinx, Inc. LogiCORE
FPGAs
Spartan-3, Spartan-3A,
OPB Timer/Counter Xilinx, Inc. LogiCORE
Spartan-IIE FPGAs
Spartan-3, Spartan-IIE
OPB to OPB Bridge (Lite Version) Xilinx, Inc. LogiCORE
FPGAs
Spartan-3, Spartan-3A,
OPB UART 16450 Controller Xilinx, Inc. LogiCORE
Spartan-II FPGAs
Spartan-3, Spartan-3A,
OPB UART Lite Xilinx, Inc. LogiCORE
Spartan-IIE FPGAs
Spartan-3, Spartan-IIE
OPB ZBT Controller (DO-EDK) Xilinx, Inc. LogiCORE
FPGAs
Spartan-3, Spartan-3A,
OPB/PLB PCI32 Interface Xilinx, Inc. LogiCORE
Spartan-IIE FPGAs
Spartan-3, Spartan-IIE
PLB ATMC Xilinx, Inc. LogiCORE
FPGAs
Spartan-3, Spartan-IIE
PLB to OPB Bridge (DO-EDK) Xilinx, Inc. LogiCORE
FPGAs
Spartan-3, Spartan-IIE
PowerPC Bus Master (EP201) Eureka Technology AllianceCORE X
FPGAs
Spartan-3, Spartan-IIE
PowerPC Bus Slave (EP100) Eureka Technology AllianceCORE X
FPGAs
Spartan-3, Spartan-IIE
SDRAM Controller, DDR (EP525) Eureka Technology AllianceCORE X
FPGAs
Spartan-3E, Spartan-3,
SDRAM Controller, DDR (NWL) Northwest Logic AllianceCORE X
Spartan-IIE FPGAs
Spartan-XL, Spartan-3
SDRAM Controller, DDR2 (NWL) Northwest Logic AllianceCORE X
FPGAs
Spartan-3E, Spartan-3,
SDRAM Controller, SDR (NWL) Northwest Logic AllianceCORE X
Spartan-IIE FPGAs
Spartan-3, Spartan-IIE
TMS32025 DSP Processor (C32025) CAST, Inc. AllianceCORE X
FPGAs
Spartan-3, Spartan-IIE,
Util Bus Split Operation Xilinx, Inc. LogiCORE
Spartan-II FPGAs
Spartan-3, Spartan-IIE,
Util Flop-Flop Xilinx, Inc. LogiCORE
Spartan-II FPGAs
Spartan-3, Spartan-IIE,
Util Reduced Logic Xilinx, Inc. LogiCORE
Spartan-II FPGAs
Spartan-3, Spartan-IIE,
Util Vector Logic Xilinx, Inc. LogiCORE
Spartan-II FPGAs
Spartan-3, Spartan-IIE
Z80 Serial I/O Controller (CZ80SIO) CAST, Inc. AllianceCORE X
FPGAs
Chapter 15
CALL/RETURN
1Kx18 PORT_ID
31x10
Stack
64-Byte
(PC)
Flags
Instruction Constants Z Zero
Decoder
C Carry
INTERRUPT
16 Byte-Wide Registers
Enable Operand 1 ALU
IE s0 s1 s2 s3
s4 s5 s6 s7
IN_PORT s8 s9 sA sB
sC sD sE sF
Operand 2
UG331_c17_01_082406
Introduction
Table 15-1: Embedded Processing/Control Solutions for Spartan-3 Generation FPGAs (Cont’d)
Function/Feature PicoBlaze Processor MicroBlaze Processor
Clocks per Instruction 2 1 to 3, 34 for integer divide
Call/Return/Interrupt Stack 31 locations (internal) Variable size, in data memory
Interrupts 1, Expandable 1, Expandable
Maximum Interrupt Latency 4 clock cycles 7 to 40 clock cycles
(46 ns at maximum clock rate) (application dependent)
Instruction Cache N/A 64 to 64K
Data Cache N/A 64 to 64K
Floating-Point Unit N/A Optional, up to 120X performance
improvement
Hardware Multiplier N/A 32x32 = 32 in 3 cycles
Hardware Divider N/A Optional, up to 20% performance
improvement
Hardware Barrel Shifter N/A Optional, up to 15X performance
improvement
Hardware Debugger Support N/A XMD
LocalLink Direct Processor N/A 200 MB/sec communication
Interface
The PicoBlaze processor is always fully embedded within a Spartan-3 generation FPGA
using on-chip block RAM and distributed RAM for code and data storage. The MicroBlaze
processor optionally uses internal FPGA memory resources or interfaces to external
memory to support larger code or data storage requirements. The Embedded
Development Kit (EDK) for the MicroBlaze processor includes hardware IP cores to
support external Flash, SRAM, SDRAM, DDR DRAM, and ZBT SRAM memory. Similarly,
the MicroBlaze processor supports both instruction and data caches, each up to 64 Kbytes,
to increase performance when connected to external memory.
Table 15-2: PicoBlaze and MicroBlaze Resource Requirements and Performance (Cont’d)
Function/Feature PicoBlaze Processor MicroBlaze Processor
Percent of XC3S200A/AN 5%-6% 29%+
Percent of XC3S400A/AN 3%-5% 15%+
Percent of XC3S700A/AN 3%-5% 10%+
Percent of XC3S1400A/AN 2%-3% 6%+
Spartan-3E FPGAs
Percent of XC3S100E 10%-25% 55%+
Percent of XC3S250E 4%-8% 21%+
Percent of XC3S500E 3%-5% 11%+
Percent of XC3S1200E 2%-4% 7%+
Percent of XC3S1600E 2%-3% 6%+
Spartan-3 FPGAs
Percent of XC3S50 13% – 25% 68%+
Percent of XC3S200 5% – 8% 27%+
Percent of XC3S400 3% – 6% 15%+
Percent of XC3S1000 2% – 4% 8%+
Percent of XC3S1500 2% – 3% 6%+
Percent of XC3S2000 1.3% – 3% 5%+
Percent of XC3S4000 0.5% – 1% 2%+
Percent of XC3S5000 0.5% – 1% 2%+
Performance (Spartan-3 FPGA –5 speed grade)
Maximum clock frequency 87 MHz 100 MHz
Instructions per second 43.5M 92M
Dhrystone MIPS (D-MIPS) N/A 92
Using Spartan-3 generation FPGAs, both MicroBlaze and PicoBlaze processors consume
minimal FPGA resources and are highly cost effective, as shown in Table 15-2. Complete
PicoBlaze solutions cost as little as $0.40 in high-volume applications. MicroBlaze solutions
start from $1.40 in volume.
Both the MicroBlaze and PicoBlaze processors provide significant numbers of flexible I/O
at much lower cost than off-the-shelf controllers. Similarly, the peripheral set for both
processors can be customized to meet the specific feature, function, and cost requirements
of the target application. Because both processors are delivered in synthesizable HDL, both
cores are future-proof, safe from any possible product obsolescence. Being integrated into
the FPGA, both processors reduce board space, design cost, and inventory.
Operating Systems
Many embedded processing applications require operating system capabilities. The
following operating systems and real-time operating systems (RTOS) have ports to the
MicroBlaze processor.
• Micriμm μC/OS-II Real-Time Operating System
http://www.micrium.com/products/rtos/kernel/rtos.html
Processor Peripherals
• Timer/Counter
• Timebase/Watchdog Timer
• UART-Lite
• Interrupt Controller
• General-Purpose I/O port (GPIO)
Serial I/O
• SPI Master and Slave
• JTAG UART
• 16450 UART*
• 16550 UART*
• I2C two-wire serial Master and Slave*
Memory Interfaces
• SDRAM controller and interface
• DDR SDRAM controller and interface
• Flash memory interface
• SRAM memory interface
• Block RAM interface
Networking Interfaces
• Single-channel HDLC controller*
• ATM Utopia L2 master and slave controller*
• 10/100 Ethernet Media Access Controller (MAC)* (Full and Lite versions)
* IP core available as a separate product. Plugs into EDK. Evaluation versions available.
“Package Drawings”
Chapter :
Chapter 16
Notes:
1. # = I/O bank number, an integer between 0 and 3 (7 for the Spartan-3 family).
Pin Types
Pin Types
Most pins on a Spartan-3 generation FPGA are general-purpose, user-defined I/O pins.
There are, however, up to 12 different functional types of pins on Spartan-3 generation
packages, as outlined in Table 16-2. The color coding is used in the package footprint
drawings that are found in Module 4 of each data sheet.
Notes:
1. # = I/O bank number, an integer between 0 and 3 (7 for the Spartan-3 family).
Pin Labeling
The pin label is abbreviated but descriptive for each pin. All I/O pins begin with IO, while
the input-only pins begin with IP. If a pin can be used as a differential signal, the name
includes an L followed by the pair number and the bank number (see “Differential Pair
Labeling”). Bank numbers are also indicated on single-ended pins and on the voltage
inputs that are bank-specific, VCCO and VREF . Dual-purpose pins have a forward slash
separating the two functions. _B is used as the active-Low designator, as in CSI_B.
Pin Types
Pair Number
Bank 1
Positive Polarity,
Bank 3 True Driver
IO_L39P_1
Spartan-3E
FPGA IO_L39N_1
Negative Polarity,
Inverted Driver
Bank 2 DS312-4_00_111105
Pinout Files
The pinouts are found in the data sheets for each family. Comma-delimited text files and
Excel graphical footprints for the pinouts specific to each Spartan-3 FPGA family are
available from the data sheets on xilinx.com. Using a spreadsheet program with the
comma-delimited CSV files, the data can be sorted and reformatted according to any
specific needs. Similarly, the ASCII text file is easily parsed by most scripting programs.
For information on how to use these files to create OrCAD symbols, see Answer Record
10078 at http://www.xilinx.com/support/answers/10078.htm.
The following subsections provide additional details on using the downloadable pinout
files.
Pinout Tables
The comma-delimited ASCII text files located in the /tables directory list pinout
information for a specific package type. Each line represents one pin on the package.
Pinout information for all devices available in the package for the family appears on the
line. This subsection provides brief descriptions of the fields available on each line.
PIN_NUMBER
PIN_NUMBER is the pin identifier for each pin on the package.
For a particular package and family, there can be multiple FPGAs available in that package.
For each pin, all possible FPGAs are listed. Each device is represented by two fields on each
line, XC3S**_PIN and XC3S**_TYPE, described below.
XC3S**_PIN
The XC3S**_PIN field indicates the name for a particular package pin and for a particular
Spartan-3 generation FPGA in that package. The ** characters here indicate a wildcard
character. In the pinout table file, the ** characters are replaced by an actual part number,
such as XC3S250E.
XC3S**_TYPE
The XC3S**_TYPE field indicates the pin type for a particular package pin and for a
particular Spartan-3 generation FPGA in that package. The listed type matches those
described in Module 4 of the data sheet. The ** characters here indicate a wildcard
character. In the pinout table file, the ** characters are replaced by an actual part number,
such as XC3S250E.
BANK
Sorting by BANK orders the pins by their associated I/O bank. The possible values for
BANK include integers between 0 and 3 (0 and 7 for the Spartan-3 family), VCCAUX, and
N/A. N/A indicates that the pin is not associated with a specific bank.
DIFFERENCE
Sorting by DIFFERENCE in descending order highlights any pinout differences between
Spartan-3E FPGAs in the same package. A period (.) indicates that the pins match
identically. DIFF indicates that the pins are different between packages.
To locate unconnected pins in a package type, sort by the TYPE of the smallest device
offered in the package footprint. Any unconnected pins on larger devices are a subset of
those on the smallest device.
Footprint Diagrams
The files in the \footprints directory are all Microsoft Excel spreadsheet files. These
files present a common footprint for each package type and show the pins on the package
as viewed from the top (QFP packages) or through the top of the package (BGA packages).
Note the location of the pin 1 indicator on QFP packages.
Each pin is labeled and color -coded according to Module 4 of the data sheet. No Connect
(N.C.) pins are also indicated with special symbols.
Most footprints were saved as 50% to 75% of normal size so that the entire footprint is
visible on the screen. To change the magnification, select View --> Zoom from the Excel top
menu, then select the desired magnification factor.
Excel might issue a warning when you open the file, indicating that the file might contain
macros. Select either Disable Macros or Enable Macros. There are no active macros in the
Excel files.
PartGen
The pinout files can also be generated from the Xilinx ISE® development system by using
the PartGen program. To create pinout files in PartGen, go to a command prompt and type,
for example, partgen -p xc3s50atq144. That command writes a text file called
xc3s50atq144.pkg to the current directory, which contains a list of the pin names and
pin numbers. Using the -v option (verbose) generates a more detailed .pkg file that
includes the bank number and nearest CLB, among other information. Details on using
PartGen are found in the PartGen chapter of the Development System Reference Guide, found
at: http://www.xilinx.com/support/documentation/dt_ise.htm.
Packages
Table 16-3 shows the low-cost, space-saving production package styles for the
Spartan-3 generation families.
X X PQ208/ 208 Quad Flat Pack 158 0.5 30.6 x 4.10 5.3
PQG208 (QFP) 30.6
Notes:
1. Package mass is ±10%.
Pb-Free Packages
Each package style is available as a standard and an environmentally friendly lead-free
(Pb-free) option. The Pb-free packages include an extra G in the package style name. For
example, the standard TQ144 package becomes TQG144 when ordered as the Pb-free
option. The mechanical dimensions of the standard and Pb-free packages are similar, as
shown in the mechanical drawings provided in Table 17-1, page 471. The materials listed
in the Material Declaration Data Sheet and the thermal characteristics will be different. The
Packages
pinouts are always identical between the standard and Pb-free packages. For more
information on Pb-free packages, see Xilinx Pb-Free and RoHS-compliant Products.
Description and specifications for packages, pack and ship, thermal characteristics,
electrical characteristics, PCB design rules, moisture sensitivity, and reflow soldering
guidelines.
• Device Packaging Application Notes
Application notes on board routability, solder reflow, and related topics.
Chapter 17
Package Drawings
Summary
This chapter provides mechanical drawings of the Spartan®-3 generation packages listed
in Table 17-1. These drawings are also available on the Xilinx website at
http://www.xilinx.com/support/documentation/package_specifications.htm. Also
found on the Xilinx website is the Material Declaration Data Sheet for the standard and Pb-
free versions of each package.
Notes:
1. The CP132 and CPG132 packages are being discontinued and are not recommended for new designs. See
http://www.xilinx.com/support/documentation/customer_notices/xcn08011.pdf for details.
2. The FG1156 and FGG1156 packages are being discontinued and are not recommended for new designs. See
http://www.xilinx.com/support/documentation/customer_notices/xcn07022.pdf for details.
Note: The Spartan-3 generation FPGAs use the 4-layer version of the FG484 package.
UG331_c17_12_111809
Chapter 18
Specific requirements for power supplies, including ramp rates and quiescent current, are
different for each family and are specified in the FPGA data sheets. Dynamic power
consumption also varies by family and can be estimated using the Xilinx power estimator
tools.
Voltage Supplies
Spartan-3 generation FPGAs have multiple voltage supply inputs, as shown in Table 18-2.
There are two supply inputs for internal logic functions, VCCINT and VCCAUX. In the
Spartan-3A/3A DSP platforms, the VCCAUX level is programmable as either 2.5V (default)
or 3.3V. The user specifies the value in the software through the CONFIG VCCAUX=2.5 or
CONFIG VCCAUX=3.3 constraint. In the Spartan-3AN platform, the user must set CONFIG
VCCAUX=3.3 (default) for using the In-System Flash.
Notes:
1. The VCCO designations apply to Spartan-3E and Extended Spartan-3A family FPGAs. The Spartan-3 family has eight VCCO supplies
numbered 0 to 7 clockwise, starting from the top left half-edge. The Spartan-3 devices in the TQ144 and CP132 packages have four
VCCO supplies as shown, but are connected together to make the equivalent of VCCO_TOP, VCCO_RIGHT, VCCO_BOTTOM, and
VCCO_LEFT.
Each of the I/O banks has a separate VCCO supply input that powers the output buffers
within the associated I/O bank. All of the VCCO connections to a specific I/O bank must be
connected to the same voltage. The VCCO voltage can be 1.2V to 3.3V, depending on the
output standard specified for a given bank.
Most devices have four I/O banks. The Spartan-3 family offers eight I/O banks in most
packages, one for each half-edge, except the TQ144 and CP132 packages, which have one
VCCO level per side. In those packages, the VCCO signals are connected together to form the
equivalent of VCCO_TOP, VCCO_RIGHT, VCCO_BOTTOM, and VCCO_LEFT.
In a 3.3V-only application, all VCCO supplies and VCCAUX in the Extended Spartan-3A
family, connect to 3.3V. However, Spartan-3 generation FPGAs allow bridging between
Power Estimation
different I/O voltages and standards by applying different voltages to the VCCO inputs of
different banks. Refer to “I/O Banking Rules,” page 363 for which I/O standards can be
intermixed within a single I/O bank.
In the Extended Spartan-3A family, the 3.3V supplies support the full ±10% range from
3.0V to 3.6V, simplifying the selection of the 3.3V power supply. The Spartan-3/3E families
support –10% to + 5%, or 3.0V to 3.45V.
The Extended Spartan-3A family also allows input voltages (VIN) of up to 4.1V,
independent of the VCCO level. The Spartan-3/3E families restrict VIN to no more than
0.3V/0.5V above VCCO (or VCCAUX for dedicated pins). For applications requiring higher
voltages, see XAPP459, “Eliminating I/O Coupling Effects when Interfacing Large-Swing
Single-Ended Signals to User I/O Pins on Spartan-3 Generation FPGAs”.
VREF
Each I/O bank also has a separate, optional input voltage reference supply, called VREF. If
the I/O bank includes an I/O standard that requires a voltage reference such as HSTL or
SSTL, then all VREF pins within the I/O bank must be connected to the same voltage. The
VREF pins are available as I/O pins if no standards within a bank require them.
Xilinx recommends to always separate VREF from VTT as the VTT supply is very noisy. A
stable VREF using a small LDO is the desirable implementation. A voltage divider
implementation is also possible. Knowledge of the PCB environment, such as frequency of
coupled noise, is required to correctly calculate the resistance and capacitance values of the
divider circuit. As a result, an isolated reference supply is usually a more robust and
simpler approach.
Power Estimation
Xilinx provides a number of spreadsheet and Internet-based power estimation tools,
power analyzers, and power-related documentation to meet all power solutions needs.
The Power Solutions page on xilinx.com provides access to these tools, documentation,
news, and supply solutions.
There are two recommended ways to estimate the total power consumption (quiescent
plus dynamic) for a specific design:
• The XPower Power Estimator spreadsheet provides quick, approximate, typical
estimates, and does not require a netlist of the design. (The Spartan-3 family uses the
“Web Power Tool”.)
• The XPower Analyzer is delivered with ISE® software and uses a netlist as input to
provide maximum estimates as well as more accurate typical estimates.
Voltage Regulators
The choice of a voltage regulator depends on system requirements and the estimated
power consumption requirements for the FPGA. Use the XPower tools to calculate the
requirements for a specific device and design. If the design is not complete or the XPower
tool does not support the target device, use the closest match in the Spartan-3 generation
families that is supported. Then choose a regulator from a pin-compatible family so the
current capability can be adjusted up or down. External power FETs are easy to upgrade. A
softstart feature that controls output ramp time is useful.
With care, use of overcurrent protection is possible, such as foldback or fuses. In this case,
apply VCCAUX no later than VCCINT to avoid the surplus ICCINT current (see “Surplus
ICCINT if VCCINT is Applied before VCCAUX,” page 491). Also be aware that capacitors will
be charging at power-on and might draw a significant amount of current for a short time.
If necessary, slow the supply voltage ramp to control the charge current. If foldback is not
a design requirement, it is best to avoid it, keeping the power supply design simple.
Various power supply manufacturers offer complete power solutions for Xilinx FPGAs
including some with integrated three-rail regulators specifically designed for
Spartan-3 generation FPGAs. The Xilinx Power Solutions website provides links to vendor
solution guides and Xilinx power estimation and analysis tools.
Power-On Behavior
Spartan-3 generation FPGAs have a built-in Power-On Reset (POR) circuit that monitors
the three power rails required to successfully configure the FPGA (see Figure 18-1). At
power-up, the POR circuit holds the FPGA in a reset state until the VCCINT, VCCAUX, and
VCCO Bank 2 supplies reach their respective input threshold levels (see the respective
FPGA data sheets). In the Spartan-3 family, the POR input is VCCO Bank 4, the equivalent
half-edge to VCCO Bank 2 in the Spartan-3E and Spartan-3A/3AN/3A DSP devices.
References to POR and VCCO_2 apply to VCCO_4 in the Spartan-3 family. After all three
supplies reach their respective thresholds, the POR reset is released and the FPGA begins
its configuration process.
Glitch
VCCAUX
Filter Reset
Configuration
Glitch Logic
VCCO_2 Filter
Glitch
PROG_B
Filter
UG331_c18_01_081806
Supply Sequencing
Because the three FPGA supply inputs must be valid to release the POR and can be
supplied in any order, there are no FPGA-specific voltage sequencing requirements.
Applying the FPGA’s VCCINT supply last uses the least ICCINT current.
Although the FPGA has no specific voltage sequence requirements, be sure to consider any
potential sequencing requirement of the configuration device attached to the FPGA, such
as an SPI serial Flash PROM, a parallel NOR Flash PROM, or a microcontroller. For
example, Flash PROMs have a minimum time requirement before the PROM can be
Hot Swap
selected, and this time must be considered if the 3.3V supply is the last in the sequence. See
the Spartan-3 Generation Configuration User Guide (UG332) for more details.
For the Spartan-3AN devices, when configuring from the In-System Flash, VCCAUX must
be in the recommended operating range. On power-up make sure VCCAUX reaches at least
3.0V before INIT_B goes High to indicate the start of configuration. VCCINT, VCCAUX, and
VCCO supplies to the FPGA can be applied in any order if this requirement is met.
When all three supplies are valid, the minimum current required to power-on the FPGA
equals the worst-case quiescent current, specified in the FPGA data sheets.
Spartan-3 generation FPGAs do not require Power-On Surge (POS) current to successfully
configure.
Independent of the hot-swap pin, if VCCO is applied after VCCINT and VCCAUX, the internal
pull-ups are enabled for all I/Os from the time VCCO reaches approximately 0.4V until
VCCO exceeds VCCINT. If this pull-up is not desired, the user should avoid this power
sequence or place pull-down resistors to hold the pin at a Low logic level. Selection of a
pull-down value should be based on the minimum resistor value of the FPGA data sheet
(RPU, VCCO = 1.14V) and the VIL maximum specification of the downstream device.
Ramp Rate
To ensure successful power-on, VCCINT, VCCO Bank 2, and VCCAUX supplies must rise
through their respective threshold-voltage ranges with no dips. The FPGA data sheets
specify any ramp rate requirements. The Spartan-3 family has no ramp rate requirements
for the current revision; refer to the Spartan-3 FPGA Family Data Sheet (DS099) for
specifications for earlier versions. The Spartan-3E family has ramp rate requirements from
0.2 to 50 ms. The Extended Spartan-3A family device ramp rate requirements are from 0.2
to 100 ms.
Hot Swap
Hot swap, also known as hot plug or hot insertion, refers to plugging an unpowered board
into a powered system. To support hot swap, an unpowered board or device must be able
to be plugged directly into a powered system or backplane without affecting or damaging
the system or the board/device. Devices that support hot swap include the following I/O
features:
• Signals can be applied to I/O pins before powering the device
• I/O pins are high-impedance (that is, 3-stated) before and during the power-up and
configuration processes
• There is no current path from the I/O pin back to the voltage supplies
While all Spartan-3 generation families can be used in hot-swap applications, they do not
offer the same levels of support. The Extended Spartan-3A family is fully hot-swap
compliant to the definition provided above. The Spartan-3/3E families require sequenced
connectors to make sure power is applied to the FPGA before the I/Os receive signals.
During the power-down sequence, to cleanly transition from valid signals to the disabled
state, remove VCCO first (VCCO < 0.5V) in the Extended Spartan-3A family, and remove
VCCAUX before VCCINT in the Spartan-3 and Spartan-3E families. See UG332, the Spartan-3
Generation Configuration User Guide for more details.
Saving Power
Lower power consumption not only reduces power supply requirements but also reduces
heat, which increases reliability and might allow for smaller form factor packaging and
eliminate heat sinks and fans. Xilinx FPGAs are designed to minimize power consumption
without sacrificing high performance and low cost.
Dynamic power consumption can be reduced by reducing the number or frequency of
nodes and I/O toggling in a design. The lowest power state is the quiescent state with no
inputs toggling, all outputs disabled, and no pull-up or pull-down resistors in use. In this
state, the power consumption equals the sum of the power required for each power supply.
Consider the following techniques to eliminate any unnecessary switching in a design and
reduce dynamic power:
• Bring all incoming signals to a static state
• Apply rail-to-rail levels to inputs wherever possible
• Use signals that swing from GND to VCCO
• Turn off as many outputs as possible
Saving Power
BUFGCE
Clock
Enable
UG331_c18_02_081906
Avoid using logic to generate gated or multiple clocks. Using CLB logic on a clock signal
introduces route-dependent skew and makes the design sensitive to the timing hazards of
lot-to-lot variations.
Minimizing the amount of routing a clock net uses is helpful. The Xilinx software
automatically disables clock nets in unused columns of CLBs, so reduce the number of
clock columns in use by concentrating the clocked logic in the fewest possible columns of
CLBs. Also reduce the number of rows that the clock is driving.
Floorplanning can be helpful to minimize clock power. Partition logic driven by global
clocks into clock regions and reduce the number of clock regions to which each global clock
is routed. Organize the design into independent clock domains, and clock each domain at
the lowest possible frequency.
Even if the clock cannot be manipulated, the activity on the loads can be controlled
through the use of clock enables to reduce switching activity on the outputs of flip-flops.
Power-Off Mode
In some cases, the device can simply be powered off to save power. This is useful for
designs where the FPGA has lengthy periods of non-operation and the power
consumption must be as low as possible. For the Extended Spartan-3A family, all three
supplies can be removed even if signals are still toggling on the inputs (see Figure 18-3).
For the Spartan-3E and Spartan-3 families, VCCO must be kept at a valid level to keep the
power diodes off (see Figure 18-4). VCCINT and VCCAUX can still be removed, reducing the
total static current to the typical ICCO quiescent value. External FETs can be used to switch
power. Configuration and memory data will be lost, so the FPGA must be reconfigured
after powering on again.
VCCO
2.5V/3.3V 1.2V Supply
Power
Switch
PROG_B
UG331_c18_03_022507
VCCO
2.5V 1.2V Supply
Power
Switch
VCC
VCCAUX VCCINT VCCO
I/O Bank
Device
Spartan-3/3E
FPGA
PROG_B
UG331_c18_04_081806
To enter the Power-Off state, first pull PROG_B Low to turn the outputs off and initialize
the configuration memory to all zeros. After INIT_B and DONE go Low, switch off VCCINT
and VCCAUX. To restore power, reapply VCCINT and VCCAUX and then pull PROG_B back
High. After INIT_B goes High, reconfigure the FPGA and return to user mode.
Suspend Mode
The Extended Spartan-3A family FPGA Suspend Mode reduces power to below quiescent
current levels while saving the state of the device, including all configuration and user
data. For details on the Suspend Mode, see Chapter 19, “Power Management Solutions.”
including stray inductance on the PCB as well as capacitive loading at receivers. Any SSO-
induced voltage consequently affects internal switching noise margins and ultimately
signal quality.
The number of SSOs allowed for quad-flat packages (VQ, TQ, PQ) is lower than for ball
grid array packages (FG) due to the larger lead inductance of the quad-flat packages. The
results for chip-scale packaging are better than quad-flat packaging but not as good as for
ball grid array packaging. Ball grid array packages are recommended for applications with
a large number of simultaneously switching outputs.
Each FPGA data sheet provides guidelines for the recommended maximum allowable
number of Simultaneous Switching Outputs (SSOs). These guidelines describe the
maximum number of user I/O pins of a given output signal standard that should
simultaneously switch in the same direction, while maintaining a safe level of switching
noise. Meeting these guidelines for the stated test conditions ensures that the FPGA
operates free from the adverse effects of ground and power bounce.
Large-Swing Signals
Under recommended operating conditions, the User I/O and Dual-Purpose pins of
Spartan-3 generation FPGAs handle signals that swing anywhere from 1.2V to 3.3V. The
Dedicated pins of these FPGAs normally use the LVCMOS25 standard. To handle signals
with a larger swing than is ordinarily recommended, see the guidelines in XAPP459,
Eliminating I/O Coupling Effects when Interfacing Large-Swing Single-Ended Signals to User I/O
Pins on Spartan-3 Generation FPGAs.
Related Documents
• Power Solutions (http://www.xilinx.com/power)
• Signal Integrity
(http://www.xilinx.com/products/design_resources/signal_integrity)
• XAPP453, The 3.3V Configuration of Spartan-3 FPGAs
• XAPP623, Power Distribution System (PDS) Design - Using Bypass/Decoupling Capacitors
• UG332, Spartan-3 Generation Configuration User Guide
• Spartan-3 Generation Data Sheets
Chapter 19
100 400
3V)
90 er (VCCAUX=3.
Quiescent Pow 350
Bars: Quiescent Current (mA)
80
r (VCCAUX=2.5V)
250
60
Quiescent Power
(Suspend)
50 200
ICCINT (Suspend)
ICCINT (Suspend)
ICCINT (Suspend)
40
ICCAUX
150
ICCAUX
ICCINT
ICCINT
30
ICCAUX
ICCINT
100
20
0 0
Blank 32 LVDS 32 LVDS + 8 DCM
UG331_c19_10_022507
Figure 19-1: Effects of Suspend Mode on Example Designs Measured on Typical XC3S1400A
Figure 19-1 also shows four bars, indicating the typical quiescent current on the VCCINT
and VCCAUX supplies under normal quiescent conditions with all clocks stopped and
during Suspend mode. The associated current measurement, in mA, appears along the
left-side vertical axis. Note that the current on VCCAUX during Suspend mode is near the
base of the chart, highlighted in burgundy.
Furthermore, Figure 19-1 shows the quiescent power (current multiplied by the voltage
applied to each power rail). The associated resulting power measurement, in mW, appears
on the right-side vertical axis. On Spartan-3A/3A DSP FPGAs, VCCAUX can be either 2.5V
or 3.3V nominally. By physics, the quiescent power is lower when VCCAUX = 2.5V. Note the
significant reduction in total power consumption when the Spartan-3A FPGA is in
Suspend mode. Although the total power savings is design dependent, Suspend mode
typically reduces power consumption by 40% or more, with a minimum power savings of
about 20%.
During Suspend mode, some of the circuitry powered by the VCCAUX supply is switched
over to the VCCINT supply. Note the Blank design example in Figure 19-1. The current on
the VCCINT supply actually increases while the current on the VCCAUX supply drops
significantly! Fortunately, the total VCCINT current during Suspend remains below that
used in an active FPGA application. Furthermore, despite the increased VCCINT current,
the overall system power is reduced because current is being switched from the 2.5V or
3.3V VCCAUX supply to the 1.2V VCCINT supply.
The power savings are more pronounced in the 32LVDS and 32LVDS+8DCM examples
and both designs use circuitry that consumes current on the VCCAUX supply.
FPGA FPGA
Inputs Writable Clocked Elements Outputs
Flip-Flops SRL16 Block RAM
SUSPEND
Latches LUT RAM Constraint
SUSPEND
Constraint
Block FPGA
Inputs
Write-Protect Writable
Clocked Elements
SUSPEND AWAKE
Glitch Filter
Suspend Enable
en_suspend Filter Select
suspend_filter UG331_c19_01_113006
The FPGA can only enter Suspend mode if enabled in the configuration bitstream (see
“Enable the Suspend Feature”). Once power is applied to the system, the FPGA always
powers up and configures regardless of the value applied to the SUSPEND pin. Once
enabled via the bitstream, the FPGA unconditionally and quickly enters Suspend mode if
the SUSPEND pin is asserted. If Suspend is not enabled in the bitstream, the SUSPEND
input will have no effect and the AWAKE pin will be usable as a general-purpose I/O.
When the FPGA enters Suspend mode, all nonessential FPGA functions are shut down to
minimize power dissipation. The FPGA retains all application state and configuration data
while in Suspend mode. All writable clock elements are write-protected against spurious
write operations. All FPGA inputs and interconnects are shut down.
Each FPGA output pin or bidirectional I/O pin assumes its defined Suspend mode
behavior, which is described as part of the FPGA design using a “SUSPEND Constraint”.
The AWAKE pin goes Low, indicating that the FPGA is in Suspend mode. The DONE pin
remains High while the FPGA is in Suspend mode because the FPGA does not lose its
configuration data.
tSUSPENDHIGH_AWAKE tSUSPENDLOW_AWAKE
3 8
AWAKE Output
tSUSPEND_GWE tAWAKE_GWE
2
10
Flip-Flops, Block RAM, Write Protected
Distributed RAM
tSUSPEND_GTS tAWAKE_GTS
4 9
FPGA Outputs Defined by SUSPEND constraint
tSUSPEND_DISABLE tSUSPEND_ENABLE
5 7
FPGA Inputs,
Blocked
Interconnect
UG331_c19_02_011209
FPGA FPGA
Inputs Writable Clocked Elements Outputs
Flip-Flops SRL16 Block RAM
SUSPEND
Latches LUT RAM Constraint
SUSPEND
Constraint
Re-enable
FPGA Inputs
Set/Reset
Flip-Flops
Enable
en_sw_gsr
sw_gwe_cycle
Unlock Clocked
Elements
1 5 1,024
sw_gts_cycle
Wake-Up Activate Outputs
Timing Clock
Source
SUSPEND 1 4 1,024
sw_clk
Glitch Filter AWAKE
Suspend Enable
Filter Select drive_awake
en_suspend
suspend_filter UG331_c19_03_092606
Figure 19-4 is a block diagram showing how to exit Suspend mode using the SUSPEND
pin.
When SUSPEND goes Low, the FPGA automatically re-enables all inputs and
interconnects.
If enabled in the FPGA bitstream, all flip-flops are optionally, globally set or reset
according the FPGA design description. By default, the flip-flops are not globally set or
reset, which preserves the state of the FPGA application before entering Suspend mode.
The remaining wake-up process depends on the logic value applied to the AWAKE Pin.
Once AWAKE goes High, two user-programmable timers define when FPGA outputs are
re-enabled and when the write-protect lock is released from all writable clocked elements.
The wake-up timing clock source is also programmable.
Items 6 through 10 correspond to the markers in Figure 19-3, page 502:
6. The system drives the FPGA’s SUSPEND input Low, causing the FPGA to exit
Suspend mode.
7. The FPGA releases the inputs and interconnect, allowing signals to propagate
internally. There is no danger of corrupting the internal state because all clocked
elements are still write protected.
8. The FPGA asserts the AWAKE signal with the bitstream option drive_awake:yes. If the
option is drive_awake:no, then the FPGA releases AWAKE to become an open-drain
output. In this case, an external pull-up resistor is required or an external signal must
drive AWAKE High before the FPGA continues to awaken. All subsequent timing is
measured from when the AWAKE output goes High.
9. The FPGA switches output behavior from the specified SUSPEND Constraint to the
function specified in the FPGA application. The timing of this switch-over is controlled
by the Suspend/Wake sw_gts_cycle bitstream generation setting, which defines when
the FPGA’s internal Global Three-State (GTS) control is released. After the specified
number of clock cycles, the outputs are active according to normal FPGA application.
By default, the outputs switch over four clock cycles after AWAKE goes High. The
outputs are generally released before the clocked elements to allow signals to
propagate out of the FPGA.
10. The writable, clocked elements are released according to the Suspend/Wake
sw_gwe_cycle bitstream generator setting, which defines when the FPGA’s internal
Global Write Enable (GWE) control is asserted. After the specified cycle, it is again
possible to write to flip-flops, block RAM, distributed RAM (LUT RAM), shift registers
(SRL16), and I/O latches. By default, the clocked elements are released five clock
cycles after AWAKE goes High. Generally, the write-protect lock should be held until
after outputs are enabled.
It is good design practice to apply a Reset to any design DCMs after exiting the Suspend
mode so that the DCM will re-acquire lock. See “Momentarily Stopping CLKIN,” page 152.
Figure 19-5: UCF Constraint Defining Suspend Mode Behavior for an I/O pin
Via BitGen
Note: Setting the en_suspend bitstream option is an alternate way to enable the Suspend mode.
However, this method is not recommended because it does not automatically reserve the AWAKE pin
in the application.
bitgen -g en_suspend:Yes
The output drivers for the “true” differential I/O standards (LVDS, RSDS, mini-LVDS,
PPDS, TMDS) are high impedance, using one of the 3STATE attributes described in
Table 19-4. The DRIVE_LAST_VALUE attribute is not supported for differential output
drivers.
Treat the pseudo-differential I/O standards, such as BLVDS, LVPECL, DIFF_HSTL, and
DIFF_SSTL, as two single-ended I/O pins. All the attributes apply as for “Single-Ended
I/O Standards” although the settings must be set appropriately for the complementary
pair.
When in the high-impedance state, the differential driver pair does not conduct current to
the power or ground rails, or between adjacent pins.
Differential input receivers are disabled in Suspend mode.
Differential input termination (DIFF_TERM) is disabled when in Suspend mode.
SUSPEND Constraint
The SUSPEND constraint allows each pin to have an individually defined behavior during
Suspend mode. The available options are in Table 19-4, page 506.
UCF Example
Figure 19-6 shows an example UCF constraint that defines the Suspend mode behavior for
a specific pin. The SUSPEND constraint can be included on the same UCF line as other
constraints for a pin.
Figure 19-6: UCF Constraint Defining Suspend Mode Behavior for an I/O pin
More Information
For additional information on the SUSPEND constraint, see the Constraints Guide for the
latest software version (http://www.xilinx.com/support/documentation/dt_ise.htm).
• Constraints Guide for ISE® Software
http://www.xilinx.com/support/documentation/dt_ise.htm
en_sw_gsr:No, which means that clocked elements are not set or reset when the FPGA
awakens and all states are preserved.
~50 MHz
STARTUP_SPARTAN3A Internal
Oscillator
StartupClk sw_clk
CLK
UG331_c19_06_092606
SUSPEND Pin
When the Suspend feature is enabled (see “Enable the Suspend Feature,” page 505), the
SUSPEND pin controls when the FPGA enters Suspend mode. During normal FPGA
operation, the SUSPEND pin must be Low. When High, the SUSPEND pin forces the FPGA
into the low-power Suspend mode. Table 19-6 describes the functionality of the SUSPEND
pin.
If the Suspend feature is not enabled for an application (the application never enters low-
power mode), then connect the SUSPEND pin to GND.
Characteristics
The SUSPEND pin is an LVCMOS/LVTTL receiver, and power to the input buffer is
supplied by the VCCAUX power rail. The SUSPEND pin has no pull-up resistors during
configuration, and the PUDC_B control has no affect on the SUSPEND pin.
AWAKE Pin
The AWAKE pin optionally provides status on the Suspend power-savings mode.
The AWAKE pin can further be configured as an open-drain output (the default) or a full-
swing output driver, as shown in Figure 19-8. This behavior is controlled by a bitstream
generator (BitGen) option:
bitgen -g drive_awake:no
LVCMOS LVCMOS
12 mA 12 mA
FAST FAST
UG331_c19_07_011209
The AWAKE output pin is supplied by the VCCO power rail on bank 2 when Suspend
mode is enabled.
When drive_awake:yes, the AWAKE pin is an active output driver, equivalent to a user
I/O configured as LVCMOS, with 12 mA output drive and a Fast slew rate.
Do not use any other JTAG instructions when in Suspend mode or while transitioning into
and out of Suspend Mode. Furthermore, do not enter Suspend mode when performing a
Readback operation.
Hibernate
VCCO can be reduced to 1.0V during SUSPEND mode, but this also affects the voltage
levels for any output pin with a SUSPEND=”DRIVE_LAST_VALUE” constraint.
The FPGA’s power-on reset (POR) circuit continues to monitor the VCCINT and VCCAUX
supplies. The POR circuit does not monitor the VCCO supplies after configuration. By
default, if the VCCINT or VCCAUX supply dips below the minimum specified data sheet
voltage limit, then the FPGA restarts configuration.
Hibernate
Hibernate provides the maximum possible power savings for applications that can be
turned off for long periods of time and that can afford to lose the present application state.
Power is supplied to VCCO lines throughout the hibernation period. Figure 19-9 shows
how to put Spartan-3 generation FPGAs into Hibernate.
During the Hibernation period, the VCCINT and VCCAUX rails are turned off. Power FETs
with low “on” resistance are recommended to perform the switching action. Configuration
data is lost upon entering Hibernate; therefore, the device will reconfigure after exiting the
state.
VCCO
2.5V 1.2V Supply
Power
Control
VCCAUX VCCINT
I/O Bank
Inactive
Low Device
Spartan-3E Hi-Z
FPGA
HSWAP
I/O Bank
Active,
Powered
PROG_B Device
VCCO
Supply UG331_c19_08_092506
Holding the PROG_B input Low during the transition into Hibernation period keeps all
FPGA output drivers in a high-impedance state. Release PROG_B after re-applying power
to the VCCINT and VCCAUX rails. See “Design Considerations,” page 518 for recommended
levels on Dedicated and Dual-Purpose pins.
Hibernate
1. Turn off power to any external devices connected to a particular FPGA I/O bank. If
both the FPGA I/O bank and the external device are unpowered, there is no current
flow.
2. If the external device is powered but the FPGA I/O bank is not, then ensure that all
signals driving into the FPGA are either high-impedance (Hi-Z) or that they are under
0.5V. Both cases ensure that there is no current flow through the FPGA’s power diodes.
Voltages higher than 0.5V can turn on the power diodes. Keep the diodes off to prevent
"reverse current" from flowing into the VCCO rail.
Hibernate
PROG_B
I/Os
VCCINT
VCCAUX
VCCO
(per I/O Bank)
INIT_B
DONE Undefined
Undefined
CCLK in Master Mode
STARTUP cycles
UG331_c19_09_022507
Figure 19-10 shows the waveforms for entering and exiting Hibernate. The steps for
entering Hibernate are as follows:
1. Pull the PROG_B pin Low to force all user-I/O pins and Input-only pins into a high-
impedance state.
2. The FPGA drives the INIT_B and DONE pins Low.
3. External switches turn off the VCCINT and VCCAUX supply rails to the FPGA.
Depending on the FPGA product family and the application, it might also be possible
to turn off power to the VCCO supply rail.
• See “Extended Spartan-3A Family FPGA: Turn Off VCCO,” page 516
• See “Spartan-3E and Spartan-3 FPGAs: Maintain VCCO on I/O Banks Connected
to Powered External Devices,” page 516
4. The FPGA is now in Hibernate. While the FPGA is kept in this state, power
consumption rests at the lowest possible level.
Exiting Hibernate
The steps for exiting Hibernate are as follows.
1. Reapply power to all rails that were switched off. Apply power in any sequence.
2. Before FPGA initialization can begin, deassert PROG_B to a High logic level. The
rising transition on PROG_B must occur after turning all three power supplies back on.
3. After logic initialization, the FPGA releases the open-drain INIT_B signal. With
INIT_B High, the FPGA starts its configuration process.
4. When configuration is complete, the FPGA enters the Startup phase, asserts DONE,
and enables the I/Os, according to how the BitGen options are set.
5. The FPGA is now ready for user operation.
Design Considerations
Be aware of how various pins are powered in the application. Most user-I/O pins,
including the Dual-Purpose configuration pins, are powered by a specific VCCO supply
input. The Dedicated configuration pins are powered by the VCCAUX supply. If
disconnecting power to any of these supplies, consider how that will affect FPGA
configuration when power is re-applied.
For specific information on configuration pins and their associated power rails, refer to the
“Configuration Pins and Behavior during Configuration” chapter in UG332: Spartan-3
Generation Configuration User Guide.
If disconnecting power to VCCO or VCCAUX supplies on Spartan-3 or Spartan-3E FPGAs
during Hibernate, do not apply voltages on the pins in excess of 0.5 V to ensure that the
power diodes are kept off. This restriction does not apply to Extended Spartan-3A family
FPGAs, which have a floating N-well structure for improved hot-swap performance. For
more information, see XAPP459: Eliminating I/O Coupling Effects when Interfacing Large-
Swing Single-Ended Signals to User I/O Pins on Spartan-3 Generation FPGAs.
Chapter 20
Introduction
As I/O switching frequencies have increased and voltage levels have decreased, accurate
analog simulation of I/Os has become an essential part of modern high-speed digital
system design. By accurately simulating the I/O buffers, termination, and circuit board
traces, designers can significantly shorten their time-to-market of new designs. Identifying
signal integrity related issues at the beginning of the design cycle decreases the required
number of board fixes and increases quality.
The device data sheets provide basic information about guaranteed DC and switching
characteristics of the I/Os. However, the data sheet does not include all the information
required to determine the best board implementation for a particular application, such as
slew rates and drive strength, which are included in the IBIS model. Designers can use IBIS
models for system-level analysis of signal integrity issues, such as ringing, crosstalk, and
RFI/EMI. Complete designs can be simulated and evaluated before going through the
expensive and time consuming process of producing prototype PCBs. This type of pre-
layout simulation can reduce considerably the development cost and time to market, while
increasing the reliability of the I/O operation.
performance. SPICE simulation has a further disadvantage in that not all SPICE simulators
are fully compatible. Often, default simulator options are not the same in different SPICE
simulators. As there are some very powerful options that control accuracy, convergence
and the algorithm type, any options that are not consistent might give rise to poor
correlation in simulation results across different simulators. Also, because of the different
variants of SPICE, these models are often incompatible between simulators, thus models
must be extracted for a specific simulator.
IBIS Background
IBIS, originally developed by Intel, is an alternative to SPICE simulation. The IBIS
specification now is maintained by the EIA/IBIS Open Forum, which has members from a
large number of IC and EDA vendors. IBIS is the ANSI/EIA-656 and IEC 62014-1 standard.
For more information about the IBIS specification, see
http://www.eigroup.org/ibis/ibis.htm.
The core of the IBIS model consists of a table of current versus voltage and timing
information. This is very attractive to the IC vendor as the I/O internal circuit is treated as
a black box. This way, transistor-level information about the circuit and process details is
not revealed.
IBIS models can be used to model best-case and worst-case conditions (best-case = strong
transistors, low temperature, high voltage; worst-case = weak transistors, high
temperature, low voltage). The “fast/strong” model represents best-case conditions, while
the "slow/weak" model represents worst-case conditions. The "typical" model represents
typical behavior.
IBIS cannot be used for internal timing information (propagation delays and skew); the
timing models instead provide that information. IBIS also does not model power and
ground structures or pin-to-pin coupling. The implications are that ground bounce, power
supply droop, and simultaneous switching output (SSO) noise cannot be simulated with
IBIS models. Instead, Xilinx provides device/package-dependent SSO guidelines once
extensive lab measurements are completed. IBIS models also do not provide detailed
package parasitic information. Package parasitics usually are provided in the form of
lumped RLC data, which loses its accuracy at higher speeds. To model the package
parasitics accurately, include a transmission line with a delay of 25 ps to 100 ps and an
impedance of 65Ω.
Using IBIS models has a great advantage to the user in that simulation speed is
significantly increased over SPICE, while accuracy is only slightly decreased. Non-
convergence, which can be a problem with SPICE models and simulators, is eliminated in
IBIS simulation. Virtually all EDA vendors presently support IBIS models, and ease of use
of these IBIS simulators is generally very good. IBIS models for most devices are freely
available over the Internet making it easy to simulate several different manufacturers’
devices on the same board. Several different IBIS simulators are available today, and each
simulator provides different results. An overshoot or undershoot of ±10% of the measured
result is tolerable. Differences between the model and measurements occur because not all
parameters are modeled. Simulators for IBIS models are provided by Cadence, Hyperlynx,
Mentor, and Intusoft.
IBISWriter
A Xilinx IBIS file downloaded from the Web contains a collection of IBIS models for all I/O
standards available in the targeted device. The ISE® tool can generate IBIS models specific
to your design via the IBISWriter tool, simplifying design export into signal integrity
analysis tools. IBISWriter associates IBIS buffer models to each pin of the customer design
according to the design specification for each I/O buffer. IBISWriter outputs an IBS file that
can be used directly as an input file to your signal integrity analysis tool.
Generating design-specific IBIS files requires only three easy steps:
1. Implement your design in Project Navigator.
2. In the Processes window, under Implement Design/Place & Route, select Generate
IBIS Model and click Run. A design-specific file is generated where all input/output
pins are associated with an IBIS model.
3. Incorporate this file onto your favorite signal integrity analysis tool to perform the
desired simulations.
References
IBIS files are available at the Xilinx Download Center:
http://www.xilinx.com/support/download/index.htm
Chapter 21
Boundary-Scan Overview
Boundary-Scan testing is used to identify faulty board-level connections, such as
unconnected or shorted pins. Boundary-Scan tests allow designers to quickly identify
manufacturing or layout problems, which otherwise could be nearly impossible to isolate,
especially with high-count ball-grid packages. More recently, PLD vendors such as Xilinx
have made use of Boundary Scan as a convenient way of configuring devices, including
the Spartan-3 generation FPGA families. For details on configuration through Boundary-
Scan, see UG332, Spartan-3 Generation Configuration User Guide.
IEEE Standards
Joint Test Action Group (JTAG) is the commonly used name for IEEE standard 1149.1,
which defines a method for Boundary-Scan. JTAG compliant devices have dedicated
hardware that comprises a state machine and several registers to allow Boundary-Scan
operations. This dedicated hardware interprets instructions and data provided by four
dedicated signals: TDI (Test Data In), TDO (Test Data Out), TMS (Test Mode Select), and
TCK (Test Clock). The JTAG hardware interprets instructions and data on the TDI and TMS
signals, and drives data out on the TDO signal. The TCK signal is used to clock the process
(see Figure 21-1).
TMS 1
Exit1-DR
1
1
Exit1-IR
1
0 0
Pause-DR Pause-IR
0 0
1 1
Exit2-DR Exit2-IR
0 0
TCK 1
Update-DR
1
Update-IR
1 1
0 0
Select Data
Instruction Decoder Register
TDO
Bypass[1] Register
IDCODE[32] Register
Boundary-Scan[N] Register
In the Spartan-3 generation FPGAs, the four JTAG signals TDI, TDO, TMS, and TCK are on
dedicated pins powered by VCCAUX. Each can be configured with a pullup (default),
pulldown, or neither, through bitstream options. IEEE 1532 is a superset of the IEEE 1149.1
JTAG standard. IEEE 1532 provides additional flexibility for configuring programmable
logic devices. IEEE Std 1532 enables designers to concurrently program multiple devices,
minimize programming times with enhanced silicon features, and produce robust systems
that are more easily maintained. This standard defines the three additional items required
to configure in-system programmable logic devices:
• Device architectural components for configuration
• Algorithm description framework
• Configuration data file
General information on the IEEE 1532 JTAG standard is available at:
http://www.xilinx.com/products/design_resources/config_sol/isp_standards_specs.htm
Boundary-Scan Functions
Spartan-3 generation devices support the mandatory IEEE 1149.1 commands, as well as
several Xilinx vendor-specific commands. The EXTEST, INTEST, SAMPLE/PRELOAD,
BYPASS, IDCODE, USERCODE, and HIGHZ instructions are all included. The TAP also
supports internal user-defined registers (USER1, USER2) and configuration/readback of
the device.
Boundary-Scan Tools
Boundary-Scan Tools
Boundary-Scan testing requires specialized test equipment and software. The Boundary-
Scan test software is used to generate test vectors, which are typically delivered to the
Boundary-Scan chain using a test pod connected to a PC.
To develop vectors for Boundary-Scan testing, the test software must be provided with
information about the scan chain:
1. The composition of the scan chain - how many devices, what type, and so forth.
The chain composition can be either specified by the user or automatically detected by
the Boundary-Scan software.
2. The Boundary-Scan architecture of each device - the Instruction Register length,
opcodes, number of I/Os, and how each of those I/Os behaves.
The Boundary-Scan architecture of each device is defined in a Boundary Scan
Description Language (BSDL) file.
3. How the device I/Os are connected to each other.
This information typically is extracted from a board-level netlist.
BSDL Files
Any manufacturer of a JTAG-compliant device must provide a BSDL file for that device.
The BSDL file contains information on the function of each of the pins on the device -
which are used as I/Os, which are power or ground, and so forth. All Xilinx BSDL files
have file extensions of .bsd.
BSDL files for Xilinx devices are available in the development system and on the Xilinx
website at http://www.xilinx.com/support/download/index.htm. BSDL files for other
manufacturers typically can be found on the manufacturer's website.
Files for the IEEE 1532 extension to the BSDL files are also available for Xilinx products.
They are included with the other BSDL files.
IEEE 1149.1 BSDL files appear as: <device_name>.bsd
For example: xc3s50.bsd
These BSDL files are the only ones needed for programming. For JTAG testing, the
package-specific files are used.
For example: xc3s50_pq208.bsd
BSDL Files
"CCLK_P104:P104," &
"DONE_P103:P103," &
"HSWAP_EN_P206:P206," &
"M0_P55:P55," &
"M1_P54:P54," &
"M2_P56:P56," &
"PROG_B:P207," &
"TCK:P159," &
"TDI:P208," &
"TDO:P158," &
"TMS:P160," &
"VCCAUX:(P17,P38,P69,P89,P121,P142,P173,P193)," &
"VCCINT:(P70,P88,P174,P192)," &
"VCCO0:(P188,P201)," &
"IO_P2:P2," &
"IO_P3:P3," &
5. use statements
The use statement calls VHDL packages that contain attributes, types, constants, and
others that are referenced in the BSDL File.
Example (from the xc3s50_pq208.bsd file):
use STD_1149_1_1994.all;
6. Scan Port Identification
The Scan Port Identification identifies the JTAG pins: TDI, TDO, TMS, TCK, and TRST
(if used). TRST is an optional JTAG pin that is not used by Xilinx devices.
Example (from the xc3s50_pq208.bsd file):
attribute TAP_SCAN_IN of TDI : signal is true;
attribute TAP_SCAN_MODE of TMS : signal is true;
attribute TAP_SCAN_OUT of TDO : signal is true;
attribute TAP_SCAN_CLOCK of TCK : signal is (33.0e6, BOTH);
7. TAP description
The TAP description provides additional information on the device’s JTAG logic. Some
of this information includes the Instruction Register length, Instruction Opcodes, and
device IDCODE. These characteristics are device specific, and can vary widely from
device to device.
Examples (from the xc3s50_pq208.bsd file):
attribute COMPLIANCE_PATTERNS of XC3S50_PQ208 : entity is
"(PROG_B) (1)";
attribute INSTRUCTION_LENGTH of XC3S50_PQ208 : entity is 6;
attribute INSTRUCTION_OPCODE of XC3S50_PQ208 : entity is
"EXTEST (000000)," &
attribute INSTRUCTION_CAPTURE of XC3S50_PQ208 : entity is
"XXXX01";
attribute IDCODE_REGISTER of XC3S50_PQ208 : entity is
"XXXX" & -- version
"0001010" & -- family
"000001101" & -- array size
"00001001001" & -- manufacturer
"1"; -- required by 1149.1
8. Boundary Register description
The Boundary Register description gives the structure of the Boundary-Scan cells on
the device. Each pin on a device can have up to three Boundary-Scan cells, each cell
consisting of a register and a latch. Boundary-Scan test vectors are loaded into or
scanned from these registers.
Example (from the xc3s50_pq208.bsd file):
attribute BOUNDARY_REGISTER of XC3S50_PQ208 : entity is
" 0 (BC_2, *, controlr, 1)," &
" 1 (BC_2, IO_P161, output3, X, 0, 1, PULL0)," & -- PAD30
" 2 (BC_2, IO_P161, input, X)," & -- PAD30
Software Support
Because certain connections between the Boundary-Scan registers and pad might change,
the Boundary-Scan architecture is effectively changed when the device is configured.
These changes often need to be communicated to the Boundary-Scan tester through a post-
configuration BSDL file. If the changes to the Boundary-Scan architecture are not reflected
in the BSDL file, Boundary-Scan tests might fail.
Xilinx offers the BSDLAnno utility to automatically modify the BSDL file for post-
configuration testing. BSDLAnno obtains the necessary design information from the
routed .ncd file and generates a BSDL file that reflects the post-configuration Boundary-
Scan architecture of the device.
Use the following syntax to generate a post-configuration BSDL file with BSDLAnno:
bsdlanno [options] infile[.ncd] outfile[.bsd]
The infile is the routed (post-PAR) NCD design source file for the specified design. The
outfile[.bsd] is the destination for the design-specific BSDL file. The .bsd extension
is optional. For more details on BSDLanno including suggested user modifications to the
output file, see the Development System Reference Guide at
http://www.xilinx.com/support/documentation/dt_ise.htm
Software Support
Xilinx offers several tools for generating device files and for device programming.
Boundary-Scan test functionality is available from several third-party vendors, as noted at
http://www.xilinx.com/products/design_resources/config_sol/resource/isp_ate.htm.
iMPACT
iMPACT is a full featured software tool used for configuration and programming of all
Xilinx FPGAs, CPLDs, and PROMs. It features a series of "wizard" dialogs that easily guide
the user through the every step of the configuration process. iMPACT supports a host of
output file types including SVF. iMPACT configuration software enables users to easily
configure Xilinx FPGAs using different modes: slave serial, SPI, SelectMAP (Slave
Parallel), and JTAG IEEE 1149.1. iMPACT supports the Parallel Cable IV and Platform
Cable USB.
iMPACT features a special function in the JTAG mode to test both the operation of the cable
and the robustness of the JTAG chain. The user can test chain operation by instructing
iMPACT to write to and read back from the user code location multiple thousands of times.
It then counts the number of errors that occur in this operation. This gives the user the
opportunity to evaluate the relative robustness of the JTAG chain and the susceptibility to
noise and other influences like board layout.
For more information on iMPACT see the iMPACT help at
http://www.xilinx.com/support/documentation/dt_ise.htm
SVF Files
Serial Vector Format (SVF) is an industry-standard file format that is used to describe JTAG
chain operations in a compact, portable fashion. SVF files capture all of the device specific
programming information within the SVF instructions. SVF files are useful because
intricate knowledge of the device is not needed. The capability to create SVF files is
included in the iMPACT tool. For more information on SVF files see XAPP503 at
http://www.xilinx.com/support/documentation/application_notes/xapp503.pdf.
BSCAN_SPARTAN3A TCK
TDO1 TMS
CAPTURE
DRCK1
DRCK2
RESET
SEL1
SEL2
SHIFT
TDO2 TDI
UPDATE
UG331_c22_02_082306
Spartan-3 generation FPGAs provide hooks for two user-definable scan chains through the
USER1 and USER2 instructions. These instructions can be used to provide access to the
user design through the JTAG interface. To take advantage of the optional USER1 and
USER2 instructions, the designer must instantiate the BSCAN_SPARTAN3A macro in the
source code, and wire it to the user-defined scan chain. Only one BSCAN_SPARTAN3A
component can be used in any single design.
The BSCAN_SPARTAN3A component is generally used with IP, such as the ChipScope™
PRO analyzer, for communications via the JTAG pins of the FPGA to the internal device
logic. When used with this IP, this component is generally instantiated as a part of the IP
and nothing more is needed by the user to ensure it is properly used. If, however, custom
access is desired, the BSCAN_SPARTAN3A component can be instantiated and connected
to the design to get this functionality. All appropriate pins should be connected to the
internal logic. For details on using boundary scan in Spartan-3 generation FPGAs, see
Chapter 9, JTAG Configuration Mode and Boundary-Scan, in UG332.