0% found this document useful (0 votes)

11 views

A Time-Multiplexed: Steve Trimberger, Dean Carberry, Anders Johnson, Jennifer Wong

The document describes an FPGA architecture that allows for time-multiplexing of configurations to increase logic capacity. It stores 8 configurations on-chip and can reconfigure the entire FPGA in one clock cycle by loading from the on-chip memory. It discusses the modes of operation like logic engine mode and time share mode and how state is handled between configurations.

Uploaded by

Senthil Sivakumar

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

A Time-Multiplexed: Steve Trimberger, Dean Carberry, Anders Johnson, Jennifer Wong

Uploaded by

Senthil Sivakumar

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

A Time-Multiplexed FPGA

Steve Trimberger, Dean Carberry, Anders Johnson,

Jennifer Wong
Xilinx, Inc.
2 100 Logic Drive
San Jose, CA 95124
408-559-7778
steve.trimberger @ xilinx.com

Abstract
This paper describes the architecture of a time-multiplexed
FPGA. Eight configurations of the FPGA are stored in onchip memory. This inactive on-chip memory is distributed
around the chip, and accessible so that the entire configuration of the FPGA can be changed in a single cycle of the
memory. The entire configuration of the FPGA can be loaded
from this on-chip memory in 3011s. Inactive memory is accessible as block RAM for applications. The FPGA is based on
the Xilinx XC4000E FPGA, and includes extensions for
dealing with state saving and forwarding and for increased
routing demand due to time-multiplexing the hardware.

Memory. Time-multiplexed systems consume memory for

configuration data and for staged design data. Facilities
must be provided for this data.
Static Logic. All systems include some logic that must
always be active and cannot be multiplexed. A usable
time-multiplexed device must be able to also supply this
non-time-multiplexed logic.

This paper is divided into three sections. First is an introduction to the method of time multiplexing used in this architecture. The second section discusses modes of operation of a
time-multiplexed device. The third section describes the
device itself and highlights the device features and how they
support the modes of operation.

Background

The Basic Concept

This paper reports on architecture development project that

began in 1991. The architecture builds on the work of Ong
[1995], who proposed rapidly reconfiguring an FPGA to
increase logic capacity. Bhat [ 19931, DeHon [ 19951 and Tau
[ 19951 also described rapidly-reconfigured FPGAs. These
devices were deficient in at least one of the following critical
areas :

The time-multiplexed P G A is an extension of the Xilinx

XC4000E product family. We gain logic capacity by dynamically re-using hardware. We add SRAM bits rather than
CLBs.
The FPGA holds one active configuration and eight inactive
configurations. The configuration memory is distributed
throughout the die, with each configuration memory cell
backed by eight bits of inactive storage in the configuration
SRAM. This distributed inactive memory can be viewed as
eight configururion memory planes (figure 1). Each plane is
a very large word of memory (100,000 bits in a 20x20
device). When the device isflash reconjigured, all bits in the
logic and interconnect array are updated simultaneously
from one memory plane. This process takes about 5ns. After
flash reconfiguration, about 25ns is required for signals in
the design to settle.

Device Capacity. Fundamentally, logic is shared to

increase capacity. A base FPGA of low capacity is impractical, since a larger non-time-multiplexed FPGA is faster
and easier to use.
State Storage. Although combinational logic can be shared,
state values cannot. They must be stored or forwarded until
they are used. Simplistic solutions to this problem result in
FPGA logic resources consumed with saving state. Very
little remains for implementing logic.

Configuration memory planes can be loaded from off-chip

while the FPGA is operating. They can also be read and
written by on-chip logic, giving applications access to a single large block of RAM.

0-8186-8159-4/97 $10.00 0 1997 IEEE

Figure 2. Logic Mode1

In this mode, the FPGA is reconfigured several times per

user clock cycle -- the reconfiguration clock is faster than
the user clock. Partial results from one configuration of the
device must be saved and passed to subsequent configurations. Storage must be provided for flip flops, which cannot
be time shared, since all values are required at future times.

Time Share Mode

Figure 1. Time-Multiplexed FPGA Configuration Model
In time share (TS) mode, the FPGA emulates several independent, communicating FPGAs in a virtual hardware environment. The FPGA remains in a single configuration for
multiple user clock cycles before switching to another configuration -- the reconfiguration clock is slower than the user
clock rate. The FPGA may reconfigure at irregular intervals
or upon interrupt. To support virtual logic in time share
mode, values computed in one configuration must be stored
and shared with logic in other configurations. When a con-.
figuration is re-loaded into the FPGA, its state must be
restored so it can resume operation as if it had never been
removed.

Modes of Operation
A rapidly-reconfigurable FPGA is a mere curiosity without a
model of use that can be used as a design target and automated. We envision three modes of operation of the device:
logic engine mode, time-share mode and static mode.

Logic Engine Mode

In logic engine (LE) mode, the time-multiplexing capability
of the FPGA is used to emulate a single large design. Designs
are modelled as Mealy state machines (figure 2). Combinational logic receives inputs from the device inputs and from
flip flop outputs; and device outputs come from combinational logic and from flip flops. The combinational logic can
be split into pieces and LUTs in the FPGA can be time-multiplexed during the calculation of those results.

Static Mode
In static mode, the FPGA, or part of it, does not appear to be
reconfiguring at all. Static mode is used to build logic that
must always be resident and active -- for example the logic
that controls the time share or logic engine sequencing, or
asynchronous logic. We implement static logic by programming memory cells of multiple configuration bits to be identical. When the new configuration is loaded, it operates
identically. We ensure that reconfiguration does not glitch
control points if they are unchanged, so the emulated logic
operates without interruption.

When operating in logic engine mode, the FPGA sequences

through multiple configurations called microcycles. The
sequencing of microcycles is synchronized with the users
clock. One pass through all microcycles is called a user
cyde. All combinational logic is evaluated and all flip-flop
values are updated in one user cycle.

Partial reconfiguration is not sufficient to support static

logic, since dynamic logic interconnect may pass through

Figure 3. Time Share Model

static logic regions on the chip. A design where a region of

the device is not reconfigured would allow static logic to
exist, but such a design is too restrictive because it prevents
device resources in the static region from being accessed by
dynamic logic.
Memory Access
In addition to these operation modes, a configuration memory plane can be used as a block RAM of approximately
100,000 bits. The R A M mode allows user designs to read and
write the memory directly, leading to the ability to create
self-modifying hardware. Although this mode of operation is
intriguing, we can see only one application: building a clever
loader (perhaps decrypting data as it configures the plane).
Mixed Modes
We expect that these operation modes will be mixed on the
chip, as shown in figure 4. A single application may have a
few memory planes used as memory, and the logic part of the
array split between static logic and time-shared logic or logic
engine logic. A common occurrence of mixed modes is on
the chip outputs in Logic Engine mode: the output side of the
10s must be static to properly emulate the outputs of the
design. However, inputs may be dynamic, cycling in logic
engine mode.

Figure 4.Mixed Modes

--7+

F4
F3
F2

Figure 5. Time Multiplexed FPGA Architecture

Logic can access all values stored in micro registers from

any configuration. The micro registers are multiplexed with
the CLBs combinational and sequential results onto the
CLB outputs. Like all other configurable features of the
chip, those multiplexers are controlled by multiple configuration memory cells. Micro register signals are routed
through normal programmable interconnect to their destinations.

Configurable Logic Block

As stated previously, the reconfigurable FPGA is based on
the XC4OOOE device. Figure 5 shows a block diagram of the
CLB. All configuration points in the device are backed by
eight memory cells, as shown by the shadowed boxes. All
interconnect points are similarly backed by memory cells. A
significant difference between this CLB and the XC4OOOE is
the addition of the micro registers near the CLB outputs on
the right side.

In logic engine mode, micro registers store intermediate values and flip flop values. This is a critically important capability. Although combinational logic can be multiplexed
among several functions, state storage cannot. As a result,
although eight LUTs in a design may share a single physical
LUT, eight flip flops must all be provided for subsequent
logic to access the results. The storage for these flip flops is
in the micro registers.

Micro Registers
A micro register stores the CLB output (either the combinational or sequential output) when the P G A changes configuration. This state saving is automatic, though each micro
register can use the CLBs clock signal as a clock enable,
which can be used to control state saving. The state-saveenable feature is used to support multiple clocks in logic
engine mode.

In time share mode, the micro registers serve to pass data

from one configuration to another, and to save the state of a

Configuration Controller

swapped-out configuration for restoration when it is re-activated.

Several options on reconfiguration are controlled by the

reconfiguration controller. The controller supports time
share and logic engine modes.

In figure 5 , access outside the CLB is limited to three of the

eight micro registers during any one micro cycle. This
restriction was derived empirically. We built an optimizing
scheduler to partition many designs onto a time-multiplexed
FPGA model. The results indicated that three outputs were
sufficient to allow required access to micro registers without
over-constraining placement.

Time Share Mode

To support timeshare mode, flash reconfiguration can be initiated by an external or internal signal. The address of the
new configuration plane can come from an internal or external source. Therefore, there is no restriction on which configuration is the next to be run. The reconfiguration
operation proceeds as follows:

Interconnect
Like CLB configuration cells, all configuration cells that control interconnect are backed by eight inactive memory cells.

1. Save all CLB flip flop values in micro registers

2. Load the new configuration

Fundamentally, the signals routed from the micro registers to

their destinations represent additional nets that must be
routed, increasing wiring demand. Therefore, additional
interconnect is required on the device. The interconnect
capacity in the time-multiplexed FFGA is shown in table 1.
This capacity chosen to provide a 97% probability of place
and route success for a full 20x20 array of CLBs with additional interconnect demand due to logic engine wiring. A
high success rate is required for a single configuration
because a single design may be composed of up to eight configurations, all of which must route successfully for the
design to operate.
Vertical

Horizontal

Quads

Octals

3. Restore CLB flip flop values from micro registers

Flip flops are restored to allow a swapped-out configuration

to resume operation where it ended, allowing us to swap
virtual logic in and out of the device.
Logic Engine Mode
In logic engine mode, the sequence of reconfiguration is
known in advance. The speed of reconfiguration is critical,
since multiple configurations are required to complete a single cycle in the emulated design. The controller includes a
next-address calculator. The controller reads the memory
plane at that address and holds it, awaiting the completion of
the current micro cycle.
The reconfiguration operation (microcycle) proceeds as follows:
1. Save all CLB outputs in micro registers.
2. Activate the new configuration.

3. Perform a user memory access (if any).

Table 1. Interconnect Summary.

4. Pre-fetch the next configuration.

User Memory

User memory access (#3) and pre-fetching the next configuration (#4) can be pipelined with the operation of the logic
in the configuration (#2). Steps #3 and #4 are both memory
operations, and both used the configuration memory bus.
Therefore they must be serialized. However, since each
memory operation takes about 5ns, the delay for these operations can be completely hidden.

A single memory access port located in the center of the chip

uses the configuration address and data lines to access the
configuration memory without interfering with FPGA logic.
The address is decoded into a memory plane and a reference
within the plane. The memory access port allows 8-bit, 16-bit
or 32-bit access to the configuration memory.

Notice the CLB flip flop values are not restored. The CLB
flip flop is not used in LE mode -- flip flop values are stored
in micro registers.

Conceptually, all micro cycles are of the same duration, but

in practice, the length of a micro cycle is the settling time -the amount of time required for values to propagate from
micro registers through combinational logic and set up the
next-stage micro registers. This delay includes routing delay,
which is dependent on the placement and routing of the logic
plane. Therefore, each microcycle includes a duration
counter, which determines how long the FPGA remains in
the current micro cycle before proceeding to the next one.
The duration is set by software after placement and routing
is complete. The software timing verifier determines the
length of the longest path after routing and sets the microcycle duration accordingly.

For flash reconfiguration, a single word line (Wn) is brought

high, enabling the corresponding memory cell onto the bit
line. When the bit line is stable, the latch is clocked to store
the new configuration value. Typically, the bit lines for the
same memory plane are read simultaneously for every control point.
The timing of pClock ensures that the bit line is stable
before the latch becomes transparent -- this allows glitchfree transitions from one configuration to the next when the
two values are the same -- a requirement for static logic.
The storage latch lets memory operations on the inactive
memory proceed without affecting the active configuration.
These operations include loading a memory plane with configuration data or design data from an outside source, and
accessing the memory through the user memory port on the
chip.

Static Mode
Since static mode is supported by programming some bits
identically, the controller has no special sequencing to support it. The control over restoring flip flops is actually stored
as bits in each CLB, so state restoration is permitted on a
per-CLB basis. This flexibility is required to allow static flip
flops to be distributed among time-shared logic. The CLBs
containing static flip flops are not overwritten during reconfiguration, while those in time-shared logic are overwritten.

The latch also allows us to pipeline configuration fetch with

operation of the device. We can pre-fetch the next configuration of the device, then instantly switch to the next configuration by clocking the latch, This is most useful in logic
engine mode, where the configuration proceeds through a
pre-defined sequence of steps, and the next configuration
address is always known.

Configuration Memory Design

Power Consumption

Figure 6 shows the circuitry for the configuration memory.

Eight SRAM cells are connected to a single bit line for the
cell. The current control value is held in the latch. To write a
memory cell, a word line (Wn) is brought high allowing the
data value on the bit line to overwrite the corresponding
memory cell (MCn). Typically, only one word line on the
chip is high during a memory write.

Configuration/User
Memory Data

Power consumption during reconfiguration can be very

high. Although each bit line has a very small capacitance,
there are 100,000bit lines in a 20x20 array. Further, the signals on time-multiplexed interconnect on the FPGA is not
expected to auto-correlate from cycle to cycle, as is the case
from cycle to cycle in a traditional P G A . A logic engine
design operating at 40MHz can consume tens of watts. We

I
b,it

Figure 6. Memory Circuit

1-1

Latch

+to CLB/lnterconnect
Control

Architectural and circuit innovations include:

addressed power consumption two ways. First, we lowered

the voltage swing on the bit lines. Secondly, we reduced the
number of memory cells required to configure the device.
However, power consumption remains a concern, particularly with larger devices.

Micro registers for state storage and forwarding partial

results. Micro registers also hold state for restoration of
logic in a virtual hardware environment.
A configuration controller that sequences configurations
intelligently for logic engine and timeshare mode.
Storage for the active configuration to allow pipelining of
configuration memory fetch with FPGA operation and for
allowing the configuration storage to be used as a block
memory efficiently.

Layout
As shown in figure 7, the design consists of columns of memory blocks, interleaved with logic. Each memory block contains 16 bits of memory, eight of which control field
programmable logic on the left, eight on the right. The area
between the memory cell columns is used to build the fieldprogrammable logic: lookup tables, micro registers and programmable interconnect.

References
N. Bhat, K. Chaudhary, E.S. Kuh, Performance-Oriented
Fully Routable Dynamic Architecture for a Field Programmable Logic Device, M93/42, U.C. Berkeley, 1993.

Memory columns

A. DeHon, DPGA-Coupled Microprocessors: Commodity

ICs for the 21st Century, IEEE Workshop on FPGAs for
Custom Computing Machines, 1995.
CLd

CLE

IOB

D. Gajski, N. Dutt, A. Wu, S. Lin, High Level Synthesis:

Introduction to Chip and System Design, Kluwer Academic
Publishers, 1994.

R. Ong, Programmable Logic Device Which Stores More

Than One Configuration and Means for Switching Configurations, U.S. Patent 5,426,378, 1995.

I I
Figure 7. Chip Layout Floorplan. Upper Right Corner

E. Tau, D. Chen, I. Eslick, J. Brown, A. DeHon, A First

Generation DPGA Implementation, FPD 94 - Third Canadian Workshop on Field-Programmable Devices, 1995.

The FPGA area is dominated by the memory and memory

overhead circuitry. To reduce area, we reduced the memory
cell count wherever possible. For example, the control bits
for all multiplexers are fully encoded, reducing the memory
cell count, which reduced the area and power consumption.

S . Trimberger, Mapping Large Designs into a Time-Multiplexed FPGA, private communication, 1997.
Xilinx, The Programmable Logic Data Book, 1996.

The layout was done in 0.5pm CMOS. The chip has not yet
been fabricated.

Summary
This paper describes an architecture and modes of operation
of a time-multiplexed FPGA. Modes of operation include:
Logic engine mode, where the device emulates a single
large FPGA.
Time share mode, where the device emulates several communicating FPGAs.
Static mode, where the logic remains active and unchanged
during configuration.

Sega Saturn Architecture: Architecture of Consoles: A Practical Analysis, #5
From Everand
Sega Saturn Architecture: Architecture of Consoles: A Practical Analysis, #5
Rodrigo Copetti
No ratings yet
A Time-Multiplexed: Steve Trimberger, Dean Carberry, Anders Johnson, Jennifer Wong
No ratings yet
A Time-Multiplexed: Steve Trimberger, Dean Carberry, Anders Johnson, Jennifer Wong
7 pages
Design and Implementation of Traffic Controller Using VHDL: Interim Project Report
No ratings yet
Design and Implementation of Traffic Controller Using VHDL: Interim Project Report
21 pages
Traffic Control Using VHDL
No ratings yet
Traffic Control Using VHDL
18 pages
Design of Embedded Processors
No ratings yet
Design of Embedded Processors
80 pages
FPGA Technology
No ratings yet
FPGA Technology
26 pages
Introduction To Field Programmable Gate Arrays (Fpgas)
No ratings yet
Introduction To Field Programmable Gate Arrays (Fpgas)
8 pages
CPLD and FPGAs
No ratings yet
CPLD and FPGAs
29 pages
FPGA Kitap BLM
No ratings yet
FPGA Kitap BLM
30 pages
Programmable Logic Arrays
No ratings yet
Programmable Logic Arrays
5 pages
Fpga Questions
No ratings yet
Fpga Questions
10 pages
FPGA: Field Programmable Gate Array
No ratings yet
FPGA: Field Programmable Gate Array
5 pages
Fpga & It's Architecture
100% (6)
Fpga & It's Architecture
39 pages
FPGA Fundamentals - National Instruments
No ratings yet
FPGA Fundamentals - National Instruments
5 pages
Ritu
No ratings yet
Ritu
16 pages
Main Memory 64kB Block Size 8 Bytes Direct Mapped Cache - 32 Lines
No ratings yet
Main Memory 64kB Block Size 8 Bytes Direct Mapped Cache - 32 Lines
34 pages
Fpga Tutorial
No ratings yet
Fpga Tutorial
10 pages
Part2 PDF
No ratings yet
Part2 PDF
6 pages
Fpga Interview Questions
No ratings yet
Fpga Interview Questions
9 pages
FPGA Presentation
No ratings yet
FPGA Presentation
57 pages
Unit VI
No ratings yet
Unit VI
26 pages
Lec 1
No ratings yet
Lec 1
25 pages
I F P G A (Fpga) : Ntroduction To Ield Rogrammable ATE Rrays S
No ratings yet
I F P G A (Fpga) : Ntroduction To Ield Rogrammable ATE Rrays S
13 pages
Fpga Interview Question
100% (1)
Fpga Interview Question
35 pages
Image Processing Using VHDL
No ratings yet
Image Processing Using VHDL
36 pages
Fpgas Design Ebook Emea Emeaen
No ratings yet
Fpgas Design Ebook Emea Emeaen
19 pages
FPGAs Memory Synchronization and Performance Evaluation Using The Open Computing Language Framework
No ratings yet
FPGAs Memory Synchronization and Performance Evaluation Using The Open Computing Language Framework
8 pages
Unit 2
No ratings yet
Unit 2
82 pages
FPGA Basics
100% (8)
FPGA Basics
11 pages
Sec5-Fpga - Part2
No ratings yet
Sec5-Fpga - Part2
63 pages
What Is FPGA and Its Applications?
No ratings yet
What Is FPGA and Its Applications?
5 pages
Vlsi Unit-5
No ratings yet
Vlsi Unit-5
21 pages
Introduction To FPGA: Architecture
No ratings yet
Introduction To FPGA: Architecture
17 pages
Fpga
No ratings yet
Fpga
6 pages
FPG A
No ratings yet
FPG A
29 pages
Spartan 6 Slice and Io Resources
No ratings yet
Spartan 6 Slice and Io Resources
31 pages
CHAPTER 16
No ratings yet
CHAPTER 16
56 pages
DSUV Imp Questions
No ratings yet
DSUV Imp Questions
28 pages
Group 4 Activity
No ratings yet
Group 4 Activity
8 pages
0 RTL2GDS Synthesis Intro
No ratings yet
0 RTL2GDS Synthesis Intro
153 pages
lec5-FPGA
No ratings yet
lec5-FPGA
46 pages
VLSI IMP
No ratings yet
VLSI IMP
26 pages
Iffat Anjum (Roll: 16) Tabassum Tahrin Trisha (Roll: 32) Bashira Akter Anima (Roll: 48) Tamanna Yasmin (Roll: 49)
No ratings yet
Iffat Anjum (Roll: 16) Tabassum Tahrin Trisha (Roll: 32) Bashira Akter Anima (Roll: 48) Tamanna Yasmin (Roll: 49)
41 pages
Anjali Kumari Report
No ratings yet
Anjali Kumari Report
8 pages
VIT WKSP
No ratings yet
VIT WKSP
75 pages
CHAPTER 16
No ratings yet
CHAPTER 16
56 pages
fgpa_course
No ratings yet
fgpa_course
2 pages
Chapter 16
No ratings yet
Chapter 16
60 pages
FPGA & Reconfigurable Computing: Lecture-2
No ratings yet
FPGA & Reconfigurable Computing: Lecture-2
50 pages
Introduction_to_PLD (2)
No ratings yet
Introduction_to_PLD (2)
78 pages
Tutorial On FPGA
No ratings yet
Tutorial On FPGA
16 pages
An Efficient Cordic Processor For Complex Digital Phase Locked Loop
No ratings yet
An Efficient Cordic Processor For Complex Digital Phase Locked Loop
7 pages
FPGA Questions and Answers
No ratings yet
FPGA Questions and Answers
3 pages
Preliminary Specifications: Programmed Data Processor Model Three (PDP-3) October, 1960
From Everand
Preliminary Specifications: Programmed Data Processor Model Three (PDP-3) October, 1960
Digital Equipment Corporation
No ratings yet
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
Franco Mario
No ratings yet
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
From Everand
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
Rodrigo Copetti
No ratings yet
Dreamcast Architecture: Architecture of Consoles: A Practical Analysis, #9
From Everand
Dreamcast Architecture: Architecture of Consoles: A Practical Analysis, #9
Rodrigo Copetti
No ratings yet
GameCube Architecture: Architecture of Consoles: A Practical Analysis, #10
From Everand
GameCube Architecture: Architecture of Consoles: A Practical Analysis, #10
Rodrigo Copetti
No ratings yet
PostgreSQL Replication - Second Edition
From Everand
PostgreSQL Replication - Second Edition
Hans-Jurgen Schonig
No ratings yet
PlayStation 2 Architecture: Architecture of Consoles: A Practical Analysis, #12
From Everand
PlayStation 2 Architecture: Architecture of Consoles: A Practical Analysis, #12
Rodrigo Copetti
No ratings yet
Title As: 1 First Section
No ratings yet
Title As: 1 First Section
2 pages
Title As: 1 First Section
No ratings yet
Title As: 1 First Section
2 pages
1.cmos Inverter
No ratings yet
1.cmos Inverter
4 pages
Advert SCEE 2018
No ratings yet
Advert SCEE 2018
3 pages
Installation Steps PDF
No ratings yet
Installation Steps PDF
1 page
Conversion of Schematic To Layout
No ratings yet
Conversion of Schematic To Layout
3 pages
Preparation of Papers For The "Accs and Peit Conference": First A. Author Second B. Author
No ratings yet
Preparation of Papers For The "Accs and Peit Conference": First A. Author Second B. Author
4 pages
Lalanne Sene Vcos
No ratings yet
Lalanne Sene Vcos
23 pages
OOMD Chapter 1 PDF
No ratings yet
OOMD Chapter 1 PDF
6 pages