0% found this document useful (0 votes)
54 views

Plasma: An FPGA For Million Gate Systems: 1. Abstract

This document describes Plasma, an FPGA developed for the Teramac custom computing machine. Plasma can execute synchronous logic designs of up to one million gates at rates of one megahertz. It uses a unique architecture and compiler that allows designs to be mapped to the hardware automatically in about two hours, overcoming limitations of commercial FPGAs. The flexibility of configurable computers like Teramac makes them well-suited for architectural exploration and testing custom designs.

Uploaded by

Decky Setiyanto
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views

Plasma: An FPGA For Million Gate Systems: 1. Abstract

This document describes Plasma, an FPGA developed for the Teramac custom computing machine. Plasma can execute synchronous logic designs of up to one million gates at rates of one megahertz. It uses a unique architecture and compiler that allows designs to be mapped to the hardware automatically in about two hours, overcoming limitations of commercial FPGAs. The flexibility of configurable computers like Teramac makes them well-suited for architectural exploration and testing custom designs.

Uploaded by

Decky Setiyanto
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Plasma: An FPGA for Million Gate Systems

by R. Amerson, R. Carter, W. Culbertson, P. Kuekes, G. Snider


Hewlett-Packard Laboratories
Lyle Albertson
Hewlett-Packard California Design Center
Palo Alto, California 94304

ond. Look-up tables perform the actual logic.


1. Abstract Engineers contemplating building special purpose hardware
Prototypes are invaluable for studying special purpose
for solving unique problems must first create and test an appro-
parallel architectures and custom computing. This paper de- priate architecture for solving the problem. Developing a custom
scribes a new FPGA, called Plasma— the heart of a configur- architecture can require considerable experimentation. The
able custom computing engine (Teramac) that can execute flexibility of configurable computers makes them ideal for per-
synchronous logic designs up to one million gates at rates up to forming such architectural exploration. When accompanied by
one megahertz. Plasma FPGA’s using 0.8 micron CMOS are good software tools, configurable computers can be quickly con-
packaged in large multichip modules (MCMs). A large custom figured to implement a proposed architecture and the resulting
circuit may be mapped onto the hardware in approximately two machine can be used for benchmarking and customer trials.
hours, without user intervention. Plasma introduces some inno- Parameterized designs can be created with the aid of good design
vative architecture concepts including hardware support for synthesis software, enabling a multitude of designs to be gener-
large multiported register files. ated and tested on Teramac for an easier search of the design
Keywords: FPGA, custom computing, register files space.
The flexibility of configurable computers makes them cost
2. Introduction effective when the construction of special purpose hardware is
Plasma, an acronym for “Programmable Logic And Switch
Matrix,” is the key component of a large custom computing ma-
chine (CCM), Teramac[1]. The Plasma chip, designed specifi-
cally to address issues important to CCMs, completes a 100%
fully automatic route in approximately three seconds. Its unique
marriage of architecture and compiler give it this remarkable
performance. Commercial FPGAs, such as those in the Xilinx
3000 and 4000 series, require two to four orders of magnitude
greater place and route time, often failing to achieve 100% route.
Connecting them into a network of hundreds or thousands of
chips to create a single programmable system is difficult[2], with
design changes taking days or months. The custom Plasma chip
introduces innovations in wiring capacity, compiling speed,
functionality and robustness that overcome many of the limita-
tions of other approaches.

2.1 Teramac
Teramac (Figure 1) is a test vehicle for computer architects
at Hewlett-Packard’s Research Laboratories. An engineer can
quickly synthesize a custom architecture for a specialized prob-
lem and test the design at high speed using Teramac. The name
derives from tera (1012) and “Multiple Architecture Computer.”
With this tool the architect can create a machine with a million
boolean functions of two variables being simultaneously evalu-
ated at one megahertz— a trillion very small operations per sec-

Figure 1: Four Board Teramac System


not. It is possible to build a virtual machine that is larger than close. Additionally, our users need access to the same tools the
the physical machine. Configurable computers make such solu- vendors provide their customers, which we would need to license
tions feasible since they can be reconfigured dynamically while or develop.
they solve a single problem[3]. Thus, several phases of the solu-
tion of a problem can be accelerated, each by a separate configu- 3.3 Compiler
ration. Very efficient use is made of the hardware since the it is
The desire for fast compilation of large designs inspired
reused during each phase.
separating the problem of mapping a user circuit onto program-
mable hardware into smaller independent subproblems that could
3. Design Considerations be attacked with O(n) algorithms. This separation guided the
Teramac is a system for architecture exploration. The most hardware in directions that allowed for fast compilation. As
important issues are compile time, flexibility, debug, and robust- implementation problems developed or the need for choosing
ness. The dominant driving factor to achieve these is system parameters arose, the compiler then became a tool for exploring
interconnect since no level of sophistication in the remaining alternatives and refinement.
system elements can compensate for an inadequate interconnect
capacity. The time to map a design to the system is critical. 3.3.1 Compiler-First Design
Teramac goals were two hours to map a design from scratch, a Early in the project we recognized fully automatic place and
few minutes for simple changes. Flexibility to adapt to multiple route of a million gates is key to success. Before any hardware
architectures is important too. Custom hardware for specialized was designed, we created a compiler to place and route a million
functions is anathema. If a number of architects share the sys- gates using a variety of hypothetical interconnection schemes.
tem, each with a different design, custom boards cannot be Not surprisingly, the interconnect topologies that the compiler
changed each time another uses it. Designs must be easy to de- found easy to use had a very large number of wires available.
bug, allowing probing and modifying of any signal. The tool Consequently, this need drove the architecture of the chip.
chain must handle designs with ease and no user intervention.
One early experiment with the compiler found about half of
Further, users should not worry about whether the system accu-
the routing resources were used across the system and about
rately represented a design. Bugs in designs are expected and
three-quarters of the PALEs were used. One might think this
should not be able to cause physical damage to the system.
indicates an abundance of routing resources compared to PALEs,
but it does not take into account the heavy use of routing re-
3.1 Rent’s Rule sources in congested areas.
Richard Rent observed that the number of signals crossing
the boundaries of a partition of a logic design has a power law 3.3.2 Compilation Time
dependence on the number of gates in the partition[4]. The ex- Placement and routing time for standard FPGA’s is much
ponent was found to range from 0.5 to 0.7 for a variety of de- longer than desired for a custom computing machine (CCM).
signs. Our experiments with a number of custom designs Although the place and route times for a single chip have de-
confirmed this. Teramac follows Rent’s rule with an exponent of creased from overnight to minutes, this is still much greater than
approximately two-thirds. the three seconds we achieve using Plasma. Even at ten minutes
of compilation time per chip, placing and routing 1728 off-the-
3.2 Commercial Chips shelf FPGAs will consume twelve days. This can be distributed
Commercially available chips do not meet the stringent across many workstations (if they are available) reaching a limit
Rent’s rule requirement. The target market for these vendors of ten minutes when 1728 workstations are used. System level
values lower cost above high pinout. Customers are willing to partitioning, placement and routing, cannot be distributed. For
trade long compile times during design for lower cost, higher the Teramac, this takes about one hour on an HP 735 worksta-
density chips. Modifying a design by hand is an acceptable proc- tion.
ess. No commercial parts met our need, so we created a custom
FPGA to ensure that we had enough wires on the chip and
enough I/Os off the chip. Bare die are also needed to build a
4. Chip Description
system as large as Teramac. Plasma is a routing-rich FPGA consisting of lookup tables
(LUT) with configurable latches and registers on their outputs,
The tremendous connectivity of Teramac places special re- interconnected by crossbar switches. It has many features mak-
quirements on the chips. At the system level, 300,000 wires ing it ideal for reconfigurable computing applications: each chip
between 1728 chips are balanced among MCM[5], board, and has 336 programmable signal pins; placement and routing are
inter-board wiring. State of the art CAD and manufacturing insensitive to signal pin assignments; compilation time is meas-
technology were used to develop an MCM containing twenty- ured in seconds; multiported register files are efficiently sup-
seven Plasma chips connecting over 9000 user signal pins. ported; raw die are available.
Building the MCM requires tested, bare die; HP’s internal fab
capability provides this. External vendors were unwilling to sell 4.1 Programmable Atomic Logic Elements
raw die that could meet our requirements.
(PALE)
The vendors we considered for supplying the FPGA’s had The basic primitive for implementing logic is the PALE. It
proprietary formats for internal data they were unwilling to dis- consists of a six-input, two-output universal gate, each output of
which is followed by a pair of latches that can be configured as One and two output PALEs occupy about the same areas for
an edge triggered flip-flop. The LUT can be configured to im- the LUTs, but the two output PALE requires approximately half
plement any six-input boolean function. The PALE structure is as large a hextant crossbar. Three output PALEs required
shown in figure 3. Each LUT output drives a PALE output somewhat more area to implement the same design, so we chose
block, which drives the PALE output. The output block can be six-input, two-output PALEs. We did not examine PALEs with
configured to pass its input directly to its output or pass the input LUTs having a combination of shared inputs and unique inputs.
to the output via a level- or edge-triggered register. Other research has shown this may be a more optimum configu-
ration[7]. In this configuration PALE area is approximately 10%
Plasma is hierarchically structured. At the bottom level are
of the total chip area; the impact of the decision on the size of the
PALEs. Sixteen PALEs connect into a crossbar. Each PALE has
crossbars was more important than on the size of PALEs.
eight connections into the crossbar– six input and two output.
The sixteen PALEs make a total of 128 lines. For space reasons,
the PALE outputs are partially populated, driving half of the 4.1.2 Crossbar sizing
crossbar lines. Rent’s rule projects between 32 and 56 wires needed to
connect to the central crossbar from each hextant for exponents
Tri-state logic is not directly supported since errors in user of 0.5 to 0.7. Additional wires are needed to connect between
designs could unintentionally damage the chip. Instead, a PALEs within the hextant that do not connect outside. Empirical
straightforward mapping to an AND-OR structure gives the studies found few designs requiring over 100 lines, so 100 was
equivalent for most cases. It also allows detecting the case of felt to be adequate for most designs.
multiple outputs enabled simultaneously, alerting users to design
bugs.

4.1.1 Look Up Table sizing

Low Pass Filter

LUT L Q L Q
180%

M UX
160%

140%
1 Output
2 Outputs
120% Figure 3: 1/2 PALE Logic
PALEs as % Total Chip Area

3 Outputs

100%
4.2 Interconnect
80%
Plasma contains 256 PALEs, organized into sixteen groups,
called hextants, of sixteen PALEs each. Eight PALEs bound
60%
each side of the hextant, and eight hextants bound each side of a
central crossbar (Figure 4). These two levels of crossbars allow
40% (1) interconnections of PALEs within the same set; (2) intercon-
nections of PALEs in different sets; and (3) interconnections of
20% PALEs and I/O signal pins. I/O signal pins are connected to the
top and bottom of the central crossbar.
0%

2 3 4 5 6 7 8 9 10 The sixteen hextant crossbars connect into a central crossbar


# Inputs/PALE that is 400 lines wide and 1600 lines deep. This crossbar is one-
quarter populated because of space considerations. Each of the
Figure 2: PALE count vs. PALE inputs 100 wires from a hextant connects to 100 of the 400 central
crossbar lines. The central crossbar is divided into four sections
The choice of a 6-input, 2-output LUT was made after with buffers between to speed signals. Each line segment con-
evaluating several designs from the standpoint of their chip area nects 100 hextant crossbar lines. Alternatively, one could view
used versus size of PALE. An example (low pass filter) design the central crossbar switch array as sixteen, interleaved, fully
in Figure 2 shows a small area increase for the PALEs going populated, 100X100 crossbar arrays.
from two to four input PALEs, some increase to six inputs, and
noticeable gain beyond that. Delay through a 6-input LUT is The dense population of switches in the crossbars slows the
only slightly longer than through a 4-input LUT. Since more chip. It speeds the compiler significantly, since a routing path is
levels of logic are packed in the 6-input version, this can reduce nearly always available. Knowing the difficulty of routing com-
circuit delay. This is consistent with results from other research- mercial chips having significantly less connectivity, we consid-
ers[6]. ered a trade of compile speed for execution speed a good choice.
6 Bits
4.3 Configuration Memory Address
The configuration memory is integrated with switches in the
1 Bit
silicon layout. A 5-T memory cell is used for configuration bits Enable-in
to save area taken up by the extra bit line of 6-T cells. We use an
6 Bits
n-channel device for crossbar switches. No signal ever goes
Address
through more than two switches without being buffered. The 5-T 1 Bit
memory cells are designed so that it is difficult to read 0 and Data-in
write 1. This noise margin is improved by having a slow word 6 Bits
line turn-on to avoid flipping the cell on read. Extensive SPICE Address
simulation proved design robustness. 1 Bit
Data-in
Signal Pins
.
Crossbar 6 Bits
. 8 Write
.
Hextant Address Ports
1 Bit
Enable-in
PALE 6 Bits
Address
1 Bit
Data-in
Partial 6 Bits
Crossbar Address
Register Files
1 Bit
Data-in
64 bits 64 bits 64 bits

6 Bits
Register File
Address 64 bits 64 bits
1 Bit
Data-out
6 Bits
Signal Pins Address
1 Bit 8 Read
Data-out
Figure 4: Plasma Block Diagram
. Ports
The number of bits of configuration memory could be 6 Bits .
greatly reduced from the 1000 per PALE Plasma has. Each of Address .
the PALE inputs connects to only a single line of the 100 in the 1 Bit
hextant crossbar at any one time, but 100 bits of configuration Data-out
memory are used to express this connection. Obviously, only 6 Bits
seven bits are required, a 93% reduction. But, using only seven Address
configuration bits would require a 100-1 multiplexor to select the 1 Bit
correct line. Such large multiplexes are quite slow and require Data-out
significant chip area. The number of transistors used would not Figure 5: Plasma Register File
decrease.
Plasma takes a novel approach to reducing this count. Be-
cause the PALE LUTs are large trees of pass transistors, they can
4.4 Register Files be connected “backwards” to build large decoders. The six input
Custom computing designs frequently have need for large PALE conveniently converts into a 6-64 decoder. By taking a
numbers of registers. Building these from scratch using gates slice of PALEs from each of eight hextants, a register file primi-
(viz. LUTs) is expensive. The large decoders used to select the tive is constructed (Figure 5). Each Plasma contains four, 2-bit
register accessed by each port require a minimum of one output wide, 64-deep, 16 port register files[8]. Eight ports are config-
per register per port. Building a 64-deep register file with two ured as read ports and eight as write ports. A write port has both
read and one write port would require 256 LUTs to drive the row an address and write enable; a read port has only an address.
lines. A 64 wide OR gate is required for each bit (column) of
each read port. The chip does not support wire-OR or tri-state
logic. Using the six input PALEs of Plasma, a 64 deep, 32-bit 4.4.1 Average gate count
wide, register file requires approximately 1200 PALEs. For designs using this feature, extremely high gate effi-
ciency can be achieved. One test design, a sort engine for 64
elements, achieved an equivalent gate count of 60,000 for a sin-
gle Plasma, nearly 240 gates per PALE. A more typical number sive data path with most of the signals switching from 1 to 0 (or
for random logic is eight to ten gates per PALE, approximately 0 to 1) at the same clock edge. The resulting collapse of the
2000 gates per chip. These numbers are achieved by 100% internal power supply can cause configuration bits to change,
automatic place and route. leaving a random design in the chip. Not only would the result-
ing design be incorrect, but the chip outputs could short damag-
4.5 Pinouts ing it.
A CCM requires greater connectivity than is provided by We took significant precautions to reduce the danger of si-
commercially available FPGA’s. Plasma’s 336 signal I/O pins multaneous switching problems. Great care was taken in the
are roughly double that which can be obtained off-the-shelf. In design of the buffer arrays: central crossbar buffers, peripheral
addition, placement and routing within Plasma is insensitive to crossbar buffers, and PALE buffers. The buffers are designed
signal pin assignments because all I/O pins have equal connec- with small devices to keep switching currents small. Break-
tivity into the chip’s central crossbar. before-make design minimizes crossover currents in the buffers
during logic state switches. Peripheral crossbar buffers are in-
4.6 Scan verting so alternate buffers in a path switch to the opposite state.
The Plasma chip is designed to make the debugging of a Logic states are positive true in the peripheral crossbars and
user’s custom design easy and natural. All internal state is negative true in the central crossbar.
available on a user scan chain. This includes all flip-flops and The power busing is designed to meet the instantaneous cur-
register files as well as all the I/O pins of the chip. Both the D rent requirements when switching all flip-flops using the lowest
and the Q of every flip-flop are simultaneously observable. The resistivity metal layer. Eight large VDD/GND connections, each
outputs and inputs of all LUTs are observable and the user inter- with three bonding wires, allow better current carrying capability
face recreates signals that exist on the original schematic but and lower inductance. Dirty VDD/GND power pins separately
were subsumed into lookup tables. supply all the off-chip pad drivers. Plasma is unusual because
The clock is stopped before examining the scan chain and the on-chip core SSO is much more severe than I/O pad SSO.
the user may change the state of the machine before resuming Wide internal VDD/GND power rings between the core logic
clocking. The Plasma architecture allows the user full peek and and the pads provide power connections from core to power pads.
poke capability while debugging. This scan capability uses dif- Power buses are routed directly over high current buffers to
ferent commands to the chip than the read and write configura- minimize voltage drop.
tion commands allowing fast debugging. To keep the configuration bits from changing logic states,
the configuration cell bit lines are precharged high at all times
4.7 Programming except when accessing. This is done to take advantage of the
The chip has a second scan chain used for configuring. The difficulty of writing a “1” into the cells. Many conventional
scan chains are separated for fast reading and writing of the user designs precharge the bit lines only immediately before accessing
state, allowing simple algorithms for examining and setting the the cells.
state of all user visible signals in the chip to provide excellent To prevent all 512 flip-flops from switching at the same
debug capability. clock edge, the logic clocks are deliberately skewed up to seven
The configuration scan chain writes and reads the underly- nanoseconds by the clock distribution tree. The inner eight sets
ing configuration bits. The read allows verification that a con- of hextants are skewed to clock later than the outermost eight
figuration was received correctly. Configuration requires first hextants to deal with the worst case hold time as seen from out-
halting the clocks and placing the chip in configure mode. A side the Plasma chip. All hold times are kept negative because
separate high frequency clock is used for configuration. A com- the best case buffer switching times are much less than the clock
plete configuration consists of almost 400,000 bits. The configu- skew. To account for skewing, additional time is added to the
ration bits are stored in a large SRAM structure underlying the non-overlap time of the clock. SPICE shows that all hold times
crossbars and PALEs. Each row may be read and written inde- can be met. Because the Plasma compiler computes the clock
pendently, allowing minor modifications to a design without period required by each individual design, the skewing of clocks
completely downloading a new configuration. in the chip causes no setup time problems.

Plasma uses a parent-child scheme on the scan chain sig- A potential problem arises with “Global Drive Enable,” the
nals. A header record tells the chip whether this configuration signal that enables all buffers out of tristate after a configuration
data was intended for it. If not, the remaining data is passed to has been loaded. When all buffers on the chip are enabled at
the child pin. This allows the chips to be connected serially for once, a significant current surge can occur. Our solution created
system configuration, reducing the signals that must be con- five individual “Global Drive Enable” signals each separated by
nected to the master system. A broadcast command can address one hundred nanoseconds, routed so that four tied to one hextant
all chips on a given chain. in each quadrant and the fifth enabled the pad drivers. The tim-
ing is controlled by the chip controller.
4.8 Simultaneous Switching Outputs (SSO)
FPGAs used for custom computing have a high probability
of simultaneous switching problems. A user may design a mas-
Figure 6: Plasma Chip Photograph

5. Results 5.1 Performance


FIRST SILICON WORKED! The 16.2 mm by 16.2 mm Designs contained in a single Plasma have run at speeds in
three million transistor Plasma chip (Figure 6) required no excess of 10MHz. But Plasma is only a component of a much
modification. Test designs performed exactly as predicted due to larger system. Connecting together up to 1728 Plasma chips
extensive peer review and conservative design. results in numerous chip crossings per clock cycle, and conse-
quently slower execution speed. A typical design has a worst
The chip performs at its projected speed. Simultaneous case path spread across 10-20 Plasma chips. The Teramac
switching has not been an issue. Most designs have compiled compiler calculates the correct maximum clock speed based on
with no user intervention, though quite a bit more compiler work an individual design. Designs have ranged from 300 KHz to 2
was required to bring the compiler to accomplish this. MHz. The latter designs are linear systolic arrays with very
We have worked with several students who want to imple- short critical paths.
ment algorithms on Teramac. They have learned the system
quickly, spending most of their time on algorithms and design, 6. Conclusions
not learning custom computing hardware. In four to eight weeks Configurable custom computing has a promising future.
they have completed working designs on Teramac, achieving Several researchers have used tens of FPGAs to create a variety
sixty to one-hundred times workstation performance. of highly parallel custom machines [9,10,11]. Teramac allows
experiments using hundreds of FPGAs. The Teramac architec-
ture provides a routing-rich environment for implementing user
Computer-Aided Design of Integrated Circuits and Systems, Vol
designs, made possible by an investment in custom FPGA’s,
12, No 2, February 1993
MCM’s and PC boards. The Plasma architecture simplifies the
compiler such that the resulting speed of compilation, tens of [8] Greg Snider, Philip Kuekes, W. Bruce Culbertson, Rich-
minutes rather than tens of hours, allows another layer of ab- ard J. Carter, Arnold S. Berger, Rick Amerson, “The Teramac
straction in defining highly parallel custom computers. The Configurable Compute Engine,” Field Programmable Logic and
speed of compilation and execution along with easy access to Applications, Will Moore and Wayne Luk (Eds). Springer, 1995
many large memory banks and multiported register files provided
by Teramac will allow fruitful methods for the simple specifica- [9] J. M. Arnold, D. A. Buell, and E. G. Davis. “Splash 2,”
tion and synthesis of custom computers. Proceedings of the 4th Annual ACM Symposium on Parallel
Algorithms and Architectures, 1992, pages 316-322
[10] Patrice Bertin, Didier Roncin, and Jean Vuillemin,
"Introduction to programmable active memories," in Systolic
Acknowledgments Array Processors, Prentice-Hall, 1989
Greg Snider was the compiler writer and chief architect.
[11] S. Casselman. “Virtual Computing,” Proceedings of the
Arnie Berger designed the Logic Board and managed the system
IEEE Workshop on FPGA’s for Custom Computing Machines,
hardware design. Andy Blasciak designed and implemented the
Napa, Ca, April 1993
control board. Martin Guth perfected the MCM wire-bonding
process. Dan Kary developed the control and user interface soft-
ware. Wulf Rehder developed test algorithms. Bruce Culbertson
created the system test code. Dick Carter defined the user inter-
face, created numerous custom computing designs, and improved
the compiler. Lyle Albertson, Sue Blockstein, Brian Jung, and
Tom Meyers did the detailed design and verification of Plasma.
Peter Maxwell developed the chip yield model for Plasma. Rick
Amerson provided management and contributed to the Plasma
architecture. Phil Kuekes was project manager and has contrib-
uted to both the MCM design and the architecture. Barry
Shackleford and Bob Rau originally conceived Teramac.

References
[1] R. Amerson, R. Carter, W. Culbertson, P. Kuekes, G.
Snider. "Teramac -- Configurable Custom Computing", Proceed-
ings of the 1995 IEEE Symposium on FPGA's for Custom Com-
puting Machines.
[2] Azam Barkatullah, Wern-Yan Koe, Harish Nayak, Nazar
Zaidi, “Pre-Silicon Validation of Pentium CPU”, 1993 Hot Chips
Symposium
[3] J. Hadley, B. Hutchings. “Design Methodologies for
Partially Reconfigured Systems,” Proceedings of the 1995 IEEE
Symposium on FPGA’s for Custom Computing Machines.
[4] B. Landman and R. Russo, "On a Pin vs. Block Relation-
ship for Partitions of Logic Graphs," IEEE Transactions on Com-
puters, December 1971.
[5] R. Amerson and P. Kuekes. “The Design of an Ex-
tremely Large MCM-C -- A Case Study,” International Journal of
Microcircuits and Electronic Packaging, Vol. 17, No. 4.
[6] Jonathan Rose, Robert J. Francis, David Lewis, Paul
Chow, “Architecture of Field Programmable Gate Arrays: The
Effect of Logic Block Functionality on Area Efficiency,” IEEE
Journal of Solid-State Circuits, Vol 25, No. 5, October 1990
[7] Dwight Hill, Nam-Sung Woo, “The Benefits of Flexibil-
ity in Lookup Table-Based FPGA’s,” IEEE Transactions on

You might also like