0% found this document useful (0 votes)
84 views7 pages

Chortle CRF

This document presents a new algorithm called Chortle-crf for technology mapping of lookup table-based FPGAs. The algorithm aims to minimize the total number of lookup tables needed to implement a combinational circuit. It is faster than previous algorithms and can also implement circuits using Xilinx CLBs. The algorithm exploits reconvergent paths and logic replication to reduce lookup table usage.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views7 pages

Chortle CRF

This document presents a new algorithm called Chortle-crf for technology mapping of lookup table-based FPGAs. The algorithm aims to minimize the total number of lookup tables needed to implement a combinational circuit. It is faster than previous algorithms and can also implement circuits using Xilinx CLBs. The algorithm exploits reconvergent paths and logic replication to reduce lookup table usage.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Chortle-crf Fast Technology Mapping for

Lookup Table-Based FPGAs

Robert Francis, Jonathan Rose, Zvonko Vranesic

Department of Electrical Engineering, University of Toronto, Canada

Abstract blocks containing lookup tables, such as the first com-


mercial FPGA [Cart 86]. Moreover, recent studies in
A new technology mapping algorithm for lookup table- FPGA architectures have suggested that lookup tables
based Field Programmable Gate Arrays (FPGA) is pre- are an area-efficient method of implementing combina-
sented. The major innovation is a method for choosing tional functions [Rose90]. A K-input lookup table is
gate-level decompositions based on bin packing. This a digital memory with K address lines and a one-bit
approach is up to 28 times faster than a previous ex- output. This memory contains 2 K bits and is capable
haustive approach. The algorithm also exploits recon- of implementing any Boolean function of K input vari-
vergent paths and replication of logic at fanout nodes ables.
to reduce the number of lookup tables in the circuit. This paper presents a new algorithm for lookup ta-
The new algorithm is implemented in the Chortle-crf ble technology mapping which is implemented by the
program. In an experimental comparison Chortle-crf Chortle-crf program. Chortle-crf converts a combina-
requires 14 YO fewer lookup tables than Chortle [Fran90] tional network of ANDs, ORs, and NOTS into a circuit
and 10 ~o fewer lookup tables than mis-pga [Murg90a] of lookup tables where every lookup table has K or fewer
to implement a set of benchmark networks. inputs. The goal is to minimize the total number of K-
Chortle-crf can also implement a network as a cir- input lookup tables in this circuit. For example, the
cuit of Xilinx 3000 series Configurable Logic Blocks network in Figure 1a can be implemented by the circuit
(CLBS). To implement the benchmark networks as cir- of three 5-input lookup tables shown in Figure lb. The
cuits of CLBS Chortle-crf requires 12 70 fewer CLBS dotted boundaries indicate the functions implemented
than mis-pga and 22 % fewer CLBS than XNFOPT by each lookup table. Note that one of the lookup ta-
[Xili89]. In these experiments Chortle-crf waa an aver- bles uses only 4 of the available 5 inputs. All examples
age of 68 times faster than mis-pga and 30 times faster in the remainder of this paper will assume that K is
than XNFOPT. 1 equal to 5.

2 Background
1 Introduction
Technology mapping produces a circuit that implements
Field Programmable Gate Arrays (FPGAs) are a re-
a combinational network using a restricted set of circuit
cent innovation in Application Specific Integrated Cir-
elements. Earl y work in technology mapping, such as
cuits (ASICS) that provide both large scale integra-
SOCRATES [Greg86] and the work by Kahrs [Kahr86],
tion and user-programmability [Hsie88] [Ahre90]. The
focused on circuits created from standard cell libraries.
user-programmability of FPGAs can dramatically re-
An important advance in library-based technology map-
duce ASIC turn-around time and manufacturing costs.
ping was the introduction of dynamic programming by
An FPGA consists of an array of programmable logic
Keutzer [Keut87]. Other library-based technology map-
blocks and a programmable routing network. An im-
pers include misII [Detj87] and McMAP [Lisa87].
portant class of FPGAs consists of those that use logic
A lookup table of K-inputs can implement 22K differ-
ent Boolean functions of K variables. For values of K
1This work ~= supported by NSERC Operating Gr~ts greater than 3 the library required to describe a K-input
#URFO043298 and #OGPOO05280, a research grant from Bell- lookup table becomes impractically large and therefore
Northern Research, and a research grant from the ITRC of On- technology mapping algorithms that deal specifically
tario.
with lookup tables are required [Fran90]. Two pre-
viously reported lookup table technology mappers are

Permlsslon to copy w>thout fee all or part of this material I< granted
Chortle [Fran90] and mis-pga [Murg90a].
provided that the copies are not made or distributed for drect commercial The Chortle technology mapper presented in [Fran90]
advanrage, the ACM copyright notice and the title of the pubhcation and uses an exhaustive search to find the optimal gate-
Its date appear, and notice is given that copying is by permission of the
level decomposition of every node in a fanout-free tree.
Association for Computing Machinery. To copy otherwise, or to republish,
requmes a fee and/or specific penmssion. However, the partitioning of the original network into

28th ACM/l EEE Design Automation Conference@


Paper 15.1
01991 ACM 0.89791-395-7/91/0006/0227 $1.50 227
,-.
-..
–-i
i 9’ ~--
v
.-.1
/-.__y ~----.l

‘Y
If-t-Y
i I !
I ii i
. . . --..4
I i--- -.
ii
. . . . .
I. . .... .

.- .- - -.
I
i

L ---- ---- J

a) without gate decomposition


a) combinational network

~-----”l
~ .. . . . . .

R?----
I

.----
.,
.
i I i
I 1 !
/ ! i
I I I
I i i
!
L_____ ---- J i---- ___.J
Y’
I i

$$-”
! !
i [
i I
i
i i
!
1 i
1....... . ...- ..—.. . ....
b) with gate decomposition
b) circuit of 5-input lookup tables Figure 2.
Figure 1.

addition of extra lookup tables.


fanout-free trees precludes optimization that exploit
reconvergent paths and replication of logic at fanout
nodes. 3.1 Bin Packing Approach
The mis-pga technology mapper produces a circuit of to Gate Decomposition
lookup tables as an intermediate result [Murg90a]. It
initially performs a non-optimal decomposition of the The key to constructing the Best Circuit implementing
combinational network and then focuses on a covering a node is finding the decomposition of the node that
problem to reduce the number of lookup tables in the reduces the number of lookup tables in the final circuit.
circuit. The covering problem does allow opt imizat ions For example, five lookup tables are required to imple-
that exploit reconvergent paths and replication of logic ment the tree shown in Figure 2a. In Figure 2b, the
at fanout nodes. single OR node of Figure 2a has been decomposed into
two OR nodes, which allows the tree to be implemented
with just two lookup tables.
3 The Chortle-crf Algorit hm The construction of the Best Circuit for a node de-
pends upon the Best Circuits that implement the node’s
A major innovation in Chortle-crf is the application of immediate fanin nodes. The order of the network
bin packing to choosing gate-level decompositions. Two traversal ensures that these immediate fanin circuits
other important features are the exploitation of recon- have been previously constructed. The output lookup
vergent paths and replication of logic at fanout nodes tables of the fanin Best Circuits will be referred to as
to reduce the number of lookup tables in the circuit. the fanin lookup tables. Figure 3a shows an OR node
The principal technique used by Chortle-crf is dy- and its five fanin lookup tables.
namic programming. The combinational network is tra- The goal of finding the best decomposition is attained
versed beginning at the primary inputs and proceeding by constructing a tree of lookup tables that implements
toward the primary outputs. At each node a circuit both the functions of the fanin lookup tables and a
implementing the cone extending from the node to the decomposition of the node. This tree must contain
primary inputs of the network is constructed. This cir- the minimum number of lookup tables and the output
cuit is referred to as the Best Circuit implementing the (root) lookup table must have the maximum number of
node. unused inputs possible without increasing the number
Chortle-crf has two goals when constructing the Best of lookup tables in the tree.
Circuit. The first is to minimize the number of lookup The tree of lookup tables is constructed in two steps.
tables in the circuit and the second is to maximize the First, a two-level decomposition is constructed and then
number of unused inputs at the output lookup table. this decomposition is converted into a multi-level de-
These unused inputs are important because they may composition. Figures 3b and 3C illustrate the two-level
allow subsequent nodes to be implemented without the and multi-level decompositions constructed from the

Paper 15.1
228
FirstFitDecreasing

{
start with en empty bin list

uhile there are unpacked boxes

a) fanin lookup tables {


if the largest unpacked box will not fit
vithin any bin in the bin list
I
1 {
!
create an empty bin and
I add i.t to the end of the bin list
}

pack the largest unpacked box into the


first bin it will fit within
}
}
b) two-level decomposition

Figure 4: Pseudo code for First Fit Decreasing

bins are the second-level lookup tables and the boxes


are the fanin lookup tables. The capacity of each bin

~....
is K, and the size of each box (fanin lookup table) is
its number of used inputs. In Figure 3a the boxes have
sizes 3, 2, 2, 2, and 2. In Figure 3b the final contents

. . .. ...... . . . . .. . of the packed bins are 5, 4, and 2. The bin packing


algorithm used is First Fit Decreasing as outlined in

7’
I i Figure 4 [Gare79].
/

!1 3.1.2 Multi-Level Decomposition


.—... -_ . ....._ i

The decomposition tree is completed by implementing


c) multi-level decomposition
the first-level node with a tree of lookup tables. The
Figure 3.
inputs to the leaf lookup tables of this first-level tree
are the outputs of the second-level lookup tables of the

fanin lookup tables of Figure 3a. two-level decomposition. Any second-level lookup ta-
ble with unused inputs can be used to implement a
portion of the first-level tree, thereby reducing the to-
3.1.1 Two-Level Decomposition
tal number of lookup tables in the decomposition tree.
The two-level decomposition consists of a single jirst- Figure 3C illustrates the multi-level decomposition con-
level node and several second-level nodes. In Figure 3b structed from the two-level decomposition of Figure 3b.
the 3-input OR node is the first-level node and its three The detailed procedure for converting the two-level
inputs are the second-level nodes. Each second-level decomposition into a multi-level decomposition is out-
node implements the operation of the node being de- lined in Figure 5.
composed over a subset of one, some, or all of the fanin The final multi-level decomposition can be shown to
lookup tables. In Figure 3b there are three second-level be optimal if the network is a fanout-free tree and the
nodes each of which is implemented by a lookup ta- value of K is less than or equal to 5 [Fran91]. For net-
ble. The first-level node is not yet implemented by any works partitioned into fanout-free trees the bin packing
lookup tables, however, it will be implemented when the approach is up to 28 times faster than the previous ex-
two-level decomposition is converted into a multi-level haustive search approach [Fran90], yet it produces cir-
decomposition. cuits with the same number of lookup tables. This im-
The two-level decomposition is constructed using a provement in speed makes it practical to consider opti-
bin packing algorithm. In general, the goal of bin pack- mization exploiting reconvergent paths and replication
ing is to find the minimum number of bins into which of logic at fanout nodes, as discussed in the following
a set of boxes can be packed [Gare79]. In this case, the sect ions.

Paper 15.1
229
!
MultiLevel

{
while there is more than one unconnected bin

{
if there are no free inputs among the
remaining unconnected bins
{ a) fanin lookup tables with shared input
create an empty bin and
add
}
it to the end of the bin list -I.-l-.., r-tI--+ -t:
--- --- .
A.-.1--, .-1-.-1-.

comect the most filled unconnected bin to

the next unconnected bin vith a free input

Figure5: Pseudo code for multi-level conversion

b) realized reconvergent paths


Figure 6.
3.2 Exploiting Reconvergent Paths

It is possible to exploit local reconvergent paths to find


chosen pairs and then proceeding with the bin pack-
a better circuit implementing a node. The following
ing. The circuit with the fewest lookup tables (and the
discussion uses the terminology of the previous section,
greatest number of unused inputs at the output lookup
where the fanin lookup tables are referred to as boxes
table) is retained as the Best Circuit. This realization of
and the second-level lookup tables are referred to as
reconvergent paths is a greedy local optimization that
bins.
is considered at every node as the network is traversed.
If two boxes share the same input, then there exists
In our experiments with the MCNC benchmark net-
a pair of reconvergent paths. If the total number of
works the largest number of reconvergent pairs at any
distinct inputsto these twoboxesis less than orequal to
one node has been found to be six pairs. The bin pack-
K, then it impossible topack the two boxes intoone bin.
ing approach is fast enough to make the search of all
When these two boxes are packed into the same bin, the
possible combinations of these pairs practical.
volume occupied is the total number of distinct inputs,
which is less than the sum of the boxes’ individual sizes.
Figure 6a shows a pair of boxes that share an input and 3.3 Replication of Logic
Figure 6b shows the pair of reconvergent paths realized
at Fanout Nodes
within a bin.
By merging the two boxes and realizing the pair of re- The previous version of Chortle partitions the combina-
convergent paths within a single lookup table, a smaller tional network into a set of fanout-free trees [Fran90].
portion of the bin is occupied. This may lead to a supe- This forces every fanout node to be explicitly imple-
rior bin packing, which in turn may lead to a superior mented as the output of a lookup table, and allows these
Best Circuit. nodes to be treated as primary inputs to the rest of the
However, two boxes can only be merged if they are net work.
packed into the same bin. The two boxes can be forced It is possible to implement the fanout nodes implic-
into the same bin by merging them before the bins are itly inside lookup tables, which requires the replication
packed. Forcing these two boxes into one bin may inter- of some logic at a fanout node. This replication may de-
fere with the bin packing algorithm and actually result crease the total number of lookup tables in the circuit
in an inferior packing. To find the Best Circuit, both the implementing the network. For example, in Figure 7a,
packing with the forced merge and the packing without three lookup tables are required to implement the net-
the forced merge need to be considered. work when the fanout node is explicitly implemented.
A further complication is that more than one pair In Figure 7b, the AND gate implementing the fanout
of reconvergent paths may terminate at the node. To node is replicated and only two lookup tables are re-
find the Best Circuit, Chortle-crf begins by finding all quired to implement the network.
pairs of local reconvergent paths. For every possible When the dynamic programming traversal of the net-
combination of these pairs, including none, a circuit is work encounters a fanout node the Best Circuit imple-
constructed by first merging the respective boxes of the menting the fanout node is constructed. At this point

Paper 15.1
230
.-..
.\.i
‘frw
Network Cho] :-crf mis-pga
i -c -cr -Cf -crf
L--- . .. . ..J
lookups Iookups lookups lookups lookum
.- -- -i ,... --- --l
!
z4ml 9 9 9 6 8
I
I misexl 20 20 19 19 11
!
!
! vg2 24 24 23 21 30

~--..;~...-.,
-.
----
I 5xpl 34 31 34 27 31
L---- ..-! L..- .-.__!
count 47 45 40 31 31
a) no replicated logic 9symml 63 59 62 55 56
9sym 69 65 67 59 72
apex7 72 71 71 64 64

Iwl
rd84 76 76 74 73 40
e64 95 95 80 80 82
!/
!1
C880 115 110 112 86 103
II apex2 123 123 121 120 80
II
!1 alu2 131 121 127 116 129
II
Ii . .... .... ... . . .—-. —.-.. duke2 138 136
. .. . .. . . --------- 126 120 128
C499 166 164 158 74 66
b) with replicated logic rot 219 207 208 189 200
Figure 7. apex6 232 219 230 212 243
alu4 238 219 227 195 235
apex4 603 600 579 558 765
des 1073 1060 1050 952 1016

total 3547 3454 3417 3057 3390

Table 1: Results for K = 5

two options are considered. The fanout node can be ei-


ther explicitly implemented, or implicitly implemented. 4 Results
If the fanout node is explicitly implemented it is treated
as a primary input to the rest of the network. If it is To evaluate Chortle-crf a series of experiments were
implicitly implemented, a replica of the function of the performed on networks from the MCN-C logic synthe-
output lookup table is made for each fanout edge. This sis benchmark suite. Four experiments were performed
replica replaces the fanout node as the source of the on each network:
edge.
-c using only the constructive bin packing approach
Every path starting with an edge from a fanout node
will eventually reach another fanout node or a primary -cr using the reconvergent optimization

output of the network. These subsequent fanout nodes -cf using the replication optimization
and primary outputs will be referred to as the visible
-crf using both reconvergent and replication
nodes.
To determine if the replication is worthwhile The first step in the experimental procedure was
Chortle-crf solves a series of subproblems. For each technology independent logic optimization using the
visible node the Best Circuit implementing the visible misII logic optimizer with the standard script [Bray86].
node is constructed twice; once with the replication and Chortle-crf was then used to implement the networks as
once without the replication. Each subproblem is itself circuits of 5-input lookup tables. Note that Chortle-crf
solved using Chortle-crf with the assumption that any is capable of implementing networks as circuits of K-
remaining fanout nodes encountered in these subprob- input lookup tables for values of K from 2 to 10.
lems are explicitly implemented and can therefore be Table 1 records the number of 5-input lookup tables
treated like primary inputs. The bin packing approach required to implement the networks in each of the four
is fast enough to make solving these subproblems prac- experiments. The reconvergent optimization reduced
tical. the total number of lookup tables required to imple-
After the subproblems have been solved the total ment the networks by 2.7 YO , and the replication opti-
number of lookup tables required to implement the vis- mization reduced the total number of lookup tables by
ible nodes both with and without the replication are 3.7 %. Combining both optimizations reduced the total
known. If the total number of lookup tables is reduced number of lookup tables by 14 Yo.
by the replication, then the replication is retained. The The reduction achieved when using both optimiza-
replication of logic is considered at every fanout node aa tion together often exceeds the sum of the individual
it is encountered by the dynamic programming traversal reductions. This occurs when reconvergent paths that
of the network. cross fanout nodes are found and realized within a single

Paper 15.1
231
Network Chortle-crf Network Chortle-crf I mis-pga xl OPT
-c -cr -Cf -crf CLBS
~ sec.
2
CLBS sec. 1
CLBS CLBS CLBS CLBS z4ml 3 0.8 7 6 296.5
z4ml
misexl
vg2
5xpl
count
14
20
23
32
5

42
5
14
19
20
31
21
23
32
50
14
7
14
18
20
27
41
3 misexl
vg2
5xpl
count
9symml
9sym
14
18
20
27
41
T 0.7
0.6
3.2
2.0
59.1
10
21
23
28
43
25.6
45.5
12
20
19
32
56
298.2
299.7
301.1
301.9
901.2
305.1

1
9symml 50 42 62.9 59 52
9sym 52 44 56 42 apex7 42 2.9 50 117.3 51 304.6
apex7 48 45 49 42 rd84 53 15.4 32 65.1 38 303.2
rd84 52 52 53 53 e64 54 1.9 61 65 901.5
e64 48 48 54 54 C880 69 12.6 82 101 1809.4
C880 75 70 94 69 apex2 93 34.9 70 102 909.7
apex2 94 90 97 93 alu2 83 56.3 102 91 907.8
alu2 94 86 98 83 duke2 89 9.1 105 357.1 99 903.6
duke2 88 87 91 89 C499 50 15.9 50 137.5 121 1847.0
C499 84 84 96 50 rot 131 14.0 153 844.8 166 1811.4
rot 134 129 144 131 apex6 161 25.3 191 1376.8 198 1822.6
apex6 169 161 169 161 alu4 138 178.1 189 232 1849.4
alu4 165 144 174 138 subtotal 1128
m
apex4 457 451 463 448

3ELE!I
apex4 448
des 714 695 797 743
des 743
z
total 2418 2317 2582 2319
tot al
-
2319
‘execution times on a
m
Sun 3/60
Table 2: CLB Results 2 execution times on a VAX 8800

Table 3: CLB Results


lookup table. A dramatic example is the network C499.
where using both optimizations reduces the number of
lookup tables by 55 %.
As an intermediate result the mis-pga technology tables. The replication of logic at a fanout node may
mapper produces a circuit of 5-input lookup tables increase the number of inputs used at some lookup ta-
[Murg90a]. The sixth column of Table 1 records the bles thereby precluding some pairings of lookup tables
number of 5-input lookup tables in the circuits produced into CLBS and reducing the maximum number of pairs
by mis-pga [Murg90b]. In total, Ghortle-crf required that can be found. If the reduction in the number of
10 % fewer lookup tables than mis-pga to implement pairs exceeds the reduction in the number of lookup ta-
the benchmark networks. bles then the replication will result in a net increase in
the number of CLBS.
Two other logic synthesis systems capable of im-
4.1 Xilinx CLBS
plementing networks as circuits of CLBS are mis-pga
The Xilinx 3000 series of FPGAs uses lookup tables to [Murg90a] and the Xilinx proprietary design system
implement combinational logic [Hsie88]. These devices [Xili89]. Chortle-crf can be compared to these systems
contain an array of Configurable Logic Blocks (CLBS). on the basis of the number of CLBS in the final cir-
Each CLB can implement one 5-input lookup table or cuits and execution time. Table 3 records the number
two 4-input lookup tables as long as the total number of CLBS required to implement the benchmark networks
of distinct inputs to the CLB is less than or equal to 5. using Chortle-crf, mis-pga and Xilinx software. In to-
A circuit of CLBS can be derived from each circuit of tal, Chortle-crf required 12 YO fewer CLBS than mis-pga
5-input lookup tables by using one CLB to implement and 22 Yo fewer CLBS than XNFOPT to implement the
each lookup table. The number of CLBS can be reduced benchmark networks.
by finding pairs of lookup tables that fit inside a sin- The table also records the execution times for
gle CLB. Finding the maximum number of such pairs Chortle-crf on a Sun 3/60 and mis-pga on a VAX 8800
can be restated as a Maximum Cardinality Matching [Murg90a]. In the Xilinx design system technology
problem [Murg90a] [Gibb85]. Table 2 records the num- mapping is performed by the two programs XNFOPT
ber of CLBS in the circuits derived from the previous and XNFMAP [Xili89]. Note that XNFOPT will run
Chortle-crf experiments. indefinitely and in these experiments limits were placed
Note that using only the replication optimization can on its execution time. The seventh column of Table 3
increase the number of CLBS in the derived circuit, even records the total execution time of the two programs
when the optimization reduces the number of lookup on a Sun 3/60. It should be noted that by conservative

Paper 15.1
232
estimate a VAX 8800 is twice as fast as a Sun 3/60. [Fran90] R. J. Francis, J. Rose, K. Chung, “Chortle: A

Taking into account the relative speed of the Sun 3/60 Technology Mapping Program for Lookup Table-
Based Field Programmable Gate Arrays: Proc.
and the VAX 8800, Chortle-crf is an average of 68 times
27th DAC, June 1990, pp. 613-619.
faster than mis-pga and 30 times faster than XNFOPT.
[Fran91] R. J. Francis, “Technology Mapping for Lookup
Table-Based FPGAs,” Ph.D. Thesis in preparation,
University of Toronto, Department of Electrical En-
5 Conclusions gineering.

The bin packing approach to gate decomposition de- [Gare79] M. R. Garey, D. S. Johnson, “Computers and
scribed in this paper is up to 28 times faster than a pre- Intractability, A Guide to the Theory of NP-
Completeness,” W. H. Freeman and Co., 1979, pp.
vious exhaustive search approach. The improved speed
124-129.
of gate decomposition makes it practical to consider lo-
cal optimizations that exploit both reconvergent paths [Gibb85] A. Gibbons, “Algorithmic Graph Theory,” Cam-
bridge University Press, 1985, pp. 125-133.
and replication of logic at fanout nodes.
Using both of these optimizations, Chortle-crf re- [Greg86] D. Gregory, et al., “Socrates: a system for au-
quired 14 % fewer 5-input lookup tables than Chortle tomatically synthesizing and optimizing combin>
[Fran90] and 10 % fewer lookup tables than mis-pga tion.?d logic,” Proc. 23rd DAC, June 1986, pp. 79-85.

[Murg90a] to implement a set of benchmark networks. [Hsie88] H. Hsieh, et al., “A 9000-Gate User-Programmable
Chortle-crf is also capable of implementing networks Gate Array,” Proc. 1988 CICC, May 1988, pp. 15,3,1
aa circuits of Xilinx 3000 series CLBS. To implement the -15.3.7.

benchmark networks as circuits of CLBS, Chortle-crf re- [Kahr86] M. Kahrs, “Matching a parts library in a silicon
quired 12 YO fewer CLBS than mis-pga and 22 ‘?10 fewer compiler,” IEEE ICCAD, 1986, pp. 169-172.
CLBS than XNFOPT. On average, Chortle-crf was 68
[Keut87] K. Keutzer, “DAGON: Technology Bindkg and Lo-
times faster than mis-pga and 30 times faster than cal Optimization by DAG Matching,” Proc. 24th
XNFOPT. DAC, June 1987, pp. 341-347.

[Lisa87] R. Lisanke, F. Brglez, G. Kedem, “McMAP: A


Fast Technology Mapping Procedure for Multi-Level
6 Future Work Logic Synthesis? Proc. ICCD, Oct. 1988, pp. 252-
256.
Currently, the optimizations exploiting reconvergent
[Murg90a] R. Murgai, et al., “Logic Synthesis for Pro-
fanout and replication of logic are evaluated locally,
grammable Gate Arrays,” Proc, 27th DAC, June
There are, however, global interactions among these op-
1990, pp. 620-625.
timization. The search for reconvergent paths should
be extended to include those paths not found by the [Murg90b] R. Murgai, private correspondence.

local search. As well, realizing a pair of reconvergent [Rose90] J. Rose, R. J. Francis, D. Lewis, P. Chow, “Architec-
paths within a single lookup table may depend upon tures of Field-Prograrmnable Gate Arrays: The ef-
the replication of logic at multiple fanout nodes. fect of Logic Block Functionality of Area Efficiency,”
IEEE Journal of Solid-State Circuits, Vol. 25, No.
There are cases where the optimizations requiring
5, Oct. 1990, pp. 1217-1225.
replication of logic at different fanout nodes may be mu-
tually exclusive. A computationally tractable method [Xili89] XACT LCA Development System, Vol. II, Xilinx
blC.. 1989.
of determining which set of replications at fanout nodes
will result in the minimum number of lookup tables for
the entire network is needed.

References
[Ahre90] M. Ahrens, et aL, UAn FpGA Family optimized for
High Densities and Reduced Routing Delay,” Proc.
19!20 CICC, May 1990, pp. 31.5.1-31.5.4.

[Bray86] R. Brayton, et al., “Multiple-Level Logic Optimiza-


tion System~ Proc. ICCAD, Nov. 1986, pp. 356-
359.

[Cart86] W. Carter et al., “A user Programmable reconfig-


urable gate array? Proc. CICC, May 1986, pp 233-
235.

[Detj87] E. Detjens et. al, “Technology Mapping in MIS”,


Proc. ICCAD 87, Nov 1987, pp. 116-119.

Paper 15,1
233

You might also like