A Theoretical Investigation On CMOL FPGA Cell Assignment Problem
A Theoretical Investigation On CMOL FPGA Cell Assignment Problem
3, MAY 2009
nificantly increase the number of gates and the total length of Informally, C(n) has the same number of output ports as the
wires, as well as the length of critical path. A practical fanout number of last row of gates and each output port is connected
reduction algorithm should be designed that does not extend the to the corresponding gate at last row.
critical path too much. Note that the circuit is a complete binary tree as shown in
The aforesaid lemma says that any circuit can be transformed Fig. 8.
to a circuit of degree 2. Though the small fanout size of cir- Now we are in a position to show that, for any d, there exists
cuits of degree 2 makes them easier to place, they may still be n such that C(n) is not placeable. Instead of proving this result
unplaceable, even the placement area is infinitely large. directly, we can prove a stronger result that there exists a partial
Theorem 5 (Not placeable circuit of degree 2): For any posi- circuit P(n) that is not placeable.
tive number d, there is a circuit of degree 2 that is not placeable For the given connection domain radius d, all gates in C(n)
in a placement region with radius d. must be placed within the distance of n ∗ d from i. This area
Proof: We will define a sequence of circuits C(n), n = forms a rectangle containing total of R(n, d) = (n ∗ d) ∗ (2 ∗
1, 2, 3, . . . , so that for any d, there exists an n such that C(n) n ∗ d + 1) ∈ O(n2 ). However, the total number of gates at the
is not placeable for the placement region with radius d. To this last row of C(n) is 2n . As 2n increases faster than n2 , there
end, we begin by a recursive definition of a series of partial cir- exists a positive number n such that 2n > R(n, d). This circuit
cuits P(n), n = 1, 2, 3, . . .. The partial circuit P(n) is the same is not placeable for the placement region with radius d.
as C(n) except that all output ports and wires connected to the The idea of this proof is simple. Given the critical path n of
output ports are removed. a circuit and the radius d of a placement region, all gates in
First, P(0) (see Fig. 7) is a circuit with a single gate, one the circuit have to be placed within the distance n ∗ d. If the
input and a wire. Formally, P(0) = (G0 , W0 , I0 ) where total number of gates in the circuit is greater than the cells in
this area, then the circuit is not placeable. Since it is possible
G0 = {g01 }
to construct a series of circuits whose sizes grow exponentially
I0 = {i} while the length of critical path grow linearly, eventually there
will be an unplaceable circuit in this series.
W0 = {(i, g01 )}.
The proof of this theorem suggests a practical method to
identify certain unplaceable situations through the estimation of
Second, assume P(n) = (Gn , Wn , In ) is defined, then
the upper bound of the size of placeable area for the circuit under
P(n + 1) is constructed as follows:
consideration. For example, we synthesized a 64-bit multiplier
n+1 in the NOR gate netlist that has 93 023 gates and a critical path
Gn +1 = Gn ∪ {gn1 +1 , . . . , gn2 +1 }
of length 87. If the radius of the placement region is 2, then the
In +1 = I0 cells available for this multiplier will be less than R(87, 2) =
(87 ∗ 2) ∗ (2 ∗ 87 ∗ 2 + 1) = 60726, which is much less than
n
2
Wn +1 = Wn ∪ {(gni , gn2i−1 i 2i
+1 ), (gn , gn +1 )}.
total 93 023 gates of the circuit. Thus, the circuit is not placeable
i=1 with placement region of radius 2.
This calculation can detect certain unplaceable circuits under
Given a partial circuit P(n), a circuit C(n) = given placement radius. It does not guarantee that a circuit is
(Gn , Wn , In , On ) can be constructed by adding the set of output placeable even when the total number of cells in the estimated
326 IEEE TRANSACTIONS ON NANOTECHNOLOGY, VOL. 8, NO. 3, MAY 2009
Fig. 11. Second base case of Theorem 9. 1) place C1 and move it 5 rows up; 2)
place g1 , g2 and fix the location for i1 ; 3) connect g1 , g2 to j1 , j2 , j3 , j4 . The
Fig. 10. Illustration of connection placement: 1) place subcircuit with input two gates g1 , g2 and the wiring (including buffers) occupy five rows of cells.
(i1 , . . . , in −1 ) in AREA2; 2) move AREA2 three cells up (become AREA ) and
leave input ports unchanged, add buffers to connect i1 , . . . , in −1 to AREA2 ;
3) move output ports three cells up and connect AREA2 to output ports; 4)
connect in to its destination output ports by wiring outside AREA2 .
Second base case (single input port). Consider the circuit C1 CMOL FPGA cell assignment is also a circuit mapping prob-
of C, which is same as C except that the input i1 and the gates lem, but with a rather different constraint—wire length between
directly connected to i1 are removed (Fig. 11). As the number pairs of nodes has a short upper bound. Besides, cell assignment
of gates of C1 is less than that of C, by induction assumption C1 involves both placement and routing. Due to these features, ex-
is placeable. Move the placed C1 five rows up to make space for isting placement tools are not directly applicable, and new auto-
the wiring of i1 . mated cell assignment methods need to be developed. A first step
Assume that C1 = (G1 , W1 , I1 , O1 ). Note that the connection in this direction is the study by Hung et al. who have encoded
between C1 and i1 has a number of variants depending on the the CMOL FPGA cell assignment as a satisfiability problem so
fanout of the input port i1 , the number of gates connected to i1 , that it can be solved automatically by SAT solvers [10]. This
and when G1 is not empty, the fanout of the gate(s) connected approach has made no attempt to modify circuit netlists.
to i1 . Fig. 11 is an illustrated placement for the case where i1 is In [21], Strukov and Likharev reported a successful placement
connected to two gates g1 , g2 and each gi has two connections. and routing of the Toronto 20 FPGA benchmark circuits. After
The two gates g1 and g2 are placed at the second row adjacent synthesizing a circuit into NOR gate netlist by SIS, they use T-
to i1 from bottom so that they can be connected to i1 directly VPack [2], [3] to partition the netlist into logic clusters. Each
without using buffers. After the placement of C1 , horizontal logic cluster is then mapped to a tile2 in CMOL FPGA using the
locations for the input ports of C1 are fixed, so buffers may be Versatile Place and Route (VPR) placement tool [2], [3]. The
needed to wire g1 , g2 to the input ports of C1 . Without loss of number of cells (N ) in a logic cluster is set to be less than the
generality, assume the leftest and the rightest input ports are number of cells (T ) in a tile. The remaining T –N cells in a tile
connected to g1 and g2 , respectively. If the required lengths of are served as routing cells, which will be utilized to store buffers
these two connections are longer than d, buffers can be inserted (pairs of inverters) to build long “wire”s in a custom routing
along the third row and the two columns leading to these input procedure. If the routing is not successful, the number N will
ports. Row 4 is for placing buffers to make connection from be decreased and the whole process of clustering, placement,
another line coming from g1 . Similarly, row 5 is for wiring and routing will be restarted.
another line from g2 . The placement and routing results of Strukov and Likharev
Second induction case (multiple input ports). Let I = are consistent with our theoretical analysis in three aspects.
{i1 , . . . , in } where n > 1. C can be split into two disjoint sub- First, on the negative side, our analysis predicts that, without
circuits C1 and C2 . The first circuit C1 is the subcircuit of C ac- structural modification, large and complicated circuits might not
cessible from in . The second subcircuit C2 contains input ports be placeble in CMOL FPGA. The fact that they have prepared
Ib = {i2 , . . . , in } and all gates and output ports that are acces- routing cells in each tile shows that their algorithm needs extra
sible from Ib except those contained in circuit C1 (Fig. 12). As- gates for circuit modification. Second, on the positive side, we
sume that C1 = (G1 , W1 , I1 , O1 ) and C2 = (G2 , W2 , I2 , O2 ). have shown that CMOL placement and routing can be realized
Subcase 1 (G1
= ∅ ∧ G2
= ∅). Since both C1 and C2 have by adding buffers. One of the essential steps in their procedure is
less number of gates than C, so they are placeable by the first in- to add buffers to connect gates located in distant logic clusters.
duction assumption. Move C1 above C2 and make the placement As it is hard to predict how many buffers are needed, their
of the connection C3 , which is placeable by Lemma 7. Finally, algorithm will sometimes increment the ratio of routing space
make the connection C4 from C1 and C2 to output ports, which in tiles and restart the placement and routing process. Third,
is again guaranteed placeable by Lemma 7. we conclude that placement and routing have to be integrated
Subcase 2 (G1 = ∅). In this case, G2 is the same as G, but for cell assignment. Their algorithm is actually an iteration of
the number of input ports of C2 is less than that of C. By the placement-and-routing until a feasible solution is found. This
second induction assumption, C2 is placeable. By Lemma 7, the can be viewed as a form of placement and routing integration.
connection from {in } ∪ O2 toward O is placeable. In summary, both their experimental study and our theoretical
Subcase 3 (G2 = ∅). In this case, G1 is the same as G. Note analysis confirm that buffer insertion is necessary and sufficient
that the number of input ports for C1 might not be less than for the placement and routing of CMOL circuits. We presented
that of C. Therefore, neither of the induction assumptions apply. the problem in a general setting and have obtained theoretical
However, as G1 is not empty, we can proceed as in the second conclusions. Their study demonstrates one feasible strategy to
base case by placing the subcircuit not directly connected to in solve the problem. It is expected that more strategies will be
first, then make the connection using Lemma 7. developed in the future.
The CMOL FPGA circuits investigated in this paper contain
only the most basic type of CMOL cell that has a single CMOS
V. RELATED WORKS inverter. A few more variants of this CMOL cell have recently
Traditionally, the task of a placement tool is to assign circuit been proposed in the literature. Strukov and Likharev [20] in-
components to locations in a given area in a way that certain troduced a latch cell into CMOL FPGA. Dong et al. [5] propose
global parameters such as total wire length are optimized [18].
In the simplest form of the placement problem, no limitation 2 A tile is an area containing T regular CMOS cells and one latch cell. The
on the length of individual gate-connecting wire is imposed. area taken by a latch cell is four times larger than that of a regular cell. Hence,
the total area is T +4 CMOS cells. In [21], T is set to 12. The intention is to
Though optimal placement is hard to achieve, feasible solutions support the implementation of a logic element with a 4-input Lookup Table
can always be found if there is sufficient amount of space. (LUT) and a latch.
CHEN et al.: A THEORETICAL INVESTIGATION ON CMOL FPGA CELL ASSIGNMENT PROBLEM 329
two more CMOL cells, namely T-Cell (transmission cell) and [4] A. DeHon and K. Likharev, “Hybrid CMOS/nanoelectronic digital cir-
D-Cell (D-flipflop cell), which can support the implementation cuits: Devices, architectures, and design automation,” in Proc. ICCAD,
Washington, DC: IEEE Computer Society, 2005, pp. 375–382.
of tristate buffer and clocked flipflop. Though these CMOL [5] C. Dong, W. Wang, and S. Haruehanroengra, “Efficient logic architectures
cells have different internal structures and functionalities, but for CMOL nanoelectronic circuits,” IET Micro. Nano. Lett., vol. 1, no. 2,
they share the common feature that the connection domain is pp. 74–78, Dec. 2006.
[6] C. Gao and D. Hammerstrom, “Cortical models onto CMOL and CMOS—
limited. Our results are irrelevant to the internal structures of architectures and performance/price,” IEEE Trans. Circuits Syst., vol. 54,
CMOL cells. They remain valid as long as cells have limited no. 11, pp. 2502–2515, Nov. 2007.
connection domains. [7] K. Golshan, Physical Design Essentials: An ASIC Design Implementation
Perspective. Secaucus, NJ: Springer-Verlag, 2007.
[8] K. Greene, “Nanotube circuits made practical,” Technol. Rev., Jun. 2007.
VI. SUMMARY [9] J. R. Heath, P. J. Kuekes, G. S. Snider, and R. S. Williams, “A defect-
tolerant computer architecture: Opportunities for nanotechnology,” Sci-
This paper lays out the theoretical foundation for the devel- ence, vol. 280, no. 12, pp. 1716–1721, Jun. 1998.
opment of cell assignment algorithm. The main contribution [10] W. Hung, C. Gao, X. Song, and D. Hammerstrom, “Defect tolerant CMOL
cell assignment via satisfiability,” presented at the Nanoelectronic Devices
are two theoretical results about combinatorial circuit place- Defense Security (NANO-DDS) Conf., Crystal City, VA, 2007.
ment in CMOL architecture. First, given the maximum length [11] M. Hutton and V. Betz, “FPGA synthesis and physical design,” in
of gate-connecting wire, there always exists a circuit that is not Electronic Design Automation for Integrated Circuits Handbook, vol 1,
L. Scheffer, L. Lavagno, and G. Martin, Eds. New York/Boca Raton,
placeable. Second, any circuit can be transformed to a function- FL: Taylor & Francis/CRC, 2006.
ally equivalent placeable circuit. Although this paper deals only [12] ITRS. (2004). International Technology Roadmap for Semiconductors
with the kind of CMOL cell implementing NOR gate, the results emerging research devices. [Online]. Available: http://public.itrs.net
[13] K. K. Likharev, “Hybrid semiconductor/nanoelectronic circuits. Invited
are actually applicable to circuits with different types of CMOL talk in IBM Almaden, Research Center,” Jun. 2007.
cells, as long as they are of limited connection domain. [14] K. K. Likharev and D. B. Strukov, “CMOL technology develpment
These results imply that a CMOL FPGA cell assignment tool roadmap,” in Proc. 4th Workshop Non-Silicon Comput., Jun. 2007, pp. 9–
16.
designed for mapping arbitrary complex circuits should have [15] W. Rao, A. Orailoglu, and R. Karri, “Topology aware mapping of
a tight integration of placement and routing, in which buffer logic functions onto nanowire-based crossbar architectures,” in DAC,
insertion is an indispensable step. E. Sentovich, Ed. New York: ACM, 2006, pp. 723–726.
[16] S. S. Sapatnekar, P. Saxena, and R. S. Shelar, Routing Congestion in VLSI
This paper is a theoretical investigation on the placement and Circuits: Estimation and Optimizatio (Series on Integrated Circuits and
routing problem in CMOL FPGA in the spirit of computation Systems). Secaucus, NJ: Springer-Verlag, 2007.
theory for computer science. Like the Turing Machine model, [17] L. Scheffer, L. Lavagno, and G. Martin, Eds., Electronic Design Au-
tomation for Integrated Circuits Handbook. New York/Boca Raton,
which helps to explain what functions are computable but not FL: Taylor & Francis/CRC, 2006.
intended to be a practical computer architecture, the algorithm [18] N. Sherwani, Algorithms for VLSI Physical Design Automation, 3rd ed.
presented in this paper is designed to illustrate the theoretical Norwell, MA: Kluwer, 1999.
[19] D. B. Strukov and K. Likharev, “CMOL FPGA circuits,” in CDES, H.
limitation and the routability in CMOL FPGA, but not intended R. Arabnia and M. M. Eshaghian-Wilner, Eds. Las Vegas, NV: CSREA,
to be a practical placement and routing solution. We think the 2006, pp. 213–219.
area from this placement would be too large than necessary. Our [20] D. B. Strukov and K. K. Likharev, “CMOL FPGA: A reconfigurable
architecture for hybrid digital circuits with two-terminal nanodevices,”
next goal is to work on the design and implementation of more Nanotechnology, vol. 16, pp. 888–900, 2005.
realistic cell assignment algorithms. [21] D. B. Strukov and K. K. Likharev, “A reconfigurable architecture for
hybrid CMOS/nanodevice circuits,” in Proc. 2006 ACM/SIGDA 14th
Int. Symp. Field Programmable Gate Arrays (FPGA 2006), New York:
ACKNOWLEDGMENT ACM, pp. 131–140.
[22] D. B. Strukov and K. K. Likharev, “Reconfigurable hybrid
We are grateful to anonymous referees who have given valu-
CMOS/nanodevice circuits for image processing,” IEEE Trans. Nan-
able feedback on an earlier version of this paper. otechnol., vol. 6, no. 6, pp. 696–710, Nov./Dec. 2007.
[23] K. Wang and A. Balandin, Eds., The Handbook of Semiconductor Nanos-
REFERENCES tructures and Nanodevices. Valencia, CA: America Scientific, Oct.
2005.
[1] V. Betz, “Placement for general purpose FPGAs,” in Reconfigurable
Computing, A. DeHon and S. Hauck, Eds. San Mateo, CA: Morgan
Kauffman, 2007.
[2] V. Betz and J. Rose, “VPR: A new packing, placement, routing tool for
FPGA research,” in Proc. 7th Int. Workshop Field-Programmable Logic
Appl. (FPL 1997), London, U.K.: Springer-Verlag, pp. 213–222.
[3] V. Betz, J. Rose, and A. Marquard, Architecture and CAD for Deep-
Submicron FPGAs. Norwell, MA: Kluwer, 1999. Author’s photographs and biographies not available at the time of publication.