0% found this document useful (0 votes)
74 views

A Theoretical Investigation On CMOL FPGA Cell Assignment Problem

cell

Uploaded by

suganya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views

A Theoretical Investigation On CMOL FPGA Cell Assignment Problem

cell

Uploaded by

suganya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

322 IEEE TRANSACTIONS ON NANOTECHNOLOGY, VOL. 8, NO.

3, MAY 2009

A Theoretical Investigation on CMOL FPGA Cell


Assignment Problem
Gang Chen, Xiaoyu Song, and Ping Hu

Abstract—The hybrid CMOS/nano circuits (CMOL) field-


programmable gate array (FPGA) is a promising nanotechnology
that has the potential to be accepted by industry in the future.
However, a primary question to be addressed is whether or not all
circuits can be mapped on CMOL architecture. In contrast to tra-
ditional placement and routing problems, CMOL cell assignment
has the constraint that each gate can only be wired to a limited
number of gates in its neighborhood. Under such a restriction, not
all circuits are directly placeable. This paper presents two theoreti-
cal results concerning whether a combinatorial circuit is placeable
in CMOL FPGA. For any finite connection domain, we prove the
existence of a few nonplaceable circuits under certain conditions.
Given a reasonable connection domain size, we show that any com-
binatorial circuit can be transformed to an equivalent circuit which
is placeable. These results conclude that the CMOL cell assignment
problem is solvable but circuit modification has to be part of the
placement procedure.
Index Terms—Cell assignment, the hybrid CMOS/nano circuits
(CMOL), nanotechnology, placement and routing.

Fig. 1. CMOL architecture illustration. A NOR gate is formed by an inverter


I. INTRODUCTION A and two nanoswitches B and C.

HE NANOTECHNOLOGY with feature size below 10


T nm has the potential to provide huge density improvement
over the current CMOS technology [4], [6], [12], [13]. How-
more than 300 adders can be placed to CMOL FPGA, and run
faster.
The CMOL FPGA of Strukov and Likharev has several vari-
ever, at such a small scale, fabricated chips will exhibit a high ants. The one adopted in this paper is from [20], which is de-
percentage of defects, probably as much as 20%–50%. To ad- signed for the implementation of combinatorial logic. It consists
dress this problem, a number of defect-tolerant architectures of two structures stacked one upon another (Fig. 1). The first
have been invented [8], [9], [23]. Some of them use dupli- level is a matrix of CMOS inverter cells (shown as the solid line
cated devices to increase the reliability of the circuit. Others grid), while the second level is a grid of nanowires (shown as the
adopt programmable architectures to bypass defect elements dashed line grid). Each inverter is connected to two orthogonal
through reconfiguration [15], [20]–[22]. Among them, the hy- nanowires, from which the inverter can be connected to other
brid CMOS/nano circuits (CMOL) field-programmable gate ar- inverters via configurable nanoswitches at intersections of the
rays (FPGA) approach proposed by Strukov and Likharev [20] nanowire grid. For example, the inverter A in Fig. 1 connects to
seems particularly interesting. In their report, it is calculated that nanowires x6 and y4. The nanoswitches at intersections B and
a 32-bit Kogge–Stone adder can be mapped in CMOL within C are preset to “ON.” The inverter A along with nanoswitches B
an area about 110 µm2 with 1.3 ns delay. In comparison, it is and C forms a NOR gate, where x4 and x12 are the inputs and x6
estimated that the same circuit will take 39000 µm2 with 1.7 ns is the output. A physical constraint to this architecture is the lim-
delay in future Xilinx FPGA with 32 nm CMOS technology. As ited length of nanowires (about 10–20 cells long). This means
a result, in an area for a single adder in 32 nm CMOS FPGA, that a gate can access only the gates in its neighborhood, called
the connection domain. A combinatorial circuit can be mapped
Manuscript received July 14, 2008; revised September 30, 2008 and to this architecture by setting the configurable nanoswitches,
December 5, 2008. First published December 22, 2008; current version pub- this process is called cell assignment [20].
lished May 6, 2009. The review of this paper was arranged by Associate Editor
K. K. Likharev. From an abstract point of view, CMOL architecture can be
G. Chen is with the Lingcore Laboratory, Portland, OR 97224 USA (e-mail: viewed as a 2-D cell array. Each cell contains a NOR gate that
[email protected]). can be routed to the gates in its connection domain at the con-
X. Song is with the Electrical and Computer Engineering Department,
Portland State University, Portland, OR 97207 USA (e-mail: [email protected]). figuration stage.
P. Hu is with the Portland Group, ST Microelectronics, Portland, OR 97035 The exact shape and the size of connection domain will be
USA (e-mail: [email protected]). eventually determined by future nanotechnology. Nevertheless,
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. the larger the region, the higher the defect rate. In [20], the size of
Digital Object Identifier 10.1109/TNANO.2008.2011732 a connection domain is determined by a parameter r such that the
1536-125X/$25.00 © 2009 IEEE
CHEN et al.: A THEORETICAL INVESTIGATION ON CMOL FPGA CELL ASSIGNMENT PROBLEM 323

Fig. 2. Illustration of the fanout splitting transformation. The point i initially


has a fanout value 7 as shown on the left side of the graph. We take away four
wires d, e, f, g and insert two NOT gates to form a buffer, shown as the line Fig. 3. Cell assignment area.
i − x in the picture. The wires d, e, f, g are connected to x, which is the end of
the inserted buffer. Therefore, the fanout at node i is reduced to 4.
gorithm typically works in two stages. First, an initial placement
total cells in the connection domain is equal to 2r(r − 1) − 1. In is constructed; second, the initial placement is optimized. The
their demonstration example of the implementation of a Kogge– former is easy, while the latter is difficult. However, as demon-
Stone adder, r is taken to be 12. Thus, there are altogether 263 strated in this paper, the construction of an initial placement in
gates in the connection domain of a gate. Due to high defect rate CMOL cell assignment is far from trivial. As such, we leave
of nanowires (estimated at 50%), not all of these gates are usable. the placement optimization problem in future study, and focus
The cell assignment process as described in [20] has two exclusively on the finding of initial placements. The current re-
phases. In the first phase, gates are assigned to cells and search should be viewed as the theoretical foundation for cell
connections are wired as if there were no defects. In the second assignment algorithms.
phase, the mapping is updated so that defect connections are In the next section, we formalize the CMOL FPGA cell as-
bypassed. Strukov and Likharev have developed an algorithm to signment (or placement) problem. Section III shows that for any
automatically perform the second task, while the first one was connection domain of finite size, there always exists a nonplace-
carried out manually [20]. By this reason, only the first phase able circuit, even if the maximum fanout is restricted to two.
of the cell assignment is taken into consideration in this paper. Section IV presents the main theorem whose proof contains an
Since the initial mapping will be adjusted at the second phase algorithmic procedure transforming a circuit to a placeable one.
for defects correction, the size of the connection domain in the The last two sections discuss related research and summarize
first phase has to be even smaller. In the demonstration example the results of this study.
of [20], the initial connection domain parameter r is set to 10.
Therefore, there are only 179 cells available in a connection II. FORMULATION OF THE CELL ASSIGNMENT PROBLEM
domain. The central problem of this study is whether or not a combina-
The limitation on the size of the connection domain brings up torial NOR circuit can be placed in an infinite large area in which
the question whether an arbitrary complex circuit can be placed the distance between each pair of gates should be less than a
or not. This paper explores this problem from a theoretical predefined number. For the formal investigation of this problem,
perspective. we assume the placement region is bounded only by horizontal
Assume mapping a 180-bit multiplier netlist, then each of its and vertical coordinate lines as shown in Fig. 3. More precisely,
input ports will have a fanout value 180, larger than the total the area for cell assignment is the set of infinite amount of pairs
number of cells in the connection domain just described. Since of coordinates
each wire must connect to a cell within the connection domain,
A = { (x, y) | x ≥ 0 ∧ y ≥ 0 }.
this circuit is obviously not placeable if no modification is made.
As a matter of fact, much smaller circuits might still not be A combinatorial NOR gate circuit, hereinafter referred
placeable. In our initial experiment, the placer failed in placing to in short as “circuit,” is a tuple of four components:
a 16-bit multiplier. A closer look into the problem reveals a C = (G, W, I, O), where G is the set of NOR gates (or simply
high congestion near the input side where each port has a fanout called gates); I and O are sets of input and output ports,
value 16. respectively; W is the set of wires. A gate here is an abstract
The high fanout problem, however, can be resolved by insert- object that can be connected to one or two input wires and
ing fanout splitting buffers as illustrated in Fig. 2. Using this one or more output wires. If a NOR gate is used as a NOT
method, we can prove that any circuit can be converted to a cir- gate, it connects to one input wire, otherwise it connects to
cuit with a maximum fanout value equal to or less than 2. Will two input wires. The functional and physical properties of
this kind of circuits always be placeable? Unfortunately not, we NOR gate are of no concern here. A wire is a connection
will show that, for any finite size connection domain, there exists from an input port or a gate to another gate or an output
a circuit whose maximum fanout value is two but it is still not port. Formally, G, I, O are disjoint sets of symbols, also
placeable. The reason is that such a circuit may still have high called nodes. The set of nodes is denoted by N . A wire is
congestion areas. Fortunately, the main theorem of this paper a pair of nodes. Thus, a circuit can be viewed as a directed
ensures that any combinatorial circuit can be transformed to a graph of nodes and wires. Fig. 4 is a sample circuit defined
functionally equivalent placeable circuit. by G = {g1 , g2 , g3 , g4 }, I = {i1 }, O = {o1 , o2 }, and W =
It is worthwhile to compare the cell assignment problem to {(i1 , g1 ), (i1 , g2 ), (i1 , g3 ), (g1 , g2 ), (g1 , g3 ), (g1 , g4 ), (g2 , g4 ),
the traditional placement problem. A traditional placement al- (g3 , o2 ), (g4 , o1 )}.
324 IEEE TRANSACTIONS ON NANOTECHNOLOGY, VOL. 8, NO. 3, MAY 2009

Fig. 6. Sample cell assignment.


Fig. 4. Simple circuit of NOR gates.

connection domain is d, then the insertion of a buffer can make


a gate to reach another node 3d cells away. A buffer can also be
used to reduce the fanout of a gate. The next section will show
details of these constructions.

III. UNPLACEABLE CIRCUITS


With unlimited fanout size, it is easy to show that unplaceable
circuits exist.
Fig. 5. Connection domain of a cell.
Lemma 1 (Trivial unplaceable circuit): If the radius of con-
nection domain is d, then there is a circuit that is not placeable.
Proof: Let a circuit C0 = (G, W, I, O) be such that G contains
The number of wires connected to the output of a node n is a gate g with fanout value larger than T (d). Then, no connection
called the fanout value of n, denoted by F (n). An input port domain has enough cells to be assigned to all nodes connected
node or a gate node can have fanout value from one to many, an to g, so C0 is not placeable. 
output port node has fanout value zero. Big fanout value is an obstacle to solve the cell assignment
A connection domain D of a cell C in the placement region problem. Nevertheless, any circuit can be converted to an equiv-
is a finite set of cells around C. In the paper [20], the shape of alent circuit so that the maximum fanout value of all nodes will
connection domain is close to a rectangle. For the simplicity of be less than or equal to two. The following is a conversion
this theoretical study, we define the connection domain of cell C algorithm.
at location (x, y) with radius d as the set of cells (u, v) such that Algorithm 2 (Fanout reduction): Assume that the input circuit
v is larger or equal to y and that (u, v) is within the distance of d is C = (G, W, I, O) and N = G ∪ I ∪ O.
cells from (x, y). This set forms a rectangle above C (see Fig. 5). Loop:
The total number of cells in the connection domain with radius if for any node n ∈ N , F (n) ≤ 2
d, denoted by T (d), is equal to (d + 1) ∗ (2d + 1) − 1, as long then return the current circuit C
as it is not cut by the boundaries. The radius of a placement area else
is defined as the radius of connection domain of the placement let n be a node such that F (n) = k > 2.
area. let W  = {(n, ai )| ai ∈ G, i = 1, . . . , k}
A cell assignment (or placement) with radius d (or sim- let m = k/2
ply “cell assignment” when d is known) for a circuit C = create two NOT gates g1 , g2 and a set of wires
(G, W, I, O) is a mapping from the set of nodes(N ) to the set {(n, g1 ), (n, a1 ), . . . , (n, am ), (g1 , g2 ), (g2 , am +1 )
of cells(A) A : N → A such that for any pair of nodes n1 and , . . . , (g2 , ak )}
n2 , which are connected by a wire w, A(n2 ) should be located goto Loop
within the connection domain of A(n1 ). By this definition, the For each node n with fanout value F (n) = k > 2, the al-
distance between A(n1 ) and A(n2 ) will be equal to or less than gorithm adds two connected NOT gates g1 , g2 as a buffer
d. Fig. 6 is a cell assignment of the circuit in Fig. 3 for a place- and updates wire connections in such a way that F (n) =
ment region with radius equal to or larger than 2. In this paper, (k/2) + 1, F (g1 ) = 1, F (g2 ) = k − k/2. The fanout values of
we consider only those cell assignments with radius at least 2. g1 , g2 and that of the updated n are strictly less than the initial
Given a radius d, we say a circuit is placeable if there is a cell fanout value k.
assignment with radius d for the circuit, otherwise the circuit is Definition 3 (Circuit of degree n): A circuit is of degree n if
said not placeable. the maximum fanout value of its nodes is equal to n.
Two NOT gates can be connected together to form a buffer The following lemma shows that the Algorithm 2 transforms
that can be used to extend the length of a wire. If the radius of a circuit to a circuit of degree 2.
CHEN et al.: A THEORETICAL INVESTIGATION ON CMOL FPGA CELL ASSIGNMENT PROBLEM 325

Fig. 7. Partial circuit P0 .

Lemma 4 (Degree 2 fanout reduction): Any circuit C =


(G, W, I, O) can be transformed to a logically equivalent circuit
C  = (G , W  , I, O) such that F (n) ≤ 2 for all n ∈ G , I, O. Fig. 8. Circuit C(n).
Proof: Apply the Algorithm 2 to C. The algorithm is terminat-
ing since each node n with fanout value F (n) > 2 is replaced
by three new nodes whose fanout values are strictly smaller than ports (On ) and the set of wires connecting to output ports (Wn )
F (n). The insertion of buffers at each loop step does not change
On = {o1 , . . . , o2 n }
the functionality of the circuit. Therefore, the returned circuit C 

n
is equivalent to the input circuit and the maximum fanout value 2

of C  is less than or equal to 2.  Wn = Wn {(gni , oi )}.


The fanout reduction as described in Algorithm 2 may sig- i=1

nificantly increase the number of gates and the total length of Informally, C(n) has the same number of output ports as the
wires, as well as the length of critical path. A practical fanout number of last row of gates and each output port is connected
reduction algorithm should be designed that does not extend the to the corresponding gate at last row.
critical path too much. Note that the circuit is a complete binary tree as shown in
The aforesaid lemma says that any circuit can be transformed Fig. 8.
to a circuit of degree 2. Though the small fanout size of cir- Now we are in a position to show that, for any d, there exists
cuits of degree 2 makes them easier to place, they may still be n such that C(n) is not placeable. Instead of proving this result
unplaceable, even the placement area is infinitely large. directly, we can prove a stronger result that there exists a partial
Theorem 5 (Not placeable circuit of degree 2): For any posi- circuit P(n) that is not placeable.
tive number d, there is a circuit of degree 2 that is not placeable For the given connection domain radius d, all gates in C(n)
in a placement region with radius d. must be placed within the distance of n ∗ d from i. This area
Proof: We will define a sequence of circuits C(n), n = forms a rectangle containing total of R(n, d) = (n ∗ d) ∗ (2 ∗
1, 2, 3, . . . , so that for any d, there exists an n such that C(n) n ∗ d + 1) ∈ O(n2 ). However, the total number of gates at the
is not placeable for the placement region with radius d. To this last row of C(n) is 2n . As 2n increases faster than n2 , there
end, we begin by a recursive definition of a series of partial cir- exists a positive number n such that 2n > R(n, d). This circuit
cuits P(n), n = 1, 2, 3, . . .. The partial circuit P(n) is the same is not placeable for the placement region with radius d. 
as C(n) except that all output ports and wires connected to the The idea of this proof is simple. Given the critical path n of
output ports are removed. a circuit and the radius d of a placement region, all gates in
First, P(0) (see Fig. 7) is a circuit with a single gate, one the circuit have to be placed within the distance n ∗ d. If the
input and a wire. Formally, P(0) = (G0 , W0 , I0 ) where total number of gates in the circuit is greater than the cells in
this area, then the circuit is not placeable. Since it is possible
G0 = {g01 }
to construct a series of circuits whose sizes grow exponentially
I0 = {i} while the length of critical path grow linearly, eventually there
will be an unplaceable circuit in this series.
W0 = {(i, g01 )}.
The proof of this theorem suggests a practical method to
identify certain unplaceable situations through the estimation of
Second, assume P(n) = (Gn , Wn , In ) is defined, then
the upper bound of the size of placeable area for the circuit under
P(n + 1) is constructed as follows:
consideration. For example, we synthesized a 64-bit multiplier
n+1 in the NOR gate netlist that has 93 023 gates and a critical path
Gn +1 = Gn ∪ {gn1 +1 , . . . , gn2 +1 }
of length 87. If the radius of the placement region is 2, then the
In +1 = I0 cells available for this multiplier will be less than R(87, 2) =
(87 ∗ 2) ∗ (2 ∗ 87 ∗ 2 + 1) = 60726, which is much less than

n
2
Wn +1 = Wn ∪ {(gni , gn2i−1 i 2i
+1 ), (gn , gn +1 )}.
total 93 023 gates of the circuit. Thus, the circuit is not placeable
i=1 with placement region of radius 2.
This calculation can detect certain unplaceable circuits under
Given a partial circuit P(n), a circuit C(n) = given placement radius. It does not guarantee that a circuit is
(Gn , Wn , In , On ) can be constructed by adding the set of output placeable even when the total number of cells in the estimated
326 IEEE TRANSACTIONS ON NANOTECHNOLOGY, VOL. 8, NO. 3, MAY 2009

placeable region is greater than the number of gates in the circuit.


First, most gates have to be placed within a distance less than the
length of critical path times the radius; second, big fanout may
prevent a gate from being placed as discussed earlier. For the
64-bit multiplier in the previous paragraph, every input of this
circuit has a fanout of 64; therefore, the total number of cells in a
connection domain has to be greater than 64. The consequence
is that the radius of connection domain should be at least 6
(R(1, 6) = 78). Third, the congestion problem. Since connected Fig. 9. Placement strategy for the connection from a single input port to
nodes have to be placed close to each other, their connection multiple output ports. Each square box is a NOT gate with one or two fanouts.
domains are overlapped, causing a shrink of placeable space
for each node. The 64-bit multiplier has 128 input ports, each
has a fanout value 64. Assume the connection domain radius is vertically closer or away from each other, but the relative hori-
7 and the placement distance between input ports is 7. Then, zontal locations of the input/output ports are fixed.
the total number of cells available for the placement of gates The study of the connection placement problem has two steps.
that connects directly to input ports is approximately 128 ∗ 7 ∗ First, we will show that a circuit with a single input port can be
7 = 6272. However, the total number of gates connected to placed. Recall that circuits with high fanout value often become
these input ports is 64 ∗ 128 = 8192,1 which is greater than the obstacles to the placement. Lemma 4 told us that any circuit
number of available cells. By this calculation, the radius 7 is can be converted to a circuit with fanout value at most 2. But
still too small for the placement. Fourth, gate insertion demands Theorem 5 says that such a circuit might still be not placeable.
more space. Buffers have to be inserted for a number of reasons: Hence, we need further modifications to the circuit.
1) if the distance of two placed gates is greater than the radius, The constructive proofs of the following lemmas and the next
one or more buffers have to be added if they require a connection. theorem contain an algorithm for making a transformation so
In the 64-bit multiplier example, each input port has to connect that a circuit can be successfully placed. Though it might use
to a gate that also connects to one of the other 63 input ports, an area much larger than necessary, it should be remembered
the radius value 7 is obviously not long enough for this purpose; that this is for the proof of the theoretic results, which clears the
2) buffers have to be added to reduce node fanout, etc. Each doubt about whether a placement can be made or not.
buffer consists of two NOT gates, the addition of buffer increases Lemma 6 (Single input multiple output): Let d be the radius
the total number of gates to be placed. By all these reasons, the of the placement region. Assume that d > 2. Given a circuit
radius for 64-bit multiplier has to be bigger than 7. C = (∅, W, {i}, O) such that the horizontal locations of input
and output ports of C are fixed. Then, C can be converted to an
IV. CONVERSION TO PLACEABLE CIRCUIT equivalent placeable circuit.
The next question is that, for an unplaceable circuit, is it possi- Proof: We construct an equivalent circuit by adding NOT gates
ble to transform it to a functionally equivalent placeable circuit ? to the original circuit. The placement is illustrated in Fig. 9. The
NOT gates in the first row from bottom have fanout values from
The main theorem in this section gives a positive answer to this
question. The basic idea is to add buffers to move subcom- 1 to 3 depending on the location. Since d > 2, the connection
ponents of the circuit far away from each other so that each domain has enough cells for fanout 3 connections. These newly
component can have sufficient placement area. A key problem inserted gates serve to reduce the fanout of i1 . Whenever nec-
to this approach is how to place the connections between the essary, NOT gates on the second row are added so that each path
components. from the input port i1 to an output port ok has an even number
Therefore, our first step is to prove that the connections be- of NOT gates. If the horizontal distance between two adjacent
tween two modules can be placed. This problem can be formu- output ports is greater than d, then more buffers are inserted in
the first row to ensure a legal connection. 
lated as how to place a circuit that has n input ports and m output
ports, but the set of gates is empty. We show that this circuit can The following lemma shows how to make a placement for
be converted to an equivalent circuit that is placeable. connections between input ports and output ports.
While the original placement problem has no restriction on Lemma 7 (Connection placement): Let d be the radius of
where to place input/output ports, the connection placement the placement region. Assume that d > 2. Given a circuit C =
problem assumes that the horizontal locations of input/output (∅, W, I, O). Assume that the horizontal placement locations
ports are fixed. This is because connections are wired after the for all i ∈ I and all o ∈ O are fixed. Then, the circuit C can be
placement of modules. A placed module can be moved around, converted to an equivalent placeable circuit.
but the locations of the input/output ports relative to its place- Proof: The assertion is proved by induction on the total num-
ment boundary cannot be changed. When making connections ber n of input ports.
between two modules, we assume that modules can be moved Base case n = 1. This case is covered by Lemma 6.
Induction case n > 1. Assume that the set of input ports I =
1 A pair of adjacent input ports may share a connecting gate, which should
{i1 , . . . , in } and that the set of output ports O = {o1 , . . . , om }.
be excluded from the aforesaidb gate count. But the maximum number of the Note that each input port may be connected to one or more
sharing gates is less than 128; therefore, it can be safely ignored. output ports. Let W1 , W2 be two subsets of W such that
CHEN et al.: A THEORETICAL INVESTIGATION ON CMOL FPGA CELL ASSIGNMENT PROBLEM 327

Fig. 11. Second base case of Theorem 9. 1) place C1 and move it 5 rows up; 2)
place g1 , g2 and fix the location for i1 ; 3) connect g1 , g2 to j1 , j2 , j3 , j4 . The
Fig. 10. Illustration of connection placement: 1) place subcircuit with input two gates g1 , g2 and the wiring (including buffers) occupy five rows of cells.
(i1 , . . . , in −1 ) in AREA2; 2) move AREA2 three cells up (become AREA ) and
leave input ports unchanged, add buffers to connect i1 , . . . , in −1 to AREA2 ;
3) move output ports three cells up and connect AREA2 to output ports; 4)
connect in to its destination output ports by wiring outside AREA2 .

W1 = {(i, o)|i = in } and W2 = {(i, o) | i


= in }. Set I1 =
{in }, I2 = {i2 , . . . , im }, O1 = {o | (i, o) ∈ W1 }, and O2 =
{o | (i, o) ∈ W2 }. By induction assumption, the circuit C2 =
(∅, W2 , I2 , O2 ) is placeable. Assume C2 is placed within
the areas AREA2 = {(x, y) | x ∈ [0, p] ∧ y ∈ [0, q]}. Move
AREA2 three cells up to AREA2 = {(x , y  ) | x = x, y  =
y + 3, (x, y) ∈ AREA2 }. Note that this changes the locations
of all gates, while keeping the input ports at their original places.
Among the three cells allocated in each column, two of them
will be used for making a vertical buffer, another one is left for a
possible horizontal buffer (not shown in the picture). Then, con- Fig. 12. Second induction case of Theorem 9. 1) Place all nodes connected to
tinue to move output ports three cells up. After making spaces in as C1 ; 2) place all nodes connected to i2 , . . . , in −1 as C2 ; 3) move C1 above
C2 ; 4) place the connection between C1 and C2 as C3 ; 4) make connections for
below and above AREA2 , insert vertical buffers to reconnect in and the output ports o1 , . . . , om ; 5) make connections C4 from C1 , C2 to
new input/output ports to the gates previously connected with output ports.
the old input/output ports. Finally, add buffers and wire in to its
destination output ports as illustrated in Fig. 10. 
Now we are in the position to prove the main theorem of this W  = {(n1 , n2 ) | (n1 , n2 ) ∈ W ∧ n1 , n2 ∈ I  ∪ AccG(I  ) ∪
section. The idea of the proof can be illustrated in Figs. 11 and 12 AccO(I  )}. The circuit C  = (AccG(I  ), W  , I  , AccO(I  )) is
(details are given in the following proof). If a circuit has only one called the subcircuit of C accessible from I  .
input port (refer Fig. 11), then we take out the input port as well Theorem 9 (Conversion for placement): Given a circuit C and
as the gates directly connected to the input port. The remaining a connection domain radius d, there is a circuit C  such that C  is
circuit C1 is “smaller” and it should be placeable. After that, logically equivalent to C and that C  is placeable in a placement
we place the input ports and the gates directly connected to the region with radius d > 2.
input port. If there are more than one input ports, we begin by Proof: By the fanout reduction described in Lemma 4, we
placing the subcircuits connected to the first n − 1 input ports need only to prove that a circuit of degree 2 can be transformed
(C1 in Fig. 12), then place the last input ports (C2 ), and finally to a placeable circuit. The assertion is proved by induction on
place the connection between the two subcircuits (C3 ) and the the total number of gates. The induction case is proved by a
connection toward the output ports (C4 ). second induction on the number of input ports.
We need a precise definition for accessible subcircuits with Base case. Let CirBase be the set of circuits that does not
respect to a subset of input ports. contain any gate. Then, each circuit in this set is a network of
Notation 8 (Accessible subcircuit): Given a circuit C = connections between input ports and output ports. By Lemma
(G, W, I, O) and a subset of input ports I  ⊂ I. If g ∈ G is 7, these circuits can be converted to functionally equivalent
a gate such that there is a path from i ∈ I  to g, then it is said placeable circuits.
that g is accessible from I  . The set of all gates accessible Induction case. Suppose that the circuit C does not belong
from I  is denoted by AccG(I  ). The notion of the set of out- to the set CirBase. Assume that C = (G, W, I, O). The proof
put ports AccO(I  ) accessible from I  is defined similarly. Let proceeds by induction on the number of input ports.
328 IEEE TRANSACTIONS ON NANOTECHNOLOGY, VOL. 8, NO. 3, MAY 2009

Second base case (single input port). Consider the circuit C1 CMOL FPGA cell assignment is also a circuit mapping prob-
of C, which is same as C except that the input i1 and the gates lem, but with a rather different constraint—wire length between
directly connected to i1 are removed (Fig. 11). As the number pairs of nodes has a short upper bound. Besides, cell assignment
of gates of C1 is less than that of C, by induction assumption C1 involves both placement and routing. Due to these features, ex-
is placeable. Move the placed C1 five rows up to make space for isting placement tools are not directly applicable, and new auto-
the wiring of i1 . mated cell assignment methods need to be developed. A first step
Assume that C1 = (G1 , W1 , I1 , O1 ). Note that the connection in this direction is the study by Hung et al. who have encoded
between C1 and i1 has a number of variants depending on the the CMOL FPGA cell assignment as a satisfiability problem so
fanout of the input port i1 , the number of gates connected to i1 , that it can be solved automatically by SAT solvers [10]. This
and when G1 is not empty, the fanout of the gate(s) connected approach has made no attempt to modify circuit netlists.
to i1 . Fig. 11 is an illustrated placement for the case where i1 is In [21], Strukov and Likharev reported a successful placement
connected to two gates g1 , g2 and each gi has two connections. and routing of the Toronto 20 FPGA benchmark circuits. After
The two gates g1 and g2 are placed at the second row adjacent synthesizing a circuit into NOR gate netlist by SIS, they use T-
to i1 from bottom so that they can be connected to i1 directly VPack [2], [3] to partition the netlist into logic clusters. Each
without using buffers. After the placement of C1 , horizontal logic cluster is then mapped to a tile2 in CMOL FPGA using the
locations for the input ports of C1 are fixed, so buffers may be Versatile Place and Route (VPR) placement tool [2], [3]. The
needed to wire g1 , g2 to the input ports of C1 . Without loss of number of cells (N ) in a logic cluster is set to be less than the
generality, assume the leftest and the rightest input ports are number of cells (T ) in a tile. The remaining T –N cells in a tile
connected to g1 and g2 , respectively. If the required lengths of are served as routing cells, which will be utilized to store buffers
these two connections are longer than d, buffers can be inserted (pairs of inverters) to build long “wire”s in a custom routing
along the third row and the two columns leading to these input procedure. If the routing is not successful, the number N will
ports. Row 4 is for placing buffers to make connection from be decreased and the whole process of clustering, placement,
another line coming from g1 . Similarly, row 5 is for wiring and routing will be restarted.
another line from g2 . The placement and routing results of Strukov and Likharev
Second induction case (multiple input ports). Let I = are consistent with our theoretical analysis in three aspects.
{i1 , . . . , in } where n > 1. C can be split into two disjoint sub- First, on the negative side, our analysis predicts that, without
circuits C1 and C2 . The first circuit C1 is the subcircuit of C ac- structural modification, large and complicated circuits might not
cessible from in . The second subcircuit C2 contains input ports be placeble in CMOL FPGA. The fact that they have prepared
Ib = {i2 , . . . , in } and all gates and output ports that are acces- routing cells in each tile shows that their algorithm needs extra
sible from Ib except those contained in circuit C1 (Fig. 12). As- gates for circuit modification. Second, on the positive side, we
sume that C1 = (G1 , W1 , I1 , O1 ) and C2 = (G2 , W2 , I2 , O2 ). have shown that CMOL placement and routing can be realized
Subcase 1 (G1
= ∅ ∧ G2
= ∅). Since both C1 and C2 have by adding buffers. One of the essential steps in their procedure is
less number of gates than C, so they are placeable by the first in- to add buffers to connect gates located in distant logic clusters.
duction assumption. Move C1 above C2 and make the placement As it is hard to predict how many buffers are needed, their
of the connection C3 , which is placeable by Lemma 7. Finally, algorithm will sometimes increment the ratio of routing space
make the connection C4 from C1 and C2 to output ports, which in tiles and restart the placement and routing process. Third,
is again guaranteed placeable by Lemma 7. we conclude that placement and routing have to be integrated
Subcase 2 (G1 = ∅). In this case, G2 is the same as G, but for cell assignment. Their algorithm is actually an iteration of
the number of input ports of C2 is less than that of C. By the placement-and-routing until a feasible solution is found. This
second induction assumption, C2 is placeable. By Lemma 7, the can be viewed as a form of placement and routing integration.
connection from {in } ∪ O2 toward O is placeable. In summary, both their experimental study and our theoretical
Subcase 3 (G2 = ∅). In this case, G1 is the same as G. Note analysis confirm that buffer insertion is necessary and sufficient
that the number of input ports for C1 might not be less than for the placement and routing of CMOL circuits. We presented
that of C. Therefore, neither of the induction assumptions apply. the problem in a general setting and have obtained theoretical
However, as G1 is not empty, we can proceed as in the second conclusions. Their study demonstrates one feasible strategy to
base case by placing the subcircuit not directly connected to in solve the problem. It is expected that more strategies will be
first, then make the connection using Lemma 7.  developed in the future.
The CMOL FPGA circuits investigated in this paper contain
only the most basic type of CMOL cell that has a single CMOS
V. RELATED WORKS inverter. A few more variants of this CMOL cell have recently
Traditionally, the task of a placement tool is to assign circuit been proposed in the literature. Strukov and Likharev [20] in-
components to locations in a given area in a way that certain troduced a latch cell into CMOL FPGA. Dong et al. [5] propose
global parameters such as total wire length are optimized [18].
In the simplest form of the placement problem, no limitation 2 A tile is an area containing T regular CMOS cells and one latch cell. The

on the length of individual gate-connecting wire is imposed. area taken by a latch cell is four times larger than that of a regular cell. Hence,
the total area is T +4 CMOS cells. In [21], T is set to 12. The intention is to
Though optimal placement is hard to achieve, feasible solutions support the implementation of a logic element with a 4-input Lookup Table
can always be found if there is sufficient amount of space. (LUT) and a latch.
CHEN et al.: A THEORETICAL INVESTIGATION ON CMOL FPGA CELL ASSIGNMENT PROBLEM 329

two more CMOL cells, namely T-Cell (transmission cell) and [4] A. DeHon and K. Likharev, “Hybrid CMOS/nanoelectronic digital cir-
D-Cell (D-flipflop cell), which can support the implementation cuits: Devices, architectures, and design automation,” in Proc. ICCAD,
Washington, DC: IEEE Computer Society, 2005, pp. 375–382.
of tristate buffer and clocked flipflop. Though these CMOL [5] C. Dong, W. Wang, and S. Haruehanroengra, “Efficient logic architectures
cells have different internal structures and functionalities, but for CMOL nanoelectronic circuits,” IET Micro. Nano. Lett., vol. 1, no. 2,
they share the common feature that the connection domain is pp. 74–78, Dec. 2006.
[6] C. Gao and D. Hammerstrom, “Cortical models onto CMOL and CMOS—
limited. Our results are irrelevant to the internal structures of architectures and performance/price,” IEEE Trans. Circuits Syst., vol. 54,
CMOL cells. They remain valid as long as cells have limited no. 11, pp. 2502–2515, Nov. 2007.
connection domains. [7] K. Golshan, Physical Design Essentials: An ASIC Design Implementation
Perspective. Secaucus, NJ: Springer-Verlag, 2007.
[8] K. Greene, “Nanotube circuits made practical,” Technol. Rev., Jun. 2007.
VI. SUMMARY [9] J. R. Heath, P. J. Kuekes, G. S. Snider, and R. S. Williams, “A defect-
tolerant computer architecture: Opportunities for nanotechnology,” Sci-
This paper lays out the theoretical foundation for the devel- ence, vol. 280, no. 12, pp. 1716–1721, Jun. 1998.
opment of cell assignment algorithm. The main contribution [10] W. Hung, C. Gao, X. Song, and D. Hammerstrom, “Defect tolerant CMOL
cell assignment via satisfiability,” presented at the Nanoelectronic Devices
are two theoretical results about combinatorial circuit place- Defense Security (NANO-DDS) Conf., Crystal City, VA, 2007.
ment in CMOL architecture. First, given the maximum length [11] M. Hutton and V. Betz, “FPGA synthesis and physical design,” in
of gate-connecting wire, there always exists a circuit that is not Electronic Design Automation for Integrated Circuits Handbook, vol 1,
L. Scheffer, L. Lavagno, and G. Martin, Eds. New York/Boca Raton,
placeable. Second, any circuit can be transformed to a function- FL: Taylor & Francis/CRC, 2006.
ally equivalent placeable circuit. Although this paper deals only [12] ITRS. (2004). International Technology Roadmap for Semiconductors
with the kind of CMOL cell implementing NOR gate, the results emerging research devices. [Online]. Available: http://public.itrs.net
[13] K. K. Likharev, “Hybrid semiconductor/nanoelectronic circuits. Invited
are actually applicable to circuits with different types of CMOL talk in IBM Almaden, Research Center,” Jun. 2007.
cells, as long as they are of limited connection domain. [14] K. K. Likharev and D. B. Strukov, “CMOL technology develpment
These results imply that a CMOL FPGA cell assignment tool roadmap,” in Proc. 4th Workshop Non-Silicon Comput., Jun. 2007, pp. 9–
16.
designed for mapping arbitrary complex circuits should have [15] W. Rao, A. Orailoglu, and R. Karri, “Topology aware mapping of
a tight integration of placement and routing, in which buffer logic functions onto nanowire-based crossbar architectures,” in DAC,
insertion is an indispensable step. E. Sentovich, Ed. New York: ACM, 2006, pp. 723–726.
[16] S. S. Sapatnekar, P. Saxena, and R. S. Shelar, Routing Congestion in VLSI
This paper is a theoretical investigation on the placement and Circuits: Estimation and Optimizatio (Series on Integrated Circuits and
routing problem in CMOL FPGA in the spirit of computation Systems). Secaucus, NJ: Springer-Verlag, 2007.
theory for computer science. Like the Turing Machine model, [17] L. Scheffer, L. Lavagno, and G. Martin, Eds., Electronic Design Au-
tomation for Integrated Circuits Handbook. New York/Boca Raton,
which helps to explain what functions are computable but not FL: Taylor & Francis/CRC, 2006.
intended to be a practical computer architecture, the algorithm [18] N. Sherwani, Algorithms for VLSI Physical Design Automation, 3rd ed.
presented in this paper is designed to illustrate the theoretical Norwell, MA: Kluwer, 1999.
[19] D. B. Strukov and K. Likharev, “CMOL FPGA circuits,” in CDES, H.
limitation and the routability in CMOL FPGA, but not intended R. Arabnia and M. M. Eshaghian-Wilner, Eds. Las Vegas, NV: CSREA,
to be a practical placement and routing solution. We think the 2006, pp. 213–219.
area from this placement would be too large than necessary. Our [20] D. B. Strukov and K. K. Likharev, “CMOL FPGA: A reconfigurable
architecture for hybrid digital circuits with two-terminal nanodevices,”
next goal is to work on the design and implementation of more Nanotechnology, vol. 16, pp. 888–900, 2005.
realistic cell assignment algorithms. [21] D. B. Strukov and K. K. Likharev, “A reconfigurable architecture for
hybrid CMOS/nanodevice circuits,” in Proc. 2006 ACM/SIGDA 14th
Int. Symp. Field Programmable Gate Arrays (FPGA 2006), New York:
ACKNOWLEDGMENT ACM, pp. 131–140.
[22] D. B. Strukov and K. K. Likharev, “Reconfigurable hybrid
We are grateful to anonymous referees who have given valu-
CMOS/nanodevice circuits for image processing,” IEEE Trans. Nan-
able feedback on an earlier version of this paper. otechnol., vol. 6, no. 6, pp. 696–710, Nov./Dec. 2007.
[23] K. Wang and A. Balandin, Eds., The Handbook of Semiconductor Nanos-
REFERENCES tructures and Nanodevices. Valencia, CA: America Scientific, Oct.
2005.
[1] V. Betz, “Placement for general purpose FPGAs,” in Reconfigurable
Computing, A. DeHon and S. Hauck, Eds. San Mateo, CA: Morgan
Kauffman, 2007.
[2] V. Betz and J. Rose, “VPR: A new packing, placement, routing tool for
FPGA research,” in Proc. 7th Int. Workshop Field-Programmable Logic
Appl. (FPL 1997), London, U.K.: Springer-Verlag, pp. 213–222.
[3] V. Betz, J. Rose, and A. Marquard, Architecture and CAD for Deep-
Submicron FPGAs. Norwell, MA: Kluwer, 1999. Author’s photographs and biographies not available at the time of publication.

You might also like