FPGA
FPGA
INTRODUCTION
FPGA has been of keen interest to digital circuit designer for the last 2 decades. The
programmability of FPGAs has allowed designers to achieve lower non-recurring engineering
(NRE) costs and faster time-to market for their designs. The speed and the area penalties
involved in a FPGA can also be limited by careful designing, making them a viable option for
implementing a broader class of circuits at high volume. Leakage power has been a major
contributory factor to the total power consumption in a FPGA, as a result of which, many power
reduction schemes has been introduced. Power gating has proven to be a very effective method
of reducing leakage power in a circuit.
The aim of this work is to design a transistor level FPGA architecture in CADENCE
virtuoso spectrum at 45 nm technology. The basic idea behind designing the architecture is to
implement any transistor level operation in the FPGA. All prior works in this field have used
spice simulations to carry out any transistor level realization in FPGA. However implementing
the transistor level operations in a circuit level will provide more insight into the working of the
FPGA and an elaborate analysis can be carried out. In this work, a SRAM based FPGA
prototype (FPGA_NITA) has been designed in transistor level. FPGA, in its highest level,
consists of programmable logic elements and programmable routing resources. The purpose of
the programmable logic elements is to implement the combinational and sequential logic
functions and the purpose of routing resources is to interconnect the logic elements to implement
the desired system. Our architecture is inspired from the work done by authors in Ref [1].
However their work has some flaws, which has been corrected in our architecture. Our
architecture is easier to implement and simulate than the architecture in Ref [1]. Moreover,
authors in Ref [1] were unable to simulate the final FPGA architecture developed by them.
Authors in Ref [1] have used shift register for programming the FPGA architecture, which makes
the architecture more complicated and difficult to implement. Since our objective is to carry out
transistor level operations in the FPGA, dynamic programming is not essential. Programing has
been done manually in our architecture. Proper simulation results have been obtained for all the
blocks in our architecture. We have implemented a basic benchmark circuit in the FPGA
prototype and the simulation results are reported. Our architecture is effective and provides a
platform for carrying out any operation in a FPGA with the provision for in depth analysis of the
architecture.
2. LITERATURE SURVEY
Power reduction techniques in an FPGA have considered spice simulations till date [19], [20].
Modern FPGAs contain thousands of configurable logic elements, hundreds of high-bandwidth,
distributed on-chip memories and a rich interconnect. FPGAs are now large enough to support
double-precision floating-point computation on a single-chip and can be customized to
implement irregular floating-point data paths. As a result, they are an attractive architectural
candidate for accelerating SPICE. However for detail analysis of the entire FPGA architecture
and implementation of gating techniques, transistor level realization of the FPGA architecture is
preferred [1]. In Ref. [1], an SRAM based FPGA architecture has been proposed in transistor
level. Their design has considered all the tasks necessary to build and verify an FPGA include
system level planning, schematic design, cell layout, and final chip layout. However, they were
unable to simulate the final FPGA architecture consisting of 3X3 Tile. Proper analysis and
simulation of their architecture can result in many transistor level applications being realized in
the FPGA.
FIELD-PROGRAMMABLE gate arrays (FPGAs) are widely used to implement special-
purpose processors. FPGAs are economically cheaper for low quantity production because its
function can be directly reprogrammed by end users. FPGAs are a popular choice for digital
circuit implementation because of their growing density and speed, short design cycle and
steadily decreasing cost. As compared with other silicon devices, FPGAs consume huge dynamic
and standby power [21]. Trends in technology scaling make leakage power an increasingly
dominant component of total power dissipation. Several recent works have studied FPGA power
consumption [3], [4], [5] and have shown that the power consumed by the largest FPGA devices
is increasing, with such devices now consuming Watts of power [4]. The programmability of
FPGAs implies that more transistors are needed to implement a given logic circuit, in
comparison with custom ASIC technologies. Leakage power is proportional to total transistor
count and consequently, leakage optimization will likely be a key design objective in future
FPGA technologies. Reducing the power consumption of FPGAs is beneficial as it lowers
packaging/cooling costs, improves reliability and enables FPGA usage in low power
applications, such as mobile electronics.
3. PROPOSED FPGA ARCHITECTURE
The FPGA designed in this work is a SRAM based simple FPGA structure. The basic building
blocks of a FPGA is logic element (LE), which consists of a look-up table (LUT), a flip-flop, and
a multiplexer that chooses whether to forward the output of the LUT or the flip-flop outside the
LE. A k-LUT consist of a set of multiplexers that implement any function of k inputs by
forwarding one of the 2k configuration bits to the output of the LUT. LEs are interconnected
with routing resources, which are also configurable. Configuration of routing resources is
achieved using pass gates controlled by a routing configuration bits. The desired configuration is
stored in the SRAM cells. One SRAM cell is required per pass gate, while a k-LUT requires 2k
SRAM cells.
FPGA, being a regular architecture, is divided into Tiles, where each tiles
compromises of a logic element and routing resources. The routing resources include connection
blocks (C) and switch blocks (S), where the purpose of connection block is to interconnect the
logic elements and the routing wires (running horizontally and vertically) are connected inside
the switch block. The design developed in this work is based on the work done in Reference [ ].
The top view of the FPGA architecture designed is as shown in Fig 1. The structure of a tile with
the corresponding programming and routing resources used in this work is shown in Fig 2.
3.1.1. SRAM :
The programmable elements in our architecture are SRAM cells. The Schematic of a SRAM cell
designed in cadence virtuoso spectrum is shown in the Figure 3. The SRAM cell basically
consists of 2 coupled NOT gates, and 2 pass gates for programming.
Routing architecture consists of 2 basic components: Switch blocks and Connection blocks.
A) Switch blocks:
Routing consists of the 4 wiring tracks running horizontally and 4 running vertically. They can
be interconnected inside the switch blocks using programmable switches. For this purpose, we
plan to use pass gates using only nMOS transistors as shown in Fig 3.The figure shows that any
track can connect to any of the three possible directions (continue the same way, or turn left or
right) or more than one direction, thus providing the fan-out. Since there are 4 tracks coming in
from each side, and 6 transistors (and corresponding SRAM cells) are required to be able to route
each of them in any direction, we will use segmented routing. Segmented routing means that not
all of the wiring tracks are of the same length, but there are segments of various lengths. If the
segments are distributed in a proper way, there are only 8 wires (2 from each side) coming into
the switch block, which reduces the number of transistors (and SRAM cells) needed to
implement routing to 12, as explained in Ref [21].
The switch block developed in our work, utilizing the 12 transistor and the routing wires is
shown in Fig 5.
Fig 5: Switch block
The routing tracks are L0, L1, R0, R1, B0, B1, T0 and T1 in Fig 5. Each wire is connected to
another through an nMOS pass gate controlled by a SRAM cell, which is termed as pass
transistor in the above figure. It is shown in Fig. 6. Buffers are used wherever it is required.
There are two types of connection blocks: BOTTOM and RIGHT, named after their position in
the tile. They provide connections between logic blocks and routing tracks. The connection
block establishes two types of connections; connecting wire tracks to LE inputs, and connecting
LE outputs to wiring tracks. Since only one track can connect to any one input, we use
multiplexers for this purpose as shown in the Fig 7.
However, outputs can connect to any number of tracks, therefore, we use pass transistors to
connect the output to any of the tracks, as shown in the Fig 8.
The outline of the bottom connection block used in this work is shown in the following
Fig 9.
Fig 9: Connection block bottom
The various components used in this architecture are a 4:1 Mux, 4:1 SRAM cell and SRAM pass
transistor.
The Figure 10 shows the schematic for RIGHT connection block. Since this connection
block only controls inputs, there are no pass transistors, only multiplexers are present.
Fig 10: Connection block right
The logic element is the computational block of the FPGA. This is where an arbitrary function
can be applied to the input signals to produce a specific result. Figure 11 is the top level view of
the logic element. The logic element consists of three sub blocks: a LUT, a register and a 2 to 1
mux.
Fig 11: Logic Element Schematic (LE)
The lookup-table (LUT), as seen in Figure 12, is used to create the result of the desired arbitrary
function. LUTs are chosen because at the time of fabrication of the FPGA it is unknown what
functions the ended user of the FPGA will want and LUTs is the simplest and smallest way to
provide the functional for allowing arbitrary function to be programmed. We will be using 4
inputs LUTs, and hence each LUT will need 16 (24) SRAM cells (it’s 24 because each input can
have two values [0 or 1] and there are 4 inputs). The SRAM cells are to be programmed with the
16 possible results of the arbitrary functions. The inputs to the LUT are provided through the
select lines. Figure 13 shows the LUT configured with the SRAM cells.
Fig 12: Look up Table schematic (LUT)
The register in Fig 11 allows the output of the logic to be used as an input to that logic or a
previous logic block without causing a combination loop. A combination loop is when the output
affects the inputs, which in turn causes the output to change, which affects the input, and so on,
never reaching a steady state, which means the output is unpredictable. Another use of the
register is to allow for pipelined designs, which can increase the throughput of the function
programmed to the FPGA. The final element in the logic block is a multiplexer. This multiplexer
allows the user to select for the output of the logic element, either the LUT result directly or the
register result.
Fig 13: Schematic of Look up Table configured with SRAM cells
3.1.4. TILE:
The Figure 14 shows the outline of one tile with interfaces of basic building blocks in our design.
The logic element (LE) evaluates a logic function of its inputs, and produces the result on its
output. Both inputs and outputs are connected to the routing tracks through the programmable
switches in connection blocks. There are two connection blocks in a tile.
Connection_block_bottom connects the inputs of the LEs above and below it, as well as the
output of the LE above it, to the routing tracks. Similarly, connection_block_right connects the
inputs of the LEs left and right from it to the routing tracks. The horizontal and vertical routing
tracks are interconnected inside a switch block.
Fig 14: Outline of a Tile
The schematic of the TILE designed and developed in Cadence virtuoso spectrum is presented in
Fig 15.
The simulation of the tile results in accurate outputs, presented in the figure 17. In the following
figure we have considered a simple 4 input circuit having the specifications as presented in the
Table 2.
0 0 0 0 0
0 0 0 1 0
0 0 1 0 0
0 0 1 1 0
0 1 0 0 1
0 1 0 1 1
0 1 1 0 1
0 1 1 1 0
1 0 0 0 0
1 0 0 1 0
1 0 1 0 1
1 0 1 1 0
1 1 0 0 1
1 1 0 1 1
1 1 1 0 1
1 1 1 1 1
Using an array of row and columns’, an FPGA structure can now be established. Since it is a
complex process to simulate a large FPGA with our design, we have designed a FPGA prototype
with 3X3 TILE structure. The corresponding FPGA prototype is designed and simulated in
CADENCE virtuoso spectrum at 45 nm technology. The transistor level implementation of the
FPGA enables us to carry out any transistor level operation in the FPGA design including power
reducing techniques like power gating. The schematic of a 3X3 TILE is presented in Fig. 18.
Fig 18: 3X3 FPGA Prototype
In order to simulate the FPGA prototype, we have implemented a basic benchmark circuit
“lion.kiss2” in the architecture. The description of the FSM circuit is presented in Table 3. In the
table, PS and NS stand for present state and next state respectively. INPUT [1:0] is the primary
input. From the knowledge of FSMs, we can implement the following circuit in the FPGA. The
total no. of Tiles required for the implementation of this FSM is 3 {as there are 3 outputs,
OUTPUT, NS (1) and NS(0)}. FPGAs has inbuilt provisions for flip flop in their architecture.
Here, we will be requiring the need of 2 d flip-flop for obtaining the next states. The d flip-flops
are separately designed in the same technology and utilized inside the FPGA architecture.
Table 3: State table of lion.kiss2
0 0 0 0 0 0 0
0 0 0 1 1 0 1
0 0 1 0 1 0 1
0 0 1 1 1 1 1
0 1 0 0 - 0 1
0 1 0 1 1 0 1
0 1 1 0 1 1 1
0 1 1 1 1 1 1
1 0 0 0 0 0 0
1 0 0 1 1 1 0
1 0 1 0 1 1 0
1 0 1 1 - - -
1 1 0 0 0 0 0
1 1 0 1 0 0 0
1 1 1 0 1 1 0
1 1 1 1 1 1 0
The schematic of the basic implementation of the above benchmark circuit in the proposed
architecture is presented below in Figure 19. The 3 tiles being utilized are TILE NS(0), TILE
NS(1) and TILE OUT, the rest of the tiles in the 3x3 FPGA architecture are unused. Routing
wires are not shown in the Figure 19 to avoid complexity. The resultant FPGA is simulated and
output is reported. The simulation report obtained shows accurate results and states the
effectiveness of our approach. The simulation result is presented in the Figure 20.
Fig 19: 3X3 FPGA implementing lion.kiss2
In this work a transistor level SRAM based FPGA architecture has been designed and simulated
in CADENCE virtuoso spectrum at 45 nm technology. With the increasing demand for
programmable devices, FPGA has gained huge popularity over the last decade. Power
consumption of the FPGA consequently has been of keen interest to the circuit designers. Our
approach presents a platform for carrying out any transistor level operations in FPGA
architecture. We have designed and simulated a 3X3 Tile FPGA architecture, where power
reducing techniques like power gating, dual threshold voltage approach etc. can be easily
implemented. Prior works in this field have considered spice simulations for carrying out
transistor level operations in FPGA architecture. However, the circuit level will provide more
insight into the architecture and an elaborate analysis can be carried out throughout the device.
Simulation is carried out in our FPGA architecture by implementing a benchmark circuit
“lion.kiss2”. Proper simulation results has been obtained and reported.
REFERENCE
[1] Blair Fort, Daniele Paladino, Franjo Plavec, “Full Custom Layout of an SRAM-Based FPGA”, Final Report,
University of Toronto, 2004.
[2] J. Kao, S. Narendra, and A. Chandrakasan, “Subthreshold leakage modeling and reduction techniques,” in
IEEE/ACM International Conference on Computer-Aided Design, 2002, pp. 141–148.
[3] K. Poon, A. Yan, and S. J. E. Wilton, “A flexible power model for FPGAs,” in International Conference on
Field Programmable Logic and Applications, 2002, pp. 312–321.
[4] L. Shang, A. Kaviani, and K. Bathala, “Dynamic power consumption in the Virtex-II FPGA family,” in
ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2002, pp. 157–164.
[5] V. George and J. Rabaey, Low-Energy FPGAs: Architecture and Design. Boston, MA: Kluwer Academic
Publishers, 2001.
[6] J. Anderson and F. Najm, “Active Leakage Power Optimization for FPGAs” in Proceedings of ACM/SIGDA
lnternational Symposium on FPGA, 2004.
[7] S. Mutoh, “1V Power Supply High Speed Digital Circuit Technology with Multi Threshold Voltage CMOS,”
IJSSC, vol. 30, no. 8, 1995.
[8] S. Mutoh, S. Shigernatsu, and Y. Matsuya, “A 1V Multi-Threshold Voltage CMOS DSP with Efficient Power
Management Technique for Mobile Phone applications,” in ISSC, 1996.
[9] A. Gayasen, Y. Tsai, N. Vijaykrishnan, M. Kandemir, M. Irwin, and T. Tuan, “Reducing Leakage Energy in
FPGAs Using Region-Constrained Placement” in Proceeiongs of ACM/SIGDA International Symposium on FPGA,
February 2004.
[10] M. M¨unch, B. Wurth, R. Mehra, J. Sproch, and N. Wehn, “Automating RT-level Operand Isolation to
Minimize Power Consumption in Datapaths,” in Proceedings of the Conference on Design, Automation and Test
in Europe, 2000, pp. 624–633.
[11] Q. Wang, S. Gupta, and J. H. Anderson, “Clock Power Reduction for Virtex-5 FPGAs,” in Proceeding of the
17th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2009, pp. 13–22.
[12] T. Tuan, S. Kao, A. Rahman, S. Das, and S. Trimberger, “A 90 nm Low-Power FPGA for Battery-Powered
Applications,” in Proceedings of the 14th ACM/SIGDA International Symposium onField-Programmable Gate
Arrays, 2006, pp. 3–11.
[13] F. Li, Y. Lin, L. He, and J. Cong, “Low-Power FPGA using Pre-Defined Dual-Vdd/Dual-Vt Fabrics,” in
Proceedings of the 12th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2004,
pp. 42–50.
[14] Rajarshee P. Bharadwaj, Rajan Konar, Poras T. Balsara, Dinesh Bhatia, “Exploiting Temporal Idleness to
Reduce Leakage Power in Programmable Architectures”.
[15] S. Henzler, Power Management of Digital Circuits in Deep Sub-Micron CMOS Technologies (Springer Series
in Advanced Microelectronics). Secaucus, NJ, USA: Springer-Verlag New York, Inc., 2007.
[16] Y. Lin, F. Li, and L. He, “Routing Track Duplication with Fine-Grained Power-Gating for FPGA Interconnect
Power Reduction,” in Proceedings of the 2005 Asia and South Pacific Design Automation Conference, 2005,
pp. 645–650.
[17] A. Gayasen, Y. Tsai, N. Vijaykrishnan, M. Kandemir, M. J. Irwin, and T. Tuan, “Reducing Leakage Energy in
FPGAs Using Region-Constrained Placement,” in Proceedings of the 12th ACM/SIGDA International Symposium
on Field-Programmable Gate Arrays, 2004, pp. 51–58.
[18] Assem A. M. Bsoul and Steven J. E. Wilton, “An FPGA Architecture Supporting Dynamically Controlled
Power Gating”.
[19] Alastair M. Smith, George A. Constantinides and Peter Y. K. Cheung, “FPGA Architecture Optimization Using
Geometric Programming”.
[20] Nachiket Kapre and Andr´e DeHon, “Accelerating SPICE Model-Evaluation using FPGAs”
[21] H. Z. V. George and J. Rabaey, “The design of a low energy FPGA,” in Proc. Int. Symp. Low Power Electron.
Des., CA, Aug. 1999, pp. 188–193.