0% found this document useful (0 votes)

27 views

High-Performance, Cost-Effective Heterogeneous 3D FPGA Architectures

Uploaded by

mossaied2

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views

High-Performance, Cost-Effective Heterogeneous 3D FPGA Architectures

Uploaded by

mossaied2

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

High-Performance, Cost-Effective Heterogeneous 3D

FPGA Architectures
Roto Le

Division of Engineering
Brown University
Providence, RI 02912
[email protected]

Sherief Reda

Division of Engineering
Brown University
Providence, RI 02912
[email protected]

ABSTRACT

side the traditional reconfigurable fabric, heterogeneous FPGAs include dedicated full-custom design components such as digital signal processors (DSP), multipliers, on-chip memory blocks, and entire processors. Examples of such heterogeneous FPGAs include
Xilinx Spartan 3, Virtex 4, 5 and Altera Cyclone II, Stratix II, III
and Lattice ECP2 family.
To provide the required reconfigurable functionality, FPGAs provide a large amount of programmable interconnect resources in the
form of wire segments, switches, and signal repeaters. These programmable interconnect resources typically consume a large portion of the FPGA silicon die area. A number of recent studies show
that programmable interconnect fabric consumes about 7080% of
the total FPGA area [8, 6]. Since die area is one of the main factors
that determine manufacturing costs, reducing the silicon footprint
of the programmable fabric can lead to significant improvements in
the manufacturing costs of FPGAs. Reducing the length of interconnects will also bring performance improvements to the typical
interconnect-delay dominated FPGAs.
Three-dimensional (3D) Integrated Circuits (ICs) with throughsilicon vias is an new technology that will increase the functionality, scale of integration, and performance of integrated systems [1,
2]. Increasing the scale of integration is particularly attractive considering that optical lithography is approaching its natural limits.
In 3D integration, multiple die or layers are integrated and interconnected with through-silicon vias (TSVs). Three-dimensional
integration can lead to significant reduction in wire length and interconnect delay through the use of TSVs. A number of recent
publications propose novel 3D architectures and physical design
techniques that lead to FPGAs with better performance than existing planar FPGAs [3, 8, 9, 10, 6, 11]. For example, Alexander et
al. [3] developed 3D island-style based FPGAs that extend four directional 2D switch boxes to six directional 3D switch boxes. This
3D switch architecture allows logic blocks to have six immediate
neighbors including four on the die or plane where the switch box
is placed, and two others above and below the die. In another work,
Lin et al. propose a 3D FPGA architecture that partitions homogeneous FPGAs components such that configuration SRAM memory
cells and switch transistors can be moved to other 3D layers [6]. In
addition to devising new 3D FPGA architectures, a number of recent studies develop placement and routing models to support and
assess 3D FPGA architectures (e.g., [7, 8, 9, 10, 11]).
In this paper our objective is to develop novel 3D FPGA architectures and designs that improve performance with lower costs than
planar FPGAs. Our cost savings arise from significant reductions
in total die area enabled by our methodology. We summarize the
contributions of this paper as follows.

In this paper, we propose novel architectural and design techniques

for three-dimensional field-programmable gate arrays (3D FPGAs)
with Through-Silicon Vias (TSVs). We develop a novel design partitioning methodology that maps the heterogeneous computational
resources of an FPGA into a number of die such that the total die
area is minimized and the FPGA performance is maximized. Minimizing the total die area leads to direct manufacturing cost savings
which is an important incentive to bring 3D technology to the fab
and onto the market. An estimation framework is developed to assess the impact of silicon area utilized by 3D interconnect resources
while taking into account the large area occupied by TSVs which is
crucial to total die area of 3D FPGAs. In order to improve area and
performance of 3D FPGAs, we design a novel 3D switch box with
bypass TSVs. We also analyze the impact of different partitioning
strategies on die area and find the optimal number of die that gives
the largest reductions in total die area while maximizing the performance. Using a well-developed simulation infrastructure, we show
that our methodologies can achieve an average reduction of 27.7%
in total die area with a reduced interconnect path delay of about
58%.
Categories and Subject Descriptors: B.7.1 [INTEGRATED CIRCUITS]: Types and Design StylesAdvanced technologies.
General Terms:Economics, Performance
Keywords: Heterogeneous FPGA Design, 3D Integrated Circuits

R. Iris Bahar

Division of Engineering
Brown University
Providence, RI 02912
[email protected]

INTRODUCTION

Field Programmable Gate Arrays (FPGAs) have become a viable alternative to custom Integrated Circuits (ICs) by providing
flexible computing platforms with improved costs and shorter timeto-market. In an FPGA-based system, a design is mapped onto
an array of reconfigurable logic blocks and communicated by reprogrammable interconnections composed of wire segments and
switch boxes. While the re-programmable capability provides flexibility, it also leads to area and performance overheads in comparison to custom chips. Thus, to benefit from advantages of both
FPGAs and custom chips, heterogeneous FPGAs have emerged
as an attractive choice for system-on-a-chip implementations. Be-

Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
GLSVLSI09, May 1012, 2009, Boston, Massachusetts, USA.
Copyright 2009 ACM 978-1-60558-522-2/09/05 ...$5.00.

We formulate the problem of resource partitioning for heterogeneous FPGAs into a number of die for 3D ICs to minimize
the total die area and the fabrication costs of 3D FGPAs.

251

Using Rent-based statistical wirelength distribution models, we

propose novel methods to estimate the area used by TSVs and
interconnect resources of heterogeneous 3D FPGAs.
We propose novel methods to minimize the total number of
TSVs used in 3D stacking, and we also propose new techniques
to estimate the performance of 3D heterogeneous FPGAs.
We analyze the impact of different resource partitioning strategies on the performance of FPGAs as well as their total costs
as measured by the total die area. We also analyze the relationship between the number of die in the 3D stack and the total die
area, and show how to choose the optimal number of die that
minimize the total die area of a 3D FPGA.
Using a comprehensive experimental setup, we show that our
method leads to a 27% reduction in direct total die area, and a
58% improvement in performance. The improvements in total
die area lead to immediate cost savings.

In this paper, our objective is to tackle these challenges and develop a realistic design methodology for 3D FPGAs that delivers the expected 3D performance benefits while minimizes any incurred costs. The overarching goals of our objectives can be summarized with the following problem formulation.
Given: A planar FPGA that has a total area A and contains a set of
heterogeneous computational resources R = {r1 , . . . rN }.
Output: Find the optimal number of die, m, and a partition of
R into the m die such that the total die area of the 3D FPGA is
minimized compared to A and performance is maximized.
As an example, a set of heterogeneous computational resources
R for an FPGA might have 4000 logic blocks, 1000 4K memory
blocks, 200 DSPs, and 2 processors for a total of n = 5202 computational components. We seek to find the optimal number of die m
and a partition that maps each computational resource into exactly
one die.
We proceed by first proposing a novel approach to estimate the
total die area of 3D FPGAs and determine the optimal number of
die (Section 3). Our area estimation includes total logic area, total
TSV area, and total programmable interconnect area.

The rest of this paper is organized as follows. Section 2 introduces our motivation and formulation for the problem of transforming a planar heterogeneous FPGA design into a 3D FPGA design.
In Section 3, we propose how to calculate the die areas allocated for
computation, TSVs, and wiring in 3D FPGAs. In Section 4, we discuss how to calculate the improvement in performance attained by
using 3D FPGAs. Section 5 presents the results and observations
from our experimental evaluation. Finally, Section 6 summarizes
the main conclusions of this work.

ESTIMATING DIE AREA SAVINGS

The total die area of a 3D heterogeneous FPGA is the sum of

the areas of the die or layers that constitute the 3D IC. The die
area is comprised of (1) the total computational resource area, (2)
the total TSV area, and (3) the area needed for the reconfigurable
wiring fabric. It is expected that as the number of die in a 3D stack
increases, the requirements for the within-die lateral wiring area
will decrease while the TSV area required for the inter-die vertical
wiring will increase.

MOTIVATION AND FORMULATION

One of the crucial properties of FPGAs is that the reconfigurable

interconnect resources consume a large portion (up to 80%) of the
total silicon die area [8, 6]. By bringing the computational components closer together in three dimensions, 3D ICs have the potential to reduce the size of the programmable interconnect fabric
required for routing in FPGAs which lead to significant reductions
in the silicon area, thereby directly reducing the cost of fabrication.
Cost savings are important incentives for the industry to offset any
cost increases required for TSV creation (either using laser drilling
or bonding) and 3D bonding.
Designing heterogeneous 3D FPGAs involves a number of challenges. We outline and tackle the following challenges in this paper.

3.1

Estimating Total Logic Area

In this section we present area estimation models for computational components in a generic heterogeneous island style SRAMbased FPGA. Such an FPGA will contain soft computational resources such as logic clusters, and hardcore computational resources
such as embedded memory blocks, DSP blocks, and processors.
We next describe the area estimation approach and assumptions for
each of these components.
Logic Cluster: The reconfigurable logic cluster or block executes
logic operations and is considered the main component in FPGAs.
A generic logic cluster contains a number of Look-Up Tables (LUTs)
and associated registers, I/O, multiplexers, clock, and reset units.
The total area of a logic cluster may be computed by summing the
area of all these components. In our study we use a cluster architecture and area model from [12]. The cluster consists of eight 4-input
LUTs, 20 logic input pins, 8 output pins, 1 clock and 1 reset signal.

1. Typical TSVs occupy remarkably large silicon area (4 4m

to 5 15m2 [13]). Thus, introducing TSVs will lead to increases in total die area. Therefore, in 3D FPGAs, it is important to assess die area savings attained from reductions in
interconnect resources against the increase in die area due to
TSVs.
2. Introducing TSVs as a part of the programmable wiring fabric
requires a new design for the reconfigurable switch boxes. Instead of just achieving connectivity in 2D as in planar 3D ICs,
a switch box has to include extra switches to allow incoming
lateral wires to connect to vertical TSVs, and even incoming
TSVs to connect to outgoing TSVs since the 3D stack might
include more than two die.

Embedded Memory and DSP Blocks: In contrast to homogeneous FPGAs, heterogeneous FPGAs contain dedicated hard memory and DSP blocks to obtain higher performance and power saving. A typical heterogeneous FPGA often contains SRAM memory
blocks having different sizes to provide high flexibility in configuration and utilization. We use two different sized memory blocks,
Mem1 and Mem2 sized similar to memory blocks in a realistic
FPGA; i.e., 64 16 bits and 128 32 bits respectively. We assume
that these memory blocks are SRAM blocks and estimate their area
by using the CACTI memory models [14, 15]. After getting the
area estimation for memory blocks we estimate the area of DSP
blocks based on the relative ratio between DSP and Mem2 blocks
from the Altera Stratix II handbook documentation ([16] p. 2-41).

3. The number of die in the 3D stack should be chosen to maximize the performance and minimize the costs. Once the number of die in the 3D stack is determined, partitioning the computational resources of the FPGA among the die should be carried out in a way to minimize the total demand on the interconnect resources and the required number of TSVs.

252

Components
4-input LUT
Cluster size 8
Cluster size 16
Mem1 block
Mem2 block
DSP block

Capacity
1 LUT
8 LUTs
16 LUTs
32x16 bits
128x32 bits
four 16x16-bit
multipliers

Area
1
28.5
76.7
65.3
365.3
1461.2

Cut1=4

Our area estimations for the different computational resources

are summarized in Table 1, where the area of the components are
normalized with respect to the area of an LUT. The area of a 4-input
LUT is estimated to be equal to 26598 2 [12].

Estimating Total TSV Area

Cut3=4

Cut4=3

be mapped to a different die. An edge that entirely falls within a

partition will utilize within-die wiring fabric for routing, while an
edge that straddles two partitions (or dies) will require wiring resources within the two die at its end points as well as a number of
TSVs.
To determine the vertical channel size for the example of Figure 2, we create a graph, in Figure 3, composed of 5 nodes, where
each node corresponds to one partition in Figure 2. The edges between the nodes in Figure 3 correspond to the edges that straddle
across the partitions in Figure 2. The number of TSVs between two
arbitrarily adjacent dies are different but the switch boxes should
have a capability to handle maximum inter-die communication requirements (maximum cut) at any location. To estimate the maximum cut, consider the cut l between die l and die l + 1.
El =

l
X
i=1

l
l
X
X

Tij ,

(1)

i=1 j=1,j6=i

where Ti is the number of TSVs connecting to die i and Tij , i 6= j

is number of TSVs between die i and die j. For example, the Cut2
between die 2 and die 3 is calculated as:
E2 = T1 + T2 T21 = 4 + 4 2 = 6

(2)

Ti and Tij can be computed by using placed and routed benchmark

designs or using statistical estimation based on Rents rule. In this
study, due to lack of placement and routing tools supporting 3D
heterogeneous architectures, we utilize the heterogeneous Rents
rule [18] for estimation. Consequently the vertical channel width,
WV , should be equal to
WV =

Cut2=6

max{E1 , E2 , . . . , Em1 }
.
(3)
Number of switch boxes
Assuming that the pitch width of the TSVs is p then the area allocated for TSVs per switch box is equal to p2 WV .

In this section our objective is to compute the silicon area required by through-silicon vias. A typical TSV can occupy a remarkably large silicon area (e.g., 44m2 ) with a pitch of 20m [13],
and thus it is important to calculate the expected area utilized by
the TSVs in 3D FPGA designs. Figure 1 demonstrates a switch
box for 3D ICs that is architecturally formed by extending a regular 2D switch box to include two vertical channels of TSVs in
addition to the traditional lateral wiring channels. One key aspect
in the design of a 3D switch box is determining the size of the lateral wiring channel, WW , and the size of the vertical TSV channel,
WV . The sizes of these channels will play key roles in determining
the routability, performance, and die area of the 3D FPGAs.
The size of the vertical TSV channel is determined by a number
of factors, including: (1) the number of die in the 3D stack; (2) the
allocation of computational resources across the different die; and
(3) the expected inter-die communication which depends on the application circuit programmed in the FPGA as well as the placement
and routing tool. The exact size of the vertical TSV channel is determined by using a graph-theoretic approach that we describe with
the help of Figure 2. The figure shows a possible partitioning of a
heterogeneous system into five parts, where each partition should

Figure 3: The maximum cut in the linear arrangement of the

partitions determine the size of the TSV vertical channel width.

Table 1: Area estimation of computational components normalized to the area of a 4-input LUT (26598 2 ).

3.2

3.3

Figure 1: 2D switch vs. 3D switch.

LB
Mem1

LB
LB

LB
Processor

LB
3
LB

DSP

Estimating Total Routing Area

To estimate the total chip area, in this section we present the estimation model derived from [12, 8] for the reconfigurable routing
components, which includes the connection blocks and the switch
boxes. The area occupied by these routing components depends on
their architecture and the width of interconnect channels (i.e., both
the size of the within-die lateral wiring channels and the size of the
vertical TSV channels). The areas for these routing components
can be determined as follows:
Connection Blocks: The connection blocks consist of programmable
switches that connect I/O pins of logic clusters to lateral channels,
as shown in Figure 4. The size of the I/O connection blocks is
determined by the fan-in connection factor, Fci , and the fan-out
connection factor, Fco , which gives the fraction of wiring tracks in
a lateral channel to which each input pin and output pin can connect to, respectively. The area of a connection block can be calculated by first counting the number of buffers, pass transistors, and
multiplexors required for it, and then summing the areas of these
elements as outlined by [8].

LB
LB
Mem2

LB
LB
Heterogeneous FPGA

Figure 2: A partitioning of a heterogeneous FPGA.

253

Switch Boxes: The interconnect routing switch boxes (Figure 4)

consist of switch points which can be implemented by programmable
tri-state buffers or pass transistors to connect interconnect segments
together. Therefore the size of a routing switch box depends on the
size and number of switch points required for that switch box. In a
switch box that consists of WW wiring tracks, the number of switch
points, S2D , can be computed as:

having fanout of d. Hence, to estimate the total wirelength of a

d and Id have to be computed for every
heterogeneous system, L
d can be computed using the following two
d = 1 . . . q. Id and L
Rent-rule based approximations (derivations of these formulas can
be found in [18]):
(d 1)peq 1 dpeq 1
d
The average length of a net having fanout of d is then
r
A

Ld 2(.d + 1)
[dp + N (1 p )],
N
Id = keq N

WW Fs (Fs + 1)
,
(4)
2
where Fs is the maximum allowable fanout for an incoming wire
segment into the switch box [12, 8]. In contrast to a 2D switch
box, a 3D switch box accommodates four lateral wiring channels
(each consisting of WW wiring tracks) and two vertical channels
(each consisting of WV TSVs), as shown in Figure 1. Since WV
is not necessarily equal to WW , there could be only WV tracks
among the WW tracks in each lateral wiring channel that can be
connected to the WV tracks of the TSVs. Furthermore, the maximum fanout of an incoming TSV, Fsv , could be different from
the maximum allowable fanout of in incoming wire segment, Fsw .
Thus, the number of switch points in a 3D switch box, S3D , can be
generally computed as
S2D =

3.4

(WW WV )Fsw (Fsw + 1) + WV Fsv (Fsv + 1)

. (5)
2
We determined vertical TSV channel width, WV , using Equation (3). Thus, we need to estimate WW in order to determine the
number of switches and area of the 3D switch box. That is,
Ltotal
.
Nch et

q
X

(6)

d Id ,
L

Interconnect Channel Width vs. Number

of 3D Dies

In the previous subsections we have discussed how to calculate

the lateral wiring channel width WW and vertical TSV channel
width WV . We next explore the relationship between these channel parameters and the number of 3D dies and the impact of this
relationship on total die area and performance.
We explain our methodology by means of an example. Consider the makeup of a heterogeneous FPGA that consists of 2000
logic clusters, 18 DSP blocks, 144 Mem1 memory blocks, and 202
Mem2 memory blocks. Assume a partitioning configuration that allocates the components equally among the different dies. For now,
we do not discuss the details of our experimental setup (e.g., technology assumed, etc.); we disclose this information in Section 5.
The result in Figure 5 shows that the lateral channel width decreases as the number of die increases, while the vertical channel
width increases as number of die increases. As expected, the increase in number of 3D dies leads to a higher portion of lateral
wiring segments being replaced by TSV; thus, the number of TSVs
increases as number of dies increases. The increase in number of
TSVs initially yields benefits in terms of performance and die area
saving; however, when the number of TSVs increases too much,
these benefits might not be further realized due to the TSV area
overhead. This issue will be discussed more in Section 5.

This equation is based on an assumption that for any design, the

total length of utilized wiring tracks WW Nch et is equal to
the required total wire length Ltotal . The value Nch is the number
of channels and et is the utilization factor of the wiring track. In
island-style FPGAs, Nch is the number of lateral wiring channels
which depends on the number of logic blocks per die and et is a
constant (typically around 0.4 0.5) [8].
To calculate WW using Equation (6) we need to compute the
total required wirelength Ltotal . In this paper we utilize the estimation model for the global wiring requirement for heterogeneous
networks developed by Zarkesh-Ha et al. [18], where the total wire
length is computed as
Ltotal =

(7)

d=1

d is the average
where q is the maximum fan-out of the netlist, L
length of a net having fanout of d and Id is the number of nets

120
Lateral channel width Ww
Vertical channel width Wv

100

SWbox

Input connection block

Channel width

SWbox

sram
in1

Logic
cluster

(9)

where keq and pe q are equivalent Rents parameters of the system;

, and are the empirical coefficients (we use = 1.1, = 2.0
and = 0.5 in our experiments); p is the placement efficiency
parameter that depends on the placement tool; and A is total die
area [18]. To compute the total interconnect area using Equation
(9), one needs to know the total die area A. However, to calculate
the total area A one needs to first calculate the interconnect area!
To break this circular dependency, we use a numerical search to
find the smallest A that, when plugged in Equation (9), eventually
gives an interconnect area that leads back to the same total die area
A.

S3D =

WW =

(8)

Out 1

Output connection block

80
60
40
20

Out 8

in20

0
1

W tracks
Figure 4: A typical SRAM-based Island Style FPGA

5
Number of dies

Figure 5: Interconnect Channel Width vs. Number of Dies

254

source

sink

a) 2D Interconnect Path
SB

Config.

Die

1
1
2
1
2
1
2
3
1
2
3

sink

TSV
source

# of
Dies
1

b) 3D Interconnect Path

Figure 6: Interconnect path in 2D FPGAs vs. 3D FPGAs.

A
3

ESTIMATING IMPROVEMENTS IN PERFORMANCE

One of the important advantages of 3D technology is the general reduction in the average distance between the components of
the computational system. Three-dimensional technology can substitute long interconnect paths by short ones that are stitched together using TSVs. This reduction in interconnect length improves
the signal propagation delay between the computational resources
improving the overall FPGA performance. The reductions in wire
capacitance and resistance achieved from replacing long wires with
TSVs are significant. The objective of this section is to estimate the
improvement in signal propagation delay using our 3D FPGA design model.
To estimate the average interconnect path delay in 3D ICs, we
first consider every pair of locations across all die and calculate the
delay between the two locations and then calculate average delay as
average of these point-to-point delays. For every pair of locations,
we calculate the distance between them and then estimate the number of L4 and L16 wire segments that would be used to create an
interconnect path between the two locations. If the two locations
end up on the same die, then the delay of the path between them
is calculated using a distributed RC delay model of its path constituents (i.e., the L4/L16 wire segments and the pass transistors in
the intermediate switch boxes (SB), as shown in Figure 6(a)). If the
two locations end up on different die, the delay is computed for the
path shown in Figure 6(b) with TSV delay taken into account. The
estimation result will be shown in Section 5.
To further improve the performance of 3D FPGAs, we propose
incorporating bypass TSVs into the switch boxes. Bypass TSVs will
be used to connect non-adjacent dies directly by passing through a
switch box without any interaction with any intermediate switches.
A bypass TSV will not eliminate the silicon area required for the inseries TSVs in the intermediate die, but it will eliminate the delay
and area that would have been introduced by intermediate switches.
For 3D FPGAs, our experiments in Section 5 show that using bypass TSVs can reduce the average interconnect path delay and the
die area by significant amounts.

Resource
logic + hardcores
logic
hardcores
logic + hardcores
logic + hardcores
logic
hardcores
logic
logic + hardcores
logic + hardcores
logic + hardcores

Die Area
(cm2 )
0.700
0.450
0.180
0.290
0.290
0.170
0.180
0.170
0.176
0.176
0.176

Total Area
(cm2 )
0.70
0.63
0.59
0.52
0.53

Table 2: Potential 3D partitioning configuration.

Experiment 2: Impact of the number of die in the 3D stack on the
total area savings and performance.
Experiment 1: Impact of Partitioning Configuration. In this
experiment we consider an FPGA with a computational resource
makeup based on an Altera Stratix II EP2S30 FPGA device that
consists of 2000 logic clusters, 18 DSP blocks, 144 Mem1 and 202
Mem2 memory blocks. We consider two partition configuration:
in Configuration A, each die contains only one type of computational resources (either reconfigurable logic or hardcore units),
and in Configuration B, every die contains both reconfigurable and
hardcore units. In all configurations, all die have reconfigurable
interconnects and switch boxes. In Table 2, we consider the application of these two partitioning configurations using either 1, 2 or
3 die in the 3D FPGA stack. The results show that configurations
that lead to more balanced die areas are the ones that lead to the
largest savings in the total die area. This result is not surprising;
a non-balanced area distribution would lead to some die with relatively large areas. The interconnects in these larger die will tend
to be relatively long, which implies more die area allocated for the
reconfigurable switching boxes, triggering an increase in the total
die area.
Experiment 2: Impact of Number of Die. In this second experiment, we evaluate the impact of number of die in the 3D FPGA
stack on both total die area and performance as the number of die
in the stack increases. We assume the same FPGA makeup as in
Experiment 1 with configuration B. Earlier in Figure 5, we showed
the tradeoff between the width of the vertical TSV channel (dash
line) and the lateral interconnect channel (solid line) as the number of die increased. In this experiment, we show in Figure 7 that
the total die area initially decreases as the number of die increases,
reaching the minimum value when four die are used. However, if
the number of die is further increased, the total die area does not
continue to decrease, but rather increases. This increase happens
because when many die are used, a larger number of TSVs are required for intercommunication, and the area of these TSVs end up
dominating the total die area. The plot of Figure 7 also shows that
the average interconnection delay initially decreases as the number
of die increases achieving a minimum at 7 die per stack. When
the design is further mapped to a larger number of die, the average delay increases (or equivalently performance decreases) as the
vertical TSV interconnects tend to replace the local and medium
wires, increasing the average interconnection delay. Another reason not to increase the number of die in the stack beyond a certain
point is to avoid potential thermal problems. Thus, for this FPGA
makeup a 3D stack of four die would achieve silicon area savings
of 26% (from 0.71 cm2 to 0.52 cm2 ) compared to the planar design, together with an improved performance of 61% (from 11.8ns
to 4.6ns) as measured by the average interconnection delay.

EXPERIMENTAL RESULTS

In this section we empirically assess the impact of our proposed

3D FPGA architecture on total die area and performance compared
to a 2D FPGA architecture. For all experiments, we estimate the
area of the computational resources according to Table 1, and estimate the routing area and TSV area according to the approach
outlined in Sections 3.2 and 3.3. We use the TSMC 90nm library and assume a 10 pass transistor switch, 4 wire segment
buffers, and a wire resistance and capacitance of 0.244/m and
0.208f F/m respectively. We also assume that the I/O connection
factor of connection boxs are Fci = 0.5 and Fco = 0.125. Based
on data reported in [13] we choose typical 55 m2 TSVs with resistance and capacitance values of 43 m and 40f F respectively.
We consider two different experiments:
Experiment 1: Impact of choice of partitioning configuration on
the total die area.

255

0.75

10
12

as the number of die in a 3D stack increases, the total interconnect area reduces and the total TSV area increases. We have investigated the optimal number of die that gives the greatest savings
in die area. We have estimated the improvement in performance
that will be attained by switching to 3D technology, and we have
analyzed the performance benefits of using heterogeneous FPGAs
with regular TSVs and bypass TSVs. Using Rent-based statistical
analysis, we have shown that 3D FPGAs can reduce die area by
about 27% while simultaneously improving performance by up to
58%. Though statistical-based estimation might cause variations
compared with realistic benchmark designs, the experimental results are consistent with theoretical analyses.
Finally, for future work, we would like to develop a 3D heterogeneous placement and routing tool to conduct experiments on benchmark designs to evaluate our statistical estimation model. Analyzing the impact of 3D stacking on thermal distribution of 3D heterogeneous FPGAs also would be considered.

total die area

0.7
Total Die Area (Cm2)

10
0.65

Region I

Region II

Region III

8
0.6
6
0.55

0.5
1

Interconnect Average Delay (ns)

average delay

4
10

Number of Dies

Figure 7: 3D total die area and average connection delay vs.

number of die. The left y-axis gives the total area in cm2 and
the right y-axis gives the average connection delay in ns.

Region I (less than ma die) : For a small number of die in the

stack, TSVs eliminate the long interconnections which significantly reduces delay and the reconfigurable switching logic
overhead, leading to substantial area savings despite the silicon
area overhead required for implementing the TSVs.
Region II (from ma die to mp die): If the number of die is further increased, the TSVs will start replacing the medium to long
wires bringing only modest improvements to performance, while
slightly increasing the total die area.
Region III (more than mp die): If the number of die is unreasonably increased, then the TSVs will end up replacing the short
to medium wires, increasing the average wire delay, which will
lead to deterioration in performance. Furthermore, the area required for TSVs will start to dominate significantly over component and reconfigurable routing area, leading to an increase
the total die area.
The experimental results show that by transforming existing planar FPGA designs to 3D technology, it is possible to attain significant total die area reductions, while attaining the expected performance benefits of 3D technology. The total die area reduction
leads to direct cost savings that provide an important incentive for
the industry to bring 3D IC technology into the market.

REFERENCES

[1] K. Banerjee, et. al., 3-D ICs: A Novel Chip Design for
Deep-Submicrometer Interconnect Performance and
Systems-on-Chip Integration, Proc. of the IEEE, vol. 89(5), pp.
602633, 2001.
[2] A. W. Topol, et. al., Three-dimensional Integrated Circuits, IBM
Journal of Res. and Dev., vol. 50(4-5), pp. 491506, 2006.
[3] M. Alexander, et. al., Three-dimensional field-programmable gate
arrays, ASIC Conference and Exhibit, 1995., Proc. of the Eighth
Annual IEEE International, pp. 253256, Sep 1995.
[4] W. Meleis, et. al., Architectural design of a three dimensional
FPGA, Advanced Research in VLSI, 1997. Proc., Seventeenth
Conference on, pp. 256268, Sep 1997.
[5] G. Borriello, et. al., The triptych FPGA architecture, VLSI Systems,
IEEE Transactions on, vol. 3, no. 4, pp. 491501, Dec 1995.
[6] M. Lin, et. al., Performance benefits of monolithically stacked
3D-FPGA, in Proc. of the ACM/SIGDA 14th ISFPGA. New York,
NY, USA: ACM, 2006, pp. 113122.
[7] A. J. Alexander, et. al., Placement and routing for three-dimensional
FPGAs, in Fourth Canadian Workshop on Field-Programmable
Devices, 1996, pp. 1118.
[8] A. Rahman, et. al., Wiring requirement and three-dimensional
integration technology for field programmable gate arrays, VLSI
Systems, IEEE Transactions on, vol. 11, no. 1, pp. 4454, Feb 2003.
[9] C. Ababei, et. al., Three-dimensional place and route for FPGAs,
Computer-Aided Design of Integrated Circuits and Systems, IEEE
Transactions on, vol. 25, no. 6, pp. 11321140, June 2006.
[10] Y.-S. Kwon, et. al., A 3-D FPGA wire resource prediction model
validated using a 3-D placement and routing tool, in Proc. of SLIP
05. New York, NY, USA: ACM, 2005, pp. 6572.
[11] M. Lin, et. al., A routing fabric for monolithically stacked
3D-FPGA, in Proc. of the ACM/SIGDA 15th ISFPGA. New York,
NY, USA: ACM, 2007, pp. 312.
[12] V. Betz, et. al., Architecture and CAD for Deep-Submicron FPGAs.
Norwell, MA, USA: Kluwer Academic Publishers, 1999.
[13] Vasilis F. Pavlidis, et. al, Three Dimensional Integrated Circuit
Design. Morgan Kaufman Publishers, 2008.
[14] S. Wilton and N. Jouppi, Cacti: an enhanced cache access and cycle
time model, Solid-State Circuits, IEEE Journal of, vol. 31, no. 5, pp.
677688, May 1996.
[15] Cacti 5.3, Online, available at:
http://quid.hpl.hp.com:9081/cacti/index.y?new.
[16] Altera stratix ii device handbook, volume 1,
http://www.altera.com/literature/hb/stx2/stratix2_handbook.pdf.
[17] B. Landman and R. Russo, On a pin versus block relationship for
partitions of logic graphs, Computers, IEEE Transactions on, vol.
C-20, no. 12, pp. 14691479, Dec. 1971.
[18] P. Zarkesh-Ha, et. al., Prediction of net-length distribution for global
interconnects in a heterogeneous system-on-a-chip, VLSI Systems,
IEEE Transactions on, vol. 8, no. 6, pp. 649659, 2000.

We also estimate the impact of using bypass TSVs between nonadjacent dies, as presented in Section 4. The result shows that by
using bypass TSVs the reductions in die area and average delay can
be improved more 4.63% and 9.78% respectively.
The common trends between the tested designs lead to an intuitive explanation for the impact of transforming planar FPGA designs to use 3D technology. If we denote the optimal number of die
from a pure area savings perspective as ma and the optimal number
of die from a pure delay (or performance) perspective as mp , then
from our results we can identify three regions for 3D FPGA design.

CONCLUSION AND FUTURE WORK

In this paper we have proposed new architectures and design

methodologies for heterogeneous 3D FPGAs with TSVs. The performance benefits as well as the cost savings incurred from using
such 3D systems have been analyzed. We have also estimated the
impact of 3D integration on both the total TSV area and the total area of the reconfigurable routing resources. We showed that

256

Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
87% (46)
12 Week Program: Summer Body Starts Now
70 pages
Read People Like A Book by Patrick King-Edited
58% (81)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Cheat Code To The Universe
94% (79)
Cheat Code To The Universe
34 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
The Secret Language of Attraction
86% (108)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (542)
How To Develop and Write A Grant Proposal
17 pages
Penis Enlargement Secret
60% (124)
Penis Enlargement Secret
12 pages
Workbook For The Body Keeps The Score
89% (53)
Workbook For The Body Keeps The Score
111 pages
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
83% (1016)
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
13 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (30)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
77% (13)
27 Feedback Mechanisms Pogil Key
6 pages
Frank Hammond - List of Demons
92% (92)
Frank Hammond - List of Demons
3 pages
Phone Codes
79% (28)
Phone Codes
5 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
How 2 Setup Trust
97% (307)
How 2 Setup Trust
3 pages
The 36 Questions That Lead To Love - The New York Times
94% (34)
The 36 Questions That Lead To Love - The New York Times
3 pages
100 Questions To Ask Your Partner
78% (36)
100 Questions To Ask Your Partner
2 pages
Satanic Calendar
25% (56)
Satanic Calendar
4 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
100% (8)
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
27 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
75% (12)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
1001 Songs
69% (72)
1001 Songs
1,798 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
Fpga Da
No ratings yet
Fpga Da
137 pages
Innovate or Perish: FPGA Physical Design: Taraneh Taghavi, Soheil Ghiasi Abhishek Ranjan, Salil Raje Majid Sarrafzadeh
No ratings yet
Innovate or Perish: FPGA Physical Design: Taraneh Taghavi, Soheil Ghiasi Abhishek Ranjan, Salil Raje Majid Sarrafzadeh
8 pages
FPGA Architecture Principles and Progression
No ratings yet
FPGA Architecture Principles and Progression
26 pages
Die Stacking Architecture
No ratings yet
Die Stacking Architecture
128 pages
Power Reduction and Prediction Techniques For 3-D Reconfigurable Architectures
No ratings yet
Power Reduction and Prediction Techniques For 3-D Reconfigurable Architectures
19 pages
Programmable ASIC Design: Haibo Wang ECE Department Southern Illinois University Carbondale, IL 62901
No ratings yet
Programmable ASIC Design: Haibo Wang ECE Department Southern Illinois University Carbondale, IL 62901
25 pages
FPGA Design Flow
No ratings yet
FPGA Design Flow
7 pages
WP 174
No ratings yet
WP 174
15 pages
My Expos
No ratings yet
My Expos
1 page
Training Report
No ratings yet
Training Report
30 pages
Die-stacking Architecture
No ratings yet
Die-stacking Architecture
129 pages
Ku Dissertation 2019
No ratings yet
Ku Dissertation 2019
154 pages
Implementation Technologies Available Implementation Technologies
No ratings yet
Implementation Technologies Available Implementation Technologies
5 pages
Vlsi Technologies On Various Angles
No ratings yet
Vlsi Technologies On Various Angles
12 pages
Three DIMENSIONAL-CHIP
No ratings yet
Three DIMENSIONAL-CHIP
6 pages
What Is An FPGA?: Figure 1: FPGA Block Structure
100% (1)
What Is An FPGA?: Figure 1: FPGA Block Structure
10 pages
A Survey of Existing Fine-Grain Reconfig PDF
No ratings yet
A Survey of Existing Fine-Grain Reconfig PDF
85 pages
Dcp-Unit 5 Material
No ratings yet
Dcp-Unit 5 Material
20 pages
Dynamic Power Consumption in Virtex™-II FPGA Family: Li Shang Alireza S Kaviani Kusuma Bathala
No ratings yet
Dynamic Power Consumption in Virtex™-II FPGA Family: Li Shang Alireza S Kaviani Kusuma Bathala
8 pages
Kuonfpga 08
No ratings yet
Kuonfpga 08
10 pages
Image Processing Using VHDL
No ratings yet
Image Processing Using VHDL
36 pages
Efficient Implementation of Scan Register Insertion On Integer Arithmetic Cores For Fpgas
No ratings yet
Efficient Implementation of Scan Register Insertion On Integer Arithmetic Cores For Fpgas
6 pages
04_abstract (1)
No ratings yet
04_abstract (1)
40 pages
Image Hardware PDF
No ratings yet
Image Hardware PDF
19 pages
The Design of A SRAM-Based Field-Programmable Gate Array-Part II: Circuit Design and Layout
No ratings yet
The Design of A SRAM-Based Field-Programmable Gate Array-Part II: Circuit Design and Layout
10 pages
Introduction To Field Programmable Gate Arrays AND Its Applications
No ratings yet
Introduction To Field Programmable Gate Arrays AND Its Applications
13 pages
FPGAs Memory Synchronization and Performance Evaluation Using The Open Computing Language Framework
No ratings yet
FPGAs Memory Synchronization and Performance Evaluation Using The Open Computing Language Framework
8 pages
Fpga Implimentation of LCD Display1
No ratings yet
Fpga Implimentation of LCD Display1
77 pages
Ieee Fpga
No ratings yet
Ieee Fpga
3 pages
What Is An FPGA?
No ratings yet
What Is An FPGA?
4 pages
Productflyer - 978 1 84882 015 9
No ratings yet
Productflyer - 978 1 84882 015 9
1 page
Tensor FPGA
No ratings yet
Tensor FPGA
24 pages
Lecture25 Fpga Conclude
No ratings yet
Lecture25 Fpga Conclude
33 pages
FPGA Vs ASIC
No ratings yet
FPGA Vs ASIC
9 pages
FPGA Report
No ratings yet
FPGA Report
18 pages
Design Issues in Heterogeneous 3D 2.5D Integration
No ratings yet
Design Issues in Heterogeneous 3D 2.5D Integration
8 pages
CPLD and Fpga
No ratings yet
CPLD and Fpga
28 pages
ebook
No ratings yet
ebook
19 pages
PDF
No ratings yet
PDF
315 pages
FPGA Genreal Paper
No ratings yet
FPGA Genreal Paper
7 pages
FPGA-Based System Design Wayne Wolf SAmp
No ratings yet
FPGA-Based System Design Wayne Wolf SAmp
60 pages
3DIC
No ratings yet
3DIC
148 pages
Complex Programmable Logic Devices (CPLD) & Field-Programmable Gate Array (Fpga)
No ratings yet
Complex Programmable Logic Devices (CPLD) & Field-Programmable Gate Array (Fpga)
29 pages
CND 111 Assignment 02 - Lecture
No ratings yet
CND 111 Assignment 02 - Lecture
7 pages
Seminar Report
No ratings yet
Seminar Report
39 pages
9781601984630-summary
No ratings yet
9781601984630-summary
20 pages
Atul K Srivastava: 10.3 Field Programmable Gate Arrays
No ratings yet
Atul K Srivastava: 10.3 Field Programmable Gate Arrays
4 pages
Delivered By.. Love Jain P08ec907
100% (1)
Delivered By.. Love Jain P08ec907
24 pages
Fpga 1721804616
No ratings yet
Fpga 1721804616
39 pages
Implementation of Video Processing Techniques On A Field Programmable Gate Array Development Platform
No ratings yet
Implementation of Video Processing Techniques On A Field Programmable Gate Array Development Platform
45 pages
Overview of 3D Architecture Design Opportunities and Techniques
No ratings yet
Overview of 3D Architecture Design Opportunities and Techniques
6 pages
Rahul.P (3 023 PD16EI) : Seminar Report ON
No ratings yet
Rahul.P (3 023 PD16EI) : Seminar Report ON
13 pages
FPGA
No ratings yet
FPGA
16 pages
Introduction To FPGA
100% (1)
Introduction To FPGA
16 pages
MJC 010233
No ratings yet
MJC 010233
6 pages
wp-01231-understanding-how-hyperflex-architecture-enables-high-performance-systems
No ratings yet
wp-01231-understanding-how-hyperflex-architecture-enables-high-performance-systems
7 pages
Fpga & CPLD Asics Microprocessors Microcontrollers: Application Specific Integrated Circuits
No ratings yet
Fpga & CPLD Asics Microprocessors Microcontrollers: Application Specific Integrated Circuits
14 pages
Dreamcast Architecture: Architecture of Consoles: A Practical Analysis, #9
From Everand
Dreamcast Architecture: Architecture of Consoles: A Practical Analysis, #9
Rodrigo Copetti
No ratings yet
What's New in .NET 8? A Complete Guide to the Latest Features
From Everand
What's New in .NET 8? A Complete Guide to the Latest Features
Nitika
No ratings yet
Cisco Packet Tracer Implementation: Building and Configuring Networks: 1, #1
From Everand
Cisco Packet Tracer Implementation: Building and Configuring Networks: 1, #1
S. R. Jena
No ratings yet
Meijs Prorisc 90
No ratings yet
Meijs Prorisc 90
1 page
Low-Energy Embedded FPGA Structure
No ratings yet
Low-Energy Embedded FPGA Structure
6 pages
HARP Hard-Wired Routing Pattern FPGAs
No ratings yet
HARP Hard-Wired Routing Pattern FPGAs
9 pages
GlitchLess Dynamic Power Minimization in FPGAs Through Edge Alignment and Glitch Filtering
No ratings yet
GlitchLess Dynamic Power Minimization in FPGAs Through Edge Alignment and Glitch Filtering
14 pages
Step 1 of Tooth Preparation
No ratings yet
Step 1 of Tooth Preparation
3 pages
Full Business Statistics: Australia and New Zealand 1st Edition - Ebook PDF Ebook All Chapters
100% (4)
Full Business Statistics: Australia and New Zealand 1st Edition - Ebook PDF Ebook All Chapters
49 pages
An Introduction To Lagrange Multipliers - WWW - Slimy
No ratings yet
An Introduction To Lagrange Multipliers - WWW - Slimy
13 pages
Excavating - Safely - Leaflet - Gas CADENT PDF
No ratings yet
Excavating - Safely - Leaflet - Gas CADENT PDF
4 pages
Gastrointestinal and Esophageal Foreign Bodies in The Dog and Cat
No ratings yet
Gastrointestinal and Esophageal Foreign Bodies in The Dog and Cat
5 pages
Sydney Horning Resume 2021
No ratings yet
Sydney Horning Resume 2021
3 pages
Primary School Education in Singapore
No ratings yet
Primary School Education in Singapore
11 pages
Lec-1 - Mortar - CE 201@CE-21 - 2020
No ratings yet
Lec-1 - Mortar - CE 201@CE-21 - 2020
32 pages
Loctite Max 2018
No ratings yet
Loctite Max 2018
6 pages
Pagtacunan, Jovelyn S.
No ratings yet
Pagtacunan, Jovelyn S.
13 pages
AdvancedOperatingSystem Regular HO
No ratings yet
AdvancedOperatingSystem Regular HO
5 pages
Blackwork Hearts Petite Project Carolyn Webb Final Version
100% (1)
Blackwork Hearts Petite Project Carolyn Webb Final Version
7 pages
CTA Tehnic
No ratings yet
CTA Tehnic
5 pages
Critical evaluation of textbook
No ratings yet
Critical evaluation of textbook
4 pages
GST104 2019T
No ratings yet
GST104 2019T
6 pages
The World of Barilla Taylor
No ratings yet
The World of Barilla Taylor
13 pages
Mini 5
No ratings yet
Mini 5
11 pages
Goldway VET420A Manual 2005
No ratings yet
Goldway VET420A Manual 2005
46 pages
Source Code Uas-Pbo
No ratings yet
Source Code Uas-Pbo
3 pages
1 Soal Tambahan Literasi FC
No ratings yet
1 Soal Tambahan Literasi FC
5 pages
NCP: Labor Stage 2 (Expulsion)
100% (1)
NCP: Labor Stage 2 (Expulsion)
15 pages
Asignment Tutorial 3 Toothed Gears and Gear Trains
No ratings yet
Asignment Tutorial 3 Toothed Gears and Gear Trains
4 pages
DM - Capstone Project - Muskan Shrivastav, ArchanaGaikwad, Nikhilsaxena, Saikrishnadarapureddy, Aayushibafna
No ratings yet
DM - Capstone Project - Muskan Shrivastav, ArchanaGaikwad, Nikhilsaxena, Saikrishnadarapureddy, Aayushibafna
16 pages
Volume 9 Bioelectrochemistry
No ratings yet
Volume 9 Bioelectrochemistry
605 pages
Cell Structure Function
0% (1)
Cell Structure Function
26 pages

High-Performance, Cost-Effective Heterogeneous 3D FPGA Architectures

Uploaded by

High-Performance, Cost-Effective Heterogeneous 3D FPGA Architectures

Uploaded by

High-Performance, Cost-Effective Heterogeneous 3D

In this paper, we propose novel architectural and design techniques

Using Rent-based statistical wirelength distribution models, we

ESTIMATING DIE AREA SAVINGS

The total die area of a 3D heterogeneous FPGA is the sum of

MOTIVATION AND FORMULATION

One of the crucial properties of FPGAs is that the reconfigurable

Estimating Total Logic Area

1. Typical TSVs occupy remarkably large silicon area (4 4m

Our area estimations for the different computational resources

Estimating Total TSV Area

be mapped to a different die. An edge that entirely falls within a

where Ti is the number of TSVs connecting to die i and Tij , i 6= j

Ti and Tij can be computed by using placed and routed benchmark

Figure 3: The maximum cut in the linear arrangement of the

Figure 1: 2D switch vs. 3D switch.

Estimating Total Routing Area

Figure 2: A partitioning of a heterogeneous FPGA.

Switch Boxes: The interconnect routing switch boxes (Figure 4)

having fanout of d. Hence, to estimate the total wirelength of a

(WW WV )Fsw (Fsw + 1) + WV Fsv (Fsv + 1)

Interconnect Channel Width vs. Number

In the previous subsections we have discussed how to calculate

This equation is based on an assumption that for any design, the

Input connection block

where keq and pe q are equivalent Rents parameters of the system;

Output connection block

Figure 5: Interconnect Channel Width vs. Number of Dies

Figure 6: Interconnect path in 2D FPGAs vs. 3D FPGAs.

ESTIMATING IMPROVEMENTS IN PERFORMANCE

Table 2: Potential 3D partitioning configuration.

In this section we empirically assess the impact of our proposed

total die area

Interconnect Average Delay (ns)

Figure 7: 3D total die area and average connection delay vs.

Region I (less than ma die) : For a small number of die in the

CONCLUSION AND FUTURE WORK

In this paper we have proposed new architectures and design

You might also like