0% found this document useful (0 votes)
0 views

Exploring_and_optimizing_partitioning_of_large_des

This paper explores partitioning strategies for multi-FPGA prototyping platforms, focusing on hierarchical and multilevel approaches. Experimental results indicate that the multilevel approach yields better frequency results for mono-cluster benchmarks, while the hierarchical approach is more efficient in execution time for multi-cluster benchmarks. The study emphasizes the importance of effective partitioning in optimizing the performance of complex digital systems during prototyping.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

Exploring_and_optimizing_partitioning_of_large_des

This paper explores partitioning strategies for multi-FPGA prototyping platforms, focusing on hierarchical and multilevel approaches. Experimental results indicate that the multilevel approach yields better frequency results for mono-cluster benchmarks, while the hierarchical approach is more efficient in execution time for multi-cluster benchmarks. The study emphasizes the importance of effective partitioning in optimizing the performance of complex digital systems during prototyping.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Computing (2020) 102:2361–2383

https://doi.org/10.1007/s00607-020-00834-5

REGULAR PAPER

Exploring and optimizing partitioning of large designs


for multi-FPGA based prototyping platforms

Umer Farooq1 · Bander A. Alzahrani2

Received: 18 January 2020 / Accepted: 10 July 2020 / Published online: 21 July 2020
© Springer-Verlag GmbH Austria, part of Springer Nature 2020

Abstract
Recently, multi-FPGA platforms have become a popular choice to prototype complex
digital systems. This is because of unique advantages such as high frequency and real
world testing experience that are offered when compared to other pre-silicon testing
techniques. However, one of several challenges faced by multi-FPGA prototyping is
the requirement of an efficient back end flow. Partitioning is a key part of the back
end flow of multi-FPGA systems and it directly affects the quality of final prototyped
design. In this work, we explore two different partitioning approaches: one is multi-
level; while the other is hierarchical partitioning approach. For experimentation, we
use a suite of fourteen large benchmarks. Experimental results reveal that the mul-
tilevel approach gives 12.5% better frequency results for mono-cluster benchmarks
while the hierarchical approach gives 13% better results for multi-cluster benchmarks.
Furthermore, the hierarchical approach requires, on average, 60% less execution time
when compared to the multilevel partitioning approach.

Keywords Partitioning · Multi-FPGA systems · Prototyping

Mathematics Subject Classification 05C70 · 68U07

1 Introduction

Modern day System on Chip (SoC) designs have huge computation capability and they
are enormously complex to design. Moreover, shrinking product life cycle and faster
time-to-market pressures increase the need for an efficient, fault-free design process

B Umer Farooq
[email protected]
Bander A. Alzahrani
[email protected]

1 Electrical and Computer Engineering Department, Dhofar University, Salalah, Oman


2 Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia

123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2362 U. Farooq, B. A. Alzahrani

[1,2]. Because a faulty and inefficient design can cost a huge fortune [3,4]. In this
regard, FPGA-based prototyping offers a good option for complete design-to-silicon
system verification. FPGA-based prototyping is pre-silicon verification technique that
offers better speed as compared to simulation-based verification [5]. Simulation-based
solutions are cost-effective but they are very slow and offer only abstract level view
of the system. Although emulation-based pre-silicon verification gives good speed,
unique feature of FPGA-based prototyping is that it gives real-world testing and trouble
shooting experience to a user.
Prototyping of less complex Application Specific Integrated Circuit (ASIC) can be
performed on a single FPGA as the modern day FPGAs are quite capable and have huge
logic capacity. However, as the complexity of the system under consideration grows,
the capability of even the most modern FPGAs becomes insufficient to handle the
resource and I/O requirement of the ASIC. For such scenarios, multi-FPGA platforms
are required because the gap between FPGAs capability and ASIC requirement is huge
[6] and with every new processing technology, it is becoming increasingly difficult
to bridge this gap. Normally, the number of FPGAs required to prototype a design
depends upon the complexity of the design under consideration and this number may
vary from a few FPGAs to a couple of dozen FPGAs [7,8]. The prototyping of complex
ASIC designs using multi-FPGA platforms usually follows a complex back end flow
that involves several optimization steps. The core objective of this back end flow is
to optimize the frequency and the execution speed of the design under consideration.
The back end flow starts with the RTL description of the design. The design is first
synthesized and next partitioned using a partitioning algorithm. After partitioning, the
routing of the design is performed. Finally the flow is culminated in the intra-FPGA
placement and routing of the design.
Partitioning is one of the most critical steps of the multi-FPGA partitioning flow.
In this step, based on the number of FPGAs on multi-FPGA board, the design under
consideration is divided into multiple parts. Because of the several optimization con-
straints, finding an optimal partitioning solution is an NP hard problem [9]. When we
consider partitioning problem from multi-FPGA prototyping perspective, several con-
straints are associated with the partitioning of a complex design. Two of the principle
objectives of a partitioning tool are to respect the logic capacity of the target FPGA
architecture while keeping the communication between different partitions as small
as possible. Thanks to the improved design process and better processing technology,
both the logic capacity and number of I/Os of modern generations of FPGAs have
increased. However, the rate at which the logic capacity in FPGAs has increased is
much higher when compared to the rate of increase of number of I/Os. This trend has
led to an increased logic to I/O ratio in newer generations of FPGAs and it has become
particularly difficult for a partitioning tool to minimize the inter-partition commu-
nication. Thus, the number of signals (also termed as cut-nets) traversing different
partitions are more than the available I/Os between different FPGAs of a multi-FPGA
board. These signals are routed between different FPGAs through next step of the
prototyping flow which is called inter-FPGA routing.
Inter-FPGA routing follows the partitioning of the design under consideration and
also plays an important role in the overall optimization of the design under consid-
eration. In this step, the cut-nets of the partitioned design are routed on the tracks of

123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Exploring and optimizing partitioning of large designs… 2363

multi-FPGA board in a time division multiplexed (TDM) manner. So, higher value
of cut-nets will lead to a higher value of multiplexing ratio which in turn will reduce
the execution speed of the final prototyped design. The results produced by the rout-
ing tool are directly linked with the quality of preceding partitioning process. Even a
highly efficient routing tool cannot overturn the poor results of a partitioning tool. An
in depth discussion on the quality of partitioning tool and its impact on the frequency
of final prototyped design is presented in the subsequent sections of the paper.
It is evident from the discussion presented above that partitioning plays very impor-
tant role in the multi-FPGA prototyping flow. In this work, we propose and explore
two partitioning approaches, namely hierarchical and multilevel partitioning approach.
For this purpose, we use an open source flow. This flow gives complete experience
for the prototyping of multi-FPGA systems. However, our focus in this work remains
on the partitioning aspect of the flow. The flow proposed in this work starts with
the generation of large and complex benchmarks. These benchmarks are generated
using a generic academic tool that can generate both flat and hierarchical benchmarks.
The generated benchmarks are next logically synthesized using an open source tool
by VERIFIC [10]. This tool not only performs standard cell synthesis, it also gives
complete information about the interconnect of the design under consideration. After
synthesis, we perform partitioning of the design. In order to produce the best partition-
ing results, we strive to exploit the inherent interconnect patterns of the design under
consideration. For this purpose, we explore two different partitioning approaches. One
approach exploits the hierarchical interconnect which is inherent in certain designs. We
call this proposed approach as hierarchical partitioning approach. Second partitioning
approach that we use in this work performs partitioning using multilevel clustering and
refinement. We call this approach multilevel partitioning approach in this work. Both
proposed approaches are novel in the sense that they have been specifically customized
in the context of prototyping for multi-FPGA systems. An in-depth discussion on both
approaches is given in Sect. 4 of the paper. After partitioning, the inter-FPGA routing
of the design is performed. After routing, system frequency results are obtained for the
two partitioning approaches and a thorough analysis of those results is also presented.
Here, only a brief overview of different steps involved in the proposed back end flow
is given. Detailed discussion on these steps is given in the subsequent sections of the
paper. The main contributions of this paper are summarized as follows:
– Development of an open source, generic, back end flow for prototyping of multi-
FPGA systems. All the steps of the proposed flow either use open source tools or
the tools that are free for academia.
– Development and implementation of two partitioning approaches for exploration
and optimization of different designs in multi-FPGA prototyping.
– Extensive experimentation and thorough analysis of results obtained through the
proposed back end flow.
In the rest of the paper, Sect. 2 discusses the background and related work and also
elaborates the contribution of this work. Section 3 then gives a detailed discussion on
the proposed flow where comprehensive details of all the steps of back end flow are pre-
sented. Section 4 presents in-depth discussion on the two partitioning approaches that
we propose and explore in this work. Section 5 gives profound analysis of the results

123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2364 U. Farooq, B. A. Alzahrani

obtained through experimentation and Sect. 6 concludes this paper with discussion on
the future work.

2 Background and related work

The discussion presented in Sect. 1 shows that partitioning plays a very important
role in determining the system frequency of final prototyped design. Partitioning is a
well formulated research problem and researchers have been active in this area since
1970s. Many techniques have been proposed in past to find the efficient solution of
partitioning problems. Mainly, there are three different types of techniques which are
used to find the solution of a partitioning problem.
1. Analytical partitioning technique [11,12] is commonly utilized where objective
function is to optimize the quadratic length of the critical path. Although min-
imizing the quadratic length of the critical path is only an indirect measure of
the partitioning solution, its main advantage is that the objective function can be
achieved in very small time. This kind of approach is particularly suitable for very
large problems. A quadratic function, however, does not give the best possible
solution and it is often followed by several local tweaks.
2. Simulated annealing based placement [13,14] is another technique that uses the
annealing concept for molten metal which is cooled down gradually to produce
high quality solutions. The objective function of this approach is to minimize the
overall Manhattan distance between all the connected instances. This approach
is quite effective in finding a reasonably good solution in a small amount of time.
This type of technique is commonly used for island style architectures. However,
simulated annealing technique is classified more as a placement technique rather
than a partitioning technique.
3. Min-Cut based partitioning approach [15,16] is generally suitable for partitioning
of complex designs. The min-cut partitioner recursively partitions the design
under consideration. The aim of the partitioner is to minimize the cut-nets of the
design by merging the connected instances in a single cluster. Because of the
ability to find a good solution in small time, in this work, we mainly consider
min-cut based partitioning algorithms. Further discussion on different min-cut
based partitioning algorithms is given next.
In min-cut based partitioning approach, the design is presented as a hypergraph and
the connections between different instances of the design are presented as hyper edges.
The main objective of the partitioner is to minimize the number of hyper edges (con-
nections that traverse more than one partition) in the graph. In this regard, authors in
[17] present Kerninghan-Lin bi-partitioning algorithm. Authors in [18] present FM
partitioning algorithm that uses recursive bi-partitioning approach to find a solution
of a partitioning problem. Similarly authors in [19] present another bi-partitioning
algorithm that promises to give optimal results for small graphs. However, this algo-
rithm either gives sub-optimal or no results for large to very large hypergraph. The
aforementioned three algorithms are the main partitioning algorithms used for digital
systems and the research work done later is mainly an extension of one of these algo-

123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Exploring and optimizing partitioning of large designs… 2365

rithms. Among these algorithms, Fiduccia-Mattheyses (FM) heuristics [18] has known
to produce the best results. It is an iterative partitioning algorithm that minimizes the
cut-net count over multiple iterations. In each iteration, the cut-net cost is reduced
by maximizing the move gain that is associated with each move of the instance from
one cluster to another. In FM algorithm, all moves have either positive or negative
gains. After each move, the gains of all the associated instances are also updated. This
keeps the complexity of overall process linear and allows to find an optimal solution
in minimal time.
The min-cut based partitioning approach can be applied either in a flat manner that
finds a quick solution or it can be applied using multilevel approach. However, flat
partitioning approach’s computation time increases exponentially with the complexity
of the design. For designs having moderate to high complexity, multilevel hypergraph
partitioning approach has been known to produce the best results [20–22]. Multilevel
partitioning approach comprises of three phases namely clustering, top level parti-
tioning and uncoarsening. The main advantage of multilevel partitioning over flat
partitioners is its ability to search the solution space more effectively by spending
comparatively more effort on smaller coarsened hypergraphs. Good coarsening algo-
rithms allow for high correlation between good partitioning for coarsened hypergraphs
and better refinement for the initial hypergraph. Therefore, a thorough search at the
top of the multilevel hierarchy is worthwhile because it is relatively inexpensive when
compared to flat partitioning of the original hypergraph, but can still preserve most of
the possible improvement. The result is an algorithmic framework with both improved
run time and solution quality over a completely flat approach. Multilevel partitioning
approach was successfully demonstrated by hMetis program [22]. This tool mainly
uses FM algorithm for partitioning and it also introduced several new heuristics that
produced reportedly performance critical results. However, this tool partitions designs
with homogeneous instances only and cannot partition heterogeneous instances.
Above, we have presented a detailed discussion on the partitioning problem from
a generic perspective. When we look at partitioning solutions in the context of multi-
FPGA prototyping, different tools/work exist commercially as well as in academia.
Commercially, different tools exist which provide either partial or complete prototyp-
ing flow for multi-FPGA systems. For example Synopsys’ Protocompiler [23] gives
a complete back end prototyping flow for multi-FPGA systems. However, this tool
is accompanied by HAPS [24] hardware platform of Synopsys and works only for
Synopsys specific platforms. Then, there are AUSPY and WASGA [25] partitioning
tools. These tools are platform independent and are not accompanied by a specific
hardware platform. However, these tools give only partial partitioning solution and
do not provide complete prototyping flow. Another partitioning tool by Synopsys
called CERTIFY [26] which was available for partitioning solution of multi-FPGA
systems until recently. It was generic in nature and could have been used for any hard-
ware platform. However, recently it was discontinued and replaced by Protocompiler.
Apart from aforementioned tools, there are several other solutions [27–29] that are
provided by commercial vendors. Just like multi-FPGA prototyping, these solutions
are mainly used for pre-silicon verification. However, these solutions are either sim-
ulation or emulation based. Moreover, these solutions are very costly and they do not
fall in the domain of this paper. The discussion on aforementioned commercially avail-

123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2366 U. Farooq, B. A. Alzahrani

able partitioning solutions indicates that either these solutions are platform dependent
or they offer only partial solution. Moreover, all of them are proprietary tools with
thousands of dollars in annual subscription fees.
On the other hand, if we look at state-of-the-art academic solutions of partitioning
from multi-FPGA prototyping perspective, sufficient work is not available. Authors
in [30,31] propose a new multilevel hierarchical FPGA architecture and they propose
to use a multilevel partitioning tool for the partitioning of the design. However, their
proposed solution can handle homogeneous blocks and gives partitioning solution for
a single FPGA only. Similarly, authors in [32] explore the partitioning problem for
multi-FPGA systems. They perform comparison between solutions obtained through
commercial WASGA and CERTIFY partitioning tools only and do not give any aca-
demic solution. Also, authors in [33,34] explore prototyping of multi-FPGA systems.
However, for partitioning, they use commercial tool called CERTIFY [26] by Synop-
sys. The main focus of their flow remains the inter-FPGA routing issue of the back end
flow. Furthermore, authors in [35,36] also address the back end flow for multi-FPGA
systems, but their focus remains mainly the inter-FPGA routing as well.
In this work, we not only address the routing issue but we also focus on the par-
titioning problem. Because even a highly efficient routing tool cannot improve the
frequency of final prototyped design if it is preceded by an inefficient partitioning pro-
cess. In order to make the partitioning process efficient, we put particular emphasis on
the knowledge of interconnect of the design under consideration. We extract the infor-
mation on the interconnect of the design through open source tool called VERIFIC
[10]. Because, when it comes to different types of designs, they exhibit different inter-
connect patterns. Some of them are hierarchical in nature while others have rather flat
interconnect. So, partitioning all the designs with a single approach is not justified
and it may eventually lead to poor frequency results. For this reason, in our back end
flow, we propose and explore two different partitioning approaches in this work. first
approach is called hierarchical partitioning approach and it uses a hierarchical par-
titioning algorithm. This approach is more useful for designs exhibiting hierarchical
interconnect. Second approach is based on multilevel partitioning algorithm and it is
more suitable for rather flat designs. Details about the two proposed approaches are
given in Sect. 4. The two partitioning approaches coupled with an efficient inter-FPGA
routing tool give the best frequency results for the partitioned design.
To the best of our knowledge, there is not enough academic work in state-of-the-
art for multi-FPGA prototyping systems from partitioning perspective. As discussed
before, some work exists that either uses commercial tools or performs comparison
between partitioning results of commercial tool. The unique contribution of this work
is that we extract the information on the interconnect of the design through VERIFIC
tool which is free for academia. Next, we apply one of two partitioning approaches that
best exploits the interconnect in terms of minimizing the cut-nets of the partitioned
design. Both the proposed partitioning approaches used in this work are either based
on academic tools or the customized versions of those tools. So, through this work,
we strive to provide a platform for academia in multi-FPGA prototyping and advance
the research in the important domain of pre-silicon verification through multi-FPGA
prototyping.

123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Exploring and optimizing partitioning of large designs… 2367

3 Prototyping flow

In this paper, we propose a prototyping flow for multi-FPGA based systems. In this
flow, we explore two different partitioning approaches and analyze their effect on
the system frequency of final prototyped design. An overview of the complete flow
is shown in Fig. 1. It can be seen from this figure that the flow starts with the logic
synthesis of the benchmark under consideration. After passing through various steps,
the flow terminates at the bitstream generation of the design. Further discussion on
the steps of the flow is given next.

3.1 Benchmark generation

For any exploration flow, benchmarks are a fundamental requirement. For multi-FPGA
prototyping flow, this requirement is even more pertinent as complex benchmarks
mimicking the real life applications are utmost necessary to test the capability of the
tools of a prototyping flow. Researchers in the past [37–39] have used different sets of
benchmarks for different types of exploration environments. But these benchmarks are
either too small to pose a real challenge to the exploration tools or they are synthetic
in nature and lack resemblance with real life applications. In this work, we use bench-
marks that are generated by DSX [40] academic tool. Using this tool, we can generate
mono-core and multi-core MPSoC architectures. A mono-core MPSoC architecture
contains components like UART, RAM, multiple FIFOs, and co-processors. These
components are further connected with each other through a cross bar architecture.
An example of mono core MPSoC architecture is shown in Fig. 2. In a multi-core
MPSoC architecture, we have clusters of mono-core MPSoCs that are connected to
each other using a mesh-based NoC interconnect [41]. As compared to multi-core
MPSoCs, mono-core MPSoCs have lower complexity and flat bus-based intercon-
nect. Multi-core MPSoCs, on the other hand, have higher complexity and hierarchical
interconnect. An example of multi-core MPSoC architecture is shown in Fig. 3.

3.2 Synthesis

It can be seen from Fig. 1 that the benchmarks generated through the DSX tool are
first logically synthesized. During synthesis, the design is logically optimized. For
logic synthesis, in this work, we use open source tool by VERIFIC [10] which is
free for non-commercial academic purposes. When the benchmark is given to this
tool, it parses the whole design through a very powerful parser. The parser of this
tool builds a comprehensive database of all the components of the design and gives
complete information about the interconnect of different components of the design.
This information is very useful as it is used by the hierarchical partitioner later in the
flow. The tool also performs transformation of the design into the standard logic gate
format. We use this tool to keep our flow open source and generic in nature.

123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2368 U. Farooq, B. A. Alzahrani

Fig. 1 Multi-FPGA prototyping flow

123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Exploring and optimizing partitioning of large designs… 2369

Fig. 2 An example of mono-core MPSoC architecture

Fig. 3 An example of multi-core


MPSoC architecture

3.3 Partitioning

After synthesis, the partitioning of the design under consideration is performed. Since
the designs are quite large and complex, a single FPGA cannot satisfy their logic and
I/O resource requirements, thus, they have to be partitioned in multiple partitions. As
discussed in Sect. 1, the partitioning plays a very important role in the final execution
speed of the design under consideration. Normally, number of physical connections are
quite small between different partitions while the number of cut-nets that span these
partitions are quite large. So, in subsequent process, these cut-nets have to share the
physical resources between different FPGAs in a time multiplexed manner. Eventually,
larger cut-nets will lead to greater size of multiplexer; hence increasing the delay
and reducing the overall speed. Thus the main goal of any partitioner is to keep the

123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2370 U. Farooq, B. A. Alzahrani

(a)

(b)
Fig. 4 a Partitioning Solution with 2 cut-nets; b partitioning Solution with 1 cut-net

number of cut-nets as small as possible. Another constraint that a partitioner has to


deal with is the logic capacity of the target FPGA architecture. A partitioner must
satisfy this constraint while performing partitioning. These two combined constraints
make partitioning an NP hard problem [9] for large and complex designs and it is not
possible to find an optimal solution.
Figure 4 summarizes the partitioning problem. Figure 4a shows two partitions where
the number of cut-nets are 2. Figure 4b shows the partitioning solution where the
number of cut-nets are reduced from 2 to 1. But in order to do that, we have to move
large combinatorial logic from partition 2 to partition 1 and new combinatorial logic
part may not fit in the logic capacity of partition 1. So, a partitioner always has to find
a trade-off between the logic capacity and the cut-net constraint. To find an efficient
partitioning solution, the partitioner should know and exploit the interconnect of the
design under consideration. For this purpose, in this work, we explore two different
partitioning approaches. The details of these approaches are given in Sect. 4 of this
paper.

123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Exploring and optimizing partitioning of large designs… 2371

Fig. 5 An overview of inter-FPGA routing flow

3.4 Routing

Once the partitioning is completed, the routing of the design under consideration is
performed on the multi-FPGA board. The aim of partitioning approaches discussed in
Sect. 3.3 is to minimize the number of cut-nets. However, as discussed in Sect. 1, the
number of cut-nets are always greater than the available I/O resources of FPGAs. This
is because of higher logic capacity and fewer I/Os of newer generations of FPGAs.
Therefore, we have to route the cut-nets in a time division multiplexing manner. A
simplified overview of the flow used to perform inter-FPGA routing is shown in Fig. 5.
It can be seen from this figure that routing flow starts with the routing constraints, board
description (user generated) and trace assignment file (generated by partitioning).
Once these files are given to the routing tool, the routing graph is generated. The
I/Os of this routing graph are represented as a set of vertices V and the connection
between these vertices are represented as edges E. These vertices and edges that
are combined together make a directed graph G(V , E). The graph is later used by the
routing algorithm to route cut-nets on the physical resources of the FPGA board. Once
the routing graph is generated, initial mux ratio is computed as the ratio of maximum
number of cut-nets and the physical wires between two partitions. Next, the cut-nets
are grouped as per the mux ratio value and routing is performed. For inter-FPGA
routing, in this work, we use Pathfinder [42] routing algorithm. This is a congestion-
driven negotiation-based routing algorithm. Pathfinder routing algorithm routes the
cut-nets one by one and tries to find a conflict free solution through negotiation based
approach. For a conflict free solution, it uses an iterative approach through which the
cost of congested nodes is gradually increased to avoid congestion in future. This
algorithm routes all the cut-nets of the design in conflict free manner. Next, the mux
ratio is optimized using binary search algorithm. Each time, a successful routing is
achieved, the mux ratio is adjusted according to binary search algorithm. The binary
search algorithm continues until the best mux ratio is found. This process is also
depicted in Fig. 5. While searching for the minimum mux ratio, the routing algorithm
also tries to keep the number of hops as small as possible. This is because of the
reason that both mux ratio and number of hops affect the final system frequency
which is computed at the end of the routing process.

123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2372 U. Farooq, B. A. Alzahrani

3.5 Intra-FPGA synthesis, placement and routing

Once the routing is complete, the netlists are generated as shown in Fig. 1. These
netlists contain all the information related to the partitioned design and their routing
information. The netlists are next passed to the the vendor specific tool to perform
intra-FPGA synthesis, placement, and routing of all the partitions. After a successful
completion of this step, the bitstreams of the partitions are generated which can finally
be loaded into the respective FPGAs to complete the prototyping flow. The process of
loading of the bitstreams allows to perform the in-circuit verification and debugging
of the partitioned designed. Moreover, it also gives the real world, cycle accurate and
bit-accurate execution information of the partitioned design.
A comprehensive overview of all the steps of prototyping flow is given in this sec-
tion. In the next section, a further detailed discussion is provided on the two partitioning
approaches that are proposed and explored in this work.

4 Multi-FPGA partitioning approaches

As discussed in Sects. 1 and 3.3, partitioning plays a fundamental role in the quality
of final prototyped design. It is at this step that the number of cut-nets of a parti-
tioned design are determined. The cut-nets can be either a single source to single
destination (bi-terminal cut-nets) or they can be single source to multiple destinations
(multi-terminal cut-nets). These cut-nets later determine the mux ratio which even-
tually decides the execution speed of the design. It is evident that there is a direct
relation between the number of cut-nets obtained after partitioning and the execution
frequency of the final design. The primary objective of any partitioner is to minimize
the number of cut-nets while also satisfying the logic resource constraint of the target
architecture. In order to best satisfy these constraints, in this work, we explore two
different partitioning approaches; one is termed as hierarchical partitioning approach
while other is called multilevel partitioning approach. Detailed discussion on the two
partitioning approaches is provided next.

4.1 Hierarchical partitioning approach

It is discussed in Sect. 3.2 that we use VERIFIC to perform logic synthesis of the
design under consideration. While performing logic synthesis, VERIFIC parses the
whole design and it gives complete information about the interconnect of the design.
In hierarchical partitioning approach, we extract information about the hierarchy of
the design from VERIFIC parser tool. At the next step, based on the required num-
ber of partitions and specified capacity of each partition, a hierarchical partitioning
algorithm is applied on the design. The flow of this algorithm is given in Fig. 6. It can
be seen from this figure that initially all the instances of the synthesized design are
marked as unassigned. Next, on the basis of connectivity, these instances are assigned
into different partitions iteratively. The partitioning algorithm adopts a top-down par-
titioning approach. In each iteration, N unassigned instances are chosen. Then, based

123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Exploring and optimizing partitioning of large designs… 2373

Fig. 6 Hierarchical partitioning flow

on the hierarchical information, these instances are assigned to a partition where they
are most connected. This step is to ensure that the logic capacity constraint of the target
architecture is not violated. In case of violation, the algorithm reduces the number of
instances by moving further down the hierarchy and tries to assign instances based
on the connectivity. This process continues until all the instances are assigned into
different partitions. The algorithm combines top-down partitioning approach with the
hierarchical interconnect information of the design under consideration to minimize
the cut-net count and as a result it gives the result in a very small time. This kind of
approach is particularly useful for the designs that are inherently hierarchical in nature.
The pseudo code of this algorithm is given in Algorithm 1 and the steps performed are
summarized as follows:

123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2374 U. Farooq, B. A. Alzahrani

1. Take the parsed instance list, the number of partitions and partition capacity as
input.
2. Take N unassigned instances and assign them to M partitions based on their
connectivity. While assigning, make sure, partition capacity is not violated and
the cut-net is minimum.
3. Mark N instances as assigned and go back to step 2 again.
4. Terminate when all the instances are assigned.

Get_hierarchy;
Get_partitions;
Get_capacity;
while unassigned_instances do
instances=N;
find(max_connection);
if capacity > N then
assign_instances(N,M);
assigned = N;
end
else if instance_breakable then
level = level - 1;
end
else
Partitioning_impossible;
end
end
Algorithm 1: Pseudo-code for the Hierarchical Algorithm

The aforementioned steps are performed iteratively where connectivity among the
instances is given top priority and the partition size is always respected. As described
in Sect. 5, the above approach is more suited for designs which have an inherent
hierarchical interconnect architecture in nature. For flat designs, a more sophisticated
approach is required which is described next.

4.2 Multilevel partitioning approach

Contrary to the hierarchical approach that exploits the hierarchy of the design, the
multilevel approach uses clustering and refinement approach over multiple levels. In
this approach, the instances of the benchmark are first represented in the form of a
hypergraph. Initially, the graph is quite complex as it contains a lot of instances and
it is difficult to partition it. Therefore, the graph is next reduced by merging smaller
instances together. This process is called clustering and it is repeated over multiple
levels until the number of clusters are reduced to a few dozens in number. The process
continues until the graph becomes considerably small and the refinement becomes
easy. An example of this multilevel clustering process is given in Fig. 7 where a large
hyper-graph is reduced to a smaller hyper-graph after multiple iterations of clustering.
Once the clustering process is complete, the refinement of the graph is done and the
graph is expanded in a reverse manner. During the refinement process, the instances

123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Exploring and optimizing partitioning of large designs… 2375

Fig. 7 An overview of multi-level clustering

are moved between different clusters. The objective of the refinement process is to
minimize the overall cut-net count of the design. Each time a block (i.e. instance) is
moved from one cluster to another, the change in the total cut-net count is computed. If
the change is negative (which means total cut-nets are reduced), the move is accepted
and it is rejected otherwise. This is a greedy approach which may lead to a problem
of local-minima. To avoid such situation, moves with positive gain are also accepted
depending upon the level of refinement. At higher levels, such moves are accepted.
However, these moves are not accepted when the refinement is being performed at
lower levels. The refinement process continues until the bottom level of the graph is
reached. Upon reaching this point, the partitioning process is complete and we have
the final partitioned result. An overview of the refinement process is shown in Fig. 8
where only 2-way refinement is shown. However, the proposed multilevel partition-
ing tool is able to perform N-way partition as it is generic in nature. The multilevel
partitioning tool uses same approach as presented in [43] where first clustering is per-
formed which is then followed by initial partitioning and refinement phases. However,
the work presented in [43] performs partitioning of homogeneous instances only. On
the contrary, the proposed tool can handle heterogeneous instances and also takes into
account the maximum partition size.
The multilevel partitioning is a highly sophisticated technique and for flat designs,
it offers better results when compared to hierarchical approach. However, it requires
significantly more time to produce the partitioning result. Furthermore, the hierarchical
approach gives equal or better results for designs which are purely hierarchical in
nature. The pseudo code of the multilevel algorithm used in this work is shown in
Algorithm 2.

123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2376 U. Farooq, B. A. Alzahrani

Fig. 8 An overview of multilevel refinement

level = 0;
hierarchy[level] = hypergraph;
min_vertices = 200;
while hierarchy[level].vertex_count() > min_vertices do
next_level = cluster(hierarchy[level]);
level = level + 1;
hierarchy[level] = next_level;
end
partitioning[level] = a random initial solution for top-level hypergraph;
FM(hierarchy[level], partitioning[level]);
while level > 0 do
level = level - 1;
partitioning[level] = project(partitioning[level+1], hierarchy[level]);
FM(hierarchy[level], partitioning[level]);
end
Algorithm 2: Pseudo-code for the Multilevel Partitioning Algorithm

5 Experimentation and analysis

In this section, we present the experimental results that are obtained through the
exploration flow described in Sect. 3. Initially, an overview of the benchmarks used in
this work is presented and next the results obtained for those benchmarks are discussed.

123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Exploring and optimizing partitioning of large designs… 2377

Table 1 Benchmark description

Sr. no Benchmark name Benchmark type No of components

1 CPU20 Mono-cluster 50,460


2 CPU30 Mono-cluster 65,620
3 CPU50 Mono-cluster 85,260
4 CPU125 Mono-cluster 120,526
5 AES Multi-cluster 90,484
6 CPU2X2X1 Multi-cluster 93,654
7 CPU2X2X2 Multi-cluster 105,426
8 CPU2X2X3 Multi-cluster 119,256
9 CPU2X2X4 Multi-cluster 133,459
10 CPU2X2X5 Multi-cluster 368,125
11 CPU2X2X6 Multi-cluster 380,783
12 CPU2X2X7 Multi-cluster 395,487
13 CPU2X2X8 Multi-cluster 1,296,458
14 CPU4X4X2 Multi-cluster 1,319,258

5.1 Benchmarks

For experimentation, a set of fourteen complex benchmarks is used. These bench-


marks are generated through the DSX tool described in Sect. 1. Through DSX tool,
we generate both mono- and multi-cluster benchmarks which have varying degree of
complexity. The mono-cluster benchmarks mainly exhibit a non-hierarchical intercon-
nect pattern. Multi-cluster benchmarks, on the other hand, are hierarchical in nature
where different clusters are connected to each other in a hierarchy. The connection
patterns of the two types of benchmarks used in this work are further verified through
the VERIFC parsing tool. The core objective of incorporating two types of bench-
marks with different interconnect patterns is to test the capability of two partitioning
approaches being used in this work. The details of these benchmarks are given in
Table 1. It can be seen from this table that we use four mono- and ten multi-cluster
benchmarks. The internal structure of each mono-cluster benchmarks is already dis-
cussed in Sect. 3. The number of coprocessors in each mono-cluster benchmarks are
indicated at the end of each benchmark’s name as it can be seen from Table1. As far
as the multi-cluster benchmarks are concerned, they are named like C PU X xY x Z
where X xY indicates the size of cluster and Z indicates the number of processors in
each cluster. For example, the name C PU 2x2x6 indicates that this benchmark has
four clusters and inside each cluster there are six processors. As shown in Table 1, in
this work, we use a variety of benchmarks that have varying requirements in terms of
number of components.

123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2378 U. Farooq, B. A. Alzahrani

Fig. 9 Bi-terminal Cut-Net comparison between hierarchical and multilevel partitioning approach

5.2 Experimental results

Extensive experimentation is performed using the benchmarks described in Table 1.


These benchmarks are passed through the flow of Sect. 3 and comparison is performed
between the hierarchical and multilevel partitioning approaches of Sect. 4. The objec-
tive of this experimentation is to establish the working ability of the proposed flow
and also perform the comparative analysis of the results that are obtained through two
partitioning approaches.
When the benchmarks are passed through the flow using two partitioning
approaches, the first important metric that we obtain is that of cut-nets. Since, we
are considering multi-FPGA partitioning, the cut-nets obtained after partitioning can
be of two types: bi-terminal cut-nets and multi-terminal cut-nets. Bi-terminal cut-nets
are those that have a single source and single destination and they represent point-point
interconnect. Multi-terminal cut-nets, on the contrary, have single source and multi-
ple destinations and they represent point-multi-point interconnect. Both partitioning
approaches give different bi-terminal and multi-terminal cut-net count for each bench-
mark. Bi-terminal cut-net comparison between two partitioning approaches is given
in the Fig. 9. It can be seen from this figure that multilevel partitioning approach gives
better bi-terminal cut-net results for all the mono-cluster benchmarks. This is due to the
reason that mono-cluster benchmarks used in this work are flat in nature and they do not
possess hierarchy. This fact coupled with better clustering and refinement technique of
multilevel partitioning approach leads to, on average, 8% less bi-terminal cut-nets for
mono-cluster benchmarks as compared to hierarchical partitioning approach. How-
ever, the bi-terminal cut-net results for multi-cluster benchmarks reveal that multilevel
partitioning approach performs poorly as compared to purely hierarchical partitioning
approach. This trend emerges from the fact that the hierarchical partitioning approach
better exploits the inherent hierarchy of multi-cluster benchmarks as compared to
the multilevel partitioning approach. As a result, for multi-cluster benchmarks, the
hierarchical approach, on average, gives 12% less number of bi-terminal cut-nets.

123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Exploring and optimizing partitioning of large designs… 2379

Fig. 10 Multi-terminal Cut-Net comparison between hierarchical and multilevel partitioning approach

Fig. 11 Cut-Net comparison between hierarchical and multilevel partitioning approach

A similar trend for two partitioning approach on multi-terminal cut-nets metric is


also observed and this trend is shown in Fig. 10. It can be seen from this figure that,
on average, for mono-cluster benchmarks, multilevel approach gives 7.5% reduced
multi-terminal cut-nets whereas for multi-cluster benchmarks, hierarchical approach
gives 12.5% better cut-nets respectively. The bi-terminal and multi-terminal cut-nets
are combined together to give the total cut-net count for each benchmark under con-
sideration. The results obtained for total cut-net count are shown in Fig. 11 and the
trend observed in Figs. 9 and 10 is applicable here as well.
The effect of the cut-nets is carried forward when the routing of the benchmarks
under consideration is performed. This effect is evident in the form of mux ratio
which is obtained after the routing of each benchmark. The results on mux ratio are
given in Fig. 12. It can be seen from this figure that multilevel partitioning approach
gives better mux ratio results for mono-cluster benchmarks whereas the hierarchical
partitioning approach gives better results for multi-cluster benchmarks. For mono-

123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2380 U. Farooq, B. A. Alzahrani

Fig. 12 Multiplexing ratio comparison between hierarchical and multilevel partitioning approach

Fig. 13 System frequency comparison between hierarchical and multilevel partitioning approach

cluster benchmarks, multilevel partitioning approach gives on average 12.5% smaller


mux ratio while for multi-cluster benchmarks, hierarchical partitioning approach gives,
on average, 13% better mux ratio respectively.
Once the routing of the benchmark is completed, its system frequency is estimated
according to [44] using equation 1. It can be seen from this equation that partitioning
approach with smaller mux ratio values will result in better system frequency results.

125
sys_ f r eq = MHz (1)
mux_ratio

The system frequency results obtained using the two partitioning approaches are shown
in Fig. 13. It can be seen from this figure that, for mono-cluster benchmarks, the
multilevel partitioning approach gives better system frequency results whereas for
multi-cluster benchmarks, the hierarchical partitioning approach gives better results.

123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Exploring and optimizing partitioning of large designs… 2381

Fig. 14 Execution time comparison between hierarchical and multilevel partitioning approach

Finally, to further consolidate our comparison between two partitioning approaches,


we also measure the execution time taken by each approach to partition individual
benchmarks. The execution time results are given in Fig. 14. It can be seen from this fig-
ure that, for all the benchmarks under consideration, multilevel partitioning approach
gives poor results as compared to hierarchical partitioning approach. For mono-cluster
benchmarks, the gap is not huge and the multilevel approach is 24% slower than the
hierarchical approach. However, for multi-cluster benchmarks, the execution time gap
between hierarchical and multilevel partitioning approach is significant. For multi-
cluster benchmarks, the multilevel approach is 68% slower as compared to hierarchical
approach.
It is evident from the results presented in Figs. 11, 12, 13 and 14 that multilevel
approach gives better results for mono-cluster benchmarks which are flat and do not
possess traits of hierarchy. On the other hand, the hierarchical partitioning approach
gives better results for multi-cluster benchmarks that are inherently hierarchical in
nature. However, it is also important to note that hierarchical partitioning approach
outclasses the multilevel approach when we compare them from execution time point
of view. Being a mono-cluster benchmark or a multi-cluster benchmark, the hier-
archical partitioning approach gives always better results. The gain of hierarchical
approach is significant for mono-cluster benchmarks and this gain becomes even big-
ger for multi-cluster benchmarks.

6 Conclusion

For multi-FPGA systems, partitioning plays a very important role in determining the
quality of a final prototyped design. This work explores two partitioning approaches
for multi-FPGA prototyping systems. One approach exploits the inherent hierarchy
of benchmarks while second approach uses a multilevel clustering and refinement
approach to partition the design under consideration. For exploration purpose, we
use a set of fourteen large, complex and realistic benchmarks. Experimental results

123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
2382 U. Farooq, B. A. Alzahrani

obtained through the exploration environment of this work demonstrate that multi-
level partitioning approach gives overall better results for mono-cluster benchmarks.
On the other hand, hierarchical partitioning approach gives better results for multi-
cluster benchmarks. On average, multilevel approach gives 12.5% better frequency
results for mono-cluster benchmarks whereas hierarchical approach gives 13% better
frequency results for multi-cluster benchmarks. Execution time comparison between
two approaches further reveals that hierarchical approach gives better results irre-
spective of the nature of benchmarks under consideration. Hierarchical partitioning
approach gives on average 60% better execution time results as compared to multilevel
partitioning approach.
In this work, our emphasis has mainly been the exploration of partitioning
approaches. In the future, we will make the proposed multi-FPGA prototyping flow
more comprehensive by introducing novel in-circuit verification techniques. These
techniques can be used for the functional verification of design after the prototyping
of the design is finished.

References
1. Santarini M (2005) Asic prototyping: make versus buy. EDN 11
2. Sigenics: Custom Asic calculator (2017). http://www.sigenics.com/page/custom-asic-cost-calculator
3. AMD (2007). http://techreport.com/news/13721/chip-problem-limits-supply-of-quad-core-opterons
4. Pentium (1994). https://en.wikipedia.org/wiki/pentium_fdiv_bug
5. Graphics M (2017). https://www.mentor.com/products/fv/modelsim/
6. Ian Kuon JR (2010) Quantifying and exploring the gap between FPGAs and ASICs. Springer, Berlin
7. Krupnova H (2004) Mapping multi-million gate SOCS on FPGAS: industrial methodology and expe-
rience. In: Proceedings of design, automation and test in Europe conference and exhibition, vol 2, pp
1236–1241 2
8. Asaad S, Bellofatto R, Brezzo B, Haymes C, Kapur M, Parker B, Roewer T, Saha P, Takken T, Tierno J
(2012) A cycle-accurate, cycle-reproducible multi-FPGA system for accelerating multi-core processor
simulation. In: Proceedings of the ACM/SIGDA international symposium on field programmable gate
arrays, ser. FPGA ’12. New York, NY, USA: ACM, pp 153–162. https://doi.org/10.1145/2145694.
2145720
9. Garey MR, Johnson DS (1990) Computers and intractability: a guide to the theory of NPCompleteness.
W. H. Freeman & Co., New York
10. VERIFIC (2019). https://www.verific.com/
11. Sigl G, Doll K, Johannes F (1991) Analytical placement: a linear or a quadratic objective function?
In: Design automation conference, pp 427–432
12. Alpert CJ, Chan T, Huang D, Kahng A, MarkovI, Mulet P, Yan K (1997) Faster minimization of linear
wirelength for global placement. In: ACM symposium on physical design, pp 4–11
13. Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science 220:671–
680
14. Sechen C, Sangiovanni-Vincentelli A (1985) The timberwolf placement and routing package. JSSC,
pp 510–522
15. Dunlop A, Kernighan B (1985) A procedure for placement of standard-cell VLSI circuits. In: IEEE
transactions on CAD, pp 92–98
16. Huang D, Kahng A (1997) Partitioning-based standard-cell global placement with an exact objective.
In: ACM symposium on physical design, pp 18–25
17. Kernighan B, Lin S (1970) An efficient heuristic procedure for partitioning graphs. Bell Syst Tech J
49:291–307
18. Fiduccia CM, Mattheyeses RM (1982) A linear-time heuristic for improving network partitions. In:
Design automation conference, pp 175–181

123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Exploring and optimizing partitioning of large designs… 2383

19. Bui T, Chaudhuri S, Leighton T, Sipser M (1987) Graph bisection algorithms with good average
behavior. Combinatorica 7(2):171–191
20. Alpert CJ, Hagen LW, Kahng AB (1997) Multilevel circuit partitioning. In: Design automation con-
ference, pp 530–533
21. Karypis G, Aggarwal R, Kumar V, Shekhar S (1997) Multilevel hypergraph partitioning: application
in VLSI design. In: Design automation conference, pp 526–529
22. Karypis G, Kumar V (1999) Multilevel k-way hypergraph partitioning. In: Design automation confer-
ence
23. Haps protocompiler by synopsys (2017). http://www.synopsys.com/Prototyping/FPGABasedProto
typing/Pages/protocompiler.aspx
24. Haps multi-fpga board by synopsys (2017). http://www.synopsys.com/Prototyping/FPGABased
Prototyping/Pages/HAPS.aspx
25. Auspy (2017). https://www.mentor.com/products/fv/aupsy
26. Certify partitioning tool by synopsys (2017). http://www.synopsys.com/Prototyping/FPGABased
Prototyping/Pages/Certify.aspx
27. Series CP (2017). http://www.cadence.com/products/sd/palladium_xp_series/pages/default.aspx
28. Veloce MG (2017). https://www.mentor.com/products/fv/emulation-systems/
29. Zebu-server asic emulator by synopsys (2017). http://www.synopsys.com/tools/verification/hardware-
verification/emulation/Pages/default.aspx
30. Marrakchi Z, Mrabet H, Mehrez H (2005) Hierarchical FPGA clustering to improve routability. In:
Conference on Ph.D research in microelectronics and electronics, PRIME
31. Marrakchi Z, Mrabet H, Mehrez H (2006) A new multilevel hierarchical MFPGA and its suitable
configuration tools. In: Proceedings of ISVLSI, Karlsruhe, Germany
32. Turki M, Mehrez H, Marrakchi Z, Abid M (2013) Partitioning constraints and signal routing approach
for multi-fpga prototyping platform. In: 2013 International symposium on system on chip (SoC), pp
1–4
33. Tang Q, Mehrez H, Tuna M (2013) Routing algorithm for multi-fpga based systems using multi-point
physical tracks. In: 2013 International symposium on rapid system prototyping (RSP), pp 2–8
34. Farooq U, Baig I, Alzahrani BA (2018) An efficient inter-fpga routing exploration environment for
multi-fpga systems. IEEE Access 6:56 301–56 310
35. Inagi M, Takashima Y, Nakamura Y (2009) Globally optimal time-multiplexing in inter-fpga connec-
tions for accelerating multi-fpga systems. In: International conference on field programmable logic
and applications, pp 212–217
36. Hauck S, DeHon A (2007) Reconfigurable computing: the theory and practice of FPGA-based com-
putation. Morgan Kaufmann Publishers Inc., San Francisco
37. Stroobandt D, Verplaetse P, Van Campenhout J (2000) Generating synthetic benchmark circuits for
evaluating cad tools. IEEE Trans Comput Aided Des Integr Circuits Syst 19(9):1011–1022
38. Farooq U, Parvez H, Mehrez H, Marrakchi Z (2012) A new heterogeneous tree-based application
specific fpga and its comparison with mesh-based application specific fpga. Microprocess Microsyst
36(8):588–605. https://doi.org/10.1016/j.micpro.2012.06.012
39. Yang S (1991) Logic synthesis and optimization benchmarks user guide, version 3.0
40. Pouillon N, Greiner A (2010) Soc lib project. https://www.asim.lip6.fr/trac/dsx/
41. Miro Panades I, Greiner A, Sheibanyrad A (2006) A low cost network-on-chip with guaranteed service
well suited to the gals approach. In: 1st International conference on nano-networks and workshops,
NanoNet ’06, pp 1–5
42. McMurchie L, Ebeling C (1995) Pathfinder: a negotiation-based performance-driven router for fpgas.
In: ACM international symposium on field-programmable gate arrays, ACM Press, New York, pp
111–117
43. Karypis G, Kumar V (1999) Multilevel k-way hypergraph partitioning. In: Proceedings of the 36th
annual ACM/IEEE design automation conference, ser. DAC ’99, ACM, New York, NY, pp 343–348.
https://doi.org/10.1145/309847.309954
44. Synopsys (2017). http://www.synopsys.com/prototyping/fpgabasedprototyping/. http://www.
synopsys.com/Prototyping/FPGABasedPrototyping/FPMM/Pages/default.aspx

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps
and institutional affiliations.

123
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center
GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers
and authorised users (“Users”), for small-scale personal, non-commercial use provided that all
copyright, trade and service marks and other proprietary notices are maintained. By accessing,
sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of
use (“Terms”). For these purposes, Springer Nature considers academic use (by researchers and
students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and
conditions, a relevant site licence or a personal subscription. These Terms will prevail over any
conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription (to
the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of
the Creative Commons license used will apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may
also use these personal data internally within ResearchGate and Springer Nature and as agreed share
it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not otherwise
disclose your personal data outside the ResearchGate or the Springer Nature group of companies
unless we have your permission as detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial
use, it is important to note that Users may not:

1. use such content for the purpose of providing other users with access on a regular or large scale
basis or as a means to circumvent access control;
2. use such content where to do so would be considered a criminal or statutory offence in any
jurisdiction, or gives rise to civil liability, or is otherwise unlawful;
3. falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association
unless explicitly agreed to by Springer Nature in writing;
4. use bots or other automated methods to access the content or redirect messages
5. override any security feature or exclusionary protocol; or
6. share the content in order to create substitute for Springer Nature products or services or a
systematic database of Springer Nature journal content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a
product or service that creates revenue, royalties, rent or income from our content or its inclusion as
part of a paid for service or for other commercial gain. Springer Nature journal content cannot be
used for inter-library loans and librarians may not upload Springer Nature journal content on a large
scale into their, or any other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not
obligated to publish any information or content on this website and may remove it or features or
functionality at our sole discretion, at any time with or without notice. Springer Nature may revoke
this licence to you at any time and remove access to any copies of the Springer Nature journal content
which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or
guarantees to Users, either express or implied with respect to the Springer nature journal content and
all parties disclaim and waive any implied warranties or warranties imposed by law, including
merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published
by Springer Nature that may be licensed from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a
regular basis or in any other manner not expressly permitted by these Terms, please contact Springer
Nature at

[email protected]

You might also like