Multiple FPGAs Based Prototyping and Debugging With Complete Design Flow
Multiple FPGAs Based Prototyping and Debugging With Complete Design Flow
net/publication/313450244
Multiple FPGAs based prototyping and debugging with complete design flow
CITATION READS
1 539
5 authors, including:
All content following this page was uploaded by Umer Farooq on 11 November 2020.
Abstract—Multiple FPGA-based prototyping plays an impor- FPGAs and design complexity is also increasing, the large
tant role in the design and verification process due to their complex designs can be prototyped using multiple FPGAs.
low cost and high execution speed. However, there is a need In order to do so, large design is partitioned across multiple
to optimize the configuration flow of this multiple FPGA-
based prototyping. In this paper, we address the partitioning FPGAs to meet desired logic capacity [4]. Therefore, multiple
of large designs and propose a debugging methodology for these FPGA-based prototyping is becoming more demanding for
partitioned designs using Signal Tap II embedded logic analyzer industrial needs. Synopsys HAPS is an example of FPGA-
by Quartus tool of Altera. Usually SignalTap II tool is used based prototyping system that has been designed to fulfill
to debug design implemented on single FPGA and this logic industrial needs [5].
analyzer debugs FPGA device by probing the states of internal
signals without using external debug equipment. However, we use The multiple FPGA-based prototyping follows a configura-
SignalTap II logic analyzer for large designs on multiple FPGAs tion flow including various steps which are partitioning, syn-
and we facilitate the debugging methodology for thousands of thesis, inter-FPGA routing, intra-FPGA P&R, and debugging.
signals under consideration. We propose the debugging of large We generated very large scale benchmarks using Design Space
designs after partitioning by developing the techniques to trace eXploration (DSX) tool developed at LIP6 laboratory [6]. We
the required signals under test through multiple FPGAs without
using FPGA internal memory. We have generated various large used Synopsys Certify tool for partitioning large designs. In
benchmarks as well and tested them for multiple FPGA-based multiple FPGA-based prototyping, partitioning is a process to
prototyping. divide the large complex design into multiple parts depending
on the number of available FPGAs. In this way each part
I. I NTRODUCTION
after partitioning can fit to available logic capacity of target
The advancement in processing technology has tremen- device i.e. FPGA. Certify provides two types of partitioning
dously increased the computation capability of digital circuits either manually when user assigns logic elements to physical
which in turn has made their design process costly both in devices or automatically when it is done by the tool. The
terms of time and money. Design of recent digital system takes optimized partitioning process must reduce the number of
about two to three years to role out the first prototype and signals between FPGAs to increase the frequency of whole
requires millions of dollars [1]. Furthermore, disparaging pro- design. This partitioning is then followed by synthesis, inter-
cessing technology metrics are aggravating reliability issues in FPGA routing and finally placement and routing is done using
the final rolled out design. Design process of a digital circuit Quartus tools as we target Altera FPGAs. Once the placement
normally comprises of multiple important steps among which and routing are done, the next step is in-circuit verification
design verification is important as it takes around 70% of total [7]. We will use SignalTap II Embedded Logic Analyzer from
design time and 80% of total design cost. In order to minimize Altera for debugging of multiple FPGAs.
the effect of verification time and cost, different pre-silicon Moreover, for multiple FPGA-based prototyping, FPGAs
design verification techniques are exercised. These techniques are either mounted on a single board having physical connec-
primarily include simulation, emulation, and FPGA-based tions among them [8] or we can connect multiple FPGA boards
prototyping. Among these techniques, FPGA-based prototyp- together communicating either through cabling or communi-
ing has become popular as it is an economical method to cating through HSTC connectors. The communication among
validate the functionality of an ASIC design providing rapid multiple FPGAs can be either point to point or point to multi
time to market [2]. In near future, FPGA-based prototyping point depending upon the available physical tracks between
will become important for many applications including IoT FPGAs. These number of FPGAs depend upon the complexity
products, wireless sensor networks, and for cloud computing. of design and it may vary from few FPGAs to many [9], [10].
Despite its all benefits designers still oppose to do FPGA- We used four DE3 boards by Altera that have build-in HSTC
based physical prototyping as they understand that it does connectors for inter-FPGA communication. We will use Altera
not support very large scale designs [3]. In fact, the capacity tools for synthesis, P&R and debugging.
for FPGA-based designs is limited to few million of ASIC Furthermore, there are also limitations on debugging visi-
gates and it also takes months to have a working FPGA- bility for multiple FPGA-based prototype. When the specified
based prototype. There is large gap between ASICs and condition or set of conditions are reached, the Signaltap II
978-1-5090-4900-4/16/$31.00 ⃝2016
c IEEE stops and displays the data which is called trigger. Normally,
by defining the trigger conditions in logic analyzer the design required design size for multiple FPGA-based prototyping.
accuracy is achieved and the ability to isolate the errors from The architecture of large multiprocessor generated benchmark
design is improved. The SignalTap II does not require to is shown in Fig. 1 that consists of clusters which are the basic
change the internal design files or external probes to capture building blocks for 2D mesh. The cluster consists of RAM,
the state of internal nodes and no extra I/O pins for design un-
der test are required. The signals data are stored in memory of
FPGA until they are analyzed. SignalTap II from Altera is an
integrated logic analyzer that is normally used for debugging
of single FPGA. However, to use SignalTap II for debugging
of multiple FPGA-based prototyping is a challenging scientific
problem, the large design is partitioned and there is a need
to record and trace the required signals after partitioning for
debugging. In this paper, we will address how to perform
debugging for large designs after partitioning for multi-FPGA
prototyping system. First, we propose to use an external
memory to save the information of thousands of signals to
avoid limited FPGA memory. We can save the multiple states
of thousands of signals of multiple partitions which will be Fig. 1. 2D mesh NoC
running at multiple FPGAs of existing prototype. We have
taken into account this memory limitation and propose to use DMA and four processors including a data and instruction
external memory for multiple FPGA-based prototyping for cache. The 2D mesh is a grid of 𝑀 × 𝑀 clusters and
very large scale designs [11]. Second, as the large design is each cluster is further connected to the four nearest clusters.
partitioned and each partition contains thousands of signals, The size of the cluster depends on the grid that may suffer
there is a need to keep record of all signals. For example from latency which increases with the size of grid. It is
if there are 60 FPGAs and the large design is partitioned to thus possible to integrate more cores on a single chip while
multiple parts, to debug specific signals we do not know in controlling the increase of latency. Furthermore, in order to
which partition the signals are placed. As the debugging is have more complex and large benchmarks, we can move from
performed at run time just after the bitstreams are loaded mono-cluster to multi-cluster benchmarks where intra-cluster
to multiple FPGAs, we need to trace the required signals communication is done through VCI network and inter-cluster
under debugging to observe the states of signals according to communication is done through DSPIN (NoC) architecture.
trigger conditions. We have developed techniques to trace the We have generated the multi-cluster benchmarks which are
signals under debugging through multiple partitions running realistic and based on this architecture, we generated large
on multiple FPGAs. Third, We have tested the developed sizes 2D mesh NoCs that we will use in section V.
prototype for multiple-FPGAs after generating various large III. D ESIGN F LOW
benchmarks using DSX tool and facilitated the debugging of In this section we will describe the design flow that we will
large and complex designs after partitioning. use for our multiple FPGA-based prototyping. Fig. 2 describes
The rest of the paper is organized as follows. Section II the design flow where partitioning at RTL level is done before
presents the large benchmarks generation for multi-FPGA synthesis which is then followed by inter-FPGA routing and
prototyping. Section III gives brief overview of design flow multiplexing. The bitstreams for multiple FPGAs are generated
including partitioning, synthesis, inter-FPGA routing, intra- after intra-FPGA placement and routing. When the bitstreams
FPGA P&R and debugging. The debugging methodology is are downloaded to the FPGAs, in-circuit verification is the
described in section IV. The experimental setup is presented next step to validate the design functionality. We will provide
in section V that includes the system builder for multi-FPGA brief detail of each step in coming subsections.
prototyping. Various benchmarks are tested at multiple FPGA-
based prototype in this section and the debugging for large A. Partitioning
designs is also presented. Section VI is about conclusion and The benchmarks generated by DSX tools are large and
future works. have to be partitioned before synthesis in order to avoid long
synthesis time and for multiple FPGA mapping. We have used
II. B ENCHMARK G ENERATION Synopsis Certify partitioning tool that takes the description of
The complex and large scale benchmarks are elementary FPGA and perform the fast synthesis for partitioned design as
requirement for multiple FPGA-based prototyping. We gener- shown in Fig. 2. Certify tool generates the required partitions
ated various benchmarks using DSX tool that uses SoCLiB for multi-FPGA prototyping with minimum cut nets among
developed at LIP6 [6]. This section describes the architecture different partitions. Certify tool also generates the trace as-
of a benchmark which is in fact a 2D mesh NoC that is signment file that provides the information for communication
generated using SoCLiB [12]. Many NoCs of various sizes between FPGAs. This trace assignment file is finally given to
can be generated based on this pattern depending on the inter-FPGA routing tool.
which are obtained after partitioning. These cut nets are much
more than the FPGA I/Os and the inter-FPGA tracks are
very limited [13]. These cut nets are then routed using time
division multiplexing [14]. The main objective of inter-FPGA
routing tool is to provide the shortest path between the source
and destination FPGA by minimizing the multiplexing ratio.
The multiplexing ratio will have significant impact on global
frequency of whole multi-FPGA system. Another objective of
routing tool is minimize the routing hops to avoid extra delays.
This routing tool is developed internally at LIP6 laboratory
[15], [16].
after the bitstreams are loaded to multiple FPGAs, we need to Setup SignalTap II Embedded Logic Analyzer
End
design to debug flow is provided in Fig. 5 from Altera that
Task Flow of SignalTap II
starts from synthesis, continues to programming the device
and finally ends with capturing and sampling the data. We
Fig. 4. Task Flow of Logic Analyzer will use the same flow for multiple FPGAs. As each FPGA is
programmed independently, we will use our tool to trace the
We briefly explain the classical debugging flow by Signal- specific signals after partitioning. Once the signals are traced
Tap II of Altera for single FPGA, we will then use this and we have the information of FPGA where the signals are
classical debugging flow for the debugging of multiple FPGAs. placed, we can open the Signaltap II window for that specific
We start with new projects and the input for Quartus tool device to start debugging. This will reduce the debugging
can be of various format including VHDL, Verilog, or VQM. complexity for multiple FPGAs.
To debug the ASIC design several tasks are performed, Fig.
4 shows the complete task flow of SignalTap II by Altera V. E XPERIMENTAL S ETUP AND R ESULTS
including the configuration and debug analysis [18]. First step The DE3 System Builder is really helpful to create multiple
is to add .stp file to the design. If we want to view multiple projects for a large and complex design for multiple FPGAs
clock domain simultaneously, the additional instances of logic quickly as shown in Fig. 6. It also provides error-checking
analyzer are added in the design. After the addition of .stp file rules to avoid common mistakes. These mistakes may include
to the design, logic analyzer is configured in order to monitor wrong pin assignments which may damage the board. It
the required signals. The signals can be added manually to may avoid malfunctioning of board due to wrong device
the design or there is a possibility to use a plug-in, Nios II connections. We have used 4 FPGA Stratix III DE3 boards
embedded processor plug-in for example, to add all required of Altera by Terasic as shown in Fig. 7. These boards can
to generate .sof file. In .stp file signals and trigger conditions
are defined and using JTAG chain target device is selected for
configuration using each .stp file as shown in Fig. 9. Each
FPGA is then programmed independently and each signal tap
II logic analyzer runs independently as well through JTAG
chain. SignalTap II captures data at defined trigger points
through JTAG and the data are further analyzed.
JTAG Chain
Communication
Cable Stratix FPGA1 Stratix FPGA2 Stratix FPGA3 Stratix FPGA4