A_Survey_on_FPGA-Based_Heterogeneous_Clusters_Architectures

This document presents a survey on FPGA-based heterogeneous cluster architectures, highlighting the advantages of FPGAs over traditional CPU and GPU approaches in supercomputing, particularly in terms of performance and power consumption. It reviews various implementations and categorizes them into network, hardware, and software tools, discussing the trade-offs and challenges faced by designers. The study aims to provide insights into the evolution of FPGA clusters and identify open research problems in the field.

Uploaded by

adrian.pinado.cotrina

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

A_Survey_on_FPGA-Based_Heterogeneous_Clusters_Architectures

Uploaded by

adrian.pinado.cotrina

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Received 24 May 2023, accepted 15 June 2023, date of publication 21 June 2023, date of current version 10 July 2023.

Digital Object Identifier 10.1109/ACCESS.2023.3288431

A Survey on FPGA-Based Heterogeneous

Clusters Architectures
WERNER FLORIAN SAMAYOA 1,2 , MARIA LIZ CRESPO 1, ANDRES CICUTTIN 1,

AND SERGIO CARRATO 2

1 Multidisciplinary Laboratory (MLab), The Abdus Salam International Centre for Theoretical Physics, 34151 Trieste, Italy
2 Dipartimento di Ingegneria e Architettura (DIA), Universitã degli Studi di Trieste, 34127 Trieste, Italy
Corresponding author: Werner Florian Samayoa ([email protected])
This work was supported by the University of Trieste and The Abdus Salam International Centre for Theoretical Physics.

ABSTRACT In recent years, the most powerful supercomputers have already reached megawatt power con-
sumption levels, an important issue that challenges sustainability and shows the impossibility of maintaining
this trend. To this date, the prevalent approach to supercomputing is dominated by CPUs and GPUs. Given
their fixed architectures with generic instruction sets, they have been favored with lots of tools and mature
workflows which led to mass adoption and further growth. However, reconfigurable hardware such as FPGAs
has repeatedly proven that it offers substantial advantages over this supercomputing approach concerning
performance and power consumption. In this survey, we review the most relevant works that advanced the
field of heterogeneous supercomputing using FPGAs focusing on their architectural characteristics. Each
work was divided into three main parts: network, hardware, and software tools. All implementations face
challenges that involve all three parts. These dependencies result in compromises that designers must take
into account. The advantages and limitations of each approach are discussed and compared in detail. The
classification and study of the architectures illustrate the trade-offs of the solutions and help identify open
problems and research lines.

INDEX TERMS FPGA, SoC, heterogeneous computing, supercomputing, reconfigurable computing.

The notion of reconfigurable hardware has been present using a Hardware Description Language (HDL), such as
since 1984, when Altera delivered the first programmable VHDL or Verilog. The HDL description is then synthesized
logic device (PLD) to the industry [1]. Then, in 1985 into a netlist that is mapped onto the FPGA’s logic ele-
Ross Freeman and Bernard Vonderschmitt patented the ments and interconnections required to implement the desired
first commercially viable field-programmable gate array digital design. The final implementation in the FPGA is
(FPGA) [2]. Owing to production costs, when compared to performed using vendor-specific tools such as Vivado [3],
application-specific integrated circuits (ASICs), FPGAs are Vitis [4], Quartus [5], and Libero [6]. Once the mapping and
traditionally used in applications with low production vol- routing process is completed, the design is compiled into a
umes that require high throughput and low latency. bitstream file loaded onto the FPGA to configure its logic
FPGAs are electronic devices that consist of many config- elements and interconnections to create a circuit correspond-
urable logic blocks composed of look-up tables, flip-flops, ing to the algorithm. It has to be added, however, proprietary
I/O blocks, and interconnection fabric. FPGAs are used to FPGA vendor tools have dominated the field, there are now
create custom hardware solutions, which make the imple- some open-source FPGA tools, such as Yosys [7], F4PGA [8],
mentation of algorithms quite different from targeting a CPU. and RapidSilicon [9], that provide alternative options for
The initial step typically consists of describing the algorithm developers seeking open-source solutions.
FPGAs have evolved into more complex devices [10]
The associate editor coordinating the review of this manuscript and by integrating components, such as embedded memory
approving it for publication was Vincenzo Conti . resources, clock management units, digital signal processing

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

VOLUME 11, 2023 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ 67679
W. F. Samayoa et al.: Survey on FPGA-Based Heterogeneous Clusters Architectures

TABLE 1. The 13 dwarfs of Berkeley [16], where each one represents an and development time. The preference for FPGAs is due
algorithmic method encapsulating patterns of communication and/or
computation with example problems. to their reconfigurability, which allows extreme hardware
specialization when needed. In addition, the fact that FPGAs
offer a wide array of input-output ports makes them ideal for
stream computation and for creating pipe-lined systems that
can maintain high throughput with low latency.
The purpose of this survey is to demonstrate and ana-
lyze the challenges of heterogeneous supercomputing by
studying the most relevant implementations of FPGA-based
cluster architectures from different application fields. Each
studied platform provides valuable insight into the decisions
and tradeoffs developers have made to reach their specific
goals. By leveraging their experience, it will be possible to
visualize the evolution and present trends in FPGA-based
clusters and target the main open challenges. We propose
dividing the architectural components of each cluster into
network, hardware, and software tools. This division helps
identify and discuss the pros and cons of each component in
its corresponding domain.
The main contributions of this study are as follows:
1) The comprehensive study of the state-of-the-art of
FPGA-based clusters.
2) A three-way segmentation of the clusters’ architecture.
3) A critical discussion of the components that build up
blocks (DSP), network-on-chip (NoC), and CPUs [11]. These
the studied clusters.
hybrid devices are known as system-on-chip (SoC-FPGA)
or adaptive SoCs, depending on the vendor. Their increased In the context of this paper, we describe a cluster by its
capabilities have increased interest in specific applications computational units (CU), which correspond to its small-
and general purposes [12], [13]. est independent part and sometimes coincide with a single
As a reconfigurable device, FPGA offers the advantage of network node. Each CU can be composed of several compu-
continuous improvement in hardware and software. In fact, tational elements (CE), namely CPUs, GPUs, and FPGAs.
being able to change the architecture offers great freedom The remainder of this paper is organized as follows.
when developing complex systems. Furthermore, FPGAs Section I elaborates on the implementations and explores rel-
have been shown to consume considerably less power than evant advancements in their application fields. A table at the
CPUs and GPUs [14], leading to reduced cooling and energy end of each application field discussion summarizes the main
costs. contributions of each study, along with the reported perfor-
By studying computing problems, classification based on mance and energy improvements, when available. Significant
repeating algorithmic patterns was proposed in 2004 [15]. In differences can be understood by studying the evolution of
2006, [16] 6 new algorithmic encapsulations were defined, heterogeneous clusters within each niche. Section II presents
expanding the classification to 13 dwarfs as shown in Table 1. the classification of systems from an architectural perspec-
Theoretically, each dwarf can be mapped onto a specific com- tive. The three main aspects described in each implementation
puting architecture [17], [18]. This has inspired the creation were used as comparison points. Subsection II-A presents
of benchmarks for heterogeneous systems such as Dwarf- a comparison of the network infrastructure in the studies.
Bench [19], Rodinia [20], and OpenDwarfs [21]. The hardware available in each studied cluster is discussed
Several implementations of heterogeneous high- in Subsection II-B. To complete the classification discus-
performance computing (HPC) systems housing FPGAs sion, the developer tools are compared in Subsection II-C.
can be named, such as Project Catapult at Microsoft [22], In Section III we present the remaining open problems and
Alibaba FaaS (FPGA as a Service) [23], Amazon EC2 F1 the identified trends. To close this paper, Section IV draws
instances [24], and ARUZ cluster at Lodz University [25]. conclusions.
At CERN, the massive adoption of FPGAs for online data
processing has motivated the development and adoption of I. CLUSTER IMPLEMENTATIONS
specific tools to aid the development of applications based Different FPGA-based cluster implementations were stud-
on FPGAs, such as hls4ml [26] (high-level synthesis for ied, and their specific characteristics highlight the purpose
machine learning). This tool, along with many others [27], for which they were planned. Technological advances offer
[28], [29], [30], [31], allows for a higher level of abstrac- greater flexibility, and cost reduction opens the door to
tion, thereby significantly reducing implementation errors increasing complexity. It can be appreciated that there is a