2019 - Scalable - Hardware-Based - On-Board - Processing - For - Run-Time - Adaptive - Lossless - Hyperspectral - Compression
2019 - Scalable - Hardware-Based - On-Board - Processing - For - Run-Time - Adaptive - Lossless - Hyperspectral - Compression
ABSTRACT Hyperspectral data processing is a computationally intensive task that is usually performed
in high-performance computing clusters. However, in remote sensing scenarios, where communications are
expensive, a compression stage is required at the edge of data acquisition before transmitting information to
ground stations for further processing. Moreover, hyperspectral image compressors need to meet minimum
performance and energy-efficiency levels to cope with the real-time requirements imposed by the sensors
and the available power budget. Hence, they are usually implemented as dedicated hardware accelerators in
expensive space-grade electronic devices. In recent years though, these devices have started to coexist with
low-cost commercial alternatives in which unconventional techniques, such as run-time hardware recon-
figuration are evaluated within research-oriented space missions (e.g., CubeSats). In this paper, a run-time
reconfigurable implementation of a low-complexity lossless hyperspectral compressor (i.e., CCSDS 123) on
a commercial off-the-shelf device is presented. The proposed approach leverages an FPGA-based on-board
processing architecture with a data-parallel execution model to transparently manage a configurable number
of resource-efficient hardware cores, dynamically adapting both throughput and energy efficiency. The
experimental results show that this solution is competitive when compared with the current state-of-the-
art hyperspectral compressors and that the impact of the parallelization scheme on the compression rate is
acceptable when considering the improvements in terms of performance and energy consumption. Moreover,
scalability tests prove that run-time adaptation of the compression throughput and energy efficiency can be
achieved by modifying the number of hardware accelerators, a feature that can be useful in space scenarios,
where requirements change over time (e.g., communication bandwidth or power budget).
INDEX TERMS Data compression, dynamic and partial reconfiguration, FPGAs, high-performance
embedded computing, hyperspectral images, on-board processing.
once deployed, since it is impossible to make modifications the proposed image partitioning scheme can complement
in the implemented circuit; on the other hand, they rely on the inherent redundant operation available in ARTICo3 to
expensive space-qualified devices to ensure correct behavior increase the fault tolerance of the compressor (i.e., a single
during mission time. error in one of the compressed bitstreams only prevents that
Recently, the appearance of CubeSats has enabled the subimage from decoding from that point onwards, instead of
use of low-cost technology, usually based upon commercial compromising the whole input image).
off-the-shelf components, in small satellite deployments [2]. The ARTICo3 -based HyLoC compressor differs from all
One of the main benefits of this is that CubeSats can be hardware-based alternatives available in the literature, which
used as testbeds for non-conventional developments target- typically rely on the use of a single and highly opti-
ing space applications. For instance, they allow the eval- mized accelerator. These implementations can achieve high
uation of dynamically and partially reconfigurable FPGAs throughput (ideally, 1 compressed sample per clock cycle)
as a possible replacement for traditional radiation-hardened in BIP mode with an optimized pipeline, although data
alternatives [3]. This evaluation includes the use of Dynamic dependencies force the compressor to run slower in BSQ or
and Partial Reconfiguration (DPR) of SRAM-based fabrics BIL modes. However, resource utilization grows significantly
with a twofold objective: on the one hand, to support func- when optimizing the cores, due to the amount of internal
tional adaptation with software-like flexibility but hardware- storage required. Moreover, the throughput may be severely
like performance; on the other hand, to implement run-time affected if internal resources are not enough and storage needs
fault mitigation mechanisms such as module relocation or to be outsourced (e.g., to an external memory attached to
configuration memory scrubbing [4]. the FPGA). Although this last problem cannot be avoided,
In this paper, a run-time scalable hardware-based imple- the parallel execution of several resource-constrained hard-
mentation of a lossless hyperspectral data compression ware accelerators mitigates the impact on execution perfor-
algorithm is presented. The proposed approach relies mance without increasing area overhead drastically.
on a configurable number of low-complexity compressor The rest of this paper is organized as follows. Section II
cores (HyLoC) managed by a DPR-enabled hardware pro- presents the building blocks used in this work. A discussion
cessing architecture (ARTICo3 ) operating in the recon- on the parallelization approach used to deploy the hyper-
figurable part of a commercial System on Programmable spectral image compressor on the hardware-based process-
Chip (SoPC). ing architecture is presented in Section III. The proposed
HyLoC [5] is a hardware-based implementation of the implementation is assessed in Section IV, and the conclusions
Consultative Committee for Space Data Systems (CCSDS) drawn from the results are presented in Section V.
123 standard that targets resource-constrained systems,
where limiting area overhead is more important than achiev-
ing high compression throughput. As such, it has a reduced II. TECHNOLOGY BACKGROUND
footprint and features low-power execution at the expense A. HYLOC: A LOW-COMPLEXITY IMPLEMENTATION
of limited performance. ARTICo3 [6], on the other hand, OF THE CCSDS 123 STANDARD
is a hardware-based processing architecture that benefits from The CCSDS 123 standard [7] describes a compression algo-
a multi-accelerator computing scheme to create a run-time rithm divided in two stages: a predictor and an entropy
adaptive solution space for data-parallel algorithms. This coder. The algorithm takes a multispectral or hyperspectral
solution space is defined by a tradeoff between computing image (i.e., a three-dimensional array of integer samples) as
performance, energy consumption and fault tolerance. input, and exploits redundancies in both spatial and spec-
The HyLoC compressor is sequential in nature. In order to tral domains to reduce data volume without compromising
make it compatible with the ARTICo3 execution model, input its integrity (i.e., the compression is lossless). As a result,
images are split in fixed-size subimages that are compressed the original input samples can be completely recovered from
independently. Hence, the implementation is provided with the encoded bitstream that is produced as output.
data-level parallelism and can fully benefit from a config- A simplified flowchart of the algorithm is presented in
urable number of accelerators to achieve better performance Figure 1. The predicted sample ŝz,y,x , which corresponds to
(one order of magnitude) at a very low cost in terms of power the actual sample sz,y,x , is computed using the previously
consumption, logic resources and design/verification times. processed samples in a three-dimensional neighborhood that
To the best of the authors’ knowledge, this is the first spans the current spectral band as well as the P previous bands
implementation of a lossless hyperspectral compressor with (see Figure 2). P is a user-defined parameter that can range
multiple SIMD-like hardware accelerators that renders run- between 0 (i.e., no information from previous bands is used
time adaptive throughput and energy consumption, while for prediction) and 15.
still being competitive with alternative state-of-the-art solu- The steps of the algorithm are as follows. First, a local
tions. This novel approach allows the system to compress sum (i.e., σz,y,x ) of the neighboring samples of sz,y,x in the
faster or slower depending on the availability of the down- current band is computed. Users can select how this local
link bandwidth or the urgency to receive, in the ground sum is computed from two alternatives: neighbor-oriented
station, the information contained in the image. In addition, and column-oriented (see Table 1). Then, the local sums are
required FPGA configuration files from the kernel spec- document mentions certain tradeoffs that need to be taken into
ifications provided by the developer. These descriptions account when applying partitioning. These tradeoffs involve
can be done either in low-level RTL code (e.g., VHDL, sacrificing the compression rate to, for instance, achieve fault
Verilog) or in high-level C/C++ code. Hence, hard- tolerance (e.g., minimize the effects of data corruption in
ware engineers can work with performance-oriented and communication links), or to limit the maximum size of each
resource-efficient accelerators, whereas software engineers compressed image [7]. By using ARTICo3 , computing per-
can also work with ARTICo3 by leveraging High-Level formance (algorithm speedup) and energy efficiency become
Synthesis (HLS) engines. At run time, the ARTICo3 runtime part of these tradeoffs, since they are also considered when
library makes both reconfiguration and execution manage- deciding the size and number of segments that are to be
ment transparent for developers, relying on a user-friendly compressed independently.
API called from the code running on the host processor. The In any case, hyperspectral image partitioning usually leads
application executable is also built by the ARTICo3 toolchain to a reduction in the compression rate for several reasons:
from a C/C++ specification, making it mandatory to pro- • Each segment requires a header, since it is handled as an
vide an already partitioned hardware/software application as independent image.
input. • Prediction is limited at the boundaries between segments
(less neighbors).
III. PARALLELIZATION APPROACH • Small segments mean less data to ‘‘learn’’ from and
The original HyLoC compressor was conceived to have a therefore, less adaptivity in the compressor.
small footprint in terms of logic resources and, as a result, Although the limits might change depending on the appli-
its normal operation is eminently sequential. Moreover, and cation scenario, variations in the compression rate need to
due to the adaptive nature of the compression algorithm be bounded. It is important to analyze the whole constraint
itself, it was meant to compress relatively large hyperspectral set in order to assess the feasibility of any given partitioning.
images. This approach, although reasonable in some contexts This is even more important in the proposed approach, where
(e.g., small rad-hard FPGAs or expensive ASICs), is no low-level, physical restrictions derived from the DPR-based
longer valid in an ARTICo3 -based implementation, because design flow are present. For this reason, an in-depth analy-
no data-level parallelism can be extracted and exploited. sis on the impact of different partitioning strategies on the
According to Section 2.1 of the CCSDS 123.0 Recom- compression rate is presented below. It is important to high-
mended Standard [7]: A user may choose to partition the light that, although this analysis is particularized through-
output of an imaging instrument into smaller images that are out the discussion to the data-parallel implementation of
separately compressed (. . .). This Recommended Standard HyLoC in ARTICo3 , the proposed partitioning strategies
does not address such partitioning or the tradeoffs associ- are platform-agnostic and hold valid for CPUs, GPUs and
ated with selecting the size of images produced under such hardware accelerators on FPGAs.
partitioning. This is further described in Section 5.3.2 of the
CCSDS 120.2 Informational Report [10]: The Recommended TABLE 3. CCSDS 123 compressor configuration.
TABLE 7. Compression throughput and energy efficiency when compressing a calibrated AVIRIS image with 224 bands, 512 lines and 512 samples using
ARTICo3 and a variable number of HyLoC accelerators (configured to compress subimage blocks of 224 bands, 8 lines and 8 samples,
FPGA fabric @ 100 MHz).
TABLE 8. Comparison of CCSDS 123 implementations according to the encoding order, bands used for prediction (P), target device, maximum operating
frequency, throughput and energy efficiency.