Scalable_QKD_Postprocessing_System_With_Reconfigurable_Hardware_Accelerator
Scalable_QKD_Postprocessing_System_With_Reconfigurable_Hardware_Accelerator
ABSTRACT Key distillation is an essential component of every quantum key distribution (QKD) system
because it compensates for the inherent transmission errors of a quantum channel. However, the interop-
erability and throughput aspects of the postprocessing components are often neglected. In this article, we
propose a high-throughput key distillation framework that supports multiple QKD protocols, implemented in
a field-programmable gate array (FPGA). The proposed design adapts a MapReduce programming model to
efficiently process large chunks of raw data across the limited computing resources of an FPGA. We present a
novel hardware-efficient integrated postprocessing architecture that offers dynamic error correction, mutual
authentication with a physically unclonable function, and an inbuilt high-speed encryption application that
utilizes the key for secure communication. In addition, we have developed a semiautomated high-level
synthesis framework that is compatible with any discrete variable QKD system, showing promising speedup.
Overall, the experimental results demonstrate a noteworthy enhancement in scalability achieved through the
utilization of a single FPGA platform.
INDEX TERMS High-level synthesis (HLS), key distillation engine (KDE), MapReduce framework, phys-
ical unclonable function (PUF), quantum key distribution (QKD).
© 2023 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
VOLUME 4, 2023 4100914
For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
IEEE Transactions on
uantum Engineering Venkatachalam et al.: SCALABLE QKD POSTPROCESSING SYSTEM
terms of the key rate in a real-time environment is chal- We summarize the main contributions of our work as
lenging. Achieving the speed of quantum communication follows.
necessitates a fast key distillation layer with efficient control
hardware. 1) A hardware-based KDE is designed with the capability
The QKD postprocessing can be broken down into sub- to support multiprotocol discrete variable (DV) QKD
modules, namely: 1) synchronization; 2) sifting for era- systems.
sure; 3) sifting for basis reconciliation; 4) random sampling; 2) A hardware-based MapReduce accelerator is designed
5) parameter estimation (PE); 6) information reconciliation to achieve a significant speedup of the computationally
(IR) and verification; 7) privacy amplification (PA); and 8) intensive tasks related to IR and PA.
key management. These classical components require mas- 3) A reconfigurable architecture attributed to a framework
sive computing and memory resources, which is why they is developed through HLS technology.
were earlier implemented on server systems. However, ow- 4) Mutual authentication for QKD systems using a phys-
ing to complex infrastructure and the security assumption ically unclonable function (PUF) is implemented.
of device isolation (trusted node architecture) [30], there 5) Rate-adaptive error reconciliation codes are utilized to
is a need for a solution that is stand-alone, compact, and optimize classical channel throughput, with enhanced
reconfigurable at low power consumption. Hence, the re- error correction capacity.
search has now progressed toward field-programmable gate 6) An on-device high-speed encryptor with a throughput
array (FPGA) accelerators. A hardware description language of up to 10 Gb/s is included.
(HDL) is a language used to program FPGAs [13]. Any 7) A detailed experimental field trial for the following
HDL design is directly correlated with resource consumption three different protocols is presented:
in the FPGA. As the design becomes more complex, the a) Coherent one way (COW) [26];
need to streamline the design-flow process has led FPGA b) BB84, developed by Charles H. Bennett and
developers to explore software-based productivity tools that Gilles Brassard in 1984. It is named after
automate the register-transfer level (RTL) design flow; high- the duo’s surnames (Bennett and Brassard,
level synthesis (HLS) is one such tool, and its adaptation to BB84) [37];
QKD key distillation engine (KDE) is further described in c) BBM92, developed by Charles H. Bennett,
Section III. Gilles Brassard, and N. David Mermin in 1992.
One of the major engineering challenges in real-time im- It is named after the trio’s surnames (Bennett,
plementations of high-performance QKD networks is the Brassard, and Mermin, BBM92) [36].
continuous storage and processing of large amount of data,
resulting in memory and computational overhead on the tar- The rest of this article is organized as follows. Section II
geted systems. Given that quantum encoding takes place at covers related work and literature review. Section III cov-
frequencies in the hundreds of gigahertz range, it is essential ers the proposed system design, architecture, and implemen-
that classical postprocessing techniques operate in real time tation of KDE in hardware. Section IV defines the exper-
to facilitate the extraction of a secure secret key at a level imental setup of the QKD protocols, and Section V pro-
consistent with the quantum process. The IR phase in post- vides the implementation results and the performance anal-
processing involves excessive dynamic computations and ysis of the design. Section VI describes the future work
memory accesses. Therefore, the overall processing time is and open challenges. Finally, Section VII concludes this
dominated by complex computations and unstructured mem- article.
ory management.
Recently, there has been an increased interest in acceler- II. RELATED WORK
ating postprocessing using FPGAs [8], [9], [16], [25], [31], One of the early attempts in 2012 to design a complete com-
[33] with limited memory storage and management capa- pact QKD system that integrated optics and control hardware
bilities. However, extended memory units, such as static into a single chassis was made by Zhang et al. [33]. They
random-access memory (SRAMs) and dynamic random- implemented the decoy-state BB84 protocol, but the key dis-
access memory (DRAMs), can be used along with the FPGA tillation software stack faced challenges due to inefficient
to overcome this drawback. A technological gap still exists computing devices, resulting in lower key rates. Around the
in the practical implementation of a high-throughput and ef- same time, Tanaka et al. [27] achieved a high-speed phase-
ficient hardware–software codesign for large-scale quantum encoded BB84 QKD system that covered a distance of 50 km.
key distillation. In this article, we propose an FPGA design They transmitted with a repetition frequency of tens of giga-
for large-volume data processing based on the MapReduce hertz using parallel transmission of photons and wavelength
programming model. This design aims to improve process- division multiplexing. However, their work underscored
ing throughput while ensuring security. We report a compre- due to the requirement of massive computing and mem-
hensive experimental study to evaluate the performance and ory resources, making the system unsuitable for portable
efficiency of different QKD protocols. applications.
FIGURE 1. Overview of the FPGA-based support system for multi-QKD protocols with corresponding postprocessing stages; M {1, 2, 3, 4} represents the
number of parallel mapper instances.
Efforts to accelerate data processing of measured qubits III. SYSTEM ARCHITECTURE AND DESIGN
have led to research on individual modules within the QKD METHODOLOGY
postprocessing. Cui et al. [9] focused on an efficient im- We propose an FPGA-based flexible high-throughput
plementation of the error reconciliation module by exploit- key distillation framework that supports multiple QKD
ing FPGA parallelism and pipelining the execution between protocols. This generalized key distillation framework is de-
read, write, and compute operations. Walenta et al. [28] signed to adapt to various QKD protocols. The KDE per-
made significant strides toward a field-deployable QKD forms all the postprocessing tasks and provides the final
system by integrating control hardware and data process- key to the encryption application. To ensure effective im-
ing units. Constantin et al. [8] provided a detailed de- plementation, the entire postprocessing flow, as illustrated in
scription of key distillation modules, all implemented on Fig. 1, can be executed in three distinct phases: the prepara-
a single Xilinx Virtex 6 FPGA. However, the system size tory phase, the data acquisition phase, and the reconciliation
remained substantial. An alternative to FPGA-accelerated phase.
postprocessing is a pipelined software implementation. Zhou The preparatory phase, as described in more detail in
et al. [34] proposed a multithreaded pipelined approach that Section III-C, involves alignment and measurement pro-
utilized multiple CPU cores to optimize performance param- cesses to establish a low-loss and low-error quantum channel.
eters. Yuan et al. [32] configured a host server system with The software components required for the preparatory phase
FPGA-based accelerators for various QKD processing tasks. are specific to the particular QKD protocol being used. In
Yang et al. [31] focused on parallelizing the IR phase for this work, we have adopted the HLS framework to perform
continuous-variable QKD. tasks that require modifications based on the QKD protocol
A recent review, by Li and Pang [16], has emphasized the and to create a unified integration platform for QKD postpro-
almost mandatory choice of the FPGA for QKD applications cessing. Specific methods, such as clock synchronization and
due to its advantages in power consumption [23] and design measurement alignment, are closely tied to the protocol im-
productivity, which are key features for critical applications, plementation specifications. We provide further details about
such as for satellite quantum communication (CubeSat mis- this platform in Section III-B.
sions) [21]. It also highlighted the benefits of using HLS to The data acquisition phase (see Section III-D) involves
design and configure accelerators. Moving from the FPGA the gathering and transformation of the detected raw key
to a system-on-chip (SoC) architecture, a recent work [25] vector before proceeding to the reconciliation phase. The
presented a hardware- and software-integrated architecture gathering and time-stamping techniques used are generic
suitable for practical QKD and quantum random number to any QKD protocol. On the other hand, data-sifting
generation schemes. This architecture optimally distributes methods depend on the specific QKD protocol and its
time-related and management tasks between the FPGA and implementation.
the CPU. The reconciliation phase, as described in Section III-E,
While previous designs focused on classical postprocess- encompasses error estimation, correction, verification, and
ing engines for specific QKD protocols, our proposal aims to PA modules, all of which are independent of the chosen
create a complete suite of algorithms compatible with mul- QKD protocol. It is worth noting that error correction and
tiple DV QKD protocols, implemented as a stand-alone and PA techniques are computationally intensive. To address this,
isolated system with a parallel data storage and processing our hardware design efficiently employs the MapReduce pro-
framework. In the following sections, we provide a detailed gramming model to further optimize throughput. This ap-
design implementation of this proposed architecture. proach strikes a balance between hardware utilization and
FIGURE 2. FPGA block design for the classical postprocessing engine describing all the modules and their interconnects.
bulky data processing. The MapReduce data storage and pro- B. MULTIPROTOCOL QKD SUPPORT
cessing framework also assist in distributing complex data Our design approach aims to create an efficient and config-
processing tasks across the limited computing resources of urable control and processing framework that can be seam-
the FPGA. lessly integrated into the design solution for any QKD pro-
tocol implementation, regardless of the underlying technol-
ogy. This generic framework, tailored for programmable
A. FPGA-BASED PROCESSOR DESIGN hardware, leverages HLS technology. HLS facilitates re-
In this article, a single FPGA is used to perform all the control configurability and the transformation of high-level algo-
and data processing for QKD. This FPGA fabric is designed rithmic descriptions into RTL models, making it acces-
with a soft-core-processor-based SoC architecture. Process sible to individuals without extensive HDL development
control, task scheduling, and interface for data process- experience [13].
ing modules are defined as a software development toolkit In our framework, we embrace modularity by incorporat-
(SDK) application program interface (API)-based software ing independent IP cores for each control and data process-
application. Makni et al. [17] previously highlighted the ing task. Control modules can be developed by integrating
advantages of an FPGA-based SoC architecture. Fig. 2 il- software drivers for optoelectronic components commonly
lustrates the high-level architecture, consisting of the con- used in QKD experiments. These software drivers are often
trol unit, data processing unit, and memory management available as open-source libraries in high-level programming
unit. The soft-core processor, along with the data process- languages such as C/C++, making them directly compati-
ing and control modules, is implemented on the FPGA’s ble with our design through HLS. These drivers play a vi-
programmable logic. The board is equipped with 2 GB of tal role in configuring parameters like variable attenuation,
DDR3 SODIMM memory, which is used for storing data re- interference visibility, modulator bias, state of polarization,
quired during processing. MicroBlaze, a soft processor core and more, all of which contribute to the establishment of
designed for Xilinx FPGAs, is used in this architecture. It a secure quantum channel. In addition, our design accom-
is integrated with an AXI interconnect peripheral bus for modates libraries for mathematical and computational utility
system-memory-mapped transactions, offering server–client functions [24].
capability. Each data processing unit module interfaces with The Phase I modules, or preparatory modules, as described
the AXI data port of MicroBlaze for communication, while in Section III-C, are designed and tested using a QKD simu-
DDR3 is interfaced with MicroBlaze over AXI instruction lator. Subsequently, they are refined to comply with the syn-
cache and data cache ports. Further details regarding the thesizable subset, and a test bench is developed to ensure that
implementation of individual data processing, synchroniza- their functionality remains intact. Optimization directives
tion, and authentication modules are provided in Sections and data types are added to enhance performance. During
III-C–III-F. The control and data processing algorithms are synthesis, C functions translate into RTL blocks, function ar-
implemented as custom-developed intellectual property (IP) guments become RTL I/O, and arrays translate into memory
cores, each serving a specific function. The software appli- elements (e.g., RAM, ROM, or FIFO). To optimize hardware
cation defines a flow of execution of each custom hardware resource utilization, HLS offers user directives. Pipelining
accelerator. directives are integrated to meet timing constraints [35]. In
addition, we employ arbitrary precision data types in our tasks. However, in our implementation, two desktop comput-
design [1]. ers are used on each side to record the experimental data and
also serve as command-and-control systems for the FPGAs.
C. PREPARATORY PHASE The hardware and firmware used for time stamping and mea-
The operation of the control unit primarily involves configur- surement are based on open-source components that have
ing the QKD protocol and relies heavily on software drivers been developed and utilized in other QKD experiments. To
of optoelectronic components. The integration of this func- record the time-of-flight measurement of the single photons
tionality has been given in detail in [24]. Furthermore, the re- detected by a single photon detector (SPD), time-to-digital
maining components of the current KDE can be reconfigured converters (TDCs) are employed. The SPD outputs a de-
to accommodate various QKD protocols with minimal effort. tected photon as a nuclear instrumentation module pulse,
Each submodule within the system is designed as a separate often referred to as a “click,” which is then read by the FPGA.
proprietary library (IP core), making it readily reusable for A multichannel tapped delay line TDC was developed based
different QKD protocol configurations. Consequently, the on a reference design provided by Adamic and Trost [38].
switching time between different protocols is reduced, sim- This TDC logic is integrated into an IP core for the FPGA
plifying the process of building interoperable systems. These interface. Furthermore, the design framework includes inter-
design considerations enhance the flexibility and adaptability faces to and from the optical setup using digital-to-analog
of the KDE to support a range of QKD protocols. converters (DACs) and analog-to-digital converters (ADCs).
ADCs and DACs are crucial for controlling the optical com-
ponents, e.g., ADC is used to drive the coherent laser, and
1) SYNCHRONIZATION AND CLASSICAL CHANNEL the DAC is used for amplitude and phase modulation. The
COMMUNICATION specific details of the ADC and DAC implementations are
Precise timing information recording the launch of a quan- beyond the scope of this article.
tum state from Alice and its arrival at Bob is crucial for ef-
fective key sifting. To achieve time synchronization, we have 1) SIFTING
adopted the commercial White Rabbit Lite Embedded node After the quantum transmission and measurement, Alice and
(WR-LEN), as part of our system architecture. WR-LEN Bob utilize the classical channel to derive a secret key. The
is a versatile synchronization solution capable of accom- alignment step involves procedures to synchronize the de-
modating various classical communication protocols [19]. tection times (time stamps) in Bob’s reference frame with
WR-LEN leverages two essential technologies: Synchronous Alice’s key bits in her reference frame. This process can
Ethernet (SyncE) and the enhanced precision time protocol be extensive in terms of effort and transmitted information.
(PTP). SyncE ensures frequency matching, while PTP en- The sliding window protocol is used for alignment, and the
ables precise offset adjustments of the clock. This synchro- autocorrelation coefficient is used to confirm the alignment.
nization is facilitated over a single channel, which serves a The basis sifting procedure filters out inconclusive or in-
dual purpose—providing both synchronization and a means compatible measurement results at Bob’s end compared to
for exchanging information required for classical postpro- Alice’s preparation. Depending on the implemented QKD
cessing. protocol, the required information exchange (e.g., measure-
For high-speed communication, which occurs at gigabit- ment basis) might need to be bidirectional, as in standard
per-second rates, we have utilized the Aurora 8B/10B pro- BB84, or unidirectional, as in the distributed-phase-reference
tocol. Developed by Xilinx, this protocol is designed for (DPR) protocol. As a result of the sifting procedure, both
point-to-point serial links and serves as a robust link layer Alice and Bob have a set of nsift elements.
communication protocol. An important feature of the Aurora In the BB84 protocol, Alice stores two bits to describe
8B/10B protocol is that it is an open standard, making it avail- the prepared state, the key bit value, and the basis choice.
able for implementation by anyone. In our implementation, Bob also stores two-bit information about the measurement
we have employed this protocol for data transmission over choice and measurement outcome, along with the time stamp
the classical channel. However, it is worth noting that our of the detection. After the alignment step, Bob announces one
proposed system is flexible and supports integration with any bit describing the measurement choice for each detection,
standard communication protocol. and Alice responds with one bit containing the xor value
between Bob’s and her measurement choice. This method is
D. QKD DATA ACQUISITION chosen to reduce communication overhead in the classical
In a QKD system, data are collected by Alice and Bob us- channel. Therefore, the communication rate for basis sifting
ing dedicated hardware components. The key information in the BB84 protocol is calculated as mBB84BS = 2 × nQ bits,
is derived from the measurement device by recording the where nQ is the number of measurement outcomes Bob has
time stamps. This time-stamping capability is essential for recorded.
each side to implement a full-stack QKD postprocessing In the COW protocol, Alice encodes the key information
system. In a real-world scenario, a fully integrated hardware in two consecutive time bins. Alice’s nsift elements contain
KDE with an embedded operating system would handle these two bits each, describing the prepared state and indicating
The number of mapper instances declared in the design The approximate error rate is estimated as rn=1 N1 , where r
depends on the input block size and the required output block represents the number of errored bits received, and N is the
size. In our case, the design is tested with two, three, and total number of exposed bits. This value is captured as the
four instances of the mapper. The ECC runs a single itera- quantum bit error rate (QBER) of the quantum channel.
tion over a block size of 8192 bits. The input size for the In the proposed design, the random sampling of exposed
PA reducer module is 1 007 616 bits, so for three instances bits and the QBER calculation is implemented on the Mi-
of the mapper, 41 iterations (1 007 616 = 8192 × 3 × 41) croBlaze processing system. This implementation involves
of each instance are required to generate the desired output modulo 2 additions and a single 32-bit division operation.
size. The QBER estimate plays a crucial role in determining
Increasing the number of mapper instances can reduce whether a secure secret key can be extracted through fur-
the number of iterations, thereby decreasing the time com- ther processing. If the QBER exceeds a predefined thresh-
plexity of the implementation, up to a point. This scalabil- old (defined for each protocol, considering errors due to de-
ity potential is demonstrated in Fig. 7. By incorporating a vice and measurement imperfections as well as the amount
MapReduce framework into the KDE, a high throughput of of information that can potentially be leaked to an all-
400 kb/s is achieved for a block size of 106 compared to powerful quantum adversary through the devices or the
40 kb/s without MapReduce. Further optimization and more channel), the iteration is aborted, and the derived key is
powerful hardware could potentially scale the design to seven discarded.
instances of the mapper, enabling PA on larger block sizes
with a substantial increase in throughput. Our design model
draws inspiration from the implementation of Neshatpour 3) LOW-DENSITY PARITY-CHECK (LDPC) ERROR
et al. [20]. CORRECTION
LDPC codes have been extensively researched as forward
2) PARAMETER ESTIMATION ECCs for QKD systems [10], [11], [18]. In the proposed
After the sifting process, Alice needs to estimate the ap- system, LDPC codes are implemented as an IP core us-
proximate lower limit of the error introduced during the ing the HLS design flow described in [24]. The technique
transmission of quantum states. This estimation is deter- used to construct the LDPC parity check matrix is known
mined by comparing a randomly sampled subset of the as protograph code construction. Protograph codes are cre-
sifted key vector, which is shared by Bob. The inequality ated by expanding a base protograph. The resulting LDPC
described in (1) is proved using Chernoff–Hoeffding-type parity check matrix is a combination of submatrices. In the
bounds and is dependent on the sample size. Based on the proposed system, an irregular parity check matrix is con-
deduced error, the key distillation process is either aborted or structed and populated with values from the Galois field of
continued two elements [GF(2)]. Index positions of the elements of the
matrix containing a value of one are stored in the local mem-
Error rate of sampled subset ≈ Error rate
ory of MicroBlaze. The row and column indexes are used
of remaining bits + . (1) to construct a Tanner graph at the decoder. A soft-decision
FIGURE 6. Experimental scheme of (a) BB84 protocol and (b) BBM92 protocol, which includes both optical and electronic arrangements.
EPS: entangled photon source, FM: flip mirror, PM: prism mirror, M: mirror, F: filter, FC: fiber coupler, BS: balanced beam splitter, PBS: polarization beam
splitter, DPBS: dual-wavelength PBS, HWP: half-wave plate, SMF: single-mode fiber, MMF: multimode fiber, SPCM: single-photon counting modules,
PPKTP: Periodically poled potassium titanyl phosphate. LD: laser driver, BD: beam dumper.
in Ahmedabad, India, where the polarization-based BB84 TABLE 1 Performance Metrics of Each KDE Module Implemented on
Hardware Including Both Alice’s and Bob’s Design
QKD protocol [6] and the BBM92 QKD protocol were im-
plemented. Details of the experimental setups are depicted in
Fig. 6. In the BB84 setup shown in Fig. 6(a), weak coherent
pulses are generated by using a variable optical attenuator at
the output of a pulsed laser with a repetition rate of 80 MHz.
The encoded state is then propagated in a free-space lossy
medium with channel transmissivity estimated at 70%. At
Bob, there is a polarization-based detection setup consisting
of a balanced beam splitter (BS; a passive random basis se-
lector) with a polarizing beam splitter (PBS) on the reflected
arm (measurement in H, V ), and a combination of the half-
wave plate with PBS (measurement in D, A) on the trans-
mitted arm. At the output ports of the PBS, the photons are
TABLE 2 Utilization Report of the Alice (Transmitter) and Bob (Receiver)
detected by fiber-coupled avalanche photodiodes (Excelitas KDE Designs Without MapReduce Framework (WOMF), and With
SPCM AQRH-14-FC). The BBM92 protocol is just the en- MapReduce Framework (WMF; Three Parallel Instances)
tangled version of the BB84 protocol. The BBM92 protocol
involves pairs of entangled photons. In this protocol, a com-
mon sender prepares the entangled photon source and sends
it to Alice and Bob through the quantum channel. In Fig. 6(b),
the polarization Sagnac interferometer is used to prepare en-
tangled photons. In this interferometry, a diagonally polar-
ized 405-nm continuous-wave laser with an output power of
∼ 5 mW is used to pump a 30-mm-long Type-0 periodically
poled potassium titanyl phosphate (PPKTP) crystal of period
3.425 μm. A lens L1 of focal length 400 mm is used to focus to block the pump beam while transmitting the entangled
the pump beam on the crystal to generate entangled photons. photons. A prism mirror is used to separate the entangled
The horizontally polarized pump beam is transmitted through photon pairs. One photon is sent to Alice, and another photon
the dual-wavelength PBS (DPBS) in a clockwise direction, is sent to Bob (each has a detection setup) through launching
and vertically polarized light is reflected through the DPBS optics. The detection setup is the same as BB84. The output
in a counterclockwise direction. Since both the clockwise from the SPD is fed into electronics for recording the counts
and counterclockwise pump beams follow the same path but per integration time, and these data are then used to derive
in opposite directions inside the Sagnac interferometer and the sifted key vector. The sifted key vector is the input to the
the Type-0 PPKTP crystal is placed symmetric to the DPBS, KDE.
the implemented scheme is robust against any optical path
changes to produce spontaneous parametric downconversion V. IMPLEMENTATION AND ANALYSIS
photons in orthogonal polarizations with ultrastable phase. The developed KDE hardware design is tested and veri-
At the output of the Sagnac interferometer, a filter is used fied using quantum bits (raw key vector) collected from the
TABLE 3 MapReduce-Based IR; Implementation Validation on Intel CPU Core for {1,3,4} Number of Parallel Instances Using Python’s Inbuilt Functional
Programming Feature, Running the Same Algorithms
quantum experiments implementing the BB84 protocol [see TABLE 4 Implementation Results for Multiple Parallel Instances of the
Mapper
Fig. 6(a)], the BBM92 protocol [see Fig. 6(b)], and the COW
protocol. The QKD control and postprocessing hardware de-
signs are implemented on a Virtex-7 VX485T Xilinx FPGA.
Table 2 provides a record of the area and utilization pa-
rameters for the hardware implementation of the KDE de-
sign, both with MapReduce framework (WMF) and without
MapReduce framework (WOMF) in the parallel architecture.
Utilization summaries of the transmitter (Alice) and receiver
(Bob) designs with three parallel instances are captured in
WMF.
Table 1 records the implementation efficiency and perfor-
mance of each module in KDE design in terms of latency
and key rate. The execution time is determined by consider-
ing both the number of clock cycles required to process the
module and the clock period, determined by the board’s clock
frequency. The key rate is derived from the total input block
size divided by the time and the maximum delay path.
In terms of resource utilization, an increase in the uti-
lization of logic cells and block RAM units is observed in in the design can reduce latency, thereby increasing the
Table 2, only on the receiver side with the MapReduce frame- key rate.
work. This is primarily due to the computationally intensive
LDPC decoder. This can be reduced by adopting optimized
decoder implementations. The experiment is conducted for VI. FUTURE WORK AND OPEN CHALLENGES
each protocol, collecting and processing 10 MB of raw key Our experimental studies provide valuable insights into com-
bits. The variations in QBER recorded in the experiments plexity estimation, computing resource requirements, and
are used to validate the effectiveness of rate-adaptive LDPC bandwidth needs for establishing a fully integrated QKD
codes with a threshold error correction capacity of 25% at system. These findings lay the groundwork for the fu-
90% efficiency. Table 3 summarizes the average performance ture development of large-scale commercial QKD systems.
of both the FPGA implementation and CPU core imple- The proposed work utilizes a generic MapReduce design
mentation. Table 4 presents the performance of our imple- framework, which can be extended to build interconnected
mented design across various QKD experimental settings FPGA-based systems for large-scale applications. Future
and compares performance at different QBER values. It is work may explore novel approaches, such as viewing FP-
worth noting that while increasing the number of parallel GAs as individual coprocessors within computing clusters
instances, there is a significant increase in key rate and subse- with server–client architecture, offering scalability for QKD
quently a decrease in utilization time due to the parallel post- systems. Nonetheless, several challenges remain. First, the
processing system implementation. Performancewise, Fig. 7 complexity of the FPGA-based framework for quantum key
illustrates that as QBER increases, execution time or la- reconciliation on large datasets requires further optimiza-
tency also increases, inversely affecting the secret key ex- tion for enhanced efficiency and performance. Second, the
traction rate. However, leveraging more parallel instances integration of QKD systems with reconfigurable FPGA
ACKNOWLEDGMENT
The authors would like to acknowledge and thank Dr. Jothi
Ramalingam and Sarika K Menon for their invaluable as-
sistance and insightful discussions. The authors would also
like to acknowledge M. Swathi Mithran for his support and
assistance.
REFERENCES
[1] A. Alhamali et al., “FPGA-accelerated Hadoop cluster for deep learning
computations,” in Proc. IEEE Int. Conf. Data Mining Workshop, 2015,
pp. 565–574, doi: 10.1109/ICDMW.2015.148.
[2] N. N. Anandakumar, M. S. Hashmi, and M. A. Chaudhary, “Implemen-
tation of efficient XOR arbiter PUF on FPGA with enhanced unique-
ness and security,” IEEE Access, vol. 10, pp. 129832–129842, 2022,
doi: 10.1109/ACCESS.2022.3228635.
[3] D. Bacco, M. Canale, N. Laurenti, G. Vallone, and P. Villoresi, “Ex-
perimental quantum key distribution with finite-key security analysis
for noisy channels,” Nature Commun., vol. 4, no. 1, pp. 1–8, 2013,
doi: 10.1038/ncomms3363.
[4] C. H. Bennett, F. Bessette, G. Brassard, L. Salvail, and J. Smolin, “Exper-
FIGURE 7. Plot of (a) number of parallel instances of mapper in the imental quantum cryptography,” J. Cryptol., vol. 5, no. 1, pp. 3–28, 1992,
Hadoop framework versus execution time in seconds and (b) execution doi: 10.1007/BF00191318.
time, in seconds, versus key rate in kb/s, for COW, BB84, and BBM92
QKD protocols at varying QBER (measured during the quantum
[5] D. J. Bernstein, “The poly1305-AES message-authentication code,”
experiment). in Proc. Int. Workshop Fast Softw. Encryption, 2005, pp. 32–49,
doi: 10.1007/11502760_3.
[6] A. Biswas, A. Banerji, N. Lal, P. Chandravanshi, R. Kumar, and R. P.
Singh, “Quantum key distribution with multiphoton pulses: An advan-
accelerators presents challenges in terms of interoperabil- tage,” Opt. Continuum, vol. 1, no. 1, pp. 68–79, 2022, doi: 10.1364/OPT-
CON.445727.
ity and reconfigurability. Third, QKD systems are sensi-
[7] J. L. Carter and M. N. Wegman, “Universal classes of hash func-
tive to clock synchronization and memory management, re- tions,” J. Comput. Syst. Sci., vol. 18, no. 2, pp. 143–154, 1979,
quiring solutions and optimizations to achieve reasonable doi: 10.1016/0022-0000(79)90044-8.
performance. [8] J. Constantin et al., “An FPGA-based 4 Mbps secret key distillation engine
for quantum key distribution systems,” J. Signal Process. Syst., vol. 86,
no. 1, pp. 1–15, 2017, doi: 10.1007/s11265-015-1086-1.
VII. CONCLUSION [9] K. Cui, J. Wang, H.-F. Zhang, C.-L. Luo, G. Jin, and T.-Y. Chen, “A real-
time design based on FPGA for expeditious error reconciliation in QKD
In this article, we presented a design that enhances scalabil- system,” IEEE Trans. Inf. Forensics Secur., vol. 8, no. 1, pp. 184–190,
ity and enables faster key reconciliation for QKD systems Jan. 2013, doi: 10.1109/TIFS.2012.2228855.
by leveraging the FPGA-based MapReduce architecture. We [10] AR Dixon and H. Sato, “High speed and adaptable error correction for
megabit/s rate quantum key distribution,” Sci. Rep., vol. 4, no. 1, pp. 1–6,
introduced a novel reconfigurable architecture that optimizes 2014, doi: 10.1038/srep07275.
resources through the use of reusable blocks. In addition, our [11] D. Elkouss, J. Martinez-Mateo, and V. Martin, “Information rec-
research introduced a PUF-based authentication protocol to onciliation for quantum key distribution,” 2010, arXiv:1007.1616,
facilitate mutual device authentication and secure message doi: 10.48550/arXiv.1007.1616.
[12] B. Frhlich et al., “Long-distance quantum key distribution secure
exchange for QKD devices. Our experimental results demon- against coherent attacks,” Optica, vol. 4, no. 1, pp. 163–167, 2017,
strated that the hardware design strikes a balance between doi: 10.1364/OPTICA.4.000163.
resource utilization and throughput. Consequently, imple- [13] M. Gurel, A Comparative Study Between RTL and HLS for Image Pro-
cessing Applications With FPGAs. San Diego, CA, USA: Univ. California,
menting MapReduce-based QKD postprocessing functional- 2016.
ity directly in hardware emerges as a preferred technique to [14] K. Inoue, E. Waks, and Y. Yamamoto, “Differential-phase-shift quantum
meet the computational and critical security requirements of key distribution using coherent light,” Phys. Rev. A, vol. 68, no. 2, 2003,
commercial QKD systems. An added advantage of FPGA- Art. no. 022317, doi: 10.1103/PhysRevA.68.022317.
[15] A. Kerckhoffs, “La cryptographie militaire,” J. des Sci.
based key distillation is its capacity to enhance performance Mil., vol. IX, pp. 5–38, Jan. 1883. [Online]. Available:
and compatibility with various QKD experimental setups. https://petitcolas.net/kerckhoffs/crypto_militaire_2.pdf
Typically, a QKD KDE utilizes a combination of individual [16] H. Li and Y. Pang, “FPGA-accelerated quantum computing emulation
and quantum key distillation,” IEEE Micro, vol. 41, no. 4, pp. 49–57,
IP core libraries to construct complete postprocessing pro- Jul./Aug. 2021, doi: 10.1109/MM.2021.3085431.
tocols, including a data encryption module. This approach
allows for the creation of a secure multi-QKD protocol-based 1 https://github.com/SETSQKD
[17] M. Makni, M. Baklouti, S. Niar, and M. Abid, “Hardware resource estima- [38] M. Adamič and A. Trost, “A fast high-resolution time-to-digital converter
tion for heterogeneous FPGA-based SoCs,” in Proc. Symp. Appl. Comput., implemented in a Zynq 7010 SoC,” in Proc. Austrochip Workshop Micro-
2017, pp. 1481–1487, doi: 10.1145/3019612.3019683. electronics, 2019, pp. 29–34, doi: 10.1109/Austrochip.2019.00017.
[18] J. Martinez-Mateo, C. Pacher, M. Peev, A. Ciurana, and V. Martin, [39] C. H. Bennett, G. Brassard, C. Crépeau, and U. M. Maurer, “Gener-
“Demystifying the information reconciliation protocol cascade,” 2014, alized privacy amplification,” IEEE Trans. Inf. Theory, vol. 41, no. 6,
arXiv:1407.3257, doi: 10.48550/arXiv.1407.3257. pp. 1915–1923, Nov. 1995, doi: 10.1109/18.476316.
[19] P. Moreira, J. Serrano, T. Wlostowski, P. Loschmidt, and G. Gaderer,
“White rabbit: Sub-nanosecond timing distribution over ethernet,” in Proc.
Int. Symp. Precis. Clock Synchronization Meas., Control Commun., 2009,
pp. 1–5, doi: 10.1109/ISPCS.2009.5340196.
[20] K. Neshatpour et al., “Energy-efficient acceleration of MapReduce appli-
cations using FPGAs,” J. Parallel Distrib. Comput., vol. 119, pp. 1–17, Natarajan Venkatachalam (Member, IEEE)
2018, doi: 10.1016/j.jpdc.2018.02.004. received the M.Sc. degree in applied mathemat-
[21] D. K. L. Oi et al., “CubeSat quantum communications mis- ics and Ph.D. degree in computational sciences
sion,” EPJ Quantum Technol., vol. 4, pp. 1–20, 2017, doi: from Anna University, Chennai, India, in 2010
10.1140/epjqt/s40507-017-0060-1. and 2015, respectively.
[22] A. K. Pradhan, A. Thangaraj, and A. Subramanian, “Construction of He is currently a Scientist with the Society
near-capacity protograph LDPC code sequences with block-error thresh- for Electronic Transactions and Security (SETS),
olds,” IEEE Trans. Commun., vol. 64, no. 1, pp. 27–37, Jan. 2016, Chennai, and has worked in the cybersecurity
doi: 10.1109/TCOMM.2015.2500234. area for nearly 15 years. Prior to joining the
[23] M. Qasaimeh, K. Denolf, J. Lo, K. Vissers, J. Zambreno, and P. H. Jones, SETS, he was a Postdoctoral Research Associate
“Comparing energy efficiency of CPU, GPU and FPGA implementations with Quantum Engineering Technology, Univer-
for vision kernels,” in Proc. IEEE Int. Conf. Embedded Softw. Syst., 2019, sity of Bristol, Bristol, U.K. His research interests include secure quantum
pp. 1–8, doi: 10.1109/ICESS.2019.8782524. communications and experimental quantum key distribution systems, in
[24] U. Sisodia, “Using high-level synthesis to migrate open source software particular fundamental, and applied postquantum cryptography.
algorithms to semiconductor chip designs,” in System Level Flows for SoC Dr. Venkatachalam is an Active Member of the Cryptology Research
Architecture Analysis and Design. Noida, India: CircuitSutra Technolo- Society of India and the Computer Society of India.
gies, 2020.
[25] A. Stanco et al., “Versatile and concurrent FPGA-based architecture for
practical quantum communication systems,” IEEE Trans. Quantum Eng.,
vol. 3, 2022, Art. no. 6000108, doi: 10.1109/TQE.2022.3143997.
[26] D. Stucki, N. Brunner, N. Gisin, V. Scarani, and H. Zbinden, “Fast and
simple one-way quantum key distribution,” Appl. Phys. Lett., vol. 87,
Foram P. Shingala received the M.E. degree in
no. 19, 2005, Art. no. 194108, doi: 10.1063/1.2126792. computer science and engineering from Madras
[27] A. Tanaka et al., “High-speed quantum key distribution system for 1-Mbps
Institute of Technology, Chennai, India, in 2017.
real-time key generation,” IEEE J. Quantum Electron., vol. 48, no. 4, She is currently a Scientist with the Society for
pp. 542–550, Apr. 2012, doi: 10.1109/JQE.2012.2187327. Electronic Transactions and Security, Chennai.
[28] N. Walenta et al., “A fast and versatile quantum key distribu- She has worked on the proof-of-concept demon-
tion system with hardware key distillation and wavelength multi- stration of coherent one-way quantum key distri-
plexing,” New J. Phys., vol. 16, no. 1, 2014, Art. no. 013047, bution (QKD) system with field-programmable-
doi: 10.1088/1367-2630/16/1/013047. gate-array-based classical reconciliation algo-
[29] W. Wang, K. Tamaki, and M. Curty, “Measurement-device-independent rithms for QKD. Her current research interests
quantum key distribution with leaky sources,” Sci. Rep., vol. 11, no. 1, include quantum cryptography and computing.
pp. 1–11, 2021, doi: 10.1038/s41598-021-81003-2.
[30] F. Xu, X. Ma, Q. Zhang, H.-K. Lo, and J.-W. Pan, “Secure quantum key
distribution with realistic devices,” Rev. Modern Phys., vol. 92, no. 2, 2020,
Art. no. 025002, doi: 10.1103/RevModPhys.92.025002. Selvagangai C received the M.E. degree in very
[31] S.-S. Yang, Z.-G. Lu, and Y.-M. Li, “High-speed post-processing in large scale integration design from the PSG Col-
continuous-variable quantum key distribution based on FPGA implemen- lege of Technology, Coimbatore, India, in 2018.
tation,” J. Lightw. Technol., vol. 38, no. 15, pp. 3935–3941, Aug. 2020, In 2019, she joined the Society for Electronic
doi: 10.1109/JLT.2020.2985408. Transactions and Security, Chennai, India, as a
[32] Z. Yuan et al., “10-Mb/s quantum key distribution,” J. Lightw. Research Fellow. Her current research interests
Technol., vol. 36, no. 16, pp. 3427–3433, Aug. 2018, doi: include classical reconciliation and key distilla-
10.1109/JLT.2018.2843136. tion algorithms for quantum key distribution and
[33] H.-F. Zhang et al., “Real-time QKD system based on FPGA,” J. field-programmable gate array programming.
Lightw. Technol., vol. 30, no. 20, pp. 3226–3234, Oct. 2012, doi:
10.1109/JLT.2012.2217394.
[34] J. Zhou, B. Liu, and B. Zhao, “A pipeline optimization model for QKD
post-processing system,” in Proc. Inf. Commun. Technol.-EurAsia Conf.,
2014, pp. 472–481, doi: 10.1007/978-3-642-55032-4_48. Hema Priya S received the B.Tech. degree in
[35] M. D. Zwagerman, “High level synthesis, a use case comparison with electronics and communication and the M.Tech.
hardware description language,” M.S. thesis, School of Eng., Grand Valley degree in very large scale integration design from
State Univ., Allendale Charter Township, MI, USA, 2015. Anna University, Chennai, India, in 2015 and
[36] C. H. Bennett, G. Brassard, and N. D. Mermin, “Quantum cryptography 2017, respectively.
without Bell’s theorem,” Phys. Rev. Lett., vol. 68, no. 5, pp. 557–559, She is currently a Project Scientist with the
1992, doi: 10.1103/PhysRevLett.68.557. Society for Electronic Transactions and Secu-
[37] C. H. Bennett and G. Brassard, “Quantum cryptography: Public key dis- rity, Chennai. Her research interests include field-
tribution and coin tossing,” in Proc. IEEE Int. Conf. Comput., Syst., Signal programmable gate array design and verification,
Process., 1984, pp. 175–179, doi: 10.1016/j.tcs.2014.05.025. quantum cryptography, and hardware security.
Dillibabu S received the B.E. degree in electron- Ravindra P. Singh is currently a Senior Profes-
ics and communication engineering from Anna sor and the Chair of Atomic, Molecular and Opti-
University, Chennai, India, in 2005. cal Physics Division, Physical Research Labora-
He is currently a Scientist with the Society for tory, Ahmedabad, India. His research interests in-
Electronic Transactions and Security, Chennai. clude light scattering, phase singularities of light,
He is also a Ph.D. Research Scholar with the nonlinear optics, quantum optics, and quantum
Worcester Polytechnic Institute, Worcester, MA, information. He is also engaged in experiments
USA. He has experience in the field of cryptog- on high dimensional entangled states, free space
raphy and hardware security for 16 years. His quantum communication, satellite-based quan-
research interests include differential power anal- tum key distribution, and quantum radar.
ysis from cryptography, side-channel analysis of
hardware cryptographic modules, and postquantum cryptography.