0% found this document useful (0 votes)
13 views

Electronics 09 01665 v2

This document presents a high-speed implementation of the AES-128 encryption-decryption algorithm designed for a short-range, high-frequency communication system called Wireless Connector, utilizing the Xilinx ZCU102 FPGA platform. The implementation achieves a maximum throughput exceeding 28 Gbit/s by employing a pipelined approach that allows simultaneous processing of multiple plaintext packets, with encryption and decryption times of only 10 clock periods at a frequency of 220 MHz. The architecture is efficient in its use of hardware resources, requiring 1631 Configurable Logic Blocks for encryption and 3464 for decryption.

Uploaded by

21522726
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Electronics 09 01665 v2

This document presents a high-speed implementation of the AES-128 encryption-decryption algorithm designed for a short-range, high-frequency communication system called Wireless Connector, utilizing the Xilinx ZCU102 FPGA platform. The implementation achieves a maximum throughput exceeding 28 Gbit/s by employing a pipelined approach that allows simultaneous processing of multiple plaintext packets, with encryption and decryption times of only 10 clock periods at a frequency of 220 MHz. The architecture is efficient in its use of hardware resources, requiring 1631 Configurable Logic Blocks for encryption and 3464 for decryption.

Uploaded by

21522726
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

electronics

Article
10 Clock-Periods Pipelined Implementation of
AES-128 Encryption-Decryption Algorithm up to
28 Gbit/s Real Throughput by Xilinx Zynq
UltraScale+ MPSoC ZCU102 Platform
Paolo Visconti 1, * , Stefano Capoccia 1 , Eugenio Venere 2 , Ramiro Velázquez 3 and
Roberto de Fazio 1
1 Department of Innovation Engineering, University of Salento, 73100 Lecce, Italy;
[email protected] (S.C.); [email protected] (R.d.F.)
2 Medinok s.p.a, Via Palazziello, 80040 Napoli, Italy; [email protected]
3 Facultad de Ingeniería, Universidad Panamericana, 20290 Aguascalientes, Mexico; [email protected]
* Correspondence: [email protected]; Tel.: +39-0832-29-7334

Received: 30 August 2020; Accepted: 10 October 2020; Published: 13 October 2020 

Abstract: The security of communication and computer systems is an increasingly important issue,
nowadays pervading all areas of human activity (e.g., credit cards, website encryption, medical
data, etc.). Furthermore, the development of high-speed and light-weight implementations of the
encryption algorithms is fundamental to improve and widespread their application in low-cost,
low-power and portable systems. In this scientific article, a high-speed implementation of the AES-128
algorithm is reported, developed for a short-range and high-frequency communication system, called
Wireless Connector; a Xilinx ZCU102 Field Programmable Gate Array (FPGA) platform represents
the core of this communication system since manages all the base-band operations, including the
encryption/decryption of the data packets. Specifically, a pipelined implementation of the Advanced
Encryption Standard (AES) algorithm has been developed, allowing simultaneous processing of
distinct rounds on multiple successive plaintext packets for each clock period and thus obtaining
higher data throughput. The proposed encryption system supports 220 MHz maximum operating
frequency, ensuring encryption and decryption times both equal to only 10 clock periods. Thanks to
the pipelined approach and optimized solutions for the Substitute Bytes operation, the proposed
implementation can process and provide the encrypted packets each clock period, thus obtaining a
maximum data throughput higher than 28 Gbit/s. Also, the simulation results demonstrate that the
proposed architecture is very efficient in using hardware resources, requiring only 1631 Configurable
Logic Blocks (CLBs) for the encryption block and 3464 CLBs for the decryption one.

Keywords: field programming gate array; encryption; advanced encryption standard; wireless
connector; 5G communication; experimental testing

1. Introduction
With the advances of information technology (IT) applications involving sensitive data, the
network security is an ever-current topic for business activities; therefore, the development of robust,
in computational terms light and efficient encryption algorithms is required for supporting the
continuous increase of data volume and throughput in IoT applications, as well as video streaming,
real-time communications, mobile transmissions and so forth [1–4]. The Advanced Encryption
Standard (AES) remains the preferred encryption standard for governments, banks and high-security
systems around the world. This last is the most widespread encryption algorithm, for instance,

Electronics 2020, 9, 1665; doi:10.3390/electronics9101665 www.mdpi.com/journal/electronics


Electronics 2020, 9, 1665 2 of 30

employed in Gigabit Ethernet, Worldwide Interoperability for Microwave Access (WiMAX) and 5G
systems [5–7]. Also, this algorithm can be efficiently implemented both in hardware and software
platforms; the software implementations of AES require lower resources but offer lower physical security.
On the other hand, the growing demand for high-speed, high-volume secure data transmissions,
ensuring at the same time the physical security, strongly requires the hardware implementation of the
AES algorithm [8–10].
The Field Programmable Gate Array (FPGA) represents the ideal platform for implementing the
ciphering algorithm to ensure the network security; however, the implementation of high throughput
encryption/decryption algorithm is a challenge, given the limited resources of the considered platform;
therefore, the development of resource-efficient AES encryptor/decryptor on the FPGA platform is
crucial, as described above. The (Inv)SubBytes (Sbox) phase significantly affects the performance
of the whole encryption algorithm; therefore several approaches are developed for reducing the
computational load and resource, such as Composite Field Arithmetic (CFA) and look-up table (LUT)
methods [1].
The communication networks are usually arranged into four sections, namely radio access network
(RAN), core network, transport network and interconnection network; concerning the network security,
the supported network planes (signal, user and management ones) are exposed to different threat
typologies. Specifically, the network threats are classified in passive network threats and active network
ones; the first ones are aimed to intercept the traffic sent on the network, whereas the second type
includes all the attacks in which the execution of commands is involved to disturb the communication.
The introduction of the 5G technology signs the beginning of new concepts also in terms of
network security, such as the International Mobile Subscriber Identity (IMSI) encryption while it is
transmitted through the network, protecting it from external attacks. Furthermore, the 5G enables
the extension of security mechanisms employed in the cellular networks to other wireless networks,
ensuring the protection of the smart devices and stored data, a concept called “home network control.”
In this contest, the standardization is fundamental in order to guarantee full compatibility with the
different networks present around the globe.
As well known, the most widespread encryption algorithms for 5G and in general, for high
throughput applications, are the AES, SNOW 3G and ZUC ones [11,12]; the AES algorithm is very
robust against multiple attacks, such as the brute-force, linear and differential ones. However, the AES
algorithm requires hardware acceleration methods to reduce the execution time in the downlink,
keeping low the area and energy requirements for mobile devices. On the other hand, the SNOW
3G, updated into a faster version called SNOW-V, complies with all the requirements of 5G, ensuring
security, performance and flexibility. It is a word-based synchronous stream cipher, developed by
Thomas Johnasson and Patrik Ekdahl at Lund University and works on 32-bit words and 128-bit
(or 256-bit) keys; the algorithm is based on the combination of Finite State Machine (FSM) and Linear
Feedback Shift Register (LFSR), where this last determines the next state of the FSM. The ZUC is a
stream cipher designed by the Data Assurance and Communication Security Research Center (DACAS)
of the Chinese Academy of Sciences. The cipher forms the core of the 3GPP mobile standards 128-EEA3
(for encryption) and 128-EIA3 (for message integrity). It was proposed for inclusion in the Long Term
Evolution (LTE) or the 4th generation of cellular wireless standards (4G). ZUC is a word-oriented
stream cipher, taking a 128-bit initial key and a 128-bit initial vector (IV) as inputs and providing
as output a keystream of 32-bit words (called keywords). This keystream can be used for both
encryption/decryption phases. The ZUC algorithm has better resistance compared to the SNOW 3G
one against specific attacks, such as guess and deterministic ones, also showing a good flexibility in
balancing high throughput and consumed area [13].
This research work proposes a high-throughput implementation of the AES-128 algorithm
properly designed for a custom, very short-range and high-frequency communication system, called
Wireless Connector; in particular, this system was thought for high-throughput data transmission
on the frequency range around the 60 GHz between two mobile stations located at short range
Electronics 2020, 9, 1665 3 of 30

distance (1–10 m). A demonstrator of the Wireless Connector has been realized employing an FPGA
platform for performing the main tasks related to the communication, the base-band elaborations,
coding/decoding and the encryption/decryption phases. For these aims, the potentialities of the
FPGA, in terms of high-performance, low cost and development time, as well as re-configurability,
have been exploited [14,15]. Due to their great flexibility and wide applicability, FPGA platforms
are used on a wide range of application fields, such as video and imaging processing, military
applications, automotive, electronics for specialized processing and more. They are particularly useful
for prototyping Application-Specific Integrated Circuits (ASICs) or processors.
As known, the AES-128 stands out for robustness, efficient occupation of the logical cells for
the execution of the various operations and for being relatively computationally light, making it the
optimal choice to satisfy all requests of the project [16–18]. For instance, the time required to attack
the AES-128 algorithm and then to recover the key, is extremely high (∝ 2126 operations); for the
US government, it is considered sufficient for documents classified as secret, whereas, for top-secret
documents, the AES-192 or AES-256 algorithm is required.
The development of the AES-128 algorithm has been carried out employing the Zynq
Ultrascale+MPSoC ZCU102 platform (manufactured by Xilinx, San Jose, CA, USA), based on
Zynq Ultrascale+ XCZU9EG-2FFVB1156E MPSoC (Multiprocessor System-on-Chip), which combines
a powerful Processing System (PS) and Programmable Logic (PL) Ultrascale architecture into a
single device. The proposed implementation of the AES-128 employs multiple elaborations of the
incoming data packets for complying with the 3 Gbit/s data rate constraint (with a maximum value
of 28 Gbit/s) required by the Wireless Connector application, thus imposing an upper limit on the
time interval between successive packets equal to 42.67 ns. The proposed AES-128 implementation
employs a pipelined approach in the round-based elaboration of the AES algorithm, consisting of
in "assembly-line"-type processing, in which a new plaintext data packet is acquired, as soon as
the simultaneous elaboration of the ten rounds on the previous data packets is completed (i.e., the
corresponding FPGA logic section is available to be used). In this way, the round’s processing on
successive plaintext packets is carried out simultaneously during each clock period, with better
exploitation of the allocated resources and so reaching higher data throughput.
Also, the developed AES implementation employs a Sbox containing 256 elements of 32 bits
each, instead of 8 bits of the standard implementation, thus obtaining the encrypted data packet in a
shorter time interval (only 10 clock periods) but requesting a greater area occupation on the FPGA.
As demonstrated below, the developed encryption system can support a maximum clock frequency of
220 MHz and is featured by an encryption time of only 10 clock periods; however, it is able to process
and provide the encrypted packets each clock period (namely, 4.54 ns = 220 1MHz ), thus obtaining a
128 bit/packet
maximum data throughput higher than 28 Gbit/s (i.e., 4.54 ns = 28.16 Gbit/s).
Besides, a fast algorithm for the key expansion has been implemented, based on the combination,
by GF operations, of the previous sub-key with the current sub-key modified by the Sbox; in this
way, the key expansion operation is completed in only 174.55 ns, obtaining the 44 words from the
main key. The results above described have been obtained also thanks to the latest-generation FPGA
platform (Xilinx ZCU102 board) used to implement the encryption/decryption system, featured by high
performances, large memory capabilities and a wide set of peripherals [19]. Furthermore, the developed
encryption/decryption block implements all the control signals required to synchronize its operation
with the data generator block and the modem, placed upstream and downstream from it, respectively.
Besides, several blocks have been implemented to test the operation of the encryption/decryption
block, by deterministically inserting an error inside a known-plaintext data packet and detecting the
error in the encrypted/decrypted packet, notified by a proper error signal. Also, a proper mechanism
to change the encryption key during the operation of the encryption system has been implemented,
resulting in just three packets lost during the replacement process, as described in Section 3.
The following text is organized as follows: the Section 2 includes a literature analysis about
high throughput implementation of encryption/decryption algorithms based on FPGA platforms;
Electronics 2020, 9, 1665 4 of 30

furthermore, the demonstrator of the Wireless Connector, based on the Xilinx ZCU102 platform,
is described, demonstrating its proper operation. In the Section 3, the VHDL (acronym of
VHSIC-Hardware Description Language, VHSIC means Very High Speed Integrated Circuits) blocks to
implement and test the encryption and decryption algorithms are carefully described; also, the results
related to the performances of the proposed custom AES-128 encryption/decryption algorithm are
presented, in terms of encryption/decryption time, resource utilization and complexity. In the Section 4,
the discussion on the obtained results are reported, also solving with clock routing issues; furthermore,
in this section, the tests of the combined encryption/decryption system, after the loading on the ZCU102
board, are presented. Finally, the comparison of proposed AES-128 implementation with other similar
works reported in the scientific literature is reported.

2. Materials and Methods

2.1. Fundamentals of the AES-128 Encryption/Decryption Algorithm


The AES algorithm is a block cipher at the bit level, like the Data Encryption Standard (DES), where
each block length is fixed to 128 bits, whereas the key length can be equal to 128, 192 or 256 bits [20].
Each 128-bit data block is partitioned into 16 bytes, mapped on a 4 × 4 array named state, and each
byte of the state corresponds to an element of the Galois Field (GF) with 28 cardinality. Based on the
key length, the algorithm includes n iterations, called rounds, where n is 10, 12 or 14 when the key
length is 128, 192 or 256 bits, respectively. Each round of the encryption process, except for the last one,
consists of four operations:

• Substitute Bytes
• Shift Rows
• Mix Columns
• Add Round Key

All the operations are carried out sequentially within each round, except for the initial Add Round
Key; in the last round, the Mix Columns operation is not performed (Figure 1).
The Substitute Bytes step is a non-linear transformation, where each byte in the state array is
replaced with the entry of a fixed 8-bit Substitution Box (Sbox) implemented as a lookup table with
28 words of 8 bits each, used to hide the relationship between
  the key and the cipher-text. The used
Sbox is derived from the multiplicative inverse over GF 28 , combined with an invertible geometric
transformation, to avoid attacks based on simple algebraic properties, obtaining a 16 × 16 bytes table
(Figure 2). The permutation is obtained addressing the Sbox locations based on the most significative
nibble and the less significative one of the 8-bit input data.
The Shift Rows step operates on the rows of the state array, circularly shifting the bytes in each
row by a given offset. The first row is left unchanged, whereas each byte of the second row is shifted
one position to the left; likewise, the third and fourth rows are shifted respectively by two and three
bytes to the left. The Mix Columns step is a linear transformation
  mixing the column of the state
array; each column, treated as a polynomial over GF 2 , is multiplied, modulo z4 + 1, with a fixed
8

polynomial (i.e., c(z) = 03z3 + 01z2 + 01z + 02). Both the Shift Rows and the Mix Columns operations
are needed to hide the relationship between the cipher-text and the plain text.
Electronics 2020, 9, x FOR PEER REVIEW 5 of 30
Electronics 2020, 9, 1665 5 of 30
Electronics 2020, 9, x FOR PEER REVIEW 5 of 30

Figure 1. Schematic representation of the Advanced Encryption Standard (AES) encryption


algorithm.
Figure 1. Schematic
Figure representation
1. Schematic of theofAdvanced
representation Encryption
the Advanced Standard
Encryption (AES) encryption
Standard algorithm.
(AES) encryption
algorithm.

Figure 2. Sbox involved in the Substitute Bytes transformation.


Figure 2. Sbox involved in the Substitute Bytes transformation.
In the Add Round Key step, the key is combined with the state array to make the cipher safer;
In the Add Round Figure
Key2.step,
Sboxthe
involved
key isincombined
the Substitute
withBytes transformation.
the state array to make the cipher safer;
for each round, a subkey is derived from the expansion of the main key, obtaining, at the end of the
for each round, a subkey is derived from the expansion of the main key, obtaining, at the end of the
encryption process,
In the
encryption Add
process,
an expanded
Round Key step,
an expanded
key
theof of
176176
keykey isbytes, arranged
combined ininastate
with the
bytes, arranged
linear array
array
a linear
ofof44
to make
array
words
44the
(using
cipher
words
aa key
safer;
(using
length
for each round, After
a the
subkey initialization
is derived fromof the
the word array
expansion of (W
the [ i ]
mainfor 0 ≤
key, i ≤ 43 )
key length of 128-bit). After the initialization of the word array (𝑊[𝑖] for 0 ≤ 𝑖 ≤ 43), by inserting in
of 128-bit). ,
obtaining, by inserting
at the end the
of key
the
encryption
the first
the key process,
fourinwords,
the firsttheanother
four expanded
ones
words, key
theare of ones
176 bytes,
obtained
other using
are arranged in athe
the following
obtained using linear array relation:
relation:
following of 44 words (using a
key length of 128-bit). After the initialization of the word array (𝑊[𝑖] for 0 ≤ 𝑖 ≤ 43), by inserting
𝑊[𝑖] =
= 𝑊[𝑖
W [i]ones
the key in the first four words, the other [i−−obtained
Ware 1] ⊕ 𝑊[𝑖
1]⊕W −−44].
]. the following relation:
[i using (1) (1)
A particular exception is made 𝑊[𝑖]for the
= words
𝑊[𝑖 − 1]with index
⊕ 𝑊[𝑖 − multiple
4]. of four, for which, non-linear
(1)
A particular different
relationships, exception is made
from for XOR
bit-to-bit the words with index multiple of four, for which, non-linear
are used:
A particular
relationships, exception
different is made for
from bit-to-bit XORtheare
words
used:with index multiple of four, for which, non-linear
relationships, different from bit-to-bit XOR are used:− 1])) ⊕ 𝑅𝑐𝑜𝑛[𝑖/4],
𝑆𝑢𝑏𝑤𝑜𝑟𝑑(𝑅𝑜𝑡𝑤𝑜𝑟𝑑(𝑤[𝑖 (2)

Subword (Rotword(w[i −−11]))


𝑆𝑢𝑏𝑤𝑜𝑟𝑑(𝑅𝑜𝑡𝑤𝑜𝑟𝑑(𝑤[𝑖 ])) ⊕⊕Rcon [i/4],
𝑅𝑐𝑜𝑛[𝑖/4], (2) (2)

where the Subword sub-function replaces each byte of the word, provided as argument of the
function, using the Sbox, whereas the Rotword sub-function simply shifts one byte to left; furthermore,
Electronics 2020, 9, 1665 6 of 30
Electronics 2020, 9, x FOR PEER REVIEW 6 of 30
h i
thewhere
functiontheRcon[i]
Subword is asub-function
round constant, represented
replaces each byteby ofthe
theword
word,array xi−1 , as
provided {00}, {00}, {00}of, where
argument the
i−1
x function,
is the (i −using the Sbox, whereas
1)-th exponentiation the Rotword
operator 8
in GF(2 ).sub-function simply shifts one byte to left;
furthermore,
The encryption the and
function Rcon[i]
decryption is a round
procedures employ constant, represented
two different by the
algorithms; word array
nevertheless, each
𝑖−1 {00}, {00}, {00}], 𝑖−1
[𝑥 , where 𝑥 is the (i − 1)-th exponentiation operator in GF(2 8).
operation in the encryption process corresponds to an inverse equivalent one in the decryption process.
However, The encryption
both and decryption
are arranged in 10 rounds procedures
and both employ
performtwothe
different algorithms;
Add Round nevertheless,
Key step in the same each
way.
operation in the encryption process corresponds to an inverse
Thus, each round of the decryption process consists of the following operations: equivalent one in the decryption
process. However, both are arranged in 10 rounds and both perform the Add Round Key step in the
• Inverse
same way.Sub Bytes
Thus, each round of the decryption process consists of the following operations:
• Inverse Shift Rows
 Inverse Sub Bytes
•  Inverse Mix
Inverse Columns
Shift Rows
•  Add Round Key
Inverse Mix Columns
 Add Round Key
A further difference between the encryption and decryption processes is the order of functions
performed withindifference
A further a single round;
betweeninthe theencryption
decryption andprocess, the processes
decryption first step is the
the order
Inverse Shift Rows,
of functions
performed within a single round; in the decryption process, the first step is
followed by Inverse Sub Bytes, Add Round Key and finally Inverse Mix Columns. In particular, the Inverse Shift Rows,the
followed
Inverse ShiftbyRows
Inverse Sub Bytes,
cyclically Add
shift to Round
the rightKeybyand
thefinally
sameInverse Mix
offset of theColumns.
Shift Rows In particular,
step but inthethe
Inversedirection.
opposite Shift Rows cyclically shift to the right by the same offset of the Shift Rows step but in the
opposite direction.
To invert the Mix Column operation, the Inverse Mix Columns step employs the corresponding
To invert
inverse matrix. The the Mix Column
4-byte operation,
columns of thethe Inverse
state arrayMixareColumns step employs
multiplied the corresponding
for the inverse 4 × 4 matrix
inverse matrix. The 4-byte columns of the state array are multiplied for the inverse 4 × 4 matrix
featured by constant entries for producing the output bytes; all operations involved in the matrix
featured by constant entries for producing
  the output bytes; all operations involved in the matrix
multiplication are performed in GF 28 8 or equivalently by multiplying each column, modulo z44 + 1,
multiplication are performed in GF(2 ) or equivalently by multiplying each column, modulo 𝑧 +
+ 0E, where b(z) = c(z)−1 mod z4 + 1 and c(z) is the
 
with a fixed polynomial b(z) = 0Bz3 + 0Dz2 + 09z
1 , with a fixed polynomial 𝑏(𝑧) = 0𝐵𝑧 3 + 0𝐷𝑧 2 + 09𝑧 + 0𝐸 , where 𝑏(𝑧) = 𝑐(𝑧)−1 𝑚𝑜𝑑(𝑧 4 +
polynomial used
1) 𝑎𝑛𝑑 𝑐(𝑧) in the
is the Mix Columns
polynomial used step
in theofMix
theColumns
encryption.
step of the encryption.
TheThe
Inverse Substitute Bytes function is carried out
Inverse Substitute Bytes function is carried out similarlysimilarlyto
tothe
theSubstitute
Substitute Bytes
Bytes butbut using
using a a
different Sbox (Figure 3), obtained
different Sbox (Figure 3), obtained  applying the inverse affine transformation to the Sbox followed by
applying the inverse affine transformation to the Sbox followed
the by
multiplicative inverse 8 . 8
in GFin2GF(2
the multiplicative inverse ).

Figure 3. 3.Sbox
Figure Sboxinvolved
involvedin
inInverse
Inverse substitute bytestransformation.
substitute bytes transformation.

Since thethe
Since encryption
encryptionandanddecryption
decryption operations
operations areare not
not the
the same,
same,aasignificant
significantdisadvantage
disadvantage
from thethe
from implementation
implementationpoint pointofofview
viewisisobtained; however, there
obtained; however, thereisisananequivalent
equivalent version
version of of
thethe
decryption algorithm, which involves the inverse functions in the same order
decryption algorithm, which involves the inverse functions in the same order as the encryption as the encryption
algorithm.
algorithm.In particular, since
In particular, thethe
since Inverse
InverseShift Rows
Shift step
Rows changes
step changes thethe
sequence
sequence of of
thethe
bytes of of
bytes thethe
state
array, leaving
state array, the content
leaving unchanged,
the content whereas
unchanged, the Inverse
whereas SubBytes
the Inverse step changes
SubBytes the content
step changes but not
the content
the but not theofsequence
sequence of their
the bytes, the bytes,
ordertheir
has order has not relevance
not relevance anymore,anymore,
thus it canthusbe itexchanged.
can be exchanged.
Moreover,
Moreover, the Add Round Key and Inverse Mix Columns transformations,
the Add Round Key and Inverse Mix Columns transformations, considering the key as a sequence considering the key as a
sequence
of words, of words,
both operateboth operate
on the stateonarray,
the state
columnarray,
bycolumn
column;by therefore,
column; therefore,
the Inversethe Inverse Mix
Mix Columns
Columns
operation canoperation cantobethe
be applied applied
phasetokeytheWphase key 𝑊[𝑖],
[i], before addingbefore
it toadding it to the
the current current
state array,state array,
so obtaining
the data packet decryption with the same sequence of operations of the encryption algorithm.
Electronics 2020, 9, x FOR PEER REVIEW 7 of 30

so obtaining the data packet decryption with the same sequence of operations of the encryption
Electronics 2020, 9, 1665 7 of 30
algorithm.

2.2.Implementation
2.2. ImplementationofofEncrypting/Decrypting
Encrypting/DecryptingAlgorithms
AlgorithmsWith
WithFPGA
FPGAPlatforms
Platforms.

InInReference
Reference [21],
[21], the the authors
authors proposed
proposed an inexpensive,
an inexpensive, low arealow andarea
high and high throughput
throughput hardware
implementation of the Advanced Encryption Standard algorithm (AES) for low-cost for
hardware implementation of the Advanced Encryption Standard algorithm (AES) low-cost
embedded
embedded applications,
applications, using a 128-bit usingkeya for
128-bit
bothkey for both and
encryption encryption and decryption,
decryption, employing
employing parallel parallel
operation
operation in the folded architecture. The hardware selected for the implementation
in the folded architecture. The hardware selected for the implementation is the Virtex-6 XC6VLX75T is the Virtex-6
XC6VLX75T
FPGA device. In FPGA device.
the folded In the folded
architecture, architecture,
the 128-bit blocks oftheinput
128-bit
datablocks of input
are divided intodata
four are divided
sub-blocks
ofinto four
32-bit sub-blocks
each and all the ofoperations
32-bit each areand all the operations
performed sequentially. areDue
performed sequentially.
to the inefficiency of thisDue to the
method,
inefficiency of this method, along with the folded architecture, parallel
along with the folded architecture, parallel computing is required to speed-up the algorithm execution. computing is required to
The experimental results reveal that the algorithm can achieve a 37.1 Gbit/s data throughput with aa
speed-up the algorithm execution. The experimental results reveal that the algorithm can achieve
37.1 Gbit/sclock
maximum datafrequency
throughput of with
505.5aMHz.
maximum clock frequency of 505.5 MHz.
C. Guzmá n et al. proposed
C. Guzmán et al. proposed a hardware a hardwareimplementation
implementationof ofthe
theAESAES128-bit
128-bitalgorithm
algorithmwith withaa
pipelinedarchitecture,
pipelined architecture, working
working on twoonnon-feedback
two non-feedback modes ofmodes of operation,
operation, namely Encoded namely Encoded
Code-Book
Code-Book (ECB) and Counter (CTR), using a Xilinx Virtex 5 FPGA
(ECB) and Counter (CTR), using a Xilinx Virtex 5 FPGA platform [22]. They compared the two platform [22]. They compared
the two operation
operation modalitiesmodalities
in terms of inresource
terms ofutilization,
resource utilization,
throughput throughput and robustness.
and robustness. The resultsThe results
revealed
revealed
that the CTR thatmode
the CTR mode
is more is more convenient
convenient than ECB one thaninECB termsoneofin terms
level of level
security and security and area
area efficiency.
The proposed architecture reaches a clock frequency of 272.59 MHz corresponding to a throughputto
efficiency. The proposed architecture reaches a clock frequency of 272.59 MHz corresponding ofa
throughput
34.89 Gbit/s. of 34.89 Gbit/s.
AAtriple
triplekeykeyAESAESalgorithm
algorithmisispresented
presentedininref. ref.[23];
[23];such
suchaaframework
frameworkrequires
requires128 128bits
bitsplaintext
plaintext
input and 3 keys for combining the ciphertext. These lasts are combined
input and 3 keys for combining the ciphertext. These lasts are combined by a common xor function by a common xor function
and,the
and, theresulting
resultingkey keyisisprovided
providedalong alongwith
withthetheplaintext
plaintexttotoan an"add
"addround
roundkey"key"block,
block,where
wheretheythey
are combined by xor function; afterward, the obtained data block and the combined key are sent inin
are combined by xor function; afterward, the obtained data block and the combined key are sent
inputtotoaa128
input 128AESAESencryption
encryptionblock blockfor
forobtaining
obtainingthe thecipher
cipherdata
data(Figure
(Figure4).4).The
Theproposed
proposedalgorithm
algorithm
wasoptimized,
was optimized, thus
thus obtaining
obtaining 867.34
867.34 Mbit/s
Mbit/s maximum
maximum throughput,
throughput, withwith a resource
a resource utilization
utilization of 3402of
3402 Configurable Logic Blocks (CLBs), 27,787 LUTs and 385 Input/Output
Configurable Logic Blocks (CLBs), 27,787 LUTs and 385 Input/Output Blocks (IOBs). By comparing the Blocks (IOBs). By
proposed algorithm with other Xilinx devices, a 15% increase in throughput and a lower processinga
comparing the proposed algorithm with other Xilinx devices, a 15% increase in throughput and
lowerwere
delay processing
obtained. delay were obtained.

(a) (b)
Figure4. 4.
Figure AES AES encryption
encryption round
round operations
operations (a); process
(a); process flow of flow of the triple-keys
the triple-keys AES proposed
AES algorithm algorithm
inproposed in ref. [23] (b).
ref. [23] (b).

In ref. [24], A. Gopalan et al. described the development and implementation of an AES algorithm
on a FPGA platform (Xilinx XC6SLX16), comparing the designed infrastructure with a correspondent
software implementation. The developed encryption block required 10 clock cycles for processing each
Electronics 2020, 9, 1665 8 of 30

data block, thus corresponding to 100 ns processing time since the clock frequency is equal to 100 MHz.
Instead, the decryption block took 11 cycles (i.e., 110 ns), since an initial overhead is required. The two
main figures of merit used to evaluate the algorithm performance are the throughput (in Equation (3))
and latency (in Equation (4)):
128 ∗ fclk
T= (3)
block_per_cycle
10 ∗ stages_per_round
Lat = , (4)
fclk
where block_per_cycle is one for a fully-unrolled architecture, whereas it becomes greater than one if the
round output is re-used to elaborate the single input; stages_per_round is the number of clock cycles to
process a single round.
C. P. Fan et al. described a high-speed 128-bit AES encryption module both in sequential and
fully pipelined architectures, including a Content Addressable Memory (CAM)-based architecture
used to realize pipelined high-speed SubBytes and InvSubBytes blocks, a hardware-sharing solution
to carry out a high-speed MixColumns operation and a real-time key generation scheme to realize
the AddRoundKey block [25]. The latter generates 128-bit keys for both encryption and decryption
processes from the encryption key segmented into four 32-bit blocks and stored into different registers.
The last register output (named d register) is dispatched to ROT (shift of bytes), S-box and RCON
(XOR operation) blocks. The SubBytes and InvSubBytes operations were implemented by applying a
CAM-based architecture, providing as output the data lines value obtained from several register arrays
placed both between and inside the AES round computations, according to the matched address line
data. Also, the MixColumns and the InvMixColumns operations were carried out by means of the
row mapping permutations, based on two corresponding polynomials matrix. To reduce the resource
utilization, the InvMixColumns polynomials matrix was decomposed into three different matrices for
highlighting the hardware sharing of the two operations; in this way, a high-speed shared circuit for
implementing the two transformations was derived. The AES module, implemented on the Xilinx
XC2V3000-6 FPGA platform, reached in the sequential architecture a data throughput value up to
0.876 Gbit/s with a clock frequency of 75.3 MHz, both in the encryption and decryption phases. Instead,
the proposed fully pipelined AES architecture obtained 28.4 Gbit/s throughput with an operating
frequency of 222.2 MHz in the encryption phase.
In ref. [26], the authors proposed high-performance hardware implementation of the Data
Encryption Standard (DES) encryption algorithm, with a 16-stage pipelined architecture, operating
in CTR mode, on a Xilinx Virtex XCV1000-4 BG560 FPGA platform. In the proposed architecture, an
initial delay of 16 clock cycles is required to instantiate the functional block, where are included the key
expansion function, the Sbox function and the Pbox function; then, at each clock cycle, fixed-length
clusters of data are loaded into this block along with different keys, so allowing the use of multiple keys,
one for each of the 16 rounds of the DES algorithm. The major contribution was a parameterizable key
scheduling method, where the sub-keys are pre-computed and distributed to the functional blocks of
each round; furthermore, a skew core controls the availability overtime of the sub-keys to the different
function blocks, delaying their generation by the needed time amount. The results showed that the
proposed architecture achieves an encryption rate of 3.87 Gbit/s, guaranteeing a low area utilization
with only 6446 CLB slices used.
P. Chodowiec et al. proposed a compact implementation of the 128-bit AES algorithm on the
inexpensive Xilinx II XC2S30 FPGA, using a folded architecture and achieving good performance and
low area utilization [27]. The folded architecture described is the same reported in ref. [21], nevertheless,
the authors have introduced a new approach for implementing MixColumns and InvMixColumns
functions using shared logic resources. This architecture requires only 222 CLB slices and 3 blocks of
RAM, supporting a maximum throughput of 166 Mbit/s.
In ref. [28], the authors developed and evaluated hardware implementations, based on various
FPGA devices, of the DES encryption algorithm, introducing several pipelined architectures that stand
Electronics 2020, 9, 1665 9 of 30

out for power consumption, resource utilization and throughput; the most significant ones are an
8-stages pipelined architecture and a 37-stages pipelined architecture. In the first one, two rounds
at a time are collapsed into one stage and the output is saved into two intermediate registers of the
next stage, up to a total of 8 stages; instead, in the second proposed solution, the authors developed a
37-stages pipelined DES architecture, previously reported in ref. [29] but optimized by reducing the
utilization of resources by joining the logical operations by means of a processing block with 4 inputs
and 1 output. The second architecture was improved by removing the redundant E (Expansion) and
R (Reduction) boxes from the original design. With such modifications, the authors were able to
increase the throughput by a 1.1 factor compared to the original design, reaching 40 Gbit/s using a
Kintex7 platform. Regarding the proposed 8-stage pipelined implementation, a significant reduction of
resources utilization (of a 0.75 factor) and power consumption (of a 0.65 factor) compared to a similar
16-stages pipelined design was demonstrated.
As above described, a LUT-based solution has been used to implement the Sbox in the proposed
AES algorithm, which is not an optimal solution for area-limited hardware but offers better performances
in terms of data throughput compared to other solutions, for instance, based on combinatorial logic,
aimed to minimize the resource utilization, as demonstrated in Reference [30]. In this context,
in Reference [31], the author proposed an overview of the different strategies to implement compact
Sbox function, based on both polynomial and normal bases. Furthermore, they introduced a compact
Sbox implementation based on a multi-level representation of GF operations, obtained properly
selecting a particular basis (isomorphism) and making appropriate improvements to the circuital
solution. The proposed solution has demonstrated improvements of 20% compared to the most compact
Sbox implementation reported in Reference [32]. Besides, T. Good et al. proposed two new FPGA
implementations of the AES algorithm [33]; the first one, implemented on Xilinx Spartan-III (XC3S2000)
FPGA, relies on fully parallel loop unrolled architecture, reaching a 25 Gbit/s data throughput value.
The latter, implemented on Spartan-II (XC2S15) FPGA, is based on state data and LUTs to carry
out the AES operations, such as Substitute Bytes and Mix Columns, combined into a single matrix,
called “T-box”; this implementation is featured by low area utilization, achieving 2.2 Mbit/s maximum
throughput. The Sbox implementations proposed in these works can be applied to our solution to
significantly reduce the used hardware resources but probably reducing the maximum throughput,
representing the main prerogative of the Wireless Connector communication system.

2.3. Description of the “Wireless Connector” System’s Demonstrator and Relative Communication Tests
In this sub-section, the preliminary demonstrator of the Wireless Connector system is presented,
which includes two PEM003 RF radio modules (manufactured by Pasternack, Irvine, CA, highlighted
with red box), interfaced with the base-band hardware consisting of Zynq Ultrascale+ MPSoC ZCU102
platform (manufactured by Xilinx, San Jose, CA, USA, highlighted by the yellow box), an ADFMCDAQ2
acquisition board (manufactured by Analog Device, Norwood, MA, USA, highlighted by the purple
box), four power splitter combiners (model ZFSCJ-2-4+, manufactured by Mini-Circuits, Brooklyn,
NY, USA, highlighted by the green box), four pre-DAC anti-alias low-pass filters (model VLFX-300,
manufactured by Mini-Circuits, highlighted by the orange box) and a personal computer for the system
management (highlighted by the blue box).
The ADFMCDAQ2 acquisition board includes a dual-ADC (model AD9680, manufactured by
Analog Device) featured by 14-bit resolution 1.0 Gsps sample rate and with JESD204B interface and also
a 16-bit resolution quad-DAC (model AD9144, manufactured by Analog Device), featured by 2.8 Gsps
sample rate and JESD204B interface; furthermore, a clock generator is placed onboard by employing a
14-outputs AD9523-1 low jitter IC (manufactured by Analog Device), along with components for the
power management.
The Pasternack’s PEM003 development kit consists of a transmitter (Tx) and a receiver (Rx)
module, operating at a frequency band around 60 GHz, supporting complex modulations through
a pair of modulation signals I and Q. Each module is equipped with a USB interface, for setting the
Electronics 2020, 9,
Electronics2020, 9, 1665
x FOR PEER REVIEW 1010
ofof
3030

pair of modulation signals I and Q. Each module is equipped with a USB interface, for setting the
main
main parameters
parameters by by connecting
connecting it it to
to aa PC
PC but
but also
alsoensuring
ensuringits itspower
powersupply.
supply.TheThebaseband
basebandI andI andQQ
signals are applied to the input of the Tx module or available at the output
signals are applied to the input of the Tx module or available at the output of the Rx module,of the Rx module, through
the Microthe
through Coaxial
MicroConnector (MCX) placed
Coaxial Connector (MCX)onplaced
the back of each
on the backboard;
of eachthe obtained
board; signals are
the obtained in the
signals
differential format (i.e., I+ and I−, Q+ and Q−). The 60 GHz section terminates
are in the differential format (i.e. I+ and I−, Q+ and Q−). The 60 GHz section terminates with two Txwith two Tx and Rx
antennas connected to the UG-385/U flange which acts as an interface
and Rx antennas connected to the UG-385/U flange which acts as an interface with the WR-15 with the WR-15 waveguide.
A reference design
waveguide. baseddesign
A reference on an embedded
based on anmicroprocessor system (uBLaze
embedded microprocessor Xilinx)
system has been
(uBLaze used
Xilinx) hasto
characterize the ADC/DAC devices. By using the internal logic resources
been used to characterize the ADC/DAC devices. By using the internal logic resources of the FPGAof the FPGA device, the
embedded
device, the MicroBlaze processor isprocessor
embedded MicroBlaze generatedisby employing
generated the Vivado/SDK
by employing design tool.design
the Vivado/SDK The drivers
tool.
for
Thethe management
drivers of the Ethernet
for the management protocol,
of the a UART
Ethernet interface
protocol, a UART for the information
interface for the exchange
information and
system management and an external DDR memory for the management
exchange and system management and an external DDR memory for the management of user data of user data are directly
connected
are directlytoconnected
the MicroBlaze
to the processor.
MicroBlaze processor.
The
The reduced
reduced performance of the the acquisition
acquisitionboard
boardandandthethelogical
logicalresources
resourcesofofthe
theFPGA
FPGAdevicedevice
allow the installation
allow the installation of the Quadrature Phase-Shift Keying (QPSK) modulator-demodulator
Phase-Shift Keying (QPSK) modulator-demodulator with with
low-performance Forward Error
low-performance Forward Error Correction
Correction (FEC)
(FEC) modules
modules(e.g. (e.g.Reed-Solomon).
Reed-Solomon). In Figure
In Figure 5, a 5,
afunctional
functionalscheme
schemeofofthe thesystem
systemdescribed
describedabove
aboveis isprovided,
provided,whereas
whereasininFigure
Figure6 6the
therealized
realized
experimentalsetup
experimental setupto
toperform
performthe the60 60GHz
GHzcommunication
communicationtests testsisisshown,
shown,supporting
supportingaadata-rate
data-rateup upto
3toGbit/s,
3 Gbit/s, constraint
constraint previously
previously defined
defined forfor
thethe whole
whole 5G5G communication
communication system.
system.

UG-385/U flanges

Received I(t) and Q(t) signals


MCX connector

Ethernet interface Transmitted I(t) and Q(t) signals

Figure 5.
Figure 5. Demonstrator
Demonstrator of the
the “Wireless
“Wireless Connector”
Connector”system
systembased
basedon
onthe
theRF-PEM003,
RF-PEM003,ZCU102
ZCU102and
and
FMCDAQ2 components.
FMCDAQ2
Electronics 2020, 9, 1665 11 of 30
Electronics
Electronics2020,
2020,9,9,xxFOR
FORPEER
PEERREVIEW
REVIEW 1111ofof3030

Figure6.
Figure 6.6. Picture
Picture of
of the
the experimental
experimental setup
experimental setup using the Zynq
using the Zynq Ultrascale+MPSoC
Ultrascale+MPSoC ZCU102
Zynq Ultrascale+MPSoC ZCU102platform
platform
basebandsystem
baseband systeminterconnected
interconnectedto to the
the PEM003 radio system,
PEM003 radio operating in
system, operating inQPSK
QPSKmodulation.
QPSK modulation.
modulation.

The
TheQPSK
The QPSKmodulation
QPSK modulationis
modulation is carried
is carried out
carried out by
out by generating
generating the I(t)
the I(t) and
I(t) and Q(t)
and Q(t) coefficients,to
Q(t)coefficients,
coefficients, to be
tobe sent
senttoto
besent the
tothe
the
two
two quadrature
quadrature mixers, where
mixers, they
where are mixed
they are with
mixedthe carrier
with signal
the (I(t))
carrier and the
signal
two quadrature mixers, where they are mixed with the carrier signal (I(t)) and the latter latter
(I(t)) phase-shifted
and the by
latter
90 ◦ (Q(t)), respectively, both respectively,
produced by both the modem block. A representation
phase-shifted
phase-shifted by90°
by 90°(Q(t)),
(Q(t)), respectively, both produced
produced by
by the
the modem
modemblock.block.of
Athe
A modulated signals
representation
representation ofofthe
the
can be made
modulated on the
signals complex
can be madeplane,
on obtaining
the four
complex symbols
plane, which
obtaining constitute
four symbols the
modulated signals can be made on the complex plane, obtaining four symbols which constitute the QPSK
which constellation
constitute the
QPSK
(as
QPSK shownconstellation (asshown
in Figure (as
constellation 7). shown in in Figure
Figure 7).
7).

Figure 7. Graphic representation of the QPSK constellation related to the received signals, using the
Figure Graphicsoftware
Analog7.Devices representation
tool IIOof the QPSK constellation related to the received signals, using the
Oscilloscope.
software tool
Analog Devices software tool IIO
IIO Oscilloscope.
Oscilloscope.
The QPSK demodulation is based on the principle of coherent demodulation, which requires an
The QPSK
QPSK demodulation
demodulationisisbasedbasedon onthe
theprinciple
principle ofof coherent
coherent demodulation,
demodulation, which
which requires
requires an
appropriate reconstruction of the base-band symbols. The ADC component included on the
an appropriate
appropriate reconstruction
reconstruction ofthethe base-bandsymbols.
symbols. TheADC ADCcomponent
component included
included on on the
ADFMCDAQ2 acquisition ofboard base-band
provides the samples The of the received I(t) and Q(t) signals,
ADFMCDAQ2
ADFMCDAQ2 acquisition
acquisitionboard
boardprovides the samples
provides the of the received
samples of the I(t) and Q(t)
received signals,
I(t) and subsequently
Q(t) signals,
subsequently processed by First-in First-Out (FIFO) systems for modifying the data-flow on 64-bit
processed byaFirst-in
subsequently
registers at processed
500 MHz First-Out
by (FIFO)
First-in
frequency. systems
First-Out
These for
aremodifying
(FIFO)
data systems
sent to afor the data-flow
modifying
threshold on 64-bit
the registers
data-flow
decision-maker at a
on 64-bit
block for
500 MHz atfrequency.
registers a 500
reconstructing theMHz These
symbols data
on theare
frequency. sent to
These
receiver a threshold
data
side. are
Thesent decision-maker
to a threshold
constellation blocksymbols
for reconstructing
decision-maker
of received block 7),
(Figure for
the symbols
reconstructingon the
the receiver
symbols side.
on the The constellation
receiver side. Theof received
constellation symbols
of (Figure
received
with 500 Ms/s symbol rate corresponding to 1 Gbit/s (but extendable up to 3 Gbit/s, as previously 7), with
symbols 500
(FigureMs/s
7),
symbol
with rate corresponding
500 Ms/s
reported), has symbol to 1 Gbit/s
rate corresponding
been displayed (but extendable
to 1 Gbit/s
through a software up to 3 Gbit/s,
(but extendable
application as previously
up to 3 IIO
(Analog devices reported),
Gbit/s, has been
as previously
Oscilloscope) for
reported),
extrapolating the transmitted symbol, processed by FPGA and sent to the PC via the UART port. for
has been displayed through a software application (Analog devices IIO Oscilloscope)
extrapolating the transmitted symbol, processed by FPGA and sent to the PC via the UART port.
Electronics 2020, 9, 1665 12 of 30

displayed through a software application (Analog devices IIO Oscilloscope) for extrapolating the
transmitted symbol,
Electronics 2020, 9, x FOR processed by FPGA and sent to the PC via the UART port.
PEER REVIEW 12 of 30

3.3. Results
Results

3.1.
3.1. Description
Description of
of the
the VHDL Blocks Implemented
VHDL Blocks Implemented for
for the
theAES
AESEncryption/Decryption
Encryption/DecryptionAlgorithm
Algorithm
The
The VDHL
VDHL block
block developed
developed for implementing the encryption
implementing the encryptionalgorithm
algorithmisisshown
shownininFigure
Figure8 8
(red box); it accepts the plaintext
(red box); it accepts the plaintext in input and provides the ciphertext in output, both arranged into
provides the ciphertext in output, both arranged into
128-bit
128-bitblocks,
blocks,via
viathe
theAXI
AXI Stream
Stream bus.
bus.

Figure 8. Complete
Figure8. Complete scheme containing all
all the
the blocks
blocks implemented,
implemented,both
bothfor
forthe
theencryption
encryptionalgorithm
algorithm
and for testing it.
and for testing it.

The
Thesource
source files
files implementing
implementing the encryption block
the encryption blockareareshown
shownin inFigure
Figure9;9;the
thefirst
firstfour
fourfiles
filesare,
are,
AES_AXIS_KEY_v1, AES_AXIS_KEY_v1_0_S00_AXIS_inst, AES_AXIS_KEY_v1_0_S01_AXI_inst
AES_AXIS_KEY_v1, AES_AXIS_KEY_v1_0_S00_AXIS_inst, AES_AXIS_KEY_v1_0_S01_AXI_inst and
AES_AXIS_KEY_v1_0_M00_AXIS_inst,
and AES_AXIS_KEY_v1_0_M00_AXIS_inst, related to the to
related implementation
the implementation of the communication
of the communication between
blocks, through AXI Bus Stream and AXI Lite; instead, the last two files contain the
between blocks, through AXI Bus Stream and AXI Lite; instead, the last two files contain the code forcode for performing
the encryption algorithm,
performing the namely aes_encoding_block
encryption algorithm, andnamelycipher_key_expansion_block.
aes_encoding_block The portion
and
of the firmware, contained in cipher_key_expansion_block,
cipher_key_expansion_block. The portion of dealing
the with the expansion
firmware, of the keyinis
contained
shown in Figure 10, which performs
cipher_key_expansion_block, dealingthe necessary
with operations
the expansion to obtain
of the key is the 44 words
shown to make
in Figure up the
10, which
10 sub-keys,
performs theused duringoperations
necessary the encryption rounds.
to obtain theAlso, the start
44 words of theup
to make keythe
expansion routine,
10 sub-keys, usedwhenever
during
athe
new key is validated
encryption rounds.by the the
Also, processor,
start of has beenexpansion
the key implemented using
routine, the expansion_key_start
whenever a new key is validated signal;
by the processor, has been implemented using the expansion_key_start
in order to obtain all the 44 words of the expanded key, only 174.5 ns are required. signal; in order to obtain all
the 44
Aswords
can beof the the
seen, expanded
44 words,key, only 174.5 the
constituting ns are required.
sub-keys, are obtained by carrying out xor operations
As the
between can 32-bit
be seen, the 44
sections words,
of the constituting
subkey the sub-keys,
at the previous round. are obtained by carrying out xor
operations between the 32-bit sections of the subkey at the previous round.
Electronics 2020, 9, x FOR PEER REVIEW 13 of 30
Electronics 2020, 9, 1665 13 of 30
Electronics 2020, 9, x FOR PEER REVIEW 13 of 30

Figure 9. Source files used to implement the code performing the encryption process.
Figure 9. Source files used to implement the code performing the encryption process.
Figure 9. Source files used to implement the code performing the encryption process.

Figure 10. Code section used to generate the 10 subkeys, for encrypting the 128-bit plaintext data packets.
Figure 10. Code section used to generate the 10 subkeys, for encrypting the 128-bit plaintext data
packets.
A Sbox 10.
Figure matrix
Codeis section
employedusedtotoexpand thethe
generate key
10(Figure
subkeys,11);
forasencrypting
mentioned theabove,
128-biteach element
plaintext of the
data
Sbox packets.
consists of 32 bits, instead of 8 bits, so allowing the algorithm to perform the related operations
A Sbox matrix is employed to expand the key (Figure 11); as mentioned above, each element of
and thus obtaining the encrypted data packets, in a shorter temporal interval but with greater resource
the Sbox consists
A Sbox of is32employed
bits, instead of 8 bits, so allowing theasalgorithm toabove,
perform the relatedof
utilization of matrix
the FPGA device. This to expand
LUT-basedthe key (Figure
solution was11); mentioned
preferred over solutions each element
that implement
operations
the Sbox and thus
consists obtaining the
of 32 bits,asinstead encrypted data
of 8 bits, packets, in a shorter temporal interval but with
Sbox through GF operations, those reported in so allowing[31,33],
References the algorithm
because the to perform the related
main prerogative of
greater resource thus
operations utilization of the
the encrypted
FPGA device. This LUT-based solution was interval
preferred butover
the WirelessandConnector obtaining
is the operating speed data
ratherpackets, in a shorter
than hardware temporal
resources utilization, given with
the
solutions
greater that implement
resource utilization Sbox through GF device.
operations,This as those reported in was
References [31,33],
wide memory capability of theofemployed
the FPGA FPGA platform; LUT-based
as solution
known, LUT-based Sbox preferred over
solutions offer
because
solutionsthethat
mainimplement
prerogativeSbox of the Wireless
through GF Connector
operations, is the
as operating
those speed rather
reported in than hardware
References [31,33],
better performances in terms of processing time to the detriment of area occupation, as demonstrated in
resources
because theutilization,
main given the of widethe memory capability of the
theemployed FPGA platform;thanasalgorithm
known,
Reference [30], thusprerogative
affecting the Wireless
Substitute Connector
Bytes step, theis most operating
critical speed rather
operation in the AES hardware
LUT-based
resources Sbox solutions offer better performances in terms of processing time to the detriment of
but also theutilization,
key expansion givenstep
the in
widethememory
proposedcapability of the employed FPGA platform; as known,
implementation.
area occupation,
LUT-based Sbox assolutions
demonstratedoffer in Reference
better [30], thus
performances in affecting
terms the Substitute Bytes step, the mostof
Once the 10 sub-keys are obtained, the algorithm carries outof
theprocessing
10 rounds time to the
required by detriment
the AES-128
critical operation in the AES algorithm but also the key expansion step in the proposed
and implemented in the aes_encoding_block source file, which receives in input the sub-keysthe
area occupation, as demonstrated in Reference [30], thus affecting the Substitute Bytes step, andmost
the
implementation.
critical operation in the AES algorithm but also the key expansion step in the proposed
plaintext and carried out the steps required to encrypt the plain text (Figure 1).
implementation.
Electronics 2020, 9, x FOR PEER REVIEW 14 of 30

Electronics 2020, 9, 1665 14 of 30


Electronics 2020, 9, x FOR PEER REVIEW 14 of 30

Figure 11. S-Box matrix containing 256 elements, each of 32 bit.

Once the 10 sub-keys are obtained, the algorithm carries out the 10 rounds required by the
AES-128 and implemented Figure 11. S-Box
inS-Box matrix containing 256 elements,
the matrix
aes_encoding_block each which
source file, of 32 bit.receives in input the
Figure 11. containing 256 elements, each of 32 bit.
sub-keys and the plaintext and carried out the steps required to encrypt the plain text (Figure 1).
In the first round, the xor operation between the plaintext and the cipher_key_table is carried out,
In thethe
first round, the xor
are operation between the plaintext out
and the
the 10
cipher_key_table is carried
which contains10
Once thesub-keys
unexpanded obtained,
encryptionthe algorithm
key (round_0carries
in Figure 12). rounds required by the
out, which
AES-128 contains
and the unexpanded
implemented encryption key (round_0
in the aes_encoding_block in Figure
source 12). receives in input the
file, which
sub-keys and the plaintext and carried out the steps required to encrypt the plain text (Figure 1).
In the first round, the xor operation between the plaintext and the cipher_key_table is carried
out, which contains the unexpanded encryption key (round_0 in Figure 12).

Figure 12. Code Section related to the first round (called round_0) of the AES-128 algorithm.
Figure 12. Code Section related to the first round (called round_0) of the AES-128 algorithm.
Afterward, the algorithm, using the intermediate data generated by the first round, proceeds with
Afterward,
the following the algorithm,
9 rounds required using
by thethe intermediate
AES-128, data in
performing generated
each round by the
the Substitute
first round, proceeds
Bytes, Shift
with the
Rows, Mix following
Columns
Figure 9 rounds
12. Codeand required
Addrelated
Section by
RoundtoKey the AES-128,
operations
the first performing
(Figure
round (called in each
13a). of
round_0) These round the
operations
the AES-128 Substitute Bytes,
are iteratively
algorithm.
Shift
appliedRows,
to theMix Columns data
intermediate and obtained
Add Round fromKey the operations
previous round,(Figure 13a). These
updated until the operations
ninth roundare
iteratively
(Figure applied
Afterward,
13b). The to
thedatatheobtained
intermediate
algorithm, using thedata
after obtained called
intermediate
this iteration, from the previous
data generated by round,
the first
intermediate_data updated
round,
(9), until the
proceeds
is provided to
ninth
with
roundtheround (Figure
10following 13b).
9 rounds
for the last The data
required
Add Round Keybyobtained after
the AES-128,
operation this iteration,
performing
and the resulting in called intermediate_data
each round
ciphered data the Substitute
is stored (9),
Bytes,
into 128-bitis
provided
Shift Rows,to round
out_cipher_data 10 for(Figure
Mixpacket
Columns theand
last14).
Add
Add Round
RoundKey Keyoperation
operations and(Figure
the resulting
13a). ciphered data is stored
These operations are
into 128-bitapplied
iteratively out_cipher_data packet (Figure
to the intermediate data14).
obtained from the previous round, updated until the
ninth round (Figure 13b). The data obtained after this iteration, called intermediate_data (9), is
provided to round 10 for the last Add Round Key operation and the resulting ciphered data is stored
into 128-bit out_cipher_data packet (Figure 14).
Electronics2020,
Electronics 2020,9,9,1665
x FOR PEER REVIEW 15of
15 of30
30
Electronics 2020, 9, x FOR PEER REVIEW 15 of 30

(a) (b)
(a) (b)
Figure 13. Generation of the 9 intermediate rounds implementing the operation provided by the
Figure 13. Generation
Generation of
of the
the 99 intermediate
intermediate rounds
rounds implementing
implementing the
the operation
operation provided
provided by the
AES-128 (a); code section implementing the operations required by each round (b).
required by
AES-128 (a); code section implementing the operations required by each
each round
round (b).
(b).

Figure 14. Code section related to the Add Round Key carried out in the last round (called round_10)
Figure 14. Code section related to the Add Round Key carried out in the last round (called
Figure 14. Code
of the AES-128 section related to the Add Round Key carried out in the last round (called
algorithm.
round_10) of the AES-128 algorithm.
round_10) of the AES-128 algorithm.
By saving the results of each round (i.e., intermediate_data(i), i = 1, . . . , 9), a pipelined
By saving the results of each round (i.e., intermediate_data(i), i = 1, …, 9), a pipelined
implementation
By saving the can results
be obtained, carrying
of each round out
(i.e.,simultaneously the 10 rounds
intermediate_data(i), on successive
i = 1, …, data
9), a pipelined
implementation can be obtained, carrying out simultaneously the 10 rounds on successive data
implementation can be obtained, carrying out simultaneously the 10 rounds on successive data
packets, thus allowing to start the processing of a new packet as soon as the round's processing on
packets, thus allowing to start the processing of a new packet as soon as the round's processing on
Electronics 2020, 9, 1665 16 of 30

packets, thus allowing to start the processing of a new packet as soon as the round’s processing on the
previous ones is completed. Therefore, simultaneous processing on multiple packets is performed,
thus allowing better exploitation of the used hardware resources, so reaching higher data throughput.
As below reported, the proposed AES implementation takes only a clock period to complete the round’s
processing, allowing to provide an encrypted data packet for each clock cycle.
To test the correct behaviour of the implemented algorithm, a word generator, called
Data_Generator (green box in Figure 8), has been included in the tool offered by Vivado IP INTEGRATOR
for simulating the presence of the ethernet module, that provides 128-bit data packets at the input
of the encryption block, every 42.67 ns, via the AXI Stream bus. Instead, to insert and store the key,
an external block called Key_generator (purple box in Figure 8) and a memory block with 4 registers
of 32-bit each have been employed, connected via AXI Lite bus, so allowing the user to update the
key at any time. A Key_to_write block (orange box in Figure 8) writes the 4 words of the key (32 bits
each) in 4 registers, created during the AXI Lite bus implementation phase, asynchronously to the
processor, allowing the substitution of the encryption key during the normal operation of the algorithm.
Therefore, if the key in the registers is not changed, the algorithm performs the data encryption,
otherwise, if it differs from the current key, the expansion_key routine starts and the 10 sub-keys of the
new main key are generated.
The switching to a new key is enabled when the processor deems it appropriate by setting a bit
of an additional byte transmitted via the AXI Lite bus, stored in an additional 32-bit register, named
key_valid. The algorithm queries this bit every 42.7 ns and if it detects that its value is set high,
it reads the key stored in the registers and starts the key expansion routine; at the same time, the
value of the enabling bit is reset for indicating to decryption block the changing of the encryption key.
The developed key substitution mechanism represents an important functionality for the Wireless
Connector since a periodic key change is required, for guaranteeing the security of the data exchanged
between the two mobile stations constituting the communication system.
The correctness of the encrypted data packets is verified by a Pattern_Verificator block (blue box
in Figure 8), connected to the encryption block via the AXI Stream bus; this last simulates the presence
of the modem and contains a table with encrypted data packets corresponding to the plaintext data
packets provided at the input of the AES_TEST_AXI block by the Data_Generator block. It compares the
packets received from the AES_TEST_AXI block with those contained in the table; if the data received
is the same as that in its table, the encryption has been successful, otherwise, an error has occurred.
To verify the correct operation of the algorithm, an Insert_Error block (pink box in Figure 8) has been
implemented to change a bit in the 128-bit plaintext packet, thus verifying the presence of any errors
by the Pattern_Verificator. When an error is detected, this last set the error_sig bit in correspondence
with the encrypted data packet that does not match the stored ones in the Pattern_Verificator table.
The CLOCK block (yellow box in Figure 8) provides the clock to all the blocks with a frequency of
350 MHz. To synchronize the Pattern_Verificator with the encryption block, an impulse is generated
to indicate the end of encryption and the availability of a new encrypted data packets at the output
of the AES_AXIS_KEY block; this signal is associated to the m00_axis_tvalid pin of the AXI-Strem
bus. Besides, two other signals have been implemented, namely a support flag and a signal indicating
the packet of the Data_Generator table provided to the input of the encryption block, allowing the
Pattern_Verificator to keep track of the packets sent and to associate them to the corresponding entries
in its internal table.
Furthermore, an external signal has been defined, called s00_axis_tvalid, which indicates, through
an impulse, to the encryption block the availability of the packets at the input, enabling the immediate
acceptance of new data packets. To optimize the algorithm and to reduce the execution time, the
plaintext packets are acquired on the falling edge of the clock signal, thus allowing to start with the
encryption process in advance, so gaining 1.42 ns corresponding to half of the clock period.
The modem, located downstream of the encryption block and simulated by the Pattern_Verificator
block, works with 64-bit data packets; therefore, the encrypted packets have to be serialized in 64-bit
Electronics 2020, 9, 1665 17 of 30

packets each, determining some latency and timing problems. Therefore, the m00_axis_tready signal,
Electronics 2020, 9, x FOR PEER REVIEW 17 of 30
provided by the AXI Stream bus, has been implemented for indicating to the encryption block, when the
Pattern_Verificator
signal is set, the is available,accepts
algorithm to accept new
the packets.
packets in When
input the andsignal is set, the
performs the algorithm
encryption accepts the
process;
packets in input and performs the encryption process; otherwise, if it is reset,
otherwise, if it is reset, the algorithm stops and waits for the signal to return high. Instead, the the algorithm stops and
waits for the signalsignal
m00_axis_tvalid to return high. Instead,
indicates the m00_axis_tvalid
to Pattern_Verificator, thatsignal
a new indicates to Pattern_Verificator,
encrypted data packet is
that a new encrypted
available at the output. data packet is available at the output.
In
InFigure
Figure15,15, thethe
temporal
temporaltrends of theofsignals
trends involved
the signals in the developed
involved encryptionencryption
in the developed algorithm
are shown; the
algorithm are s00_axis_tvalid signal is generated
shown; the s00_axis_tvalid signalonisthe falling edge
generated on of thethefalling
clock, whereas
edge of encryption
the clock,
of
whereas encryption of plaintext data packets starts on each rising edge of the clock. in
plaintext data packets starts on each rising edge of the clock. Also, to consider the case which
Also, to
the s00_axis_tvalid signal is set to zero, the last two packets are made available
consider the case in which the s00_axis_tvalid signal is set to zero, the last two packets are made using two impulses
randomly spaced.
available using two Therefore,
impulsesthe algorithm
randomly accepts
spaced. the plaintext
Therefore, data packets
the algorithm to perform
accepts encryption
the plaintext data
and provides the corresponding encrypted data packets after an interval
packets to perform encryption and provides the corresponding encrypted data packets after an of 28.560 ns. From the above
considerations,
interval of 28.560 thens.developed
From thealgorithm can supply 128-bit
above considerations, encrypted
the developed data packets
algorithm every 2.856
can supply 128-bitns
(equal
encrypted to thedata
clock period),
packets so obtaining,
every for a 350
2.856 ns (equal to MHz operating
the clock frequency,
period), a throughput
so obtaining, for a 350 value
MHz of
44.8 Gbit/s.
operating frequency, a throughput value of 44.8 Gbit/s.

Plaintext data packets

Encrypted data packets

Figure15.
Figure 15. Temporal
Temporaltrends
trendsofofthe
the signals
signals involved
involved in the
in the encryption
encryption algorithm,
algorithm, withwith the plaintext
the plaintext data
data packets
packets encrypted
encrypted on rising
on each each rising
edge ofedge
theof the clock,
clock, as indicated
as indicated by theby the s00_axis_tvalid
s00_axis_tvalid signalsignal
set to
set to
one andone
theand
lastthe
twolast two packets
packets made available
made available using
using two two impulses
impulses spaced randomly
spaced randomly over timeover time
(yellow
(yellow
box); box);
each each encrypted
encrypted data packet data
is packet is available
available at the
at the output output
after after
28.560 28.560
ns (10 ns (10
system system
clock clock
periods at
periods
350 MHzatfrequency),
350 MHz frequency),
as indicatedasby indicated by the m00_axis_tvalid
the m00_axis_tvalid signal
signal (orange (orange box).
box).

The
The temporal
temporal trends
trends related
related to the expansion of the key are shown shown in Figure
Figure 16,
16, previously
previously
stored
stored in
in the
the registers
registers in
in an
an instant
instant chosen
chosen by the user and validated through through aa signal
signal provided
provided by by
the processor. During
the processor. During the expansion of the key, which
which lasts 174.55 ns, the m00_axis_tvalid signal
m00_axis_tvalid signal is is
reset
resetindicating
indicatingthat nono
that valid encrypted
valid packets
encrypted are provided
packets from the
are provided encryption
from block. The
the encryption error_sig
block. The
signal is setsignal
error_sig in correspondence to the key change
is set in correspondence because
to the key the table
change becauserelated to therelated
the table new key to is considered,
the new key
whereas the packets
is considered, whereas arethe
still obtained
packets are with the old key;
still obtained with the
the signal
old key;returns to zero
the signal as soon
returns as the
to zero as
encrypted
soon as thepackets are obtained
encrypted packets are through the through
obtained new key.the new key.
The
Theimplementation
implementationofofall allthe
thecontrol
controland andsynchronization
synchronization signals,
signals,above described,
above described, is one of the
is one of
main contributions
the main provided
contributions by theby
provided proposed work, fundamental
the proposed for the correct
work, fundamental for theoperation of the whole
correct operation of
encryption/decryption system, ensuring
the whole encryption/decryption correctensuring
system, interoperability
correctof the developed encryption/decryption
interoperability of the developed
block with the other sections
encryption/decryption block of thethe
with Wireless Connector
other sections system.
of the Wireless Connector system.
Electronics 2020, 9, 1665 18 of 30
Electronics 2020, 9, x FOR PEER REVIEW 18 of 30

m_00_axis_tvali
m_00_axis_tvalid

Key expansion phase


Key expansion phase

Figure16.16. Temporal trends


trends of the
the key expansion
expansion phase; the the m00_axis_tvalid
m00_axis_tvalid(orange
(orangebox)
box)signal
Figure 16. Temporal
Figure Temporal trends of of the key
key expansion phase;
phase; the m00_axis_tvalid (orange box) signal
signal
indicates
indicates that
that during
during this
this phase
phase there
thereare
areno
novalid
validpackets at
packets the
at output
the of
output the
of encryption
the block;block;
encryption the
indicates that during this phase there are no valid packets at the output of the encryption block; the
signal
the signalthat
thaterror_sig
error_sig signal (yellow box)
signal (yellow box) returnsto to zero as soon
as theas the encrypted packets are
signal that error_sig signal (yellow box)returns
returns zero as soon
to zero as soon asencrypted packetspackets
the encrypted are obtained
are
obtained through the new key.
through
obtainedthethrough
new key.
the new key.

Afterward,
Afterward,the
the VHDL
VHDL block
block implementing
implementing the the correspondent decryption algorithm,
correspondent decryption algorithm,called
called
AES_128_DEC
AES_128_DEC(red (redbox
boxininFigure
Figure17),
17),has
hasbeen
beendeveloped,
developed,along
alongwith
withthe
theblocks
blocksemployed
employedtototest
testit,
reproducing an operative scenario similar to that present in the final application.
it, reproducing an operative scenario similar to that present in the final application.

Figure 17. Overall scheme containing the VHDL blocks for both implementing the decryption algorithm
Figure 17. Overall scheme containing the VHDL blocks for both implementing the decryption
and testing it, reproducing the operative scenario of the final application.
algorithm and testing it, reproducing the operative scenario of the final application.
The decryption algorithm has been implemented, as well as the encryption algorithm, parallelizing
The decryption algorithm has been implemented, as well as the encryption algorithm,
many logical instructions on each rising edge of the clock; similarly to the encryption algorithm,
parallelizing many logical instructions on each rising edge of the clock; similarly to the encryption
a Sbox matrix was used, consisting of 256 elements 1 each of 32 bits. In Figure 18, the source
algorithm, a Sbox matrix was used, consisting of 256 elements each of 32 bits. In Figure 18, the
Electronics 2020, 9, 1665 19 of 30
Electronics2020,
Electronics 2020,9,9,xxFOR
FORPEER
PEERREVIEW
REVIEW 19of
19 of30
30

source
files used
source filestoused
files used to implement
implement
to implement the code
the code
the code performing
performing
performing the
the the decryption
decryption
decryption are
areare shown;the
shown;
shown; the first
the first four
four files,
fourfiles,
files,
AES_128_DEC_v1_0,AES_128_DEC_v1_0_S00_AXIS_inst,
AES_128_DEC_v1_0,
AES_128_DEC_v1_0, AES_128_DEC_v1_0_S00_AXIS_inst, AES_128_DEC_v1_0_S01_AXI_inst
AES_128_DEC_v1_0_S00_AXIS_inst, AES_128_DEC_v1_0_S01_AXI_inst and
AES_128_DEC_v1_0_S01_AXI_inst and
and
AES_128_DEC_v1_0_M00_AXIS_inst
AES_128_
AES_128_ DEC_v1_0_M00_AXIS_inst
DEC_v1_0_M00_AXIS_inst are
areare related
related
related toimplementation
to
to the the implementation
the implementation of the
of the communication
communication
of the communication between
between
between
blocks by theblocks
blocks
means byofthe
by the means
AXImeans of AXI
of
Bus Stream AXI Bus
andBus Stream
AXIStream
Lite. The and
and AXI
filesAXI Lite. The
Lite.
containing The files containing
thefiles
code containing to
developed the
the code
code
perform
developed
developed
the AES-128 to to perform
performalgorithm
decryption the AES-128
the AES-128 decryption
decryption
are the algorithm
algorithm
last two shown are the
are
in Figure the last two
18, last
named two shown in
shown in Figure
Figure 18,
aes_decoding_block 18,
and
named aes_decoding_block and cipher_key_expansion_block.
named aes_decoding_block and cipher_key_expansion_block.
cipher_key_expansion_block.

Figure 18.Source
Figure18.
Figure 18. Sourcefiles
Source filesused
usedto
toimplement the
implementthe code
thecode performingthe
codeperforming decryptionprocess.
thedecryption
decryption process.
process.

InInthe
In thecipher_key_expansion_block
the cipher_key_expansion_block file,
cipher_key_expansion_block file, the
file, the code
the code
code to to implement
to implement
implement the the key
the key expansion
key expansionhas
expansion hasbeen
has been
been
implemented,
implemented,toto
implemented, togenerate
generatethe
generate the
the 1010
10sub-keys
sub-keys
sub-keys employed
employed
employed during
during
during thethedecryption
the decryption
decryption process;
process;
process; to expand
to expand
to expandthethe
key,
the
the same
key, the
key, matrix
the same used
same matrix for
matrix used the
used for encryption
for the process
the encryption
encryption processis deployed
process is (called
is deployed sbox_encoding_4).
deployed (called
(called sbox_encoding_4). The
sbox_encoding_4). The 10 rounds
The 1010
for obtaining
rounds
rounds for the plaintext
for obtaining
obtaining the data
the plaintext packets
plaintext dataare
data implemented
packets
packets within thewithin
are implemented
are implemented aes_decoding_block
within source file
the aes_decoding_block
the aes_decoding_block
and the file
source
source roundsandare
file and thedeveloped
the rounds are
rounds arein the samein
developed
developed way,
in theas
the done
same
same foras
way,
way, thedone
as encryption
done for the
for operation. operation.
the encryption
encryption operation.
In each
In each
In round,
each round, the
round, the InvSubBytes,
the InvSubBytes, InvShiftRows
InvSubBytes, InvShiftRows
InvShiftRows and and InvMixColumns
and InvMixColumns
InvMixColumns operations operations
operations are arecombined
are combined
combined
totoobtain
to obtainthe
obtain the plaintextdata
the plaintext datapackets.
data packets.These
packets. These
These operations
operations
operations areare
are carried
carried
carried out by
out outusing
by by using
using the16
the four
the four four
16 1616
××16 × 16
32-bit
32-bit
32-bit matrices,
matrices, calledcalled sbox_decoding_0,
sbox_decoding_0, sbox_decoding_1,
sbox_decoding_1, sbox_decoding_2
sbox_decoding_2
matrices, called sbox_decoding_0, sbox_decoding_1, sbox_decoding_2 and sbox_decoding_3, and
and sbox_decoding_3,
sbox_decoding_3,
equivalent
equivalentto
equivalent tooperations
to operationsof
operations ofthe
of theAES
the AESdecrypting
AES algorithm;
decrypting algorithm;
algorithm; in in particular,
particular, the thexorxoroperations
operationsbetween
operations between
between
the
theintermediate
the intermediatedata
intermediate dataobtained
data obtainedduring
obtained during the
the different rounds
different rounds
rounds of of decryptionalgorithm
of decryption algorithmand
algorithm andelements
and elementsof
elements of
of
these
thesematrices
these matricesare
matrices arecarried
are carriedout
carried out(Figure
out (Figure19).
(Figure 19).These
19). Thesematrices
These matricesallow
matrices allowobtaining
allow obtainingthe
obtaining theplaintext
the plaintext
plaintext data
data
data packets
packets
packets in
only
in 10 clock
in only
only 10 periods,
10 clock
clock considerably
periods,
periods, reducing
considerably
considerably the necessary
reducing
reducing time totime
the necessary
the necessary perform
time the decryption
to perform
to perform process;
the decryption
the decryption
process; however,
however,
process; however,
greater greaterutilization
resource
greater resource utilization
resource utilization
of the FPGA of the
of the FPGA
device
FPGA device is
is device
required. is required.
required.

Figure 19.Code
Figure19.
Figure 19. Code section
Code section related
related to
related to the operation
the operation carried out
operation carried duringeach
out during
during eachround
each roundused
round usedto
used toobtain
to obtainthe
obtain the
the
intermediate data
intermediatedata
intermediate and
dataand finally,
andfinally, the
finally,the plaintext.
theplaintext.
plaintext.
Electronics 2020, 9, 1665 20 of 30
Electronics 2020, 9, x FOR PEER REVIEW 20 of 30

The developed
The developed decryption
decryptionalgorithm
algorithmprovides
providesthe the plaintext
plaintext data
data packets,
packets, at the
at the output
output of
of the
the AES_128_DEC block, in just 28.560 ns for 350 MHz clock frequency,
AES_128_DEC block, in just 28.560 ns for 350 MHz clock frequency, thus obtaining the thus obtaining the same
maximum data-rate
data-rateasas
thethe
encryption algorithm
encryption (i.e., 44.8
algorithm (i.e.Gbit/s). Similarly
44.8 Gbit/s). to the encryption
Similarly algorithm,
to the encryption
the s00_axis_tvalid
algorithm, signal, provided
the s00_axis_tvalid signal,by the Cipher_Data_Generator
provided block (green
by the Cipher_Data_Generator box (green
block in Figure
box17),
in
indicates that the encrypted data packets are available for the decryption; the plaintext
Figure 17), indicates that the encrypted data packets are available for the decryption; the plaintext data packets
provided
data at the
packets output of
provided atthe
thedecryption
output of block are reported
the decryption by the
block arem00_axis_tvalid
reported by thesignal, as depicted
m00_axis_tvalid
in Figure
signal, as 20.
depicted in Figure 20.

Incoming Encrypted data packets

Outcoming Plaintext data packets

Figure 20.
Figure Temporaltrends
20. Temporal trendsrelated
relatedtoto the
the decryption
decryption phase;
phase; thethe
timetime interval
interval required
required to obtain
to obtain the
the plaintext
plaintext datadata packet
packet fromfrom
thethe encrypted
encrypted data
data packet
packet is ishighlighted
highlightedby bytime
timemarkers
markers applied
applied to
to
s00_axis_tvalid (yellow
s00_axis_tvalid (yellow box)
box) and
and the
the m00_axis_tvalid
m00_axis_tvalid (orange
(orange box)
box) signals.
signals.

The developed
The developed decryption
decryptionblock blockreceives
receivesthe thenewnewkey
keyandandstores
storesit in
it four
in four32-bit
32-bitregisters; the key
registers; the
is validated setting the bit of the key_valid register, used for this purpose.
key is validated setting the bit of the key_valid register, used for this purpose. The algorithm checksThe algorithm checks this
bit every
this 85.6 ns85.6
bit every and ns
if itand
is set,if the new
it is set,key
theis acquired
new keyand the flag bitand
is acquired is reset,
the thus
flag communicating
bit is reset, thus to
the processor thattothe
communicating thechange of the that
processor key has the been
changereceived;
of theafterward,
key has the been expansion
received; key routine starts.
afterward, the
During this process, the m00_axis_tvalid signal is reset indicating that no
expansion key routine starts. During this process, the m00_axis_tvalid signal is reset indicating valid decrypted packetsthat
are
available. This operation requires 205 ns, 177 ns more than the 28.56 ns
no valid decrypted packets are available. This operation requires 205 ns, 177 ns more than the 28.56needed to provide the first
valid
ns decrypted
needed packet
to provide theforfirst
thevalid
new validated
decryptedkey. packet for the new validated key.
To verify the correctness of the decrypted
To verify the correctness of the decrypted data data packets
packets and and to to check
check the the sensitivity
sensitivity of of the
the
implemented algorithm
implemented algorithm in indetecting
detectingerrors,errors,the Insert_Error
the Insert_Errorblock (pink
block block
(pink in Figure
block in Figure17) has
17)been
has
implemented, similar to those implemented in the encryption algorithm;
been implemented, similar to those implemented in the encryption algorithm; the sig_error signal the sig_error signal triggers
the change
triggers the achange
single abitsingle
in thebit input
in thewordinputand verify
word andthat the Pattern_verificator
verify that the Pattern_verificatorblock (blue block
block in
(blue
Figure in
block 17) Figure
detects the
17) error;
detects when theit error;
detects when
the error, the error_sig
it detects bit is set
the error, thein correspondence
error_sig bit iswith set the
in
decrypted data packet that does not match with the word set stored in its
correspondence with the decrypted data packet that does not match with the word set stored in its internal table.
Finally,
internal table.the s00_axis_tready signal has been configured, for indicating the availability of a
decryption
Finally,block to accept a newsignal
the s00_axis_tready encrypted data configured,
has been packet. As for discussed
indicating above,
the the implemented
availability of a
algorithm can accept encrypted data packets and thus perform the decryption,
decryption block to accept a new encrypted data packet. As discussed above, the implemented on every rising edge of
the systemcan
algorithm clock; therefore,
accept encrypted it is always ready to
data packets andaccept
thus new encrypted
perform data packets,
the decryption, consequently,
on every the
rising edge
s00_axis_tready signal is reset only if the m00_axis_tready is reset,
of the system clock; therefore, it is always ready to accept new encrypted data packets, namely if the block downward the
decrypting block
consequently, thecannot accept decrypted
s00_axis_tready signal isdataresetpackets.
only if the m00_axis_tready is reset, namely if the
block downward the decrypting block cannot accept decrypted data packets.
3.2. Post-Synthesis Simulation Results: Resources Utilization of the Encryption/Decryption Systems
3.2. Post-Synthesis Simulation
In this sub-section, theResults: Resources
simulations Utilization
performed to of the Encryption/Decryption
determine Systems on the
the resource utilization
ZCU102 FPGA platform by the developed AES-128 algorithm are reported. At first, the post-synthesis
In this sub-section, the simulations performed to determine the resource utilization on the
simulations have been performed on both encryption and decryption blocks, with data packets provided
ZCU102 FPGA platform by the developed AES-128 algorithm are reported. At first, the
on each rising edge of the 350 MHz clock signal; afterward, the simulation has been performed by
post-synthesis simulations have been performed on both encryption and decryption blocks, with
modifying the Data_Generator block to provide data packets at the input of the encryption/decryption
data packets provided on each rising edge of the 350 MHz clock signal; afterward, the simulation
block every 42.7 ns, thus verifying that the hardware usage remains unchanged.
has been performed by modifying the Data_Generator block to provide data packets at the input of
The resource utilization of FPGA related to the encryption algorithm is reported in Table 1;
the encryption/decryption block every 42.7 ns, thus verifying that the hardware usage remains
considering the complete encryption system, in both cases discussed above, the percentages of
unchanged.
Electronics 2020, 9, 1665 21 of 30

hardware occupation equal to 5.48% for LUTs and 0.78% for FFs have been obtained. Afterward,
the simulation of the encryption system has been performed by removing all the blocks used to verify
the correct behavior of the algorithm, leaving only the blocks involved in the encryption algorithm;
the hardware resources utilization is 4.76% for the LUT and 0.71% for the FF. A reduction in hardware
occupation of 0.72% for LUTs has been obtained compared to the previous case including all the blocks.

Table 1. Hardware resources utilization of the ZCU102 Field Programmable Gate Array (FPGA)
platform related to the encryption algorithm; in particular, the complete encryption scheme, including
all the blocks to test the correct operation of the algorithm and only the encryption block have
been considered.

Simulation Resource Utilization Utilization [%]


LUT 15,029 5.48
Complete encryption system
FF 4296 0.78
LUT 13,043 4.76
Encryption block
FF 3877 0.71

Considering the decryption algorithm, the post-synthesis simulations have been performed both
when the data packets are provided on each rising edge of the 350 MHz clock signal and when
the Data_Generator provides encrypted packets every 42.7 ns (i.e., 23.4 MHz packet rate, Table 2);
in the first case, the hardware utilization of the FPGA device is 10.62% for LUTs, 0.79% for FFs and
0.25% relative to the Global Buffers (BUFG) used. In the latter case, the use of hardware resources is
equal to 10.64% for the LUTs, 0.79% for the FFs and 0.25% for the BUFGs. Finally, the post-synthesis
simulation of the decryption system has been performed by removing all the blocks used to verify
the correct behavior of the algorithm, leaving only the block involved in the decryption algorithm.
This configuration reveals a hardware utilization of 10.11% for the LUTs, 0.71% for the FFs and 0.25%
for the Global Buffer, obtaining a reduction in the hardware occupation of 0.53% for the LUTs and
0.08% for the FFs compared to the complete decryption scheme (Table 2).

Table 2. Hardware resource utilization related to the complete decryption scheme, including all the
blocks to test the decryption algorithm, both when the encrypted packets are received in input on each
rising edge of the 350 MHz clock signal and when they are provided every 42.7 ns (i.e., 23.4 MHz); also,
the resource utilization of only the decryption block are reported.

Simulation Resource Utilization Utilization [%]


LUT 29,111 10.62
Complete decryption system
FF 4339 0.79
(350 MHz packet rate)
BUFG 1 0.25
LUT 29,156 10.64
Complete decryption system
FF 4339 0.79
(23.4 MHz packet rate)
BUFG 1 0.25
LUT 27,713 10.11
Decryption block FF 3912 0.71
BUFG 1 0.25

As it can be seen from Tables 1 and 2, showing the use of hardware resources for both the encryption
and decryption systems, the LUTs used on the FPGA by the latter are 1.94× more, considering the
only blocks that perform the decryption and 2.12× more, considering also the blocks needed to test it,
compared to the LUTs used by the encryption algorithm; this is due to the implementation of 4 matrices
(sbox_decoding_0, sbox_decoding_1, sbox_decoding_2 and sbox_decoding_3) in the decryption
algorithm, each containing 32-bit elements deriving from the operations of Inverse SubBytes, Inverse
Shift Rows and Inverse Mix Columns. In particular, the greater hardware resources consumption is
Electronics 2020, 9, 1665 22 of 30

attributable to the multiplication of the Inverse Mix Columns operation carried out in the decryption
block, because involve a large number of values such as 0×09090909, 0×0B0B0B0B, 0×0D0D0D0D,
0×0E0E0E0E; such multiplicative constants require the storing of numerous intermediate values inside
the LUT, occupying more hardware resources and consuming more power [34]. For this reason,
several strategies were proposed in the scientific literature for reducing resource utilization and power
consumption [35,36]. However, since the area occupation requirement is not as stringent as the
encryption/decryption speed for the specifications of the developed project, the implementation choice
fell on obtaining the data packets in the shortest possible amount of time at the expense of a greater
chip’s area occupation.

4. Discussion
In this section, the results of the carried out post-implementation simulations on the combined
system constituted by the cascade of the encryption system and the decryption one are reported,
to verify that the resulting performances are acceptable for the correct operation of the algorithm,
once the project is loaded on the FPGA-ZCU102 platform.

4.1. Post-Implementation Simulations: Clock Routing Issues and Overall Performances of the Combined
Encryption/Decryption System
The post-implementation simulations represent the closest emulation to downloading a design
to a device, providing useful indications related to the functional and timing requirements of the
developed system.
After setting the appropriate parameters and using synthesizable blocks, such as the Clocking
Wizard, for the system clock and the interface mappable pins on the board, for the clock signal and
the error_sig signal provided by the Pattern_Verificator, the post-implementation simulation on the
encryption system has been carried; the simulation results indicated a timing problem related to the
propagation of the signals within the FPGA-ZCU102 chip. In particular, for a 350 MHz system clock,
a Worst Negative Slack (WNS) parameter equal to −1.014 ns has been obtained, indicating excessive
delays in the propagation of the digital signals inside the FPGA chip, thus resulting in incorrect
scheduling of the performed tasks; therefore, a positive WNS is required, for ensuring the proper
operation of the developed encryption/decryption systems.
To overcome this problem, several post-implementation simulations, with a lower system
clock frequency, have been carried out, obtaining an improvement of the WNS parameter (Table 3);
in particular, by using a system clock frequency of 190 MHz, a WNS value equal to 0 ns was obtained,
as well as for 180 MHz operating frequency, a WNS equal to 0.056 ns resulted. Furthermore, to support
a greater system clock frequency, it is possible to use the implementation strategies provided by
the Vivado tool; therefore, a common strategy suitable for both the encryption and decryption
blocks has been chosen, since the final simulations have been carried out on the combined system.
The post-implementation simulation has been performed by setting 220 MHz system clock frequency
and adopting the Explore strategy, thus obtaining the WNS parameter equal to 0.005 ns for the
encryption system and 0.008 ns for the decryption one.

Table 3. Post-implementation simulation results carried out on the encryption system with different
clock frequencies to establish the maximum operating frequency with a positive Worst Negative Slack
(WNS) parameter.

Clock Frequency [MHz] Worst Negative Slack [ns] Total Negative Slack [ns]
180 0.056 0
190 0 0
200 −0.199 −0.353
250 −0.441 −0.895
The area utilization resulting from the post-implementation simulations remains unchanged
compared to the results obtained through the post-synthesis simulations, showing, for the
encryption system, resource utilization of 5% of LUTs, 1% of FFs, 1% of I/O ports and 1% of BUFGs,
as well as for the decryption system, 10% of LUTs, 1% of FFs, 1% of I/O ports and 1% of BUFGs;
Electronics 9, 1665
2020,both
finally, for the system, there is a 25% area utilization relative to the IP Clocking Wizard 23 of 30
block
used to generate the system clock during the post-synthesis and post-implementation simulations.
Before performing the post-implementation simulations of the whole system including encryption
The area utilization resulting from the post-implementation simulations remains unchanged
and decryption blocks, the behavioral simulations with 220 MHz clock frequency have been carried
compared to the results obtained through the post-synthesis simulations, showing, for the encryption
out. In Figure 21, the temporal trends of the signals are shown, obtained providing the plaintext
system, resource utilization of 5% of LUTs, 1% of FFs, 1% of I/O ports and 1% of BUFGs, as well as for
data packets (red box) to the encryption/decryption system every 40.86 ns; this data-rate derives
the decryption system, 10% of LUTs, 1% of FFs, 1% of I/O ports and 1% of BUFGs; finally, for both the
from the clock frequency of 220 MHz, corresponding to 4.54 ns clock period, chosen to comply with
system, there is a 25% area utilization relative to the IP Clocking Wizard block used to generate the
the 3 Gbit/s throughput required by the specifications of Wireless Connector system, as calculated
system clock during the post-synthesis and post-implementation simulations. Before performing the
below in Equation (5).
post-implementation simulations of the whole system including encryption and decryption blocks,
128𝑏𝑖𝑡 128𝑏𝑖𝑡
the behavioral 𝐷𝑎𝑡𝑎 − 𝑅𝑎𝑡𝑒 𝑜𝑟
simulations 𝑇ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡
with 220 MHz =clock frequency = have been = 3.132 𝐺𝑏𝑖𝑡/𝑠.
carried out. In Figure(5)21,
9 × 𝑐𝑙𝑜𝑐𝑘_𝑝𝑒𝑟𝑖𝑜𝑑 9 × 4.54 𝑛𝑠
the temporal trends of the signals are shown, obtained providing the plaintext data packets (red box)
to theThe encryption of the data
encryption/decryption packets
system is 40.86
every performed
ns; thisindata-rate
9.5 clock derives
periodsfrom
(whitethebox);
clockinfrequency
fact, the
encrypted
of 220 MHz,packets are provided
corresponding atns
to 4.54 the output
clock of the
period, encryption
chosen to complyblock on the
with the 3falling
Gbit/s edge of the
throughput
clock, after exactly 9.5 clock periods and then acquired by the decryption block
required by the specifications of Wireless Connector system, as calculated below in Equation (5). on the next rising
edge. Afterward, the data packet is decrypted in 9.5 clock periods (blue box) and provided at the
output of the decryption block after overall 128 10 128 bit
bit clock periods. Therefore, the encryption and
Data − Rate or Throughput = = = 3.132 Gbit/s. (5)
decryption operations last 20 clock periods, 9 × which
clock_period
are equal9 to× 90.8
4.54 ns,
ns considering a system clock
frequency of 220 MHz (Figure 21).

Plaintext data packet

Encrypted data packet


Decrypted data packet

Figure 21.
Figure 21. Temporal trends related to the behavioral
behavioral simulations
simulations where
where plaintext
plaintext data
data packets
packets are
are
providedevery
provided every99system
systemclock
clockperiods
periods at
at 220
220 MHz,
MHz, corresponding
corresponding to
to aa data
data rate of 3.123 Gbit/s.

In Figure
The 22, theoftemporal
encryption the datatrends
packets related to the behavioral
is performed simulations
in 9.5 clock periods are shown,
(white box);with the
in fact,
plaintext
the encrypteddatapackets
packetsare provided
provided to atthe
thesystem
outputin of each clock period
the encryption (frequency
block 220 MHz).
on the falling edge ofThethe
s00_axis_tvalid
clock, after exactly signal
9.5 is constantly
clock periodsset,andindicating to theby
then acquired receiving block that
the decryption a new
block datanext
on the packet is
rising
available
edge. at the input,
Afterward, providing
the data packetencrypted
is decrypted packets onclock
in 9.5 each rising
periodsedge of the
(blue box)clock, thus allowing
and provided at thea
data rate
output equal
of the to 28.16 Gbit/s
decryption 𝑀𝐻𝑧 ∗ 128
(220 overall
block after 𝑏𝑖𝑡 =periods.
10 clock 28.16 𝐺𝑏𝑖𝑡/𝑠).
Therefore, the encryption and decryption
Finally,
operations the
last 20post-implementation
clock periods, whichsimulation
are equal to of 90.8
the overall system constituted
ns, considering a system clockby the cascade of
frequency of
the MHz
220 encryption
(Figure(red21).box) and decryption (blue box) blocks has been carried out (Figure 23a). The
simulation
In Figure has22,been performed
the temporal by related
trends setting to the
theExplore implementation
behavioral simulations are strategy, provided
shown, with by the
the plaintext
Vivado
data tool.provided
packets The screenshots of the
to the system Project
in each Manager,
clock obtained after
period (frequency the post-implementation
220 MHz). The s00_axis_tvalid
simulation,
signal are shown
is constantly set,inindicating
Figure 23;to a positive WNS block
the receiving parameter,
that aequal
new to 0.056
data ns, isisobtained
packet available (Figure
at the
input, providing encrypted packets on each rising edge of the clock, thus allowing a data rate equal in
23b), as well as the hardware utilization of the overall encryption/decryption system is reported to
Figure
28.16 23c. In
Gbit/s (220particular,
MHz ∗ 128 the
bithardware resource utilization was equal to 15% LUTs, 1% FFs, 1% I/O
= 28.16 Gbit/s).
ports, 1% BUFG,
Finally, as well as a 25% area
the post-implementation utilization
simulation of thevalue relative
overall systemto constituted
the IP Clocking
by theWizard
cascadeblock,
of the
used to generate the system clock, was obtained.
encryption (red box) and decryption (blue box) blocks has been carried out (Figure 23a). The simulation
has been performed by setting the Explore implementation strategy, provided by the Vivado tool.
The screenshots of the Project Manager, obtained after the post-implementation simulation, are shown
in Figure 23; a positive WNS parameter, equal to 0.056 ns, is obtained (Figure 23b), as well as the
hardware utilization of the overall encryption/decryption system is reported in Figure 23c. In particular,
the hardware resource utilization was equal to 15% LUTs, 1% FFs, 1% I/O ports, 1% BUFG, as well as a
Electronics 2020, 9, 1665 24 of 30

25% area utilization


Electronics 2020, 9, x FORvalue
PEER relative
REVIEW to the IP Clocking Wizard block, used to generate the system24
clock,
of 30
was obtained.
Electronics 2020, 9, x FOR PEER REVIEW 24 of 30

Plaintext data packet


Plaintext data packet
Encrypted data packet
Encrypted data packet Decrypted data packet
Decrypted data packet

Figure 22. Behavioral simulation with the plaintext data packets (red box) acquired by the system on
Figure
Figure 22. Behavioral
22.
each rising Behavioral
edge of the simulation
clock withwith
simulation the plaintext
frequency dataand
220 MHz packets (red
a data box)
rate acquired
of 28.16 by the
Gbit/s; the system
system on
on
encrypted
each
each rising
rising
packets edgebox)
edge
(white of the
of the
areclock
clock with frequency
with
obtained frequency
after 45.40220
ns MHz and a datathem
after receiving rate and
of 28.16
the Gbit/s; the encrypted
decrypted encrypted
ones (blue
packets
packets (white
(white
box) after box)
box)
90.80 areobtained
ns. are obtained after
after 45.40
45.40 ns ns after
after receiving
receiving themthem
andand the decrypted
the decrypted onesones
(blue(blue
box)
box) 90.80
after after 90.80
ns. ns.

Encryption block
Encryption block Decryption block
Decryption block

(a)
(a)

(b) (c)
Figure23.23.Block
Figure Block
(b) Design
Design including
including the encryption
the encryption (redand
(red box) box) and decryption
decryption (c)(blue box)(blue
blocksbox) blocks
connected
inconnected
cascade (a);inscreenshot
cascade (a); screenshot
of Project Manager,of Project
obtainedManager, obtained by post-implementation
by post-implementation simulation, using a
Figure 23. Block Design including the encryption (red box) and decryption (blue box) blocks
simulation,
clock frequencyusing a clock
of 220 MHzfrequency
and Explore of implementation
220 MHz and Explore
strategy,implementation strategy,
showing the positive WNS showing the
parameter
connected in cascade (a); screenshot of Project Manager, obtained by post-implementation
positive WNSbox)
(green-dashed parameter
equal to(green-dashed
0.056 ns (b) andbox)
theequal to 0.056
hardware ns of
usage (b)the
and the hardware
overall system (c).usage of the
simulation, using a clock frequency of 220 MHz and Explore implementation strategy, showing the
overall system (c).
positive WNS
Besides, parameter (green-dashed
the estimation box) equal
of the total on-chip to 0.056
power (sumnsof(b)the
and the hardware
static FPGA power usageand
of the
design
overall system (c).
power) of the combined encryption/decryption system has been obtained from the post-implementation
Besides, the estimation of the total on-chip power (sum of the static FPGA power and design
simulation, ◦C
power) ofproviding plaintext data
the combined packets each clock period,
encryption/decryption systemwhich hasis equal
been toobtained
1.77 W, with
from 26.7 the
Besides, the estimation of the total on-chip power (sum◦ of the static FPGA power and design ◦
chip temperature, ensuring
post-implementation a thermal
simulation, marginplaintext
providing equal to data
73.3 packets
C (i.e., temperature limit equal
each clock period, whichtois90 C).
equal
power) of the combined encryption/decryption system has been obtained from the
to 1.77 W, with 26.7 °C chip temperature, ensuring a thermal margin equal to 73.3 °C (i.e.
post-implementation simulation, providing plaintext data packets each clock period, which is equal
temperature limit equal to 90 °C). Furthermore, post-implementation simulations have been carried
to 1.77 W, with 26.7 °C chip temperature, ensuring a thermal margin equal to 73.3 °C (i.e.
out on both the encryption and decryption systems individually, so obtaining the total on-chip
temperature limit equal to 90 °C). Furthermore, post-implementation simulations have been carried
power consumption equal to 1.17 W and 0.99 W with the chip temperature equal to 26.5 °C and 26.1
out on both the encryption and decryption systems individually, so obtaining the total on-chip
Electronics 2020, 9, 1665 25 of 30

Furthermore, post-implementation simulations have been carried out on both the encryption and
Electronics 2020, 9, x FOR PEER REVIEW 25 of 30
decryption systems individually, so obtaining the total on-chip power consumption equal to 1.17 W and
0.99 W with the chip temperature equal to 26.5 ◦ C and 26.1 ◦ C, respectively. By providing the plaintext
°C, respectively. By providing the plaintext data packets in input to the encryption block every
data packets in input to the encryption block every 40.86 ns, the post-implementation simulation on
40.86 ns, the post-implementation simulation on the combined encryption/decryption system
the combined
indicates a power encryption/decryption
consumption of onlysystem indicates
365 mW, with a a25.5
power consumption
°C chip of only 365 mW, with a
temperature.

25.5 C chip temperature.
4.2. Testing of the Developed Encryption/Decryption Algorithm on ZCU102 Evaluation Board
4.2. Testing of the Developed Encryption/Decryption Algorithm on ZCU102 Evaluation Board
After the generation of the bitstream file related to the developed project including the cascade
After the generation of the bitstream file related to the developed project including the cascade of
of the encryption and decryption blocks, the file has been loaded on the FPGA-ZCU102 evaluation
the encryption
board. and the
To monitor decryption blocks, the
interest signals, the file has been loaded
IP Integrated LogicalonAnalyzer
the FPGA-ZCU102
(IL) has been evaluation
added to board.
the
To monitor the interest signals, the IP Integrated Logical Analyzer
Block Design; also, to verify the correctness of the decrypted packets, provided by the(IL) has been added to the Block
system
Design; also,by
constituted tothe
verify the correctness
encryption of the decrypted
and decryption packets, provided
blocks connected in cascade,byduring
the system constituted
the test phase,
by
only a single encryption key has been used, initially loaded into four 32-bit registersa and
the encryption and decryption blocks connected in cascade, during the test phase, only single
encryption key has been used, initially loaded into four 32-bit registers
subsequently automatically validated; therefore, the error_sig signal produced by the and subsequently automatically
validated; therefore,block
Pattern_Verificator the error_sig
remains signal
low, thusproduced
indicatingby the
thePattern_Verificator
errors' absence in theblock remains low,
comparison thus
of the
indicating the errors’ absence in the comparison of the packets received by
packets received by the decryption block and those contained in the Pattern_Verificator table. the decryption block and
thoseThe
contained in the Pattern_Verificator
tests carried table.
out on the board confirmed the proper operation of both encryption and
The testsalgorithms,
decryption carried out on the boardwith
complying confirmed the properresulting
the operation operationfrom
of both encryption
the and decryption
post-implementation
algorithms, complying with the operation resulting from the post-implementation
simulations reported in the previous paragraph. In Figure 24, the temporal trends related simulations
to the
reported
complete in the previous paragraph.
encryption/decryption systemIn areFigure
shown,24,in the temporal
which trendsdata
the plaintext related to the
packets, complete
provided
every 9.5 clock periods,
encryption/decryption are accepted
system are shown, byinthe encryption
which blockdata
the plaintext (redpackets,
box) and thus theevery
provided encrypted
9.5 clock
packets are accepted
periods, deliveredby tothetheencryption
decryptionblockblock(red
(white
box)box),
and thereby
thus theobtaining
encryptedthe decrypted
packets packets to
are delivered
downstream
the decryption(blue
blockbox in Figure
(white 24). Asobtaining
box), thereby expected,the thedecrypted
error_sig packets
signal remains
downstreamlow along the in
(blue box
observation
Figure 24). As period, indicating
expected, that thesignal
the error_sig processing
remains of low
the packets is performed
along the observationcorrectly, namely the
period, indicating that
packets leaving the decryption block are equal to those provided at the input
the processing of the packets is performed correctly, namely the packets leaving the decryption block to the encryption
block.
are equal to those provided at the input to the encryption block.

Figure
Figure 24. Temporal trends
24. Temporal trends with
withshown
shownthetheplaintext
plaintextdata
datapackets
packetsentering
entering the
the system
system (red-dashed
(red-dashed
box), the encrypted ones delivered by the encryption block to the decryption block (white-dashed
box), the encrypted ones delivered by the encryption block to the decryption block (white-dashed
box)
box) and
and finally decrypted packets
finally decrypted packetsprovided
providedininoutput
outputbybythe
thedecryption
decryption block
block (blue-dashed
(blue-dashed box);
box);
as
as evident, the plaintext data packets provided in input to the system are equal to those provided by by
evident, the plaintext data packets provided in input to the system are equal to those provided
the
the decryption block (as
decryption block (as indicated
indicatedbybythe
thered
redarrow),
arrow),also
alsodemonstrated
demonstrated byby error_sig
error_sig signal,
signal, which
which
remains low along the observation period (yellow-dashed
remains low along the observation period (yellow-dashed box). box).

In Figure 25, the temporal trends related to the complete encryption/decryption system are
shown, in which the plaintext data packets are provided in input on each rising edge of the clock
Electronics 2020, 9, 1665 26 of 30

In Figure
Electronics 2020, 9,25, thePEER
x FOR temporal
REVIEW trends related to the complete encryption/decryption system are26shown,
of 30
in which the plaintext data packets are provided in input on each rising edge of the clock signal; as can
signal;
be as can
noticed, also,beinnoticed, also,
this case, theinerror_sig
this case,signal
the error_sig
remainssignal remains low,
low, indicating the indicating the proper
proper operation of the
operation of the encryption/decryption
encryption/decryption system. system.

Figure 25. Temporal trends with shown the plaintext data packets provided to the system on each rising
Figure 25. Temporal trends with shown the plaintext data packets provided to the system on each
edge of the clock; the packets are obtained from the decryption block after 20 clock periods, 10 of which
rising edge of the clock; the packets are obtained from the decryption block after 20 clock periods, 10
needed for encryption operation and the remaining 10 for the decryption one. The error_sig signal,
of which needed for encryption operation and the remaining 10 for the decryption one. The
highlighted in yellow, is low along the whole observation interval, as expected. The packets leaving the
error_sig signal, highlighted in yellow, is low along the whole observation interval, as expected. The
decryption block (blue-dashed box) are equal to those entering the encryption block (red-dashed box).
packets leaving the decryption block (blue-dashed box) are equal to those entering the encryption
block (red-dashed
4.3. Comparison box).
of the Proposed AES-128 Implementation with Other Works Reported in the Literature
For the Zynq
4.3. Comparison SoC,
of the just like
Proposed other FPGA,
AES-128 the PL section
Implementation is constituted
With Other by CLBs
Works Reported arranged
in the according
Literature
to matrix structure; each CLB contains two slices, each including four LUTs and eight FFs and a
For the Zynq SoC, just like other FPGA, the PL section is constituted by CLBs arranged
configurable switch matrix [37]. Therefore, from the results shown in Tables 1 and 2, the number of
according to matrix structure; each CLB contains two slices, each including four LUTs and eight FFs
CLBs and slices employed by the developed AES-128 encryption and decryption blocks are 1631/3262
and a configurable switch matrix [37]. Therefore, from the results shown in Table 1 and Table 2, the
and 3464/6928, respectively.
number of CLBs and slices employed by the developed AES-128 encryption and decryption blocks
Table 4 reports
are 1631/3262 the comparison
and 3464/6928, between the proposed implementation of AES-128 encryption
respectively.
algorithm
Table 4 reports the comparison between thepreviously
with other pipelined implementations proposed reported in the scientific
implementation of AES-128 literature, similar
encryption
for operative
algorithm frequency
with and supported
other pipelined throughput; also,
implementations the platform
previously employed
reported in the toscientific
develop literature,
the reported
implementations are indicated, since the FPGA technology affects the performance
similar for operative frequency and supported throughput; also, the platform employed to develop of encryption and
decryption.
the reported implementations are indicated, since the FPGA technology affects the performance ofthe
However, the figure of merit chosen for comparing the different implementations is
efficiency,
encryptiondefined as:
and decryption. However, the figure of merit chosen for comparing the different
implementations is the efficiency, E defined as: = Throughput [Mbps] .
f f iciency (6)
# o f used slices
𝑇ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡 [𝑀𝑏𝑝𝑠]
𝐸𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑐𝑦 = . (6)
# 𝑜𝑓 𝑢𝑠𝑒𝑑 𝑠𝑙𝑖𝑐𝑒𝑠

Table 4. Comparison
In particular, of the
this quantity is proposed AES-128
representative of solution with otherthe
how efficiently FPGA implementations.
FPGA hardware resources
are used to support a given output throughput.
Frequency Throughput Efficiency
Design. Platform Slices
[MHz] [Gbit/s] [Mbps/slice]
Table 4. Comparison of the proposed AES-128 solution with other FPGA implementations.
Zambreno J. et al. [38] (Enc) XC2V4000 184.1 23.57 16,938 1.39
Fan C.P. et al. [25] (Enc) XC4VLX200 250.0
Frequency 32.00
Throughput 86,806 Efficiency
0.36
Design.
Bulens P. et al. [39] (Enc) Platform
Virtex-4 250.0 2.90 Slices
1220 2.30
Standaert F. et al. [40] (Enc) XCV3200E8
[MHz]
145.0
[Gbit/s]
18.56 10,750
[Mbps/slice]
1.66
Zambreno
Hodjat J. [41]
A. et al. et al.(Enc)
[38] (Enc) XC2VP20-7
XC2V4000 168.3184.1 23.57
21.54 16,938
5177 1.394.16
FanD.C.P.
Kotturi et [42]
et al. al. [25] (Enc) XC2VP70-7
(Enc) XC4VLX200 232.6250.0 32.00
29.77 86,806
5408 0.365.50
Daoud L. etP.
Bulens al.et[43]
al. (Enc)
[39] (Enc) XC7Z020 Virtex-4 192.0250.0 1.29
2.90 431
1220 2.302.99
Good T. et al. [33] (Enc/Dec) XC3S2000-5 196.1 23.65 16,693 1.42
Standaert F. et al. [40] (Enc) XCV3200E8 145.0 18.56 10,750 1.66
Our solution (Enc) XCZU9EG 220.0 28.16 3262 8.63
Hodjat
Our solutionA. (Enc/Dec)
et al. [41] (Enc) XCZU9EG XC2VP20-7 220.0168.3 21.54
28.16 5177
10,278 4.162.74
Kotturi D. et al. [42] (Enc) XC2VP70-7 232.6 29.77 5408 5.50
Daoud L. et al. [43] (Enc) XC7Z020 192.0 1.29 431 2.99
Good T. et al. [33] (Enc/Dec) XC3S2000-5 196.1 23.65 16,693 1.42
Our solution (Enc) XCZU9EG 220.0 28.16 3262 8.63
Our solution (Enc/Dec) XCZU9EG 220.0 28.16 10,278 2.74
Electronics 2020, 9, 1665 27 of 30

In particular, this quantity is representative of how efficiently the FPGA hardware resources are
used to support a given output throughput.
As evident from the results reported in the following table, the proposed solution can reach high
data throughput values (up to 28.16 Gbit/s) but with commensurably lower utilization of the hardware
resources compared to other works, thus allowing higher efficiency. Considering the most performing
implementation, reported in Reference [42], our solution obtains a maximum data throughput slightly
lower (−5.3%) but also employs a lot less FPGA hardware resources (i.e., −39.7%), thus resulting
into a higher efficiency value (+56.9%). Also, comparing our solution implementing encryption and
decryption operation with those reported in Reference [33], a clear superiority of the former is evident,
indicated with a higher efficiency value (+92.9%).
As aforementioned, it must be considered that the comparison shown in the previous table is made
between solutions implemented with different platforms for technology, architecture and maximum
clock frequency; therefore, the enhanced performances of our solution are also attributable to the
advanced features and complex architecture of the used platform but mainly to the implemented
solutions aimed to speed up the encryption/decryption process. Such advanced specifications are
required to comply with the constraints imposed by the Wireless Connector system, also related to the
other functionalities included in the developed communication system. Finally, the platform typology
must be considered as a parameter of reported analysis to obtain a fair comparison.

5. Conclusions
In this research work, we have proposed a high-speed implementation of the well-known AES-128
algorithm properly developed for a custom, very short-range and high-frequency communication
system, called Wireless Connector; specifically, this last supports high-throughput data transmission
on a frequency range around 60 GHz between two mobile stations located at short-range (1–10 m).
The core of the communication system is constituted by a Xilinx ZCU102 FPGA platform, which
manages all the base-band operations, including the encryption and decryption of the data packets; the
prototype of the Wireless Connector was realized, demonstrating its proper operation. In particular,
a pipelined approach has been applied to the round-based elaboration typical of the AES algorithm,
allowing simultaneous processing of multiple successive plaintext packets each clock period and thus
reaching higher data throughput values; furthermore, a 32-bit 16 × 16 Sbox matrix was employed to
speed up the Substitute Byte step compared to the classic 8-bit implementation.
Encryption and decryption VHDL blocks have been developed on the Xilinx ZCU102 FPGA
platform, carrying out multiple elaborations of the incoming data packets to comply with the 3 Gbit/s
data rate, constraint required by the Wireless Connector application. The developed encryption system
can operate at a 220 MHz maximum clock frequency, supporting an encryption time of just 10 clock
periods. Thanks to the pipelined elaboration, the proposed implementation is able to process and
provide the encrypted packets each clock period (namely, 4.54 ns = 220 1MHz ), reaching a maximum
128 bit/packet
data throughput higher than 28 Gbit/s (i.e., 4.54 ns = 28.16 Gbit/s). Similarly, the decrypting
system employs just 10 clock period for obtaining the plaintext data packets.
Furthermore, developed AES-128 encryption implementation is featured by higher efficiency
(8.63 Mbps/slice) compared to similar solutions operating on the same frequency range, requiring
just 1631 CLBs, 13043 LUTs and 3877 FFs. However, the decryption implementation requires higher
resource utilization compared to the encryption one (3464 CLBs, 27713 LUTs, 3912 FFs and 1 BUFG),
due to the four matrices derived from Inverse SubBytes, Inverse Shift Rows and Inverse Mix Columns
operations, each containing 32-bit elements; the greater resource utilization is associated with the
Inverse Mix Columns operation, given the multiplicative constants involved in its matrix representation
and its LUT-based implementation inside the FPGA, as detailed in the Section 3.2.

Author Contributions: Relatively to the present scientific article: Conceptualization, P.V. and E.V.; methodology,
P.V. and E.V.; software, R.d.F.; validation, P.V. and R.V.; investigation, R.d.F. and S.C.; resources, R.d.F. and E.V.;
data curation, R.d.F. and S.C.; writing—original draft preparation, P.V., R.d.F. and S.C.; writing—review and
Electronics 2020, 9, 1665 28 of 30

editing, P.V. and R.V.; visualization, R.V.; supervision, R.V. All authors have read and agreed to the published
version of the manuscript.
Funding: This research received no external funding.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Li, L.; Li, S. High throughput AES encryption/decryption with efficient reordering and merging techniques.
In Proceedings of the 2017 27th International Conference on Field Programmable Logic and Applications (FPL), Gent,
Belgium, 4–6 September 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–4.
2. Wei, J.; Han, J.; Cao, S. Satellite IoT Edge Intelligent Computing: A Research on Architecture. Electronics
2019, 8, 1247. [CrossRef]
3. De Fazio, R.; Cafagna, D.; Marcuccio, G.; Minerba, A.; Visconti, P. A Multi-Source Harvesting System Applied
to Sensor-Based Smart Garments for Monitoring Workers’ Bio-Physical Parameters in Harsh Environments.
Energies 2020, 13, 2161. [CrossRef]
4. Visconti, P.; de Fazio, R.; Costantini, P.; Miccoli, S.; Cafagna, D. Innovative complete solution for health safety
of children unintentionally forgotten in a car: A smart Arduino-based system with user app for remote
control. IET Sci. Meas. Technol. 2020, 14, 665–675. [CrossRef]
5. Rajasekar, P.; Haridas, M. Efficient FPGA implementation of AES 128 bit for IEEE 802.16e mobile WiMax
standards. Circuits Syst. 2016, 7, 371–380. [CrossRef]
6. Denning, D.; Irvine, J.; Harold, N.; Dunn, P.; Devlin, M. An implementation of a gigabit Ethernet AES
encryption engine for application processing in SDR. In Proceedings of the 2004 IEEE 60th Vehicular Technology
Conference, VTC2004-Fall. 2004, Los Angeles, CA, USA, 26–29 September 2004; IEEE: Piscataway, NJ, USA, 2004;
Volume 3, pp. 1963–1967.
7. Dey, A.; Nandi, S.; Sarkar, M. Security Measures in IOT based 5G Networks. In Proceedings of the 2018 3rd
International Conference on Inventive Computation Technologies (ICICT), Coimbatore, India, 15–16 November 2018;
IEEE: Piscataway, NJ, USA, 2018; pp. 561–566.
8. Del-Valle-Soto, C.; Velázquez, R.; Valdivia, L.J.; Giannoccaro, N.I.; Visconti, P. An Energy Model Using
Sleeping Algorithms for Wireless Sensor Networks under Proactive and Reactive Protocols: A Performance
Evaluation. Energies 2020, 13, 3024. [CrossRef]
9. Visconti, P.; Sbarro, B.; Primiceri, P.; de Fazio, R.; Ekuakille, A.L. Design and Testing of a Telemetry System
Based on STM X-Nucleo Board for Detection and Wireless Transmission of Sensors Data Applied to a
Single-Seat Formula SAE Car. Int. J. Electron. Telecommun. 2019, 65, 671–678. [CrossRef]
10. Visconti, P.; Sbarro, B.; Primiceri, P. A ST X-Nucleo-based telemetry unit for detection and WiFi transmission
of competition car sensors data: Firmware development, sensors testing and real-time data analysis. Int. J.
Smart Sens. Intell. Syst. 2017, 10, 793–828. [CrossRef]
11. Long, K.; Leung, V.C.M.; Haijun, Z.; Feng, Z.; Li, Y.; Zhang, Z. 5G for Future Wireless Networks, 1st ed.;
Springer: Beijing, China, 2017; pp. 1–653. ISBN 978-3-319-72822-3.
12. Hejazi, A.; Pu, Y.; Lee, K.-Y. A Design of Wide-Range and Low Phase Noise Linear Transconductance VCO
with 193.76 dBc/Hz FoMT for mm-Wave 5G Transceivers. Electronics 2020, 9, 935. [CrossRef]
13. Ghanim, A.; Alshaikhli, I.; Fakhri, T. Comparative study on 4G/LTE cryptographic algorithms based on
different factors. Int. J. Comput. Sci. Telecommun. 2014, 5, 7–10.
14. Park, J.; Park, Y. Symmetric-Key Cryptographic Routine Detection in Anti-Reverse Engineered Binaries
Using Hardware Tracing. Electronics 2020, 9, 957. [CrossRef]
15. Bellemou, A.M.; García, A.; Castillo, E.; Benblidia, N.; Anane, M.; Álvarez-Bermejo, J.A.; Parrilla, L. Efficient
Implementation on Low-Cost SoC-FPGAs of TLSv1.2 Protocol with ECC_AES Support for Secure IoT
Coordinators. Electronics 2019, 8, 1238. [CrossRef]
16. Baldoni, W.M.; Ciliberto, C.; Cattaneo, G.M.P. Aritmetica, Crittografia e Codici; Springer: Berlin/Heidelberg,
Germany, 2007; ISBN 978-88-470-0456-6.
17. Saggese, G.P.; Mazzeo, A.; Mazzocca, N.; Strollo, A.G.M. An FPGA-Based Performance Analysis of the
Unrolling, Tiling and Pipelining of the AES Algorithm. In Field Programmable Logic and Application;
Cheung, P.Y.K., Constantinides, G.A., Eds.; Springer: Berlin/Heidelberg, Germany, 2003; pp. 292–302.
Electronics 2020, 9, 1665 29 of 30

18. Farooq, U.; Aslam, M.F. Comparative analysis of different AES implementation techniques for efficient
resource usage and better performance of an FPGA. J. King Saud Univ. Comput. Inf. Sci. 2017, 29, 295–302.
[CrossRef]
19. Xilinx ZCU102 Evaluation Board User Guide. Available online: https://www.xilinx.com/support/
documentation/boards_and_kits/zcu102/ug1182-zcu102-eval-bd.pdf (accessed on 16 September 2020).
20. Paar, C.; Pelzl, J. The Advanced Encryption Standard (AES). In Understanding Cryptography: A Textbook for
Students and Practitioners; Paar, C., Pelzl, J., Eds.; Springer: Berlin/Heidelberg, Germany, 2010; pp. 87–121.
ISBN 978-3-642-04101-3.
21. Rahimunnisa, K.; Karthigaikumar, P.; Rasheed, S.; Jayakumar, J.; SureshKumar, S. FPGA implementation of
AES algorithm for high throughput using folded parallel architecture. Secur. Commun. Networks 2014, 7,
2225–2236. [CrossRef]
22. Guzmán, I.; Nieto, R.; Noreña, Á. FPGA implementation of the AES-128 algorithm in non-feedback modes of
operation. DYNA 2016, 83, 37–43. [CrossRef]
23. Noorbasha, F.; Divya, Y.; Poojitha, M.; Navya, K.; Bhavishya, A.; Rao, K.; Kishore, K. FPGA design and
implementation of modified AES based encryption and decryption algorithm. Int. J. Innov. Technol. Explor.
Eng. 2019, 8, 132–136.
24. Gopalan, A.; Ganesh, J.; Swathi, M. FPGA-based Message Encryption and Decryption. Int. J. Innov. Technol.
Explor. Eng. 2015, 4, 1225–1232.
25. Fan, C.-P.; Hwang, J.-K. Implementations of high throughput sequential and fully pipelined AES processors
on FPGA. In Proceedings of the 2007 International Symposium on Intelligent Signal Processing and
Communication Systems, Xiamen, China, 28 November–1 December 2007; pp. 353–356.
26. McLoone, M.; McCanny, J.V. High-performance FPGA implementation of DES using a novel method for
implementing the key schedule. IEE Proc. Circuits Devices Syst. 2003, 150, 373–378. [CrossRef]
27. Chodowiec, P.; Gaj, K. Very compact FPGA implementation of the AES algorithm. In Proceedings of the
CHES 2003, Cologne, Germany, 8–10 September 2003; Volume 2779, pp. 319–333.
28. Bani-Hani, R.; Harb, S.; Mhaidat, K.; Taqieddin, E. High-Throughput and Area-Efficient FPGA
Implementations of Data Encryption Standard (DES). Circuits Syst. 2014, 5, 45–56. [CrossRef]
29. Rouvroy, G.; Standaert, F.-X.; Quisquater, J.-J.; Legat, J.-D. Efficient uses of FPGAs for implementations of
DES and its experimental linear cryptanalysis. IEEE Trans. Comput. 2003, 52, 473–482. [CrossRef]
30. Ahmad, N.; Hasan, R.; Jubadi, W.M. Design of AES S-box using combinational logic optimization. In
Proceedings of the 2010 IEEE Symposium on Industrial Electronics and Applications (ISIEA), Penang, Malaysia, 3–6
October 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 696–699.
31. Canright, D. A Very Compact S-Box for AES. In Cryptographic Hardware and Embedded Systems–CHES 2005;
Rao, J.R., Sunar, B., Eds.; Springer: Berlin/Heidelberg, Germany, 2005; pp. 441–455.
32. Satoh, A.; Morioka, S.; Takano, K.; Munetoh, S. A Compact Rijndael Hardware Architecture with S-Box
Optimization. In International Conference on the Theory and Application of Cryptology and Information Security:
Advances in Cryptology; Springer: Berlin/Heidelberg, Germany, 2001; pp. 239–254.
33. Good, T.; Benaissa, M. AES on FPGA from the Fastest to the Smallest. In Cryptographic Hardware and Embedded
Systems–CHES 2005; Rao, J.R., Sunar, B., Eds.; Springer: Berlin/Heidelberg, Germany, 2005; pp. 427–440.
34. Murugeswari, S.; Sridevi, P.; Vanaja, D.; Vanaja, G. Area optimization for reducing circuit complexity in
masked AES based on FPGA. Int. J. Innov. Emerg. Technol. 2015, 1, 1–4.
35. Sutharsan, H.; Thomas, A. Area & Power optimization of AES algorithm using modified mixcolumn with
composite S-BOX. IJRSET 2016, 3, 12–24.
36. Hua, L.; Friggstad, Z. An efficient architecture for the AES mix columns operation. In Proceedings of the 2005
IEEE International Symposium on Circuits and Systems, Kobe, Japan, 23–26 May 2005; IEEE: Piscataway, NJ, USA,
2005; Volume 5, pp. 4637–4640.
37. Xilinx UltraScale Architecture Configurable Logic Block User Guide (UG574). Available online: https://www.
xilinx.com/support/documentation/user_guides/ug574-ultrascale-clb.pdf (accessed on 26 August 2020).
38. Zambreno, J.; Nguyen, D.; Choudhary, A. Exploring Area/Delay Tradeoffs in an AES FPGA Implementation.
In Field Programmable Logic and Application; Becker, J., Platzner, M., Vernalde, S., Eds.; Springer:
Berlin/Heidelberg, Germany, 2004; pp. 575–585.
Electronics 2020, 9, 1665 30 of 30

39. Bulens, P.; Standaert, F.-X.; Quisquater, J.-J.; Pellegrin, P.; Rouvroy, G. Implementation of the AES-128 on
Virtex-5 FPGAs. In Progress in Cryptology–AFRICACRYPT 2008; Vaudenay, S., Ed.; Springer: Berlin/Heidelberg,
Germany, 2008; pp. 16–26.
40. Standaert, F.-X.; Rouvroy, G.; Quisquater, J.-J.; Legat, J.-D. Efficient Implementation of Rijndael Encryption in
Reconfigurable Hardware: Improvements and Design Tradeoffs. In Cryptographic Hardware and Embedded
Systems—CHES 2003; Walter, C.D., Koç, Ç.K., Paar, C., Eds.; Springer: Berlin/Heidelberg, Germany, 2003;
pp. 334–350.
41. Hodjat, A.; Verbauwhede, I. A 21.54 Gbits/s fully pipelined AES processor on FPGA. In Proceedings of the 12th
Annual IEEE Symposium on Field-Programmable Custom Computing Machines, Napa, CA, USA, 20–23 April 2004;
IEEE: Piscataway, NJ, USA, 2004; pp. 308–309.
42. Kotturi, D.; Seong-Moo, Y.; Blizzard, J. AES crypto chip utilizing high-speed parallel pipelined architecture.
In Proceedings of the 2005 IEEE International Symposium on Circuits and Systems, Kobe, Japan, 23–26 May 2005;
IEEE: Piscataway, NJ, USA, 2005; Volume 5, pp. 4653–4656.
43. Daoud, L.; Hussein, F.; Rafla, N. Optimization of Advanced Encryption Standard (AES) Using Vivado
High Level Synthesis (HLS). In Proceedings of the 34th International Conference on Computers and Their
Applications, Honolulu, HI, USA, 18–20 March 2019; Volume 58, pp. 36–44.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).

You might also like