Hardware Implementation of AES Algorithm With Logic S-Box: Sou Ane Oukili and Seddik Bri
Hardware Implementation of AES Algorithm With Logic S-Box: Sou Ane Oukili and Seddik Bri
Cryptography has an important role in data security against known attacks and decreases or
limits the risks of hacking information, especially with rapid growth in communication tech-
niques. In the recent years, we have noticed an increasing requirement to implement crypto-
graphic algorithms in fast rising high-speed network applications. In this paper, we present high
throughput e±cient hardware implementations of Advanced Encryption Standard (AES)
cryptographic algorithm. We have adopted pipeline technique in order to increase the speed and
the maximum operating frequency. Therefore, registers are inserted in optimal placements.
Furthermore, we have proposed 5-stage pipeline S-box design using combinational logic to reach
further speed. In addition, e±cient key expansion architecture suitable for our proposed design
is also presented. In order to secure the hardware implementation against side-channel attacks,
masked S-box is introduced. The implementations had been successfully done by virtex-6
(xc6vlx240t) Field-Programmable Gate Array (FPGA) device using Xilinx ISE 14.7. Our
proposed unmasked and masked architectures are very fast, they achieve a throughput of
93.73 Gbps and 58.57 Gbps, respectively. The obtained results are competitive in comparison
with the implementations reported in the literature.
1. Introduction
The astounding growth of the internet and computer systems in the last century,
have meant that the need for e®ective security and reliability of data communica-
tion, processing and storage is greater than ever. In this context, cryptographic
1750141-1
S. Oukili & S. Bri
development has been a high priority and challenging research area in both ¯elds
of mathematics and engineering. The Advanced Encryption Standard (AES),
also known as Rijndael is a symmetric key cryptographic algorithm developed by
Dr. Joan Daemen and Dr. Vincent Rijmen.1 In 2001, it was adopted as a Federal
Information Processing Standard by the National Institute of Standards and
Technology.2 This algorithm has an input and an output data of length 128 bits, the
key can be of length 128/192/256 bits.
There are software and hardware approaches to implement cryptographic AES
algorithm. As compared to software implementation, hardware implementation
provides greater physical security and higher speed.3 Because of the increasing
requirements for high-speed, high-volume secure communications combined with
physical security, hardware implementation becomes essential. Low power, high
J CIRCUIT SYST COMP Downloaded from www.worldscientific.com
throughput and compactness have always been topic matter of interest for hardware
by FUDAN UNIVERSITY on 02/21/17. For personal use only.
design and implementation. For this paper, the main goal of our hardware imple-
mentation is high throughput design using as less hardware as possible. Besides
performing a cipher algorithm, a cryptographic device is also requested to physically
protect the secret data manipulated during its real execution. Traditionally, crypt-
analysis has been directed around algorithms but, since only few years, hardware
implementation is considered as a fundamental part for security evaluation of the
cryptographic design. New challenges in this ¯eld are the so-called side-channel
attacks,4,5 which exploit information leakage from the cryptographic device due to
physical phenomena such as power consumption, electromagnetic radiation and
execution timing. These attacks are based on monitoring a physical quantity and
applying statistical analysis to extract con¯dential information from extremely noisy
signals. Numerous countermeasures have been proposed to defend against above
side-channel attacks. Masking is one of the popularly used methods, which has the
advantages of low cost and easy implementation.6–10
The AES algorithm is implemented using di®erent methods and contributions
which can be categorized as following. In the ¯rst category, loop unrolling and
iterative techniques are used to increase throughput, increase throughput to area
ratio and decrease area cost. See Refs. 11, 14, 16, 17 and 22 for more details. The
second category includes the designs which use pipelining and sub-pipelining tech-
niques to increase operational frequency and throughput, see Refs. 12–16 and 18–28.
In this paper, we present e±cient high throughput hardware design and imple-
mentation of 128-bit key AES. Pipeline technique is introduced in order to increase
the speed and the maximum operating frequency. The pipelining strategy consists in
parallelizing the data inputs and outputs with the processing. Consequently, the
algorithm is divided into stages and registers are placed. By incrementing the
number of these stages, the critical path is decreased and as a result the throughput is
increased. The S-box substitution is at the core of any AES implementation. It is the
only complex step in each round of encryption algorithm. It is implemented using
combinational logic to avoid the unbreakable delay of LUTs and to achieve any
1750141-2
Hardware Implementation of AES Algorithm with Logic S-box
2. AES Algorithm
by FUDAN UNIVERSITY on 02/21/17. For personal use only.
The AES algorithm is a symmetric block cipher, in which both the sender and the
receiver use a same key for encryption and decryption. The data block length is ¯xed
to be 128 bits and the key length can be 128, 192, or 256 bits. The AES is an iterative
algorithm. Each iteration can be called a round, and the total number of rounds is
dependent on the key length. The output of each round serves as input of next stage.
For each round, 128-bit data input and 128/192/256-bit key is required.
The key length is represented by Nk ¼ 4, 6, or 8, which re°ects the number of 32-
bit words (number of columns) in the cipher key. For the AES algorithm, the number
of rounds to be performed during the execution of the algorithm is dependent on the
key size. The number of rounds is represented by Nr, where Nr ¼ 10 when Nk ¼ 4,
Nr ¼ 12 when Nk ¼ 6 and Nr ¼ 14 when Nk ¼ 8. In AES system, same secret key is
used for both encryption and decryption, so it provides simplicity in design. For this
work, 128-bit key is chosen, which requires 10 rounds of encryption.
The 128-bit data block is arranged in a 4 4 array of bytes called the State, with
four rows and four columns consisting of 16 bytes in total. Each round is composed of
four di®erent byte-oriented transformations: SubByte, ShiftRow, MixColumn and
AddRoundKey except for the last round in which MixColumn transformation is not
performed. Apart from this, there is an initial round at the start that consists of only
AddRoundKey transformation.1,2 Figure 1 shows the 128-bit key AES algorithm.
1750141-3
S. Oukili & S. Bri
. SubByte: operates in each byte of the State independently. Each byte is substi-
tuted by the corresponding byte in the Substitution-box (S-box). S-box is one of
the basic components of any symmetric key algorithm, which exhibits the property
of confusion. This property is provided to increase the di±culty in ¯nding the key
from the known cipher text. S-box takes M inputs and transforms them to deliver
N bits at the output. Fixed S-boxes are used in AES algorithm, which are designed
using multiplicative inverse over GF (28) and combining the inverse function with
an invertible a±ne transformation. These properties make it e±cient over crypt-
analysis by providing nonlinear properties.
. ShiftRow: takes the data in the State matrix and circularly shifts each data block
left by its row index.
J CIRCUIT SYST COMP Downloaded from www.worldscientific.com
The decryption structure can be derived by inverting the encryption one directly and
its rounds require four inverse operations: InvSubByte, InvShiftRow, InvMixColumn
and AddRoundKey.
The AES algorithm takes the original main Key, and performs a Key Expansion
routine to generate the round keys. In AES 128-bit key, it generates a total of 11
KeyRound of 16 bytes in order to be employed respectively in rounds of AES, taking
into account that the ¯rst KeyRound is the initial key. Key expansion is also an
iterative algorithm with same round number as the AES encryption. The output of
each round is the input of the next one. In each round, the ¯rst four bytes of the input
KeyRound constitute the word w0, the next four bytes the word w1 and so on. The
bytes of the ¯nal word are left rotated by one position, and then each byte passes
through substitution transformation SubWord (S-box). The result is XORed with a
round constant RCon(i). Finally, the columns are added together to generate a new
128-bit round key. Figure 2 shows one round of key expansion module. The key
expansion is designed to be resistant to known cryptanalytic attacks. The inclusion
1750141-4
Hardware Implementation of AES Algorithm with Logic S-box
3. Related Works
There had been many di®erent hardware implementations of AES algorithm pre-
sented in the literature. They aim to improve the throughput and the e±ciency using
area as less as possible. In this section, we review a few of them. Authors in Ref. 11
addressed design, hardware implementation and performance testing of AES algo-
rithm. An optimized code for the Rijndael algorithm with 128-bit keys has been
developed. The area and throughput are carefully trading o® to make it suitable for
wireless military communication and mobile telephony where emphasis is on the
J CIRCUIT SYST COMP Downloaded from www.worldscientific.com
1750141-5
S. Oukili & S. Bri
architecture and pipelined architecture. The ¯rst design was optimized for area and
the second one for speed. Fully pipelined crypto processor was presented by authors
in Ref. 23, where AES was integrated with a 32-bit general purpose 5-stage pipelined
MIPS processor. Parallel Sub-Pipelined architecture (PSP) was proposed in Ref. 24
in order to obtain high throughput. The proposed architecture was also compared
with loop unrolled, pipelined, sub-pipelined, parallel and parallel pipelined archi-
tecture in terms of throughput. An extension of a general-purpose processor with a
crypto coprocessor was described by authors in Ref. 25, for encryption and decryp-
tion of information. In Ref. 26, authors presented high throughput digital design of
the 128-bit AES algorithm based on the 2-slow retiming technique on FPGA.
Authors in Ref. 27 presented an equivalent pipelined AES architecture working on
CTR mode to provide high throughput through inserting some registers in appro-
J CIRCUIT SYST COMP Downloaded from www.worldscientific.com
priate points making the delay shortest, when implementing the byte transfor-
by FUDAN UNIVERSITY on 02/21/17. For personal use only.
mation in one clock period. In Ref. 28, authors presented three high-throughput
AES implementations in ECB mode and one ultra-high throughput AES imple-
mentation in CTR mode. They performed area-delay e±cient multiplier and
multiplicative inverter over GF(28). Moreover, loop-unrolling, fully pipelining and
fully sub-pipelining techniques were also used and the registers were placed in
optimal placement.
1750141-6
Hardware Implementation of AES Algorithm with Logic S-box
J CIRCUIT SYST COMP Downloaded from www.worldscientific.com
by FUDAN UNIVERSITY on 02/21/17. For personal use only.
required gates of this transformation, several di®erent methods have been presented
in the literature to deal with the critical path of S-box and the occupied memory:
S-box using composite ¯eld arithmetic, direct mapping from LUT's or using
combinational logic only. Implementation of S-box using LUT's requires high volume
of gates and su®ers from the inherent and unbreakable existent delay. This delay is
longer than the total delay of the rest of the transformations in each round unit and
prohibits them from being divided into more than two sub-stages to achieve any
further speedup. Composite ¯eld arithmetic can reduce the area of design, however it
increase the critical path delay. Implementation using combinational logic has the
advantage of having small area occupancy, in addition to be able of being pipelined
to achieve any further speedup. Our proposed S-box implementation aims to increase
the throughput and use hardware resource as less as possible. Therefore, S-box using
combinational logic technique is adopted.
As said above, the S-Box transformation is computed by taking the multiplicative
inverse in GF(28) followed by an a±ne transformation. The a±ne transformation
can be represented in matrix form as shown below.2 Note that (a7 ; a6 ; a5 ; a4 ; a3 ; a2 ; a1
and a0 ) bits represent the input byte.
1750141-7
S. Oukili & S. Bri
J CIRCUIT SYST COMP Downloaded from www.worldscientific.com
by FUDAN UNIVERSITY on 02/21/17. For personal use only.
0 1 0 1
1 1 1 1 1 0 0 0 0 1 0
B0 a7
B 1 1 1 1 1 0 0C Ba C B1C
C B
C
B C B 6C B C
B0 0 1 1 1 1 1 0 C B a5 C B 1 C
B C B C B C
B0 1C B C B C
AT ðaÞ ¼ B
0 0 1 1 1 1 C B a4 C B 0 C : ð1Þ
B1 C
1C B C B C
B 0 0 0 1 1 1 B a3 C B 0 C
B1 C
1 C B a2 C
B B C
B 1 0 0 0 1 1 C B0C
B C @ a1 A B C
@1 1 1 0 0 0 1 1A @1A
a0
1 1 1 1 0 0 0 1 1
The a±ne transformation can be translated to logical XOR operation. The logical
form of the matrix is shown below:
0 1
a7 a6 a5 a4 a3 `0'
B a a a a a `1' C
B 6 5 4 3 2 C
B C
B a5 a4 a3 a2 a1 `1' C
B C
B a4 a3 a2 a1 a0 `0' C
AT ðaÞ ¼ B
B a a a a a `0' C :
C ð2Þ
B 7 3 2 1 0 C
B C
B a7 a6 a2 a1 a0 `0' C
B C
@ a7 a6 a5 a1 a0 `1' A
a7 a6 a5 a4 a0 `1'
1750141-8
Hardware Implementation of AES Algorithm with Logic S-box
This equation indicates that there are multiply, addition, squaring and multi-
plication inversion operations in Galois Field. Each of these operators can be
transformed into individual blocks when constructing the circuit for computing the
multiplicative inverse. From this simpli¯ed equation, the S-box transformation can
be produced as shown in Fig. 5.
Computation of the multiplicative inverse in composite ¯elds cannot be directly
applied to an element which is based on GF(28). That element has to be mapped to
its composite ¯eld representation via an isomorphic function, . Likewise, after
performing the multiplicative inversion, the result will also have to be mapped back
from its composite ¯eld representation to its equivalent in GF(28) via the inverse
1750141-9
S. Oukili & S. Bri
a0
0 1 0 0 0 0 1 1
by FUDAN UNIVERSITY on 02/21/17. For personal use only.
0 1
1 1 1 0 0 0 1 0 0 1
B0 1 0 0 0 1 0 0C a7
B C B a6 C
B C B C
B 0 1 1 0 0 0 1 0 C B a5 C
B C B C
B 0 1 1 1 0 1 1 0 C B a4 C
1
a¼B B C B C: ð6Þ
C B C
B 0 0 1 1 1 1 1 0 C B a3 C
B 1 0 0 1 1 1 1 0 C B a2 C
B C B C
B C @ a1 A
@0 0 1 1 0 0 0 0A
a0
0 1 1 1 0 1 0 1
The matrix multiplication can be translated to logical XOR operation. The logical
form of the matrices above is shown below:
0 1
a7 a5
B a7 a6 a4 a3 a2 a1 C
B C
B a7 a5 a3 a2 C
B C
B a7 a5 a3 a2 a1 C
a¼B B C; ð7Þ
a7 a6 a2 a1 C
B C
B a7 a4 a3 a2 a1 C
B C
@ a6 a4 a1 A
a6 a1 a0
0 1
a7 a5 a6 a1
B a2 a6 C
B C
B a a a C
B 6 5 1 C
B a a a a a C
1
a¼B B 1C
C: ð8Þ
6 5 4 2
a
B 5 a 4 a 3 a 2 a 1C
B a7 a4 a3 a2 a1 C
B C
@ a5 a4 A
a6 a5 a4 a2 a0
1750141-10
Hardware Implementation of AES Algorithm with Logic S-box
GFð2 2 Þ ! GFð2Þ : x 2 þ x þ 1 ;
GFðð2 2 Þ 2 Þ ! GFð2 2 Þ : x 2 þ x þ ’ ; ð9Þ
GFððð2 2 Þ 2 Þ 2 Þ ! GFðð2 2 Þ 2 Þ : x 2 þ x þ :
Addition of two elements in Galois Field can be translated to simple bitwise XOR
operation between the two elements. Based on Ref. 31, the logical equations for the
squaring, multiplication and multiplication inversion blocks are as following, where
J CIRCUIT SYST COMP Downloaded from www.worldscientific.com
. Squarer in GF(24): the formula for computing the squaring operation is shown
below:
0 1 0 1
q3 a3
B q2 C B a 3 a 2 C
q¼B C B
@ q1 A ¼ @ a 2 a 1 A :
C ð10Þ
q0 a3 a1 a0
1750141-11
S. Oukili & S. Bri
1750141-12
Hardware Implementation of AES Algorithm with Logic S-box
1750141-13
S. Oukili & S. Bri
the ciphertext resulting from the encryption process.4 Power analysis attacks are
powerful attacks among them. Power analysis attacks include simple power analysis
(SPA), di®erential power analysis (DPA), higher order di®erential power analysis
(HODPA) and glitch attack. SPA attack is a technique that involves directly
interpreting power consumption measurements during cryptographic operations.
DPA attack is based on statistical analysis in which the attacker can guess the
correctness of the keys by comparing the di®erences between a sample power trace
and the correct key power trace. HODPA attack is a powerful technique that misuses
joint leakage information of several intermediate values to \crack" the secret key. In
gate level, input signal postponing through circuit used di®erent arriving time,
therefore it leaded to the possibility of glitch attack.5 Many research works focused
on studying side channel attacks and had proposed multiple countermeasure tech-
J CIRCUIT SYST COMP Downloaded from www.worldscientific.com
measures can be restricted to the masking schemes.6,7 Its basic idea is to randomize
the intermediate results that are produced during the computation of a crypto-
graphic algorithm. Masking can break the dependence between the power con-
sumption and the intermediate values in the cryptographic algorithm.
A traditional XOR operation is used as a masking counter measure; however, the
mask is arithmetic on GF (28).7 The operation is compatible with the AES structure
(ShiftRow, MixColumn and AddRoundKey) except for SubByte, which is the only
nonlinear transformation since it uses an inversion in the ¯eld. In other words, it is
easy to compute mask correction for all transformations in a round, apart from the
inversion step of the S-box. The problem of masked inversion can be reduced to
compute binary AND on masked data bit without revealing actual data bit.8 In order
to mask our proposed 5-stage S-box design, which is constructed using XOR
and AND gates, we adopted an improved masked AND gate proposed by Refs. 9
and 10. The scheme carefully masks each inputs of the AND gates by constructing a
nonlinear function which e±ciently avoid side channel attacks. In addition, it has
two more inputs parameters compared with the existing methods. Its main circuit
only needs ¯ve XOR and four AND operations which takes up rather smaller area
when porting to FPGA and ASIC platforms.9,10 We denoted masked input bits as
follows:
x 0 ¼ x rx x 00 ; ð15Þ
y 0 ¼ y ry y 00 ; ð16Þ
x 00 ¼ x rx ; ð17Þ
00
y ¼ y ry ; ð18Þ
where x 0 ; y 0 ; x 00 and y 00 are masked data, and rx and ry corresponding masks. All
operations over binary extension ¯eld are the operations over GF(2), namely bit-
wide XOR and AND.
1750141-14
Hardware Implementation of AES Algorithm with Logic S-box
There are several implementations for the AES algorithm that aim to achieve the
most e±cient architecture, by improving high throughput and area e±ciency.
Table 2 shows the performance ¯gures for some reported architectures up to our best
Utilization
Resources Unmasked AES Masked AES
Number of slices 5,759/37,680 15% 9,531/37,680 25%
Number of slice LUTs 15,330/150,720 10% 25,740/150,720 17%
Number of slice registers 17,680/301,440 5% 28,960/301,440 9%
Number of bonded IOBs 386/400 96% 386/400 96%
Minimum period 1.36 ns 2.18 ns
Maximum frequency 732.279 MHz 457.582 MHz
1750141-15
S. Oukili & S. Bri
Critical Maximum
delay frequency Throughput E±ciency
Authors Device Slices (ns) (MHz) (Gbps) (Mbps/slices)
Jyrwa and Paily 11
Virtex-2 Pro 6211þ1 — 142.5 18.2 0.23
XC2VP30 BRAM
Gielata et al.12 Virtex-4 1209 — 165 21.2 —
XC4VLX200
Granado-Criado et al.13 Virtex-2 3576þ80 5.1 194.7 24.92 6.97
XC2V6000 BRAMs
Fan and Hwang14 Virtex-2 139357 4.5 222.2 28.40 0.20
XC2V3000
Rizk and Morsy15 Virtex-4 18,855þ200 — — 28.510 1.51
4VLX60FF668 BRAMs
J CIRCUIT SYST COMP Downloaded from www.worldscientific.com
XC5VLX110T
Kaur et al.17 Virtex-2 1127 4 247.365 31.66 —
XC2VP30
Rahimunnisa et al.18 Virtex-6 2056þ48 1.9 505.5 37.1 15.56
XC6VLX75T BRAMs
Hammad et al.19 XC2V6000 10662 — 305.1 39.05 3.6
Wang and Ha20 Virtex-6 9071þ400 3.1 319.29 40.86 4.51
XC6VLX240T BRAMs
Samiee et al.21 Virtex-2 Pro 7,865 2.9 341.53 43.71 5.55
XC2VP20
Iyer et al.22 Virtex-2 Pro 12,556þ100 2.6 373 47.74 3.8
XC2VP30 BRAMs
Anwar et al.23 Virtex-6 2,547þ204 1.8 553 58 —
ML605 BRAMs
Rahimunnisa et al.24 Virtex-6 2597 2.2 450.045 59.59 22.94
XC6VLX75T
Soliman and Abozaid25 Virtex-5 1,656 1.7 557 70 —
XC5VLX50T
Farashahi et al.26 Virtex-4 3,425 1.73 576.037 73.73 21.53
XC4VLX200
Qu et al.27 Virtex-5 22,994 1.7 576.07 73.73 3.21
XC5VLX85
Soltani and Shari¯an28 Virtex-6 28,520 1.2 803.988 102.91 3.6
XC6VLX240T
This work Unmasked AES Virtex-6 5,759 1.3 732.279 93.73 16.27
XC6VLX240T
Masked AES Virtex-6 9,531 2.1 457.582 58.57 6.14
XC6VLX240T
1750141-16
Hardware Implementation of AES Algorithm with Logic S-box
an increase of 9.79%. But in terms of area, our implementation decreases the area used in
Ref. 28 by 79.8%. Comparing throughput per slice, we notice that ours is 4.52 times
more e±cient than Ref. 28.
Comparing our unmasked and masked AES implementations, we note that the
masked one decreases the throughput by 37%, increases the used slices by 65% and
decreases the e±ciency by 62%. This is due to the masked gates which needs addi-
tional operations.
The results clearly show that our proposed implementations achieve a good bal-
ance between hardware area and design performance.
7. Conclusion
J CIRCUIT SYST COMP Downloaded from www.worldscientific.com
In this paper, we present high throughput e±cient AES architectures. They perform
by FUDAN UNIVERSITY on 02/21/17. For personal use only.
pipeline technique to break the critical path delay and increase speed. Moreover,
unmasked and masked S-box were implemented using combinational logic and reg-
isters are inserted in optimal placements. The input can be loaded every clock cycle
and after an initial delay of 80 clock cycles, the encrypted data will appear consec-
utively. The implementations were done by virtex-6 FPGA device. The results show
that our proposed AES architectures are competitive in terms of throughput, area
and e±ciency with the previous implementations.
Acknowledgment
This work is supported by the Moulay Ismail University, Meknes-Morocco.
References
1. J. Daemen and V. Rijmen, AES Proposal: Rijndael (1999), Available at http://csrc.nist.
gov/archive/aes/rijndael/Rijndael-ammended.pdf.
2. National Institute of Standards and Technology (NIST), Advanced Encryption Stan-
dard, Federal Information Processing Standards Publication 197 (2001), Available at
http://csrc.nist.gov/publications/¯ps/¯ps197/¯ps-197.pdf.
3. S. M. Yoo, D. Kotturi, D. W. Pan and J. Blizzard, An AES crypto chip using a high-speed
parallel pipelined architecture, Microprocess. Microsyst. 29 (2005) 317–326.
4. P. Kocher, J. Ja®e and B. Jun, Di®erential power analysis, Lecture Notes in Computer
Science, Advances in Cryptology-CRYPTO'99, Vol. 1666 (Springer-Verlag Berlin
Heidelberg, 1991), pp. 398–412.
5. S. Mangard, E. Oswald and T. Popp, Power Analysis Attacks: Revealing the Secrets of
Smart Cards (Spinger-Verlag, US, 2007).
6. T. Popp, S. Mangard and E. Oswald, Power analysis attacks and countermeasures, IEEE
Design Test Comput. 24 (2007) 535–543.
7. M. L. Akkar and C. Giraud, An implementation of DES and AES, secure against some
attacks, Lecture Notes in Computer Science, Cryptographic Hardware and Embedded
Systems - CHES 2001, Vol. 2162 (Springer-Verlag Berlin Heidelberg, 2001), pp. 309–318.
8. E. Trichina, Combinational logic design for AES subbyte transformation on masked data,
Cryptology ePrint Archive, Report 20031236 (2003), Available at http://eprint.iacr.org.
1750141-17
S. Oukili & S. Bri
15. M. R. M. Rizk and M. Morsy, Optimized area and optimized speed hardware imple-
mentations of AES on FPGA, 2nd Int. Design Test Workshop, Cairo, Egypt (2007), pp.
207–217.
16. J. S. Banu, M. Vanitha, J. Vaideeswaran and S. Subha, Loop parallelization and pipe-
lining implementation of AES algorithm using openmp and FPGA, Int. Conf. Emerging
Trends in Computing, Communication and Nanotechnology, Tirunelveli, India (2013),
pp. 481–485.
17. A. Kaur, P. Bhardwaj and N. Kumar, FPGA implementation of e±cient hardware for the
advanced encryption standard, Int. J. Innovat. Technol. Exploring Eng. 2 (2013) 187–
190.
18. K. Rahimunnisa, P. Karthigaikumar, S. Rasheed, J. Jayakumar and S. Suresh Kumar,
FPGA implementation of AES algorithm for high throughput using folded parallel
architecture, Security Commun. Netw. 7 (2014) 2225–2236.
19. I. Hammad, K. El-Sankary and E. El-Masry, High-speed AES encryptor with e±cient
merging techniques, IEEE Embed. Syst. Lett. 2 (2010) 67–71.
20. Y. Wang and Y. Ha, FPGA-based 40.9-gbits/s masked aes with area optimization for
storage area network, IEEE Trans. Circ. Syst. II: Express Briefs 60 (2013) 36–40.
21. H. Samiee, R. E. Atani and H. Amindavar, A novel area-throughput optimized archi-
tecture for the AES algorithm, Int. Conf. Electronic Devices, Systems and Applications,
Kuala Lumpur, Malaysia (2011), pp. 29–32.
22. N. Iyer, P. Anandmohan, D. Poornaiah and V. Kulkarni, E±cient hardware architectures
for AES on FPGA, Computational Intelligence and Information Technology, eds. V. V.
Das and N. Thankachan (Springer-Verlag Berlin Heidelberg, 2011), pp. 249–257.
23. H. Anwar, M. Daneshtalab, M. Ebrahimi, J. Plosila and H. Tenhunen, FPGA imple-
mentation of AES-based crypto processor, IEEE 20th Int. Conf. Electronics, Circuits,
and Systems, Abu Dhabi, Emirats Arabes Unis (2013), pp. 369–372.
24. K. Rahimunnisa, P. Karthigaikumar, N. A. Christy, S. S. Kumar and J. Jayakumar, PSP:
Parallel sub-pipelined architecture for high throughput AES on FPGA and ASIC, Central
Eur. J. Comput. Sci. 3 (2013) 173–186.
25. M. I. Soliman and G. Y. Abozaid, FPGA implementation and performance evaluation of
a high throughput crypto coprocessor, J. Parallel Distrib. Comput. 71 (2011) 1075–1084.
26. R. R. Farashahi, B. Rashidi and S. M. Sayedi, FPGA based fast and high-throughput 2-
slow retiming 128-bit AES encryption algorithm, Microelectron. J. 45 (2014) 1014–1025.
1750141-18
Hardware Implementation of AES Algorithm with Logic S-box
27. S. Qu, G. Shou, Y. Hu, Z. Guo and Z. Qian, High throughput, pipelined implementation
of AES on FPGA, Int. Symp. Information Engineering and Electronic Commerce,
Ternopil, Ukraine (2009), pp. 542–545.
28. A. Soltani and S. Shari¯an, An ultra-high throughput and fully pipelined implementation
of AES algorithm on FPGA, Microprocess. Microsyst. 39 (2015) 480–493.
29. V. Rijmen, E±cient Implementation of the Rijndael S-Box, Katholieke Universiteit
Leuven, Dept. ESAT, Belgium (2000), Available at http://luca-giuzzi.unibs.it/corsi/
Support/papers-cryptography/rijndael-sbox.pdf.
30. A. Satoh, S. Morioka, K. Takano and S. Munetoh, A compact Rijndael hardware ar-
chitecture with S-Box optimization, Lecture Notes in Computer Science, Advances in
Cryptology – ASIACRYPT 2001, Vol. 2248 (Springer-Verlag Berlin Heidelberg, 2001),
pp. 239–254.
31. X. Zhang and K. K. Parhi, High-speed VLSI architectures for the AES algorithm, IEEE
Trans. Very Large Scale Integr. (VLSI) Syst. 12 (2004) 957–967.
J CIRCUIT SYST COMP Downloaded from www.worldscientific.com
by FUDAN UNIVERSITY on 02/21/17. For personal use only.
1750141-19