Reliable CRC-Based Error Detection Constructions For Finite Field
Reliable CRC-Based Error Detection Constructions For Finite Field
Abstract: Finite-field multiplication has received prominent attention in the literature with
applications in cryptography and error-detecting codes. For many cryptographic algorithms, this
arithmetic operation is a complex, costly, and time-consuming task that may require millions of
gates. In this work, we propose efficient hardware architectures based on cyclic redundancy
check (CRC) as error-detection schemes for postquantum cryptography (PQC) with case studies
for the Luov cryptographic algorithm. Luov was submitted for the National Institute of Standards
and Technology (NIST) PQC standardization competition and was advanced to the second
round. The CRC polynomials selected are in-line with the required error-detection capabilities
and with the field sizes as well. We have developed verification codes through which software
implementations of the proposed schemes are performed to verify the derivations of the
proposed error-detection schemes are performed over a Xilinx field-programmable gate array
(FPGA), verifying that the proposed schemes achieve high error coverage with acceptable
overhead.
CHAPTER 1
INTRODUCTION
Many modern, sensitive applications and systems use finite-field operations in their schemes,
among which finite-field multiplication has received prominent attention. Finite-field multipliers
perform multiplication modulo, an irreducible polynomial used to define the finite field. For
postquantum cryptography (PQC), the inputs can be very large, and the finite-field multipliers
may require millions of logic gates. Therefore, it is a complex task to implement such
architectures resilient to natural and malicious faults; consequently, research has focused on
ways to eliminate errors and obtain more reliability with acceptable overhead [1]–[6]. Moreover,
there has been previous work on countering fault attacks and providing reliability for PQC.
Sarker et al. [7] used error-detection schemes of number theoretic transform (NTT) to detect both
permanent and transient faults. Mozaffari-Kermani et al. [8] performed fault detection for
stateless hash-based PQC signatures. Additionally, error-detection hash trees for stateless hash-
based signatures are proposed in [9] to make such schemes more reliable against natural faults
and help protecting them against malicious faults. In [10], algorithm-oblivious constructions are
proposed through recomputing with swapped ciphertext and additional authenticated blocks,
which can be applied to the Galois counter mode (GCM) architectures using different finite-field
spatial/temporal redundancies for the NTRU encryption algorithm have been presented in [11].
Our proposed error-detection architectures are adapted to the Luov cryptographic algorithm [12];
however, they can be applied to different PQC algorithms that use finite-field multipliers. The
Luov algorithm was submitted for National Institute of Standards and Technology (NIST)
standardization competition [13] and was advanced to the second round [14]. Cyclic redundancy
check (CRC) error-detection schemes are applied in our proposed hardware constructions to
make sure that they are overhead-aware with high error coverage. Our contributions in this brief
1) Error-detection schemes for the finite-field multipliers G F(2m ) with m > 1 used in the Luov
cryptographic algorithm are proposed. These error-detection architectures are based on CRC-5.
Additionally, we explore and study both primitive and standardized generator polynomials for
2) We derive new formulations for the error-detection schemes of Luov’s algorithm, performing
software implementations for the sake of verifications. We note that such derivation covers a
wide range of applications and security levels. Nevertheless, the presented schemes are not
3) The proposed error-detection architectures are embedded into the original finite-field
(FPGA) family Kintex Ultrascale+ for device xcku5p-ffvd900-1-i to confirm that the schemes
The third round of competition for the SHA-3 candidates is ongoing to select the winning
function in 2012. Although much attention has been devoted to the performance and security of
these candidates, the approaches for increasing their reliability have not been presented to date.
In this paper, for the first time, we propose a high-performance scheme for fault detection of the
SHA-3 round-three candidate Grostl which is inspired by the Advanced Encryption Standard
(AES). We propose a low-overhead fault detection scheme by presenting closed formulations for
the predicted signatures of different transformations of this SHA-3 third-round finalist. These
signatures are derived to achieve low overhead and include one or multi-bit parities and
byte/word-wide predicted signatures. The proposed reliable hardware architectures for Grostl are
implemented on Xilinx Virtex-6 FPGA family to benchmark their hardware and timing
characteristics. The results of our evaluations show high error coverage and acceptable overhead
research has focused on different aspects of TES, including implementations on hardware [field-
programmable gate array (FPGA)/ application-specific integrated circuit (ASIC)] and software
security-constrained usage models. In this article, we propose efficient approaches for protecting
such schemes against natural and malicious faults. Specifically, noting that intelligent attackers
do not merely get confined to injecting multiple faults, one major benchmark for the proposed
schemes is evaluation toward biased and burst fault models. We evaluate a variant of TES, i.e.,
the Hash-Counter-Hash scheme, which involves polynomial hashing as other variants are either
similar or do not constitute finite field multiplication which, by far, is the most involved
operation in TES. In addition, we benchmark the overhead and performance degradation on the
ASIC platform. The results of our error injection simulations and ASIC implementations show
the suitability of the proposed approaches for a wide range of applications including deeply
embedded systems.
and HIGHT:
undermined through natural or malicious faults. In this paper, two underlying block ciphers
which can be used in authenticated encryption algorithms are considered, i.e., light encryption
device and high security and lightweight block ciphers. The former is of the Advanced
Encryption Standard type and has been considered area-efficient, while the latter constitutes a
Feistel network structure and is suitable for low-complexity and low-power embedded security
applications. In this paper, we propose efficient error detection architectures including variants of
recomputing with encoded operands and signature-based schemes to detect both transient and
this paper, we show that the proposed schemes are applicable to the case study of simple
lightweight CFB for providing authenticated encryption with associated data. The error
simulations are performed using Xilinx Integrated Synthesis Environment tool and the results are
benchmarked for the Xilinx FPGA family Virtex-7 to assess the reliability capability and
Among these architectures are those based on stream ciphers for protection against
eavesdropping, especially when these smart and sensitive applications provide life-saving or vital
mechanisms. Nevertheless, natural defects call for protection through design for fault detection
Cryptology) for smart infrastructures. In addition, we present low-power architectures for its
nine-to-seven uneven substitution box [tower field architectures in GF(33)]. Through error
simulations, we assess resiliency against false-alarms which might not be tolerated in sensitive
intelligent infrastructures as one of our contributions. We further benchmark the feasibility of the
To augment the confidentiality property provided by block ciphers with authentication, the
Galois Counter Mode (GCM) has been standardized by the National Institute of Standards
and Technology. The GCM is used as an add-on to 128-bit block ciphers, such as the
Advanced Encryption Standard (AES), SMS4, or Camellia, to verify the integrity of data.
Prior works on the error detection of the GCM either use linear codes to protect the GCM
architectures or are based on AES–GCM architectures, confining the mechanisms to the AES
block cipher. Although such structures are efficient, they are not only confined to specific
architectures of the GCM but might also not fully take advantage of the parallel architectures
of the GCM. Moreover, linear codes have been shown to be potentially ineffective with
recomputing with swapped ciphertext and additional authenticated blocks, which can be
applied to the GCM architectures using different finite field multipliers in GF (2128 ). Such
obliviousness for the proposed constructions used in the GCM gives freedom to the
circuit implementations to demonstrate the utility of the presented schemes. Based on the
the proposed method to achieve more reliable architectures for the GCM.
CHAPTER 2
The cyclic redundancy check, or CRC, is a technique for detecting errors in digital data, but not
for making corrections when errors are detected. It is used primarily in data transmission. In the
CRC method, a certain number of check bits, often called a checksum, are appended to the
message being transmitted. The receiver can determine whether or not the check bits agree with
the data, to ascertain with a certain degree of probability whether or not an error occurred in
transmission. If an error occurred, the receiver sends a “negative acknowledgement” (NAK) back
The technique is also sometimes applied to data storage devices, such as a disk drive. In this
situation each block on the disk would have check bits, and the hardware might automatically
initiate a reread of the block when an error is detected, or it might report the error to software.
The material that follows speaks in terms of a “sender” and a “receiver” of a “message,” but it
Background:
There are several techniques for generating check bits that can be added to a message. Perhaps
the simplest is to append a single bit, called the “parity bit,” which makes the total number of 1-
bits in the code vector (message with parity bit appended) even (or odd). If a single bit gets
altered in transmission, this will change the parity from even to odd (or the reverse). The sender
generates the parity bit by simply summing the message bits modulo 2—that is, by exclusive
or’ing them together. It then appends the parity bit (or its complement) to the message. The
receiver can check the message by summing all the message bits modulo 2 and checking that the
sum agrees with the parity bit. Equivalently, the receiver can sum all the bits (message and
parity) and check that the result is 0 (if even parity is being used).
This simple parity technique is often said to detect 1-bit errors. Actually it detects errors in any
odd number of bits (including the parity bit), but it is a small comfort to know you are detecting
For bit serial sending and receiving, the hardware to generate and check a single parity bit is very
simple. It consists of a single exclusive or gate together with some control circuitry. For bit
parallel transmission, an exclusive or tree may be used, as illustrated in Figure . Efficient ways to
Other techniques for computing a checksum are to form the exclusive or of all the bytes in the
message, or to compute a sum with end-around carry of all the bytes. In the latter method the
carry from each 8-bit sum is added into the least significant bit of the accumulator. It is believed
that this is more likely to detect errors than the simple exclusive or, or the sum of the bytes with
carry discarded.
A technique that is believed to be quite good in terms of error detection, and which is easy to
implement in hardware, is the cyclic redundancy check. This is another way to compute a
checksum, usually eight, 16, or 32 bits in length, that is appended to the message. We will briefly
review the theory and then give some algorithms for computing in software a commonly used
Theory:
dividing one polynomial in GF(2) (Galois field with two elements) by another. It is a little like
treating the message as a very large binary number, and computing the remainder on dividing it
by a fairly large prime such as Intuitively, one would expect this to give a reliable checksum.
Addition and subtraction are done modulo 2—that is, they are both the same as the exclusive or
is the same as their combination by the logical and operator, and the partial products are summed
using exclusive or. Multiplication is not needed to compute the CRC checksum.
Division of polynomials over GF(2) can be done in much the same way as long division of
The reader might like to verify that the quotient of multiplied by the divisor of plus the
The CRC method treats the message as a polynomial in GF(2). For example, the message
11001001, where the order of transmission is from left to right (110…) is treated as a
representation of the polynomial The sender and receiver agree on a certain fixed polynomial
called the generator polynomial. For example, for a 16-bit CRC the CCITT 1 has chosen the
polynomial x16 + x12 + x5 + 1, which is now widely used for a 16-bit CRC checksum. To
compute an r-bit CRC checksum, the generator polynomial must be of degree r. The sender
appends r 0-bits to the m-bit message and divides the resulting polynomial of degree by the
generator polynomial. This produces a remainder polynomial of degree (or less). The remainder
polynomial has r coefficients, which are the checksum. The quotient polynomial is discarded.
The data transmitted (the code vector) is the original m-bit message followed by the rbit
checksum.
There are two ways for the receiver to assess the correctness of the transmission. It can compute
the checksum from the first m bits of the received data, and verify that it agrees with the last r
received bits. Alternatively, and following usual practice, the receiver can divide all the received
bits by the generator polynomial and check that the r-bit remainder is 0. To see that the
remainder must be 0, let M be the polynomial representation of the message, and let R be the
polynomial representation of the remainder that was computed by the sender. Then the
By the way R was computed, we know that where G is the generator polynomial and Q is the
quotient (that was discarded). Therefore the transmitted data, is equal to QG, which is clearly a
multiple of G. If the receiver is built as nearly as possible just like the sender, the receiver will
append r 0-bits to the received data as it computes the remainder R. But the received data with 0-
as the fact that the method as described is insensitive to the number of leading and trailing 0-bits
in the data transmitted. In particular, if a failure occurred that caused the received data, including
Choosing a “good” generator polynomial is something of an art, and beyond the scope of this
text. Two simple observations: For an r-bit checksum, G should be of degree r, because
otherwise the first bit of the checksum would always be 0, which wastes a bit of the checksum.
Similarly, the last coefficient should be 1 (that is, G should not be divisible by x), because
otherwise the last bit of the checksum would always be 0 (because if G is divisible by x, then R
must be also). The following facts about generator polynomials are proved in [PeBr] and/or
[Tanen]:
• If G is not divisible by x (that is, if the last term is 1), and e is the least positive integer such
that G evenly divides then all double errors that are within a frame of e bits are detected. A
• An r-bit CRC checksum detects all burst errors of length (A burst error of length r is a string of
r bits in which the first and last are in error, and the intermediate bits may or may not be in
error.)
It is interesting to note that if a code of any type can detect all double-bit and single-bit errors,
then it can in principle correct single-bit errors. To see this, suppose data containing a single-bit
error is received. Imagine complementing all the bits, one at a time. In all cases but one, this
results in a double-bit error, which is detected. But when the erroneous bit is complemented, the
data is error-free, which is recognized. In spite of this, the CRC method does not seem to be used
for single-bit error correction. Instead, the sender is requested to repeat the whole transmission if
Practice Table shows the generator polynomials used by some common CRC standards. The
“Hex” column shows the hexadecimal representation of the generator polynomial; the most
The CRC standards differ in ways other than the choice of generating polynomial. Most initialize
by assuming that the message has been preceded by certain nonzero bits, others do no such
initialization. Most transmit the bits within a byte least significant bit first, some most significant
bit first. Most append the checksum least significant byte first, others most significant byte first.
characters, or 8-bit bytes of arbitrary data. CRC-16 is used in IBM’s BISYNCH communication
protocols such as XMODEM, X.25, IBM’s SDLC, and ISO’s HDLC [Tanen]. CRC-32 is also
known as AUTODIN-II and ITU-TSS (ITU-TSS has defined both 16- and a 32-bit polynomials).
It is used in PKZip, Ethernet, AAL5 (ATM Adaptation Layer 5), FDDI (Fiber Distributed Data
Interface), the IEEE-802 LAN/MAN standard, and in some DOD applications. It is the one for
The first three polynomials in Table 14–1 have as a factor. The last (CRC-32) does not.
To detect the error of erroneous insertion or deletion of leading 0’s, some protocols prepend one
or more nonzero bits to the message. These don’t actually get transmitted, they are simply used
to initialize the key register (described below) used in the CRC calculation. A value of r 1-bits
seems to be universally used. The receiver initializes its register in the same way.
The problem of trailing 0’s is a little more difficult. There would be no problem if the receiver
operated by comparing the remainder based on just the message bits to the checksum received.
But, it seems to be simpler for the receiver to calculate the remainder for all bits received
(message and checksum) plus r appended 0-bits. The remainder should be 0. But, with a 0
remainder, if the message has trailing 0-bits inserted or deleted, the remainder will still be 0, so
The usual solution to this problem is for the sender to complement the checksum before
appending it. Because this makes the remainder calculated by the receiver nonzero (usually), the
remainder will change if trailing 0’s are inserted or deleted. How then does the receiver
recognize an error-free transmission? Using the “mod” notation for remainder, we know that
Thus the checksum calculated by the receiver for an error-free transmission should be
This is a constant (for a given G). For CRC-32 this polynomial, called the residual or
residue, is
Hardware:
To develop a hardware circuit for computing the CRC checksum, we reduce the polynomial
division process to its essentials. The process employs a shift register, which we denote by
CRC. This is of length r (the degree of G) bits, not as you might expect. When the
subtractions (exclusive or’s) are done, it is not necessary to represent the high-order bit,
because the high-order bits of G and the quantity it is being subtracted from are both 1. The
Initialize the CRC register to all 0-bits. Get first/next message bit m. If the high-order bit of
CRC is 1, Shift CRC and m together left 1 position, and XOR the result with the low-order r
bits of G.
Otherwise,
If there are more message bits, go back to get the next one.
It might seem that the subtraction should be done first, and then the shift. It would be done
that way if the CRC register held the entire generator polynomial, which in bit form is r + 1
bits. Instead, the CRC register holds only the low-order r bits of G, so the shift is done first,
Below is shown the contents of the CRC register for the generator G = x3 + x + 1 and the
000 Initial CRC contents. High-order bit is 0, so just shift in first message bit.
011 High-order bit is 0 again, so just shift in third message bit, giving: 111 High-order
111 High-order bit is 1, so shift and then XOR with 011, giving:
These steps can be implemented with the (simplified) circuit shown in Figure ,
The three boxes in the figure represent the three bits of the CRC register. When a message
bit comes in, if the high-order bit (x2 box) is 0, simultaneously the mes- sage bit is shifted
into the x0 box, the bit in x0 is shifted to x1, the bit in x1 is shifted to x2, and the bit in x2 is
discarded. If the high-order bit of the CRC register is 1, then a 1 is present at the lower input
of each of the two exclusive or gates. When a message bit comes in, the same shifting takes
place but the three bits that wind up in the CRC register have been exclusive or’ed with
binary 011. When all the message bits have been processed, the CRC holds M mod G.
If the circuit of Figure 14–2 were used for the CRC calculation, then after processing the
message, r (in this case 3) 0-bits would have to be fed in. Then the CRC register would have
the desired checksum, Mx r mod G. But, there is a way to avoid this step with a simple
shown in FIG. This has the effect of pre multiplying the input message M by xr. But pre
multiplying and post multiplying are the same for polynomials. Therefore, as each message
bit comes in, the CRC register contents are the remainder for the portion of the message
Software Figure shows a basic implementation of CRC-32 in software. The CRC-32 protocol
initializes the CRC register to all 1’s, transmits each byte least significant bit first, and
complements the checksum. We assume the message consists of an integral number of bytes.
CHAPTER 3
PRELIMINARIES:
There are five popular PQC algorithm classes: code-based, hash-based, isogeny-based, lattice-
cryptography differs from others in that its security relies on the hardness of decoding in a
cryptography is based on the hard problem to find an isogeny between two given
finite field. Such cryptographic schemes use large field sizes to provide the needed security
levels.
Luov is a multivariate public key cryptosystem and an adaptation of the unbalanced oil and
vinegar (UOV) signature scheme, but there is a restriction on the coefficients of the public
key. Instead, the scheme uses two finite fields: one is the binary field of two elements,
whereas the other is its extension of degree m. F2 is the binary field and F2m is its extension
Fig. 1. Finite-field multiplier with the proposed error-detection schemes based on CRC
are chosen from the base field F2, and whose components f1,..., fo are in the form f k(x) = v i=1
These finite-field multiplications are very complex and require large-area footprint. Therefore, it
is a complex task to implement such architectures resilient to natural and malicious faults. The
aim of this work is to provide countermeasures against natural faults and fault injections for the
finite-field multipliers used in cryptosystems such as the Luov algorithm as a case study, noting
that the proposed error-detection schemes can be adapted to other applications and cryptographic
algorithms whose building blocks need finite-field multiplications. Readers who are interested in
knowing more details about the Luov’s cryptographic algorithm are encouraged to refer to [12].
The multiplication of any two elements A and B of G F(2m ), following the approach in [16], can
be presented as A· B mod f (x) = m−1 i=0 bi · ((Aαi ) mod f (x)) = m−1 i=0 bi · X(i), where the
set of αi’s is the polynomial basis of element A, the set of bi’s is the B coefficients, f (x) is the
field polynomial, X(i) = α·X(i−1) mod f (x), and X(0) = A. To perform finite-field
multiplication, three different modules are needed: sum, α, and pass-thru modules. The sum
module adds two elements in G F(2m ) using m two-input XOR gates, the α module multiplies
an element of G F(2m ) by α and then reduces the result modulo f (x), and lastly, the pass-thru
module multiplies a G F(2m ) element by a G F(2) element. One finite-field multiplication uses a
total of m − 1 sum modules, m − 1 α modules, and m pass-thru modules to get the output. Fault
injection can occur in any of these modules, and formulations for parity signatures in G F(2m )
are derived in [16]. Parity signatures provide an error flag (EF) on each module. The major
drawback of parity signatures is that their error coverage is approximately 50%, that is, if the
number of faults is even, the approach would not be able to detect the faults. This highly
In this work, our aim is the derivation of error-detection schemes that provide a broader and
higher error coverage than parity signatures and explore the application of such schemes to the
Luov algorithm. Thus, we derive and apply CRC signatures [17] to the finite-field multipliers
used in Luov algorithm. This would be a step forward toward detecting natural and malicious
intelligent faults, especially and as discussed in this brief, considering both primitive and
standardized CRCs with different fault multiplicity coverage. CRC was first proposed in 1961
and it is based on the theory of cyclic error-correcting codes. To implement CRC, a generator
polynomial g(x) is required. The message becomes as the dividend, the quotient is discarded, and
the remainder produces the result. In CRC, a fixed number of check bits are appended to the data
and these check bits are inspected when the output is received to detect any errors. The entire
finite-field multiplier with our error-detection schemes is shown in Fig. 1, where actual CRC
(ACRC) and predicted CRC (PCRC) stand for ACRC signatures and PCRC signatures,
respectively. In Fig. 1, only one EF is shown for clarity; however, for CRC-5, which is the case
study proposed in this brief, 5 EFs are computed on each module. In Fig. 2, the α module is
shown more in-depth to clarify how the proposed CRC signatures work in each finite-field
multiplier.
For the sum and pass-thru modules, it follows the approach as for parity signatures described in
[16]. For the sum module in CRC-1, pˆx is equal to the sum of the parity bits of the input
elements A and B in G F(2m ), pˆX = pA + pB. Furthermore, for the pass-thru module in CRC-1,
pˆX = b· pA, where b is an element in G F(2). For any other CRC-n scheme, instead of summing
all the bits, it checks n bits at a time in the sum and pass-thru modules. For the α module, we
have
for which a set of derivations is needed to implement CRC-n into it. In Table I, the generator
polynomials used to derive the CRC-5 signatures are shown. The generator polynomial g0(x) is
one of the standards used for radio frequency identification [18]. The other three generator
polynomials g1(x), g2(x), and g3(x) are primitive polynomials. The benefit of using a primitive
polynomial as the generator that the resulting code has full total block length, which means that
remainders. Moreover, since the remainder is a linear function of the block, all 2-bit errors within
that block length can be identified. For the α module of the Luov’s finite-field multipliers, g0(x)
= x5 + x3 + 1 is used as the standardized generator polynomial for CRC-5. To find its CRC
According to (1), we obtain A(x) · x = a15 · x16 + a14 · x15 + ··· + a1 · x2 + a0 · x. Then,
applied as
To calculate the ACRC-5 for G F(216) in the α module (AC RC516), we rename the coefficients
(6), respectively. These parity groups are XORed with each other to determine if there has been
any fault, for example, flip of bits, during the α module operation. In total, each α module
outputs five EFs. Fig. 2 shows the implementation of the α module with the proposed error-
detection schemes. A(x) is the input with the form p(x) = am−1xm−1 + ··· + a1x + a0, which
goes to two different modules that run in parallel. In the α module, (1) takes place. The output
from the α module is divided into five groups in the ACRC module, which are denoted as x1 a –
x5 a in Fig. 2. Meanwhile, A(x) is also being divided into five groups in the PCRC module,
which are denoted as x1 p–x5 p. Once the two CRC modules are done, each group is XORed
TABLE:
XCKU5P-FFVD900-1-I
with its respective one to produce five EFs, which are represented as E F1–E F5. As an example,
to obtain E F1, x1 p (or a15 + a13 + a12 + a10 + a9 + a8 + a6 + a4 for g0(x)) is XORed with x1 a
(or γ14 + γ13 + γ11 + γ10 + γ9 + γ7 + γ5 + γ0 for g0(x)), which are calculated in (4) and (6),
respectively. For our case study, the outputs are divided into five groups since we use CRC-5;
however, if any other CRC-n is used, there will be n EFs and the actual and predicted outputs
will be divided into n groups. In Table I, the CRC signatures for the different primitive
polynomials are shown. We note that the choice of the utilized CRC can be tailored based on the
reliability requirements and the overhead to be tolerated. In other words, for applications such as
game consoles in which performance is critical (and power consumption is not because these are
plugged in), one can increase the size of CRC. However, for deeply embedded systems such as
Finite-field multiplication is a costly operation and requires large footprint. We implement Luov
polynomial generation to show that the proposed error-detection schemes provide high error
am−1xm−1 +···+a1x +a0, which requires m−1 finite-field multiplications and m−1 XOR
operations. As pointed out before, each finite-field multiplication uses three different modules
called α, sum, and pass − thru modules. A total of m − 1 α modules, m − 1 sum modules, and m
pass − thru modules are needed to perform each finite-field multiplication. Moreover, a total of
m−1 sum modules are needed to perform an XOR operation. For each architecture, the error
coverage is calculated as 100 ·(1− (1/2)sign)%, where sign denotes the number of signatures.
Luov uses the finite-field G F(216), or m = 16. Implementing its polynomials in the form of p(x)
= a15x15 + ··· + a1x + a0 requires 14 finite-field multiplications and 15 XOR operations. Since
modules, 14×15 α modules, 14×15 sum modules, and 14 × 16 pass − thru modules are needed.
Moreover, a total of 14multiplications · (15α + 15sum + 16pass-thru) + 15XOR or 659
signatures are implemented. The error coverage percentage for the generation of Luov’s
polynomial using the finite-field G F(216) is 100 · (1 − (1/2)659)%. In Table II, we present the
delay, power consumption (at the frequency of 50 MHz), throughput, and efficiency for the
We utilize Xilinx FPGA family Kintex Ultrascale+ for device xcku5p-ffvd900-1-i, using Verilog
as the hardware design entry and Vivado as the tool for the implementations. As shown in Table
II, when CRC signatures are applied to the original architecture, with higher error coverage, they
end up having higher overhead in terms of area, delay, and power, and lower overhead in terms
of throughput and efficiency. CLBs, which are the main resources for implementing general-
purpose combinational and sequential circuits, are read in the Vivado’s place utilization report to
obtain the area. To determine the delay, we use the Timing Constraints Wizard function in
Vivado, setting a primary clock period constraint of 20 ns, which equals to a frequency of 50
MHz. We also report the total on-chip power, which is the power consumed internally within the
FPGA and it is obtained by adding device static power and design power. Throughput is obtained
by dividing the total number of output bits over the delay and efficiency is obtained by dividing
throughput over area. As seen in this table, acceptable overheads are obtained with efficiency
degradations of at most 19%. The error-detection architecture that uses the primitive generator
polynomial g2(x) has the least amount of area overhead with 9.17%; however, the error-
detection implementation using g0(x), or the standardized generator polynomial for CRC-5,
performs the fastest, obtaining the least amount of delay overhead with 3.71%.
There has not been any prior work done on this type of error-detection methods for the Luov’s
finite-field multipliers to the best of our knowledge. For qualitative comparison to verify that the
overheads incurred are acceptable, let us go over some case studies. Subramanian et al. [19]
presented a signature-based fault diagnosis for cryptographic block Ciphers LED and HIGHT,
obtaining a combined area and delay overhead of 21.9% and 31.9% for LED and HIGHT,
respectively. Additionally, Mozaffari-Kermani et al. [6] have presented the fault diagnosis of
Pomaranch cipher, obtaining a combined area and throughput overhead of 35.5%. The proposed
schemes in this brief have combined area and delay overheads of less than 32% (worst case
scenario). In [7], the worst case area overhead obtained by applying error-detection schemes of
The worst case area overhead of [8] and [9] is more than 33% with a performance degradation of
more than 14% when fault-detection architectures are applied to stateless hash-based signatures.
These and similar prior works on classical cryptography verify that the proposed error-detection
architectures obtain similar overheads compared to other works on fault detection, achieving an
acceptable overhead. These degradations are acceptable for providing error detection to the
original architectures which lack such capability to thwart natural or malicious faults.
CHAPTER 4
XILINX Software
Xilinx Tools is a suite of software tools used for the design of digital circuits implemented using Xilinx Field
Programmable Gate Array (FPGA) or Complex Programmable Logic Device (CPLD). The design
procedure consists of (a) design entry, (b) synthesis and implementation of the design, (c) functional simulation
and (d) testing and verification. Digital designs can be entered in various ways using the above CAD tools:
using a schematic entry tool, using a hardware description language (HDL) – Verilog or VHDL or a
combination of both. In this lab we will only use the design flow that involves the use of VerilogHDL.
The CAD tools enable you to design combinational and sequential circuits starting with Verilog HDL design
specifications. The steps of this design procedure are listed below:
A Verilog input file in the Xilinx software environment consists of the following segments:
End: endmodule
All your designs for this lab must be specified in the above Verilog input format. Note that the
state diagram segment does not exist for combinational logic designs.
In this lab digital designs will be implemented in the Basys2 board which has a Xilinx Spartan3E
–XC3S250E FPGA with CP132 package. This FPGA part belongs to the Spartan family of FPGAs. These
devices come in a variety of packages. We will be using devices that are packaged in 132 pin package with the
following part number: XC3S250E-CP132. This FPGA is a device with about 50K gates. Detailed information
on this device is available at the Xilinx website.
3. Creating a NewProject
Xilinx Tools can be started by clicking on the Project Navigator Icon on the Windows desktop. This should
open up the Project Navigator window on your screen. This window shows (see Figure 1) the last accessed
project.
Figure 1: Xilinx Project Navigator window (snapshot from Xilinx ISE software)
Select File->New Project to create a new project. This will bring up a new project window (Figure 2) on the
desktop. Fill up the necessary entries as follows:
Figure 2: New Project Initiation window (snapshot from Xilinx ISE software)
Project Location: The directory where you want to store the new project (Note: DO
NOT specify the project location as a folder on Desktop or a folder in the Xilinx\bin
directory. Your H: drive is the best place to put it. The project location path is NOT to
have any spaces in it eg: C:\Nivash\TA\new lab\sample exercises\o_gate is NOT to be
used)
Example: If the project name were “o_gate”, enter “o_gate” as the project name and then click “Next”.
Clicking on NEXT should bring up the following window:
Figure 3: Device and Design Flow of Project (snapshot from Xilinx ISE software)
For each of the properties given below, click on the ‘value’ area and select from the list of values that
appear.
All project files such as schematics, netlists, Verilog files, VHDL files, etc., will be stored in a subdirectory
with the project name. A project can only have one top level HDL source file (or schematic). Modules can be
added to the project to create a modular, hierarchical design (see Section 9).
In order to open an existing project in Xilinx Tools, select File->Open Project to show the list of projects on
the machine. Choose the project you want and click OK.
Figure 4: Create New source window (snapshot from Xilinx ISE software)
In this lab we will enter a design using a structural or RTL description using the Verilog HDL. You can create a
Verilog HDL input file (.v file) using the HDL Editor available in the Xilinx ISE Tools (or any text editor).
Select Verilog Module and in the “File Name:” area, enter the name of the Verilog source file you are going to
create. Also make sure that the option Add to project is selected so that the source need not be added to the
project again. Then click on Next to accept the entries. This pops up the following window (Figure 5).
Figure 6: Define Verilog Source window (snapshot from Xilinx ISE software)
In the Port Name column, enter the names of all input and output pins and specify the Direction
accordingly. A Vector/Bus can be defined by entering appropriate bit numbers in the MSB/LSB
columns. Then click on Next> to get a window showing all the new source information (Figure
6). If any changes are to be made, just click on <Back to go back and make changes. If
everything is acceptable, click on Finish > Next > Next > Finish tocontinue.
If a source has to be removed, just right click on the source file in the Sources in Project
window in the Project Navigator and select Removein that. Then select Project -> Delete
Implementation Data from the Project Navigator menu bar to remove any relatedfiles.
The source file will now be displayed in the Project Navigator window (Figure 8). The source
filewindowcanbeusedasatexteditortomakeanynecessarychangestothesourcefile.All
The input/output pins will be displayed. Save your Verilog program periodically by selecting the
File->Save from the menu. You can also edit Verilog programs in any text editor and add them
Figure 8: Verilog Source code editor window in the Project Navigator (from Xilinx ISE
software)
A brief Verilog Tutorial is available in Appendix-A. Hence, the language syntax and construction of
logic equations can be referred to Appendix-A.
The Verilog source code template generated shows the module name, the list of ports and also the
declarations (input/output) for each port. Combinational logic code can be added to the verilog code
after the declarations and before the endmodule line.
For example, an output z in an OR gate with inputs a and b can be described as, assign z =
a | b;
Remember that the names are case sensitive.
A given logic function can be modeled in many ways in verilog. Here is another
example in which the logic function, is implemented as a truth table using a case statement:
moduleor_gat
e(a,b,z); input
a;
inp
ut
b;
out
put
z;
reg z;
always
@(a or b)
begin
case
({a,b})
00: z
=1'b0;
01: z =1'b1;
10: z =1'b1;
11: z =1'b1;
endcase
end
e
ndmo
dule
Suppose we want to describe an OR gate. It can be done using the logic equation as shown in Figure 9a or
using the case statement (describing the truth table) as shown in Figure 9b. These are just two example
constructs to design a logic function. Verilog offers numerous such constructs to efficiently model designs. A
brief tutorial of Verilog is available in Appendix-A.
Figure 9: OR gate description using assign statement (snapshot from Xilinx ISE
software)
Figure 10: OR gate description using case statement (from Xilinx ISE software)
The design has to be synthesized and implemented before it can be checked for correctness, by running
functional simulation or downloaded onto the prototyping board. With the top-level Verilog file opened (can be
done by double-clicking that file) in the HDL editor window in the right half of the Project Navigator, and the
view of the project being in the Module view , the implement design option can be seen in the process view.
Design entry utilities and Generate Programming File options can also be seen in the process view. The former
can be used to include user constraints, if any and the latter will be discussed later.
To synthesize the design, double click on the Synthesize Design option in the Processes
window.
To implement the design, double click the Implement design option in the Processes window. It will go
through steps like Translate, Map and Place & Route. If any of these steps could not be done or done with
errors, it will place a X mark in front of that, otherwise a tick mark will be placed after each of them to indicate
the successful completion. If everything is done successfully, a tick mark will be placed before the Implement
Design option. If thereare
warnings, one can see mark in front of the option indicating that there are some warnings.
One can look at the warnings or errors in the Console window present at the bottom of the
Navigator window. Every time the design file is saved; all these marks disappear asking for a
freshcompilation.
Figure 11: Implementing the Design (snapshot from Xilinx ISE software)
The schematic diagram of the synthesized verilog code can be viewed by double clicking View RTL Schematic
under Synthesize-XST menu in the Process Window. This would be a handy way to debug the code if the
output is not meeting our specifications in the proto type board.
By double clicking it opens the top level module showing only input(s) and output(s) as shown below.
Figure 13: Realized logic by the XilinxISE for the verilog code
To check the functionality of a design, we have to apply test vectors and simulate the circuit. In order to
apply test vectors, a test bench file is written. Essentially it will supply all the inputs to the module designed
and will check the outputs of the module. Example: For the 2 input OR Gate, the steps to generate the test
bench is as follows:
In the Sources window (top left corner) right click on the file that you want to generate
the test bench for and select ‘New Source’
Provide a name for the test bench in the file name text box and select ‘Verilog test fixture’ among the
file types in the list on the right side as shown in figure 11.
Figure 14: Adding test vectors to the design (snapshot from Xilinx ISE software)
Click on ‘Next’ to proceed. In the next window select the source file with which you want to associate the
test bench.
Figure 15: Associating a module to a testbench (snapshot from Xilinx ISE software)
Click on Next to proceed. In the next window click on Finish. You will now be provided with a template
for your test bench. If it does not open automatically click the radio button next to Simulation .
You should now be able to view your test bench template. The code generated would be something like this:
moduleo_gate_tb_v;
//
Inp
uts
reg
a;
reg b;
//
Outp
uts
wire
z;
.a(a),
.b(b),
.z(z)
);
initialbegin
// Initialize
Inputs a =
0;
b =0;
end
endmodule
The Xilinx tool detects the inputs and outputs of the module that you are going to test an assigns them initial values.
In order to test the gate completely we shall provide all the different input combinations. ‘#100’ is the time delay
for which the input has to maintain the current value. After 100 units of time have elapsed the next set of values
can be assign to the inputs.
Complete the test bench as shown below:
moduleo_gate_tb_v;
//
Inp
uts
reg
a;
reg
b;
//
Outp
uts
wire
z;
.a(a),
.b(b),
.z(z)
);
initialbegin
// Initialize
Inputs a =
0;
b =0;
a = 0;
b =1;
a = 1;
b =0;
a = 1;
b =1;
// Wait 100 ns for global
reset tofinish #100;
end
endmodule
Now under the Processes window (making sure that the testbench file in the Sources
window is selected) expand the ModelSim simulator Tab by clicking on the add sign next
to it. Double Click on Simulate Behavioral Model. You will probably receive a complier
error. This is nothing to worry about – answer “No” when asked if you wish to abort simulation.
This should cause ModelSim to open. Wait for it to complete execution. If you wish to not receive
the compiler error, right click on Simulate Behavioral Model and select process properties. Mark
the
To save the simulation results, Go to the waveform window of the Modelsim simulator, Click on File -> Print
to Postscript -> give desired filename and location.
Else a normal print screen option can be used on the waveform window and subsequently stored in Paint.
In this work, we have derived error-detection schemes for the finite-field multipliers used in
postquantum cryptographic algorithms such as Luov, noting that the proposed error-detection
schemes can be adapted to other applications and cryptographic algorithms whose building
blocks need finite-field multiplications. The error-detection architectures proposed in this work
are based on CRC-5 signatures and we have performed software implementations for the sake of
verification. Additionally, we have explored and studied both primitive and standardized
generator polynomials for CRC-5, comparing the complexity for each of them. We have
embedded the proposed error-detection schemes into the original finite-field multipliers of the