Fast Modular Multipliers For Supersingular Isogeny-Based Post-Quantum Cryptography
Fast Modular Multipliers For Supersingular Isogeny-Based Post-Quantum Cryptography
Abstract— As one of the postquantum protocol candidates, the post-quantum cryptography (PQC) protocols. For example, the
supersingular isogeny key encapsulation (SIKE) protocol delivers call for proposals for PQC standards hosted by the National
promising public and secret key sizes over other candidates. Institute of Standards and Technology (NIST) [6] is driven by
Nevertheless, the considerable computations form the bottleneck
and limit its practical applications. The modular multiplication this demand.
operations occupy a large proportion of the overall computations The supersingular isogeny key encapsulation (SIKE) proto-
required by the SIKE protocol. The VLSI implementation of col [7] has advanced to the third round of the PQC standard-
the high-speed modular multiplier remains a big challenge. ization process in July 2020 after submitted to the NIST in
In this article, we propose three improved modular multiplication November 2017. The possible reason is that it is the only one
algorithms based on an unconventional radix for this protocol,
all of which cost about 20% fewer computations than the prior that is similar to the classical ECC having very small public
art. Besides, a multiprecision scheme is also introduced for the and secret keys and owning perfect forward secrecy. Recently,
proposed algorithms to improve the scalability in hardware it has been proven in [8] and [9] that the security estimations
implementation, resulting in three new algorithms. We then in the SIKE proposal were extremely conservative both in
present very efficient high-speed constant-time modular mul- quantum and classical situations. In other words, smaller key
tiplier architectures for the six algorithms. It is shown that
these new architectures can be extensively pipelined and highly sizes can be used for the security levels presented in [7].
optimized to obtain high throughput and low latency. The field- The SIKE is a variant of the supersingular isogeny Diffie–
programmable gate array (FPGA) implementation results show Hellman (SIDH) key exchange protocol, applied with the key
that all proposed multipliers achieve much higher throughput encapsulation mechanism [10] to obtain the indistinguishabil-
than previous designs, but the increase in resources is relatively ity under chosen-ciphertext attack (IND-CCA) [11]. The SIDH
small. In addition, the multipliers without the multiprecision
scheme have very low latency, which is very friendly to high- was first introduced by Jao and De Feo [12] in 2011 to resist
speed applications of the SIKE protocol. the quantum attack based on the difficulty of finding isogenies
between supersingular elliptic curves. The zero-knowledge
Index Terms— Field-programmable gate array (FPGA), mod-
ular multiplication, post-quantum cryptography (PQC), super- identification scheme was proposed based on this protocol
singular isogeny Diffie–Hellman (SIDH) key exchange, VLSI. in [13]. Jao and Soukharev [14] presented the undeniable
signatures based on the SIDH. Azarderakhsh et al. [15] pro-
I. I NTRODUCTION vided a key compression method that greatly reduces the cost
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY MADRAS. Downloaded on November 14,2024 at 09:57:47 UTC from IEEE Xplore. Restrictions apply.
360 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 29, NO. 2, FEBRUARY 2021
the original ARM design. Jalali et al. [20] implemented the Meanwhile, we also develop their multiprecision versions
optimized field arithmetic operations on ARM for SIKE and with a clever interleaving scheme and present three other new
commutative SIDH (CSIDH) in [21], respectively. Indeed, algorithms to improve the scalability and reduce the resource
much progress has been made to speedup the SIKE protocol consumption in hardware implementation.
and make it more practical. However, these implementations Moreover, we have devised new constant-time architec-
for the SIKE still suffer more than one order of magnitude tures for the proposed algorithms with fully parallelizing and
slower speed than those for most of the other candidates. interleaving schedules that enable us to mostly reduce the
Notice that the smooth isogeny modulus for SIDH has the required clock cycles and highly optimize each submodule.
form of p = f · a x b y ± 1, where a and b are small primes, We have also coded the proposed architectures with the Verilog
x and y are positive integers, and f is a small cofactor to language and implemented them on FPGA. The implementa-
make p prime. To simplify the modular operations, especially tion results show that, compared with the new-type modular
for the modular multiplication, the parameter a is usually set multipliers based on the unconventional radix, the designs
to 2. The form of p then becomes f · 2x b y ± 1. The EFFM without the multiprecision scheme have about 60 times faster
in [23] constrains the p with the form of 2 · 2x b y − 1 = throughput than the fastest design by introducing a relatively
2R 2 − 1, where x and y must be even, and R = 2x/2 b y/2 . small portion of extra hardware resources. When applying
The input operands are transformed into quadratic polynomials the multiprecision scheme, these designs achieve significant
based on the unconventional radix R. Compared with the reductions on total resource consumptions at the cost of
conventional Montgomery modular reduction algorithm [24], slower throughput compared with their original versions while
the new method can reduce about half of the complexity still being much better than prior arts. Compared with the
of the multiplications at the cost of some additions. The eightfold high-radix Montgomery modular multipliers [11], the
FFM1 in [18] reduces the coefficients of EFFM from three redesigned IFFM− multiplier with comparable frequency still
to two by using an extra mapping function for the input achieves more than one order of magnitude faster throughput.
and output, which could efficiently discard the precomputed The rest of this article is organized as follows. Section II
constant without any increase in complexity. The FFM2 in [18] gives a brief review of general modular multiplication
extends the searching scope of the prime with the form of algorithms and efficient modular multiplication algorithms for
f · 2x b y ± 1 at the expense of more computations. It should be a specific form of prime. The three new algorithms and the
pointed out that a good prime could more possibly be found complexity comparisons with previous works are presented
with a larger searching scope, which could also help increase in Section III, and their multiprecision versions are also
the efficiency of the algorithm. Therefore, it is important to proposed in this section. In Section IV, the architectures for
develop efficient modular multiplication algorithms with loose all the proposed algorithms are devised. The results of FPGA
constraints for the prime. implementations are introduced in Section V. Section VI
In this article, we propose three new modular multipli- concludes this article. The proofs of parameters used in the
cation algorithms for different forms of prime based on proposed algorithms and the prime searching are appended
an unconventional radix adopted in [18], [23], and [25], following the conclusion.
and all of them have lower computational complexity than
previous algorithms. We aim to extend the previous prime II. BACKGROUND
used in [23] into prime with form of f · 2x b y ± 1 where The multiplication for elliptic curve cryptography is based
f ∈ {1, 2} and x and y are even. The prime can be split on finite fields, called modular multiplication, requiring the
into three forms: 2 · 2x b y − 1, 2 · 2x b y + 1, and 2x b y + 1, modular reduction after the multiplication operation. In the
with x and y being even. Accordingly, the corresponding following, we will first introduce the Montgomery reduction,
new algorithms are developed, named IFFM− , IFFMo+ , and the Barrett reduction (BR), and the efficient BR for the mod-
IFFMe+ , respectively. We use R = 2x/2 b y/2 as the uncon- ulus of 2x b y . Then, several efficient modular multiplications
ventional radix for the proposed algorithms. For the IFFM− for the SIDH will be presented.
algorithm, the usage of the radix is almost the same as before,
which has been preliminarily presented in our conference A. Modular Reduction Algorithms
paper [26]. For the IFFMo+ and IFFMe+ , we use the radix 1) Montgomery Reduction: The main idea of the Mont-
R = 2 x/2 b y/2 to reduce the complexity by expanding the gomery reduction [24] is to replace the ordinary modulus
range of the constant coefficient of a quadratic polynomial by a power of two so that the modular reduction operation
for the first time. A detailed discussion can be found in is inexpensive to handle in hardware implementation. The
Section III-B. The reduction and multiplication of the three detailed process is shown in Algorithm 1. The modulus
proposed algorithms are optimized, reducing the computa- p is an arbitrary number, which is less than R (equal to
tional complexity by about 20%. It should be pointed out 2 N ). The parameter (− p−1 ) mod R is precomputed and
that, although the new modular multiplication algorithms are saved. As taking integers modulo R is very easy, we will
very efficient, they are not applied to all the primes presented not take this kind of computations into consideration in the
in the SIKE protocol. In fact, they can only be used by the following evaluation. It can be found that the complexity
SIKEp610. We have made a brute-force search in Appendix B is only related to the bit width of the modulus p. This
to find good replacements for the currently considered SIKE algorithm requires two N × N multiplications: one 2N + 2N
primes. and one N + N adders. Note that the output remainder is
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY MADRAS. Downloaded on November 14,2024 at 09:57:47 UTC from IEEE Xplore. Restrictions apply.
TIAN et al.: FAST MODULAR MULTIPLIERS FOR SUPERSINGULAR ISOGENY-BASED POST-QUANTUM CRYPTOGRAPHY 361
Algorithm 2: BR [31]
Algorithm 4: EFFM Modular Multiplication Proposed
in [23]
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY MADRAS. Downloaded on November 14,2024 at 09:57:47 UTC from IEEE Xplore. Restrictions apply.
362 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 29, NO. 2, FEBRUARY 2021
the multiplication and reduction. The key idea is to replace Algorithm 5: IBR for Hardware Efficiency
a large modulus p with a relatively small modulus R. The
two parameters satisfy the formulas: p = 2 · 22x b2y − 1 and
R = 2x b y . An input A, which is a field element in F p ,
is expressed in quadratic polynomial as
A = a2 R 2 + a1 R + a0 (1)
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY MADRAS. Downloaded on November 14,2024 at 09:57:47 UTC from IEEE Xplore. Restrictions apply.
TIAN et al.: FAST MODULAR MULTIPLIERS FOR SUPERSINGULAR ISOGENY-BASED POST-QUANTUM CRYPTOGRAPHY 363
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY MADRAS. Downloaded on November 14,2024 at 09:57:47 UTC from IEEE Xplore. Restrictions apply.
364 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 29, NO. 2, FEBRUARY 2021
Algorithm 7: Proposed IFFMe+ for p = R 2 + 1 Algorithm 8: post_ pr oc Functions for the Proposed
Algorithms
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY MADRAS. Downloaded on November 14,2024 at 09:57:47 UTC from IEEE Xplore. Restrictions apply.
TIAN et al.: FAST MODULAR MULTIPLIERS FOR SUPERSINGULAR ISOGENY-BASED POST-QUANTUM CRYPTOGRAPHY 365
TABLE I
E STIMATIONS OF THE N ORMALIZED N UMBERS OF N + N A DDITIONS AND N × N M ULTIPLICATIONS FOR D IFFERENT
M ODULAR M ULTIPLICATION A LGORITHMS
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY MADRAS. Downloaded on November 14,2024 at 09:57:47 UTC from IEEE Xplore. Restrictions apply.
366 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 29, NO. 2, FEBRUARY 2021
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY MADRAS. Downloaded on November 14,2024 at 09:57:47 UTC from IEEE Xplore. Restrictions apply.
TIAN et al.: FAST MODULAR MULTIPLIERS FOR SUPERSINGULAR ISOGENY-BASED POST-QUANTUM CRYPTOGRAPHY 367
Fig. 8. Example: a two-level Karatsuba decomposition for a multiplier, where the bit width of the operands is assumed as N and a multiple of 4.
m1
= m2 ± + (2c0 ± c2 ) · 2k−1 TABLE II
2 C ALCULATION OF I NSERTED P IPELINE S TAGES m, D IGITS n, AND
where the plus sign is for Multi-IFFM− and minus sign for L ATENCY FOR THE P ROPOSED A LGORITHMS ON FPGA
Multi-IFFMo+ . It can be seen from the bold terms that the
FeedB module for them is made up of one 1-bit left shifter,
one adder, one (k −1)-bit left shifter, and one k-bit left shifter.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY MADRAS. Downloaded on November 14,2024 at 09:57:47 UTC from IEEE Xplore. Restrictions apply.
TIAN et al.: FAST MODULAR MULTIPLIERS FOR SUPERSINGULAR ISOGENY-BASED POST-QUANTUM CRYPTOGRAPHY 369
TABLE III
C OMPARISONS OF M ODULAR M ULTIPLIERS BASED ON U NCONVENTIONAL R ADIX FOR L EVEL -5 NIST S ECURITY I MPLEMENTING ON FPGA
TABLE IV
C OMPARISONS W ITH THE H IGH -R ADIX M ONTGOMERY-BASED M ODULAR M ULTIPLIERS ON FPGA
through simulation. For the redesigned IFFM− , we achieved in proportion to the scaling factor of the CMOS technology.
156.3 MHz by inserting 23 extra CCs, about 2.6× faster It means that the throughput will be almost doubled by
speed, and 2.3× more CCs compared with the original IFFM− using the Virtex 7 device. Apparently, our design still greatly
multiplier. It should be noted that the number of DSPs in our outperforms the previous two works in terms of efficiency.
new design is reduced by using the left shifters and adders for
the constant multipliers in the multiplying step (as shown in VI. C ONCLUSION
Fig. 8), with further growth in LUTs and Slices. It can be seen In this article, we have proposed three low-complexity
that our design achieves about 11× higher throughput with modular multiplication algorithms called IFFM− , IFFMo+ ,
only 28% latency compared with that of [11]. The increases IFFMe+ , and their corresponding multiprecision versions for
in FFs, LUTs, Slices, and DSPs are only about 3.3×, 5.4×, the SIKE protocol. Six new constant-time architectures were
4.5×, and 1.3×, respectively. In this way, we can claim that presented based on these algorithms. By incorporating the
our design is more promising than the previous one in [11], smart formula transformation, novel architectural optimization,
[16], and [17]. and maximum interleaving processing, the proposed designs
Considering the other two efficient high-radix Montgomery- demonstrate significant advantages over conventional ones. For
based multipliers, the radix of [27] is fixed, equal to 24 , high-speed applications, the primitive multipliers are very suit-
and that used in [28] is a variable, whose multiplicand is able thanks to the low latency. For embedded applications, the
converted from the binary representation to its canonical multiprecision versions are good alternatives. We believe that
representation [33], to make several (1 ∼ 3) bits be processed these achievements will greatly contribute to the practicability
in one CC. Since the additions form the bottleneck in the of this protocol.
small-radix multipliers in [27] and [28], both critical paths
are fully optimized with carry-save adders, which enable A PPENDIX
achieving higher frequencies than ours. However, the adopted
A. Deduction for the Range of c0o for the Proposed
partitioning strategies in [27] and [28] also lead to long
Algorithms
latency, which basically grows linearly with the bit width
of the modulus. From the results of the 1024- and 512- 1) For the IFFM− : The deduction is based on the assump-
bit multipliers in [27] or [28], we can see that, when the tion that the input numbers A and B are the field elements
frequency is fixed, the throughput is almost unchanged. Based in F p , where p = 2R 2 − 1. According to Algorithm 6, after
on (16), we can conclude that the throughputs of the two mapping, the coefficients of A and B are still in the normal
works are unchanged when the modulus is equal or close to range. After the first tentative computing in Step 5, we can
the one used by us. In addition, the resource consumption compute the ranges of the coefficients of C with the upper
also grows almost linearly with the bit width. Therefore, our and lower limits as
design can obtain more than 100× higher throughput than 3
c2 ∈ {0, 1}, 0 ≤ c1 ≤ 2(R − 1)2 , 0 ≤ c0 ≤ (R − 1)2 .
both of them, while the increase in slices is only about ten 2
times. If the designs in [27] and [28] are implemented on a With the first IBR function, c0 is decomposed as q0 and r0
Virtex 7 device, the clock frequency will be increased roughly in Step 6, whose ranges are [0, (3/2)R − 3] and [0, R − 1],
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY MADRAS. Downloaded on November 14,2024 at 09:57:47 UTC from IEEE Xplore. Restrictions apply.
370 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 29, NO. 2, FEBRUARY 2021
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY MADRAS. Downloaded on November 14,2024 at 09:57:47 UTC from IEEE Xplore. Restrictions apply.
TIAN et al.: FAST MODULAR MULTIPLIERS FOR SUPERSINGULAR ISOGENY-BASED POST-QUANTUM CRYPTOGRAPHY 371
[7] R. Azarderakhsh et al. (2017). Supersingular Isogeny Key Encapsula- [29] A. Rezai and P. Keshavarzi, “High-performance scalable architecture for
tion. Submission to the NIST Post-Quantum Standardization Project. modular multiplication using a new digit-serial computation,” Microelec-
[Online]. Available: https://sike.org/ tron. J., vol. 55, pp. 169–178, Sep. 2016.
[8] G. Adj, D. Cervantes-Vázquez, J.-J. Chi-Domínguez, A. Menezes, and [30] A. Rezai and P. Keshavarzi, “Compact SD: A new encoding algorithm
F. Rodríguez-Henríquez, “On the cost of computing isogenies between and its application in multiplication,” Int. J. Comput. Math., vol. 94,
supersingular elliptic curves,” in Proc. Int. Conf. Sel. Areas Cryptogr. no. 3, pp. 554–569, Mar. 2017.
Cham, Switzerland: Springer, 2018, pp. 322–343. [31] P. Barrett, “Implementing the Rivest Shamir and Adleman public key
[9] S. Jaques and J. M. Schanck, “Quantum cryptanalysis in the RAM encryption algorithm on a standard digital signal processor,” in Proc.
model: Claw-finding attacks on sike,” IACR Cryptol. ePrint Arch., Conf. Theory Appl. Cryptograph. Techn. Berlin, Germany: Springer,
vol. 2019, p. 103, Aug. 2019. 1986, pp. 311–323.
[10] D. Hofheinz, K. Hövelmanns, and E. Kiltz, “A modular analysis of [32] A. A. Karatsuba and Y. P. Ofman, “Multiplication of many-digital
the Fujisaki-Okamoto transformation,” in Proc. Theory Cryptogr. Conf. numbers by automatic computers,” in Doklady Akademii Nauk,
Cham, Switzerland: Springer, 2017, pp. 341–371. vol. 145, no. 2. Moscow, Russia: Russian Academy of Sciences, 1962,
[11] D. Jao et al. (2020). PQCrypto-SIDH. Submission to the NIST Post- pp. 293–294.
Quantum Standardization Project. [Online] https://github.com/Microsoft/ [33] G. W. Reitwiesner, “Binary arithmetic,” in Advances in Computers,
PQCrypto-SIDH vol. 1. Amsterdam, The Netherlands: Elsevier, 1960, pp. 231–308.
[12] D. Jao and L. De Feo, “Towards quantum-resistant cryptosystems Jing Tian (Student Member, IEEE) received the
from supersingular elliptic curve isogenies,” in Proc. Int. Workshop B.S. degree in microelectronics and the Ph.D. degree
Post-Quantum Cryptogr. Berlin, Germany: Springer, 2011, in information and communication engineering from
pp. 19–34. Nanjing University, Nanjing, China, in 2015 and
[13] L. De Feo, D. Jao, and J. Plût, “Towards quantum-resistant cryptosys- 2020, respectively.
tems from supersingular elliptic curve isogenies,” J. Math. Cryptol., She is currently an Associate Researcher with
vol. 8, no. 3, pp. 209–247, Jan. 2014. Nanjing University. Her research interests include
[14] D. Jao and V. Soukharev, “Isogeny-based quantum-resistant undeniable VLSI design for digital signal processing and cryp-
signatures,” in Proc. Int. Workshop Post-Quantum Cryptogr. Cham, tographic engineering.
Switzerland: Springer, 2014, pp. 160–179.
[15] R. Azarderakhsh, D. Jao, K. Kalach, B. Koziel, and C. Leonardi, “Key
compression for isogeny-based cryptosystems,” in Proc. 3rd ACM Int. Jun Lin (Senior Member, IEEE) received the B.S.
Workshop ASIA Public-Key Cryptogr., 2016, pp. 1–10. degree in physics and the M.S. degree in micro-
[16] B. Koziel, R. Azarderakhsh, M. Mozaffari Kermani, and D. Jao, “Post- electronics from Nanjing University, Nanjing, China,
quantum cryptography on FPGA based on isogenies on elliptic curves,” in 2007 and 2010, respectively, and the Ph.D. degree
IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 64, no. 1, pp. 86–99, in electrical engineering from Lehigh University,
Jan. 2017. Bethlehem, PA, USA, in 2015.
From 2010 to 2011, he was an ASIC Design
[17] B. Koziel, R. Azarderakhsh, and M. M. Kermani, “A high-performance
Engineer with AMD, Shanghai, China. In summer
and scalable hardware architecture for isogeny-based cryptography,”
2013, he was an Intern with Qualcomm Research,
IEEE Trans. Comput., vol. 67, no. 11, pp. 1594–1609, Nov. 2018.
Bridgewater, NJ, USA. In June 2015, he joined
[18] W. Liu, J. Ni, Z. Liu, C. Liu, and M. O’Neill, “Optimized modular the School of Electronic Science and Engineering,
multiplication for supersingular isogeny Diffie-Hellman,” IEEE Trans. Nanjing University, where he is currently an Associate Professor. His current
Comput., vol. 68, no. 8, pp. 1249–1255, Aug. 2019. research interests include low-power high-speed VLSI design, specifically
[19] H. Seo, Z. Liu, P. Longa, and Z. Hu, “SIDH on ARM: Faster VLSI design for digital signal processing and cryptography.
modular multiplications for faster post-quantum supersingular isogeny Dr. Lin is also a member of the Design and Implementation of Sig-
key exchange,” IACR Trans. Cryptograph. Hardw. Embedded Syst., nal Processing Systems (DISPS) Technical Committee of the IEEE Signal
vol. 2018, pp. 1–20, Aug. 2018. Processing Society. He was a co-recipient of the Merit Student Paper Award
[20] A. Jalali, R. Azarderakhsh, and M. M. Kermani, “NEON SIKE: Super- at the IEEE Asia Pacific Conference on Circuits and Systems in 2008. He was
singular isogeny key encapsulation on ARMv7,” in Proc. Int. Conf. a recipient of the 2014 IEEE Circuits & Systems Society (CAS) Student Travel
Secur., Privacy, Appl. Cryptogr. Eng. Cham, Switzerland: Springer, 2018, Award.
pp. 37–51.
[21] A. Jalali, R. Azarderakhsh, M. M. Kermani, and D. Jao, “Towards Zhongfeng Wang (Fellow, IEEE) received the B.S.
optimized and constant-time CSIDH on embedded devices,” in Proc. and M.S. degrees from Tsinghua University, Beijing,
Int. Workshop Constructive Side-Channel Anal. Secure Design. Cham, China, in 1988 and 1990, respectively, and the Ph.D.
Switzerland: Springer, 2019, pp. 215–231. degree from the University of Minnesota, Minneapo-
[22] T. Blum and C. Paar, “High-radix montgomery modular exponentiation lis, MN, USA, in 2000.
on reconfigurable hardware,” IEEE Trans. Comput., vol. 50, no. 7, He was a Leading VLSI Architect with Broadcom
pp. 759–764, Jul. 2001. Corporation, Irvine, CA, USA, from 2007 to 2016.
He was with Oregon State University, Corvallis,
[23] A. Karmakar, S. S. Roy, F. Vercauteren, and I. Verbauwhede, “Efficient
OR, USA, and National Semiconductor Corporation,
finite field multiplication for isogeny based post quantum cryptography,”
Santa Clara, CA, USA. He has been a Distinguished
in Proc. Int. Workshop Arithmetic Finite Fields. Cham, Switzerland:
Professor with Nanjing University, Nanjing, China,
Springer, 2016, pp. 193–207.
since 2016. He is a world-recognized expert on low-power high-speed VLSI
[24] P. L. Montgomery, “Modular multiplication without trial division,” Math. design for signal processing systems. He has published over 200 technical
Comput., vol. 44, no. 170, pp. 519–521, Apr. 1985. articles with multiple best paper awards received from the IEEE technical
[25] J. W. Bos and S. J. Friedberger, “Arithmetic considerations for isogeny- societies, among which is the VLSI Transactions Best Paper Award of 2007.
based cryptography,” IEEE Trans. Comput., vol. 68, no. 7, pp. 979–990, He has edited one book VLSI. He holds more than 20 U.S. and China patents.
Jul. 2019. His current research interests are in the areas of optimized VLSI design for
[26] J. Tian, J. Lin, and Z. Wang, “Ultra-fast modular multiplication digital communications and deep learning.
implementation for isogeny-based post-quantum cryptography,” in Dr. Wang was elevated as a Fellow of IEEE for contributions to VLSI
Proc. IEEE Int. Workshop Signal Process. Syst. (SiPS), Oct. 2019, design and implementation of forward error correction (FEC) coding in 2015.
pp. 97–102. He has served as a TPC member and various chairs for tens of international
[27] G. D. Sutter, J.-P. Deschamps, and J. L. Imana, “Modular multiplication conferences. In the current record, he has had many papers ranking among the
and exponentiation architectures for fast RSA cryptosystem based on top-25 most (annually) downloaded manuscripts in the IEEE T RANSACTIONS
digit serial computation,” IEEE Trans. Ind. Electron., vol. 58, no. 7, ON V ERY L ARGE S CALE I NTEGRATION (VLSI) S YSTEMS . He has served
pp. 3101–3109, Jul. 2011. as an Associate Editor for the IEEE T RANSACTIONS ON C IRCUITS AND
[28] A. Rezai and P. Keshavarzi, “High-throughput modular multiplication S YSTEMS (CAS) I, T-CAS-II, and TVLSI for many terms. Moreover, he has
and exponentiation algorithms using multibit-scan–multibit-shift tech- contributed significantly to the industrial standards. So far, his technical
nique,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 23, no. 9, proposals have been adopted by more than 15 international networking
pp. 1710–1719, Sep. 2015. standards.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY MADRAS. Downloaded on November 14,2024 at 09:57:47 UTC from IEEE Xplore. Restrictions apply.