Roetteler - 2017 - Tools for Quantum and Reversible Circuit Compilation
Roetteler - 2017 - Tools for Quantum and Reversible Circuit Compilation
Circuit Compilation
Martin Roetteler(B)
1 Introduction
The compilation of quantum algorithms into sequences of instructions that a
quantum computer can execute requires a multi-stage framework. This frame-
work needs to be capable of taking higher level descriptions of quantum programs
and successively breaking them down into lower level net-lists of circuits until
ultimately pulse sequences are obtained that a physical machine can apply. Inde-
pendent of the concrete realization of the compilation method, one of the key
steps is to implement subroutines1 over the given target instruction set. As often
the underlying problem is a classical problem in that the problem specification
involves classical data (such as finding the period of a function or searching an
assignment that satisfies a given Boolean predicate), the question arises how
1
In quantum computing literature, such subroutines are often implementing “oracles”.
c Springer International Publishing AG 2017
I. Phillips and H. Rahaman (Eds.): RC 2017, LNCS 10301, pp. 3–16, 2017.
DOI: 10.1007/978-3-319-59936-6 1
4 M. Roetteler
Fig. 1. F# program that implements a carry ripple adder using a for-loop and main-
taining a running carry.
It should be noted that possibly the overall compilation can fail, namely in case
the given target strategy cannot be implemented using the given upper bound
on the number of available qubits.
(a) MDD for h before cleanup (b) MDD for h after eager cleanup
Fig. 2. Shown in (a) is the mutable data dependency graph (MDD) for the function
h(a, b, c, d) = f (a, b, c) ⊕ f (b, c, d) where f (a, b, c) = a(b&c). Shown in (b) is the MDD
that results in applying Eager cleanup (as described in [34]) to the MDD in (a). Shown
in (c) is the final circuit that REVS emits based on the MDD in (b). Qubits that are
initially clean are shown as , qubits that terminate in a clean state are shown as .
Overall, the circuit uses a total of 7 qubits to compute the function h. This should
be compared with applying the Bennett cleanup which would result in a much larger
number of qubits, namely 11.
8 M. Roetteler
Ch(E, F, G) := (E ∧ F ) ⊕ (¬E ∧ G)
M a(A, B, C) := (A ∧ B) ⊕ (A ∧ C) ⊕ (B ∧ C)
Σ0 (A) := (A ≫ 2) ⊕ (A ≫ 13) ⊕ (A ≫ 22)
Σ1 (E) := (E ≫ 6) ⊕ (E ≫ 11) ⊕ (E ≫ 25).
For a given round, the values of all these functions is computed and con-
sidered to be 32 bit integers. Further, a constant 32 integer value Ki is
obtained from a table lookup which depends on the number i of the given
round, where i ∈ {0, . . . , 63} and finally the next chunk of the message Wi
is obtained from the message after performing a suitable message expansion
is performed as specified in the standard. Finally, H is replaced according to
H ← H + Ch(E, F, G) + M a(A, B, C) + Σ0 (A) + Σ1 (E) + Ki + Wi and then the
cyclic permutation A ← H, B ← A, . . . , H ← G is performed. The implementa-
tion of the entire round function for a given number of rounds n was presented
in [34] using the Revs high-level language.
To test the performance of the Revs compiler, in [34] we hand-optimized an
implementation of SHA-256. This circuit contains 7 adders (mod 232 ). Using the
adder from [17] with a Toffoli cost of 2n − 3 this corresponds to 61 Toffoli gates
per adder or 427 per round.
Next, we used Revs to produce Toffoli networks for this cipher, depending
on various increments of the number n of rounds. The circuits typically are too
large to be visualized in printed form, however, an automatically generated .svg
file that the LIQUi| compiler can be navigated by zooming in down to the level
of Toffoli, CNOT, and NOT gates. The resource estimates are summarized in
Table 1. Shown are the resulting circuit sizes, measured by the total number
of Toffoli gates, the resulting total number of qubits, and the time it took to
compile the circuit for various numbers of rounds. All timing data in the table
are measured in seconds and resulted from running the F# compiler in Visual
Studio 2013 on an Intel i7-3667 @ 2GHz 8 GB RAM under Windows 8.1. The
table shows savings of almost 4X in terms of the total numbers of qubits required
to synthesize the cipher when comparing the simple Bennett cleanup strategy
versus the Eager cleanup strategy. The reason for this is that the Bennett cleanup
methods allocates new space essentially for each gate whereas the Eager cleanup
strategy tries to clean up and reallocate space as soon as possible which for
the round-based nature of the function can be done as soon as the round is
completed.
Besides SHA-256, and other hash functions such as MD5, this technique
has also been applied to SHA-3 [4]. Our findings supports the thesis that it is
possible to trade circuit size (time) for total memory (space) in reversible circuit
synthesis. To the best of our knowledge, Revs is the first compiler that allows
to navigate this trade space and that offers strategies for garbage collection
for quantum architectures that go beyond the simple Bennett strategy which
generally leads to very poor memory utilization as most of the qubits are idle
most of the time.
Cuccaro et al. [17] Takahashi et al. [39] Draper [18] Häner et al. [24]
Size Θ(n) Θ(n) Θ(n2 ) Θ(n log n)
Depth Θ(n) Θ(n) Θ(n) Θ(n)
n
Ancillas n+1 (clean) n (clean) 0 2
(dirty)
Mathematically, the underlying idea how to make use of dirty ancillas can be
illustrated in case of an addition “+1” which is an observation due to Gidney [20]:
Using the ancilla-free adder by Takahashi [39], which requires no incoming carry,
and its reverse to perform subtraction, one can perform the following sequence
of operations to achieve an incrementer using n borrowed ancilla qubits in an
unknown initial state |g:
Recently, Paetznick and Svore [33] showed that by using non-deterministic cir-
cuits for decomposition, called Repeat-Until-Success (RUS) circuits, the number
of T gates can be further reduced by a factor of 2.5 on average for axial rotations,
and by a larger factor for non-axial rotations. They emphasized that synthesis
into RUS circuits can lead to a shorter expected circuit length that surpasses
the theoretical lower bound for the length of a purely unitary circuit design.
Leveraging the RUS framework, in [12,13] efficient algorithms were presented to
synthesize a non-deterministic Repeat-Until-Success (RUS) circuits for approx-
imating any given single-qubit unitary. Our algorithm runs in probabilistically
polynomial classical runtime for any desired precision ε. Our methods demon-
strate the power of using ancilla qubits and measurement for quantum circuit
compilation.
The general layout of a RUS protocol is shown in Fig. 3. Consider a unitary
operation U acting on n + m qubits, of which n are target qubits and m are
ancillary qubits. Consider a measurement of the ancilla qubits, such that one
measurement outcome is labeled “success” and all other measurement outcomes
are labeled “failure”. Let the unitary applied to the target qubits upon mea-
surement be V . In the RUS protocol, the circuit in the dashed box is repeated
on the (n + m)-qubit state until the “success” measurement is observed. Each
time a “failure” measurement is observed, an appropriate Clifford operator Wi†
is applied in order to revert the state of the target qubits to their original input
state |ψ. The number of repetitions of the circuit is finite with probability 1.
|0 /m |0 /m
U U
|ψ /n {Wi† } ... V |ψ
(a) (b)
It turns out that some fault-tolerant scalable quantum computing schemes under-
line the importance to work with higher-dimensional alphabets to encode quan-
tum information. In particular, a ternary quantum framework recently emerged
from proposals for a metaplectic topological quantum computer (MTQC) which
offers native topological protection of quantum information. MTQC creates an
inherently ternary quantum computing environment; for example the common
binary CNOT gate is no longer a Clifford gate in that environment.
Tools for Quantum and Reversible Circuit Compilation 13
In [11], compilation and synthesis methods for ternary circuits were developed
for 2 different elementary gate sets: the so-called Clifford+R|2 basis [10] and the
Clifford+P9 basis [11], where R|2 and P9 are both non-Clifford single qutrit gate
defined as R|2 = diag(1, 1, −1) and P9 = diag(e−2π i/9 , 1, e2π i/9 ).
The Clifford+R|2 basis, also called metaplectic basis, was obtained from
a MTQC by braiding of certain metaplectic non-abelian anyons and projec-
tive measurement. The gate R|2 is produced by injection of the magic state
|ψ = |0 − |1 + |2. The injection circuit is coherent probabilistic, succeeds in
three iterations on average and consumes three copies of the magic state |ψ
on average. The |ψ state is produced by a relatively inexpensive protocol that
uses topological measurement and consequent intra-qutrit projection. This pro-
tocol requires only three qutrits and produces an exact copy of |ψ in 9/4 trials
on average. This is much better than any state distillation method, especially
because it produces |ψ with fidelity 1. In [10] effective compilation methods
for Clifford+R|2 were developed to compile efficient circuits in the metaplectic
basis. In particular, given an arbitrary two-level Householder reflection r and a
precision ε, then r is effectively approximated by a metaplectic circuit of R|2 -
count at most C log3 (1/ε) + O(log(log(1/ε))), C ≤ 8.
The Clifford+P9 basis is a natural generalization of the binary π/8 gate. The
P9 gate can be realized by a certain deterministic measurement-assisted circuit
given a copy of the magic state μ = e−2π i/9 |0 + |1 + e2π i/9 |2, which further
can be obtained from the usual magic state distillation protocol. Specifically, it
requires O(log3 (1/δ)) raw magic states of low fixed fidelity in order to distill
a copy of the magic state μ at fidelity 1 − δ. The paper [11] developed a novel
approach to synthesis of reversible ternary classical circuits over the Clifford+P9
basis. We have synthesized explicit circuits to express classical reflections and
other important classical non-Clifford gates in this basis, which we subsequently
used to build efficient ternary implementations of integer adders and their
extensions.
In [14] further optimizations were given under the assumption of binary-
encoded data and applied the resulting solutions to emulating of the modular
exponentiation period finding (which is the quantum part of the Shor’s inte-
ger factorization algorithm). We have performed the comparative cost analysis
of optimized solutions between the “generic” Clifford+P9 architecture and the
MTQC architecture (the Clifford+R|2 ) using magic state counts as the cost
measure. We have shown that the cost of emulating the entire binary circuit for
the period finding is almost directly proportional to the cost of emulating the
three-qubit Toffoli gate and the latter is proportional to the cost of the P9 gate.
6 Conclusions
We presented Revs, a compiler and programming language that allows to auto-
mate the translation of classical, irreversible programs into reversible programs.
This language does not constrain the programmer to think in a circuit-centric
way. In some cases (e.g., hash functions such as SHA-256) the savings of our
14 M. Roetteler
References
1. Federal information processing standards publication 180–2, 2002. See also the
Wikipedia entry. http://en.wikipedia.org/wiki/SHA-2
2. Abdessaied, N., Amy, M., Drechsler, R., Soeken, M.: Complexity of reversible cir-
cuits and their quantum implementations. Theor. Comput. Sci. 618, 85–106 (2016)
3. Amy, M., Maslov, D., Mosca, M., Roetteler, M.: A meet-in-the-middle algorithm
for fast synthesis of depth-optimal quantum circuits. IEEE Trans. Comput. Aided
Des. Integr. Circ. Syst. 32(6), 818–830 (2013)
4. Amy, M., Di Matteo, O., Gheorghiu, V., Mosca, M., Parent, A., Schanck, J.M.:
Estimating the cost of generic quantum pre-image attacks on SHA-2 and SHA-3.
IACR Cryptol. ePrint Arch. 2016, 992 (2016)
5. Barenco, A., Bennett, C.H., Cleve, R., DiVincenzo, D.P., Margolus, N., Shor, P.,
Sleator, T., Smolin, J.A., Weinfurter, H.: Elementary gates for quantum computa-
tion. Phys. Rev. A 52(5), 3457 (1995)
6. Bennett, C.H.: Logical reversibility of computation. IBM J. Res. Dev. 17, 525–532
(1973)
7. Bennett, C.H.: Time/space trade-offs for reversible computation. SIAM J. Comput.
18, 766–776 (1989)
8. Berry, D.W., Childs, A.M., Cleve, R., Kothari, R., Somma, R.D.: Exponential
improvement in precision for simulating sparse hamiltonians. In: Symposium on
Theory of Computing (STOC 2014), pp. 283–292 (2014)
9. Berry, D.W., Childs, A.M., Kothari, R.: Hamiltonian simulation with nearly opti-
mal dependence on all parameters. In: IEEE 56th Annual Symposium on Founda-
tions of Computer Science (FOCS), pp. 792–809 (2015)
Tools for Quantum and Reversible Circuit Compilation 15
10. Bocharov, A., Cui, S.X., Kliuchnikov, V., Wang, Z.: Efficient topological compila-
tion for weakly-integral anyon model. Phys. Rev. A 93, 012313 (2016)
11. Bocharov, A., Cui, S.X., Roetteler, M., Svore, K.M.: Improved quantum ternary
arithmetics. Quantum Inf. Comput. 16(9&10), 862–884. arXiv preprint (2016).
arXiv:1512.03824
12. Bocharov, A., Roetteler, M., Svore, K.M.: Efficient synthesis of probabilistic quan-
tum circuits with fallback. Phys. Rev. A 91, 052317 (2015)
13. Bocharov, A., Roetteler, M., Svore, K.M.: Efficient synthesis of universal repeat-
until-success circuits. Phys. Rev. Lett. 114, 080502. arXiv preprint (2015).
arXiv:1404.5320
14. Bocharov, A., Roetteler, M., Svore, K.M.: Factoring with qutrits: Shor’s algo-
rithm on ternary and metaplectic quantum architectures. arXiv preprint (2016).
arXiv:1605.02756
15. Chrzanowska-Jeske, M., Mishchenko, A., Perkowski, M.A.: Generalized inclusive
forms - new canonical reed-muller forms including minimum esops. VLSI Des.
2002(1), 13–21 (2002)
16. Clader, B.D., Jacobs, B.C., Sprouse, C.R.: Preconditioned quantum linear system
algorithm. Phys. Rev. Lett. 110, 250504 (2013)
17. Cuccaro, S.A., Draper, T.G., Kutin, S.A., Moulton, D.P.: A new quantum ripple-
carry addition circuit. arXiv preprint (2004). arXiv:quant-ph/0410184
18. Draper, T.G.: Addition on a quantum computer. arXiv preprint (2000).
arXiv:quant-ph/0008033
19. Fowler, A.G., Mariantoni, M., Martinis, J.M., Cleland, A.N.: Surface codes:
towards practical large-scale quantum computation. Phys. Rev. A 86, 032324
(2012). arXiv:1208.0928
20. Gidney, C.: StackExchange: creating bigger controlled nots from single qubit, tof-
foli, and CNOT gates, without workspace (2015)
21. Green, A.S., Lumsdaine, P.L.F., Ross, N.J., Selinger, P., Valiron, B.: An introduc-
tion to quantum programming in quipper. In: Dueck, G.W., Miller, D.M. (eds.)
RC 2013. LNCS, vol. 7948, pp. 110–124. Springer, Heidelberg (2013). doi:10.1007/
978-3-642-38986-3 10
22. Green, A.S., Lumsdaine, P.L., Ross, N.J., Selinger, P., Valiron, B.: Quipper: a
scalable quantum programming language. In: Proceedings of Conference on Pro-
gramming Language Design and Implementation (PLDI 2013). ACM (2013)
23. Grover, L.: A fast quantum mechanical algorithm for database search. In: Proceed-
ings of the Symposium on Theory of Computing (STOC 1996), pp. 212–219. ACM
Press (1996)
24. Häner, T., Roetteler, M., Svore, K.M. Factoring using 2n+2 qubits with Toffoli
based modular multiplication. arXiv preprint (2016). arXiv:1611.07995
25. Aram, W., Harrow, A.H., Lloyd, S.: Quantum algorithm for linear systems of
equations. Phys. Rev. Lett. 103(15), 150502 (2009)
26. Heckey, J., Patil, S., JavadiAbhari, A., Holmes, A., Kudrow, D., Brown, K.R.,
Franklin, D., Chong, F.T., Martonosi, M.: Compiler management of communica-
tion and parallelism for quantum computation. In: Proceedings of the Twentieth
International Conference on Architectural Support for Programming Languages
and Operating Systems (ASPLOS 2015), pp. 445–456. ACM (2015)
27. Kempe, J.: Quantum random walks - an introductory overview. Contemporary
Phys. 44(4), 307–327 (2003)
28. Kliuchnikov, V., Maslov, D., Mosca, M.: Practical approximation of single-qubit
unitaries by single-qubit quantum Clifford and T circuits. IEEE Trans. Comput.
65(1), 161–172 (2016)
16 M. Roetteler
29. Maslov, D.: On the advantages of using relative phase Toffolis with an application
to multiple control Toffoli optimization. Phys. Rev. A 93, 022311 (2016)
30. Mishchenko, A., Brayton, R.K., Chatterjee, S.: Boolean factoring and decomposi-
tion of logic networks. In: Proceedings of the IEEE/ACM International Conference
on Computer-Aided Design, pp. 38–44. IEEE Press (2008)
31. Nielsen, M.A., Chuang, I.L.: Quantum Computation and Quantum Information.
Cambridge University Press, Cambridge (2000)
32. Oemer, B.: Classical concepts in quantum programming. Int. J. Theor. Phys. 44(7),
943–955 (2005)
33. Paetznick, A., Svore, K.M.: Repeat-until-success: non-deterministic decomposition
of single-qubit unitaries. Quantum Inf. Comput. 4(15&16), 1277–1301 (2014)
34. Parent, A., Roetteler, M., Svore, K.M.: Reversible circuit compilation with space
constraints. arXiv preprint (2015). arXiv:1510.00377
35. Ross, N.J., Selinger, P.: Optimal ancilla-free Clifford+T approximation of
z-rotations. arXiv preprint (2014). arXiv:403.2975
36. Selinger, P.: Quantum circuits of T -depth one. Phys. Rev. A 87, 042302 (2013)
37. Selinger, P.: Efficient Clifford+T approximation of single-qubit operators. Quan-
tum Inf. Comput. 15(1–2), 159–180 (2015)
38. Shor, P.W.: Polynomial-time algorithms for prime factorization and discrete loga-
rithms on a quantum computer. SIAM J. Comput. 26(5), 1484–1509 (1997)
39. Takahashi, Y., Tani, S., Kunihiro, N.: Quantum addition circuits, unbounded fan-
out. arXiv preprint (2009). arXiv:0910.2530
40. Wecker, D., Svore, K.M.: LIQ Ui|: a software design architecture and domain-
specific language for quantum computing. arXiv preprint arXiv:1402.4467