0% found this document useful (0 votes)

5 views

A_Game_of_Surface_Codes_Large-Scale_Quantum_Comput

Error correction surface code

Uploaded by

Fran J Gal

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

A_Game_of_Surface_Codes_Large-Scale_Quantum_Comput

Error correction surface code

Uploaded by

Fran J Gal

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

A Game of Surface Codes:

Large-Scale Quantum Computing with Lattice Surgery

Daniel Litinski @ Dahlem Center for Complex Quantum Systems, Freie Universität Berlin, Arnimallee 14, 14195 Berlin, Germany

Given a quantum gate circuit, how does one perform in a surface-code architecture.
execute it in a fault-tolerant architecture with There exist several encoding schemes for surface
as little overhead as possible? This paper is codes, among others, defect-based [7], twist-based [8]
a collection of strategies for surface-code quan- and patch-based [9] encodings. In this work, we focus
tum computing on small, intermediate and large on the latter. Surface-code patches have a low space
scales. They are strategies for space-time trade-
arXiv:1808.02892v1 [quant-ph] 8 Aug 2018

overhead compared to other schemes, and offer the pos-

offs, going from slow computations using few sibility of low-overhead Clifford gates [10, 11]. Perhaps
qubits to fast computations using many qubits. more importantly, they are conceptually less difficult to
Our schemes are based on surface-code patches, understand, as they do not directly involve braiding of
which not only feature a low space cost com- topological defects. Designing computational schemes
pared to other surface-code schemes, but are with surface-code patches only requires the concepts
also conceptually simple – simple enough that of qubits and two-qubit measurements. To this end,
they can be described as a tile-based game with we describe the operations of surface-code patches as a
a small set of rules. Therefore, no knowledge of tile-based game. This is helpful to design protocols and
quantum error correction is necessary to under- determine their space-time cost.
stand the schemes in this paper, but only the Surface codes as a game. The game is played
concepts of qubits and measurements. on a board partitioned into a number of tiles. An
The field of quantum computing is fuelled by the example of a 5 × 2 grid of tiles is shown in Fig. 1.
promise of fast solutions to classically intractable prob- The tiles can be used to host patches, which are rep-
lems, such as simulating large quantum systems or fac- resentations of qubits. We denote the Pauli opera-
toring large numbers. Already ∼100 qubits can be used tors of each qubit as X, Y and Z. Patches have
to solve useful problems that are out of reach for clas- dashed and solid edges representing X- and Z-type op-
sical computers [1, 2]. Despite the exponential speed- erators. The points where two edges meet are called
up, the actual time required to solve these problems corners. A patch with 2N + 2 corners represents N
is orders of magnitude above the coherence times of qubits. The simplest case is a four-corner square patch
any physical qubit. In order to store and manipulate (Fig. 1a) representing a single qubit. Each of the two
quantum information on large time scales, it is nec- dashed (solid) edges represent the qubits’ X (Z) oper-
essary to actively correct errors by combining many
physical qubits to logical qubits using a quantum error- Four-, six- and eight-corner patches
correcting code [3–5]. Of particular interest are codes
that are compatible with the locality constraints of real-
istic devices such as superconducting qubits, which are (a) (c)
limited to operations that are local in two dimensions.
The most prominent such code is the surface code [6, 7].
Working with logical qubits introduces additional
(b)
overhead to the computation. Not only is the space cost
drastically increased as physical qubits are replaced by
logical qubits, but also the time cost increases due to
the restricted set of accessible logical operations. Sur- (d) (2N + 2)-corner patches
face codes, in particular, are limited to a set of 2D-local
operations, which means that arbitrary gates in a quan-
tum circuit may require several time steps instead of
one. To keep the cost of surface-code quantum com-
puting low, it is important to find schemes to translate
quantum circuits into surface-code layouts with a low Figure 1: (a-c) Examples of patches in a 5 × 2 grid of tiles.
space-time overhead. This is also necessary to bench- (d) Patches with 2N + 2 corners represent N qubits. Their
mark quantum algorithms to find out how well they 2N + 2 edges represent the shown Pauli operators.

1
ator. While the square patch only occupies one tile, a (a) Bell state preparation
four-corner patches can also be shaped to, e.g., occupy 0 Step 1 1 Step 2 1 Step 3
three tiles (b). A six-corner patch (c) represents two
qubits. The first qubits’ Pauli operators X1 and Z1
are represented by the two top edges, while the second
qubits’ operators X2 and Z2 are found in the two bot-
(b) Qubit movement
tom edges. The general rule that assigns the operators
of N qubits to the edges of a (2N + 2)-corner patch is 0 Step 1 1 Step 2 1 Step 3
given in Fig. 1d. Going clockwise, the dashed bound-
aries correspond to X1 , X1 X2 , X2 X3 , . . . , XN −1 XN and
XN . Starting to the right of X1 , the solid edges corre-
spond to Z1 , Z2 , . . . , ZN and the product Z1 Z2 · · · ZN . (c) Y basis measurement
In the following, we specify the operations that can be
0 Step 1 1 Step 2 2 Step 3 2 Step 4
used to manipulate the qubits represented by patches.
Some of these operations take a time step to complete,
whereas others can be performed instantly. The goal
is to implement quantum algorithms using as few tiles
and time steps () as possible. There are three types
of operations: qubit initialization, qubit measurement
and patch deformation. (d) Moving corners (e) Shortened edges
0 Step 1 1 Step 2 normal shortened
I. Qubit initialization:

– Qubits can be initialized in the X and Z eigen-

states |+i and |0i. All qubits that are part Figure 2: Examples of short protocols. (a) Preparation of a
of one patch must be initialized in the same two-qubit Bell state in 1. (b) Moving a square-patch qubit
state. (Performed instantly) over long distances in 1. (c) Measurement of a square-patch
qubit in the Y basis using an ancilla qubit and 2. (d) Moving
– Four-corner patches can be initialized in an
corners of a four-corner patch to change its shape in 1. (e) A
arbitrary state. Unless this state is an X or Z
normal six-corner patch and one with two shortened X edges.
eigenstate, there is a certain probability pinit
that the qubit will be affected by a random
error. (Performed instantly) III. Patch deformation:
II. Qubit measurement: – Edges of a patch can be moved to deform the
patch. If the edge is moved onto a free tile
– Qubits can be measured in the X or Z ba- to increase the size of the patch, this takes
sis. All qubits that are part of the same patch 1 to complete. If the edge is moved inside
are measured simultaneously and in the same the patch to make the patch smaller, the ac-
basis. This measurement removes the patch tion can be performed instantly.
from the board. (Performed instantly)
– Corners of a patch can be moved along the
– If edges of two different patches are positioned patch boundary to change its shape, as shown
in adjacent tiles, the product of the operators in Fig. 2d. (Takes 1 to complete)
of the two edges can be measured. For exam-
ple, the product Z ⊗ Z between two neigh- – It is possible to initialize patches with short-
boring square patches can be measured, as ened edges, such that they occupy fewer tiles.
highlighted in step 2 of Fig. 2a by the blue The drawback of this is that in every time
rectangle. If the edge of one patch is adja- step, an error corresponding to the Pauli op-
cent to multiple edges of the other patch, the erator represented by the shortened edge will
product of all involved Pauli operators can be occur with a probability perr . For instance, a
measured. For instance, if qubit A’s Z edge six-corner patch with two shortened X edges
is adjacent to both qubit B’s X edge and Z as in Fig. 2e is susceptible to X errors.
edge, the operator ZA ⊗ YB can be measured
(see step 3 of Fig. 2c), since Y = iXZ. (Takes To illustrate these operations, we go through three
1 to complete) short example protocols in Fig. 2a-c. The first example
is the preparation of a Bell pair (a). Two square patches

2
are initialized in the |+i state. Next, the operator Z ⊗Z tum computations. In this work, we discuss strategies
is measured. Before the measurement, the qubits are in to tackle the following problem: Given a quantum cir-
the state |+i ⊗ |+i = (|00i + |01i + |10i + |11i)/2. If cuit, how does one execute it as fast as possible on a
the measurement outcome √ is +1, the qubits end up in surface-code-based quantum computer of a certain size?
the state (|00i + |11i)/√ 2. For the outcome −1, the This is an optimization problem that was shown to be
state is (|01i + |10i)/ 2. In both cases, the two qubits NP-hard [14], so the focus is rather on finding heuristics.
are in a maximally entangled Bell state. This protocol The content of this paper is outlined in Fig. 3.
takes 1 to complete. The second example (b) is the The input to our problem is an arbitrary gate cir-
movement of a square patch into a different tile. For cuit corresponding to the computation. We refer to the
this, the square patch is enlarged by patch deformation, qubits that this circuit acts on as data qubits. As we
which takes 1, and then made smaller again at no review in Sec. 1, the natural universal gate set for sur-
time cost. The third example (c) is the measurement face codes is Clifford+T , where Clifford gates are cheap
of a square patch in the Y basis. For this, the patch is and T gates are expensive. In fact, Clifford gates can
deformed such that the X and Z edge are on the same be treated entirely classically, and T gates require the
side of the patch. An ancillary patch is initialized in the consumption of a magic state |0i+eiπ/4 |1i. Only faulty
|0i state and the operator Z ⊗ Y between the ancilla (undistilled ) magic states can be prepared in our frame-
and the qubit is measured. The ancilla is discarded by work. To generate higher-fidelity magic states for large-
measuring it in the Z basis. scale quantum computation, a lengthy protocol called
Translation to surface codes. Protocols designed magic state distillation [15] is used.
within this framework can be straightforwardly trans- It is therefore natural to partition a quantum com-
lated into surface-code operations. The exact cor- puter into a block of tiles that is used to distill magic
respondence between our framework and surface-code states (a distillation block) and a block of tiles that
patches is specified in Appendix A, but it is not cru- hosts the data qubits (a data block) and consumes
cial to the understanding of this paper. Essentially, magic states. The speed of a quantum computer is gov-
patches correspond to surface-code patches with dashed erned by how fast magic states can be distilled, and how
and solid edges as rough and smooth boundaries. Thus, fast they can be consumed by the data block.
for surface codes with a code distance d, each tile cor- In Sec. 2, we discuss how to design data blocks. In
responds to d2 physical data qubits. Pauli product particular, we show three designs: compact, intermedi-
measurements that take 1 to complete correspond to ate and fast blocks. The compact block uses 1.5n + 3
(twist-based) lattice surgery [9, 11], which requires d tiles to store n qubits, but takes up to 9 to consume
code cycles. Thus, 1 corresponds to d code cycles. a magic state. Intermediate blocks use 2n + 4 tiles and
Qubit initialization has no time cost, since, in case of require up √to 5 per magic state. Finally, the fast block
X and Z eigenstates, it can be done simultaneously with uses 2n + 8n + 1 tiles, but requires only 1 to con-
the following lattice surgery [9, 12]. For arbitrary states, sume a magic state. The compact block is an option
initialization corresponds to state injection [12, 13]. Its for early quantum computers with few qubits, where
time cost does not scale with d. Similarly, single-qubit the generation of a single magic state takes longer than
measurements in the X or Z basis correspond to the si- 11. The fast block has a better space-time overhead,
multaneous measurement of all physical data qubits in which makes it more favorable on larger scales.
the corresponding basis and some classical error correc-
Data blocks need to be combined with distillation
tion, which does not scale with d either. Patch defor-
blocks for universal quantum computing. In Sec. 3,
mation is code deformation, which requires d code cy-
we discuss designs of distillation blocks. Since magic
cles, unless the patch becomes smaller in the process, in
state distillation is the main operation of a surface-
which case it corresponds to single-qubit measurements.
code-based quantum computer, it is important to min-
In essence, the framework can be used to estimate the
imize its space-time cost. We discuss distillation proto-
space-time cost of a computation. The leading-order
cols based on error-correcting codes with transversal T
term of the space-time cost – the term that scales with
gates, such as punctured Reed-Muller codes [15, 16] and
d3 – of a protocol that uses s tiles for t time steps is
block codes [17–19]. In comparison to braiding-based
st · d3 in terms of (physical data qubits)·(code cycles).
implementations of distillation protocols, we reduce the
space-time cost by up to 90%.
Overview A data block combined with a distillation block con-
stitutes a quantum computer in which T gates are per-
Having established the rules of the game and the corre- formed one after the other. At this stage, the quan-
spondence of our framework to surface-code operations, tum computer can be sped up by increasing the num-
our goal is to find implementations of arbitrary quan- ber of distillation blocks, effectively decreasing the time

3
Sec. 1: Clifford+T circuits Sec. 2: Data blocks Sec. 3: Distillation blocks

Example:
100 qubits Sec. 4: Sec. 5: Sec. 6:
Trade-offs limited by T count Trade-offs limited by T depth Trade-offs beyond Clifford+T
108 T gates

p = 10−4 55,000 qubits 120,000 qubits 1500 × 220,000 = 330m qubits ···
d = 13 4 hours 22 minutes 1 second ···
∼100 qubits p = 10−3 310,000 qubits 1,000,000 qubits 3000 × 1,500,000 ≈ 4.5b qubits ···
(Appendix C) d = 27 7 hours 45 minutes 1 second ···

Figure 3: Overview of the content of this paper. To illustrate the space-time trade-offs discussed in this work, we show the number
of physical qubits and the computational time required for a circuit of 108 T gates distributed over 106 T layers. We consider
physical error rates of p = 10−4 and p = 10−3 , for which we need code distances d = 13 and d = 27, respectively. We assume
that each code cycle takes 1 µs.

it takes to distill a single magic state, as we discuss in puters with 220,000 qubits each, and with the ability to
Sec. 4. In order to illustrate the resulting space-time share Bell pairs between neighboring computers.
trade-off, we consider the example of a 100-qubit com- In Sec. 6, we discuss further space-time trade-offs that
putation with 108 T gates, which can already be used are beyond the parallelization of Clifford+T circuits. In
for classically intractable computations [2]. Assuming particular, we discuss the use of Clifford+ϕ circuits, i.e.,
an error rate of p = 10−4 and a code cycle time of circuits containing arbitrary-angle rotations beyond T
1 µs, a compact data block together with a distillation gates. These require the use of additional resources,
block can finish the computation in 4 hours using 55,000 but can speed up the computation. We also discuss the
physical qubits.1 Adding 10 more distillation blocks in- possibility of hardware-based trade-offs by using higher
creases the qubit number to 120,000 and decreases the code distances, but in turn shorter measurements with
computational time to 22 minutes, using 1 per T gate. a decreased measurement fidelity. Ultimately, the speed
For further space-time trade-offs in Sec. 5, we exploit of a quantum computer is limited by classical process-
that the T gates of a circuit are arranged in layers of ing, which can only be solved by faster classical com-
gates that can be executed simultaneously. This en- puting.
ables linear space-time trade-offs down to the execution Finally, we note that while the qubit numbers re-
of one T layer per qubit measurement time, effectively quired for useful quantum computing are orders of mag-
implementing Fowler’s time-optimal scheme [20]. If the nitude above what is currently available, a proof-of-
108 T gates are distributed over 106 layers, and mea- principle two-qubit device demonstrating all necessary
surements (and classical processing) can be performed operations using undistilled magic states can be built
in 1 µs, up to 1500 units of 220,000 qubits can be run with 48 physical data qubits, see Appendix C.
in parallel. This way, the computational time can be
brought down to 1 second using 330 million qubits.
While this is a large number, the units do not neces- 1 Clifford+T quantum circuits
sarily need to be part of the same quantum computer,
but can be distributed over up to 1500 quantum com- Our goal is to implement full quantum algorithms with
surface codes. The input to our problem is the al-
1 We will assume that the total number of physical qubits is
gorithm’s quantum circuit. The universal gate set
twice the number of physical data qubits. This is consistent with
superconducting qubit platforms, where the use of measurement
Clifford+T is well-suited for surface codes, since it sepa-
ancillas doubles the qubit count. If a platform does not require rates easy operations from difficult ones. Often, this set
the use of ancilla qubits, the total qubit count is reduced by 50% is generated using the Hadamard gate H, phase gate S,
compared to the numbers reported in this paper. controlled-NOT (CNOT) gate, and the T gate. Instead,

4
(a/b)

if P P 0 = P 0 P : (a) if P P 0 = P 0 P : (c)
(c)

if P P 0 = −P 0 P : if P P 0 = −P 0 P :

if P1 P 0 = −P 0 P1 : if P2 P 0 = −P 0 P2 : (b)

Figure 4: A generic circuit consists of π/4 rotations (orange), π/8 rotations (green) and measurements (blue). The Pauli product
in each box specifies the axis of rotation or the basis of measurement. If the Pauli operator is −P instead of P , a minus sign
is found in the corner of the box, such that, e.g., Z−π/4 corresponds to an S † gate. Using the commutation rules in (a/b), all
Clifford gates can be moved to the end of the circuit. Using (c), the Clifford gates can be absorbed by the final measurements.

we choose to write our circuits using Pauli product ro- into (P 0 P1 )ϕ . If P 0 anticommutes with both P1 and P2 ,
tations Pϕ (see Fig. 5), because it simplifies circuit ma- Pϕ0 turns into (P 0 P1 P2 )ϕ .
nipulations. Here, Pϕ = exp(−iP ϕ), where P is a Pauli After moving the Clifford gates to the right, the re-
product operator (such as Z, Y ⊗ X, or X ⊗ 1 ⊗ X) and sulting circuit consists of three parts: a set of π/8 ro-
ϕ is an angle. In this sense, S = Zπ/4 , T = Zπ/8 , tations, a set of π/4 rotations, and Z measurements.
and H = Zπ/4 · Xπ/4 · Zπ/4 . The CNOT gate can Because Clifford gates map Pauli operators onto other
also be written in terms of Pauli product rotations as Pauli operators, the Clifford gates can be absorbed by
CNOT = (Z ⊗ Z)π/4 · (1 ⊗ Z)−π/4 · (Z ⊗ 1)−π/4 . In fact,
we can more generally define P1 -controlled-P2 gates as
C(P1 , P2 ) = (P1 ⊗ P2 )π/4 · (1 ⊗ P2 )−π/4 · (P1 ⊗ 1)−π/4 . (a) Single-qubit rotations
The CNOT gate is the specific case of C(Z, X).

Getting rid of Clifford gates. Clifford gates are

considered to be easy, because, by definition, they map
Pauli operators onto other Pauli operators [21]. This
can be used to simplify the input circuit. A generic cir-
(b) CNOT (c) C(P1 , P2 ) gate
cuit is shown in Fig. 4, consisting of Clifford gates, Zπ/8
rotations and Z measurements. If all Clifford gates are
commuted to the end of the circuit, the Zπ/8 rotations
become Pauli product rotations. The rules for moving
Pπ/4 rotations past Pϕ0 gates are shown in Fig. 4a: If P
and P 0 commute, Pπ/4 can simply be moved past Pϕ0 .
If they anticommute, Pϕ0 turns into (iP P 0 )ϕ when Pπ/4
is moved to the right. Since C(P1 , P2 ) gates consist of Figure 5: Clifford+T gates in terms of Pauli rotations.
π/4 rotations, similar rules can be derived in Fig. 4b: If (a) Single-qubit Clifford gates are π/4 rotations, and the T
P 0 anticommutes with P1 , Pϕ0 turns into (P 0 P2 )ϕ after gate is a π/8 rotation. (b/c) P1 -controlled-P2 gates are Clif-
commutation. If P 0 anticommutes with P2 , Pϕ0 turns ford gates, where C(Z, X) is the CNOT gate.

5
| {z } | {z } | {z } | {z } | {z } | {z }
layer 1 layer 2 layer 3 layer 4 layer 1 layer 2
Figure 6: Clifford+T circuits can be written as a number of consecutive π/8 rotations. These gates are grouped into layers of
mutually commuting rotations. A simple greedy algorithm can be used to reduce the number of layers, i.e., the T depth.

the final measurements, turning Z measurements into π/4 rotation that is commuted to the end of the circuit,
Pauli product measurements. The commutation rules thereby decreasing the T count. As we discuss in Sec. 6,
of this final step are shown in Fig. 4c and are similar to this kind of algorithm can not only be used with π/8 ro-
the commutation of Clifford gates past rotations. tations, but, in principle, with arbitrary Pauli product
T count and T depth. Thus, every n-qubit circuit rotations. The reduction of the circuit depth in terms
can be written as a number of consecutive π/8 rotations of non-π/8 rotations can be useful when going beyond
and n final Pauli product measurements, as shown in Clifford+T circuits.
Fig. 6. We refer to the number of π/8 rotations as the
T count. An important part of circuit optimization is
the minimization of the T count, for which there ex- 1.1 Pauli product measurements
ist various approaches [22–25]. The π/8 rotations of When implementing circuits like Fig. 6 with surface
a circuit can be grouped into layers. All π/8 rotations codes, one obstacle is that π/8 rotations are not di-
that are part of a layer need to mutually commute. The rectly part of the set of available operations. Instead,
number of π/8 layers of a circuit is strictly speaking not one uses magic states [15] as a resource. These states
the same quantity as the T depth, but we will still refer are π/8-rotated Pauli eigenstates |mi = |0i + eiπ/4 |1i.
to it as the T depth and to π/8 layers as T layers. They can be consumed in order to perform Pπ/8 rota-
When partitioning π/8 rotations into layers, the naive tions. The corresponding circuit [29] is shown in Fig. 7.
approach often yields more layers than are necessary. A Pπ/8 rotation corresponds to a P ⊗ Z measurement
For instance, a naive partitioning of the first 6 T gates involving the magic state. If the measurement outcome
of Fig. 6 yields 4 layers. A few commutations can bring is P ⊗Z = −1, then a corrective Pπ/4 operation is neces-
the number down to 2 layers. There are a number of sary. Since this is a Clifford gate, it can be simply com-
algorithms for the optimization of the T depth [26–28]. muted to the end of the circuit, changing the axes of the
Here, we use a simple greedy algorithm to reduce the following π/8 rotations. Finally, in order to discard the
number of layers: magic state, it is disentangled from the rest of the sys-
tem by an X measurement. Here, an outcome X = −1
repeat prompts a Pπ/2 correction. π/2 rotations correspond to
for each layer i do Pauli operators, i.e., Pπ/2 = P . The Pauli correction
for each rotation j in layer i + 1 do can also be commuted to the end of the circuit. When
if (rotation j commutes with all
rotations in layer i) then
Move rotation j from layer i + 1 to
layer i;
end
end
end
until the partitioning no longer changes;

Note that when a reordering puts two equal π/8 rota- Figure 7: Circuit to perform a π/8 rotation by consuming a
tions into the same layer, they can be combined into a magic state.

6
Pπ/2 is moved past a P 0 rotation or measurement, it (a) Measurement of Z|q1 i ⊗ Y|q2 i ⊗ X|q4 i ⊗ Z|mi
changes the axis of rotation or measurement basis to
−P 0 if P and P 0 anticommute. 0 Step 1 0 Step 2
Pauli product measurements in 1. In essence,
if magic states are available, the only operations re-
quired for universal quantum computing are Pauli prod-
uct measurements. Using a (2n)-corner patch as an
ancilla, an n-qubit Pauli product can be measured
in 1 [30]. An example is shown in Fig. 8. Suppose we
1 Step 3 1 Step 3
have four qubits |q1 i - |q4 i in four two-tile four-corner
patches, and we need to perform a (Z ⊗ Y ⊗ 1 ⊗ X)π/8
rotation. According to the circuit in Fig. 7, this is done
by measuring Z|q1 i ⊗ Y|q2 i ⊗ X|q4 i ⊗ Z|mi between the
four qubits and a magic state. Note that we only want
to measure the Pauli product without learning anything
about the individual Pauli operators Z|q1 i , Y|q2 i , X|q4 i
and Z|mi . (b) Ancilla patch
To this end, an 8-corner ancilla patch is initialized
⊗3
in the |+i state. The shape of this patch is chosen,
such that each of the four Z edges is adjacent to one
of the four operators that are part of the measurement. Figure 8: Pauli product measurement protocol. (a) Example
Note that this means that some of the X edges are of a measurement of the operator Z ⊗ Y ⊗ 1 ⊗ X ⊗ Z of the
shortened, such that the qubits are susceptible to X qubits |q1 i, |q2 i, |q3 i, |q4 i and |mi. (b) Ancilla patch used
errors. In this case, this is not a problem, since the during the measurement.
qubits are initialized in X eigenstates and random X
errors will cause no change to the states. Next, in step 3,
we measure the four Pauli products Z|q1 i ⊗Z1 , Y|q2 i ⊗Z2 , cuit and absorbed by the final measurements. Thus, any
Z|mi ⊗ Z3 and X|q4 i ⊗ (Z1 · Z2 · Z3 ). Because the ancilla quantum computation can be written as a sequence of
is initialized in an X eigenstate, the operators Z1 , Z2 π/8 rotations grouped into layers of mutually commut-
and Z3 are unknown, and the outcome of each of the ing rotations. The number of rotations is the T count
four aforementioned measurements is entirely random. and the number of layers is the T depth. Each rotation
However, multiplying the four measurement outcomes can be performed by consuming a magic state via a
yields Z|q1 i ⊗Y|q2 i ⊗X|q4 i ⊗Z|mi ⊗(Z1 ·Z2 ·Z3 ·Z1 ·Z2 ·Z3 ), Pauli product measurement. These measurements can
which is precisely the operator Z|q1 i ⊗ Y|q2 i ⊗ X|q4 i ⊗ be implemented in our framework in 1.
Z|mi that we wanted to measure. Finally, to discard the
ancilla patch we measure its three qubits in the X basis.
Again, X errors will have no effect, as they commute 2 Data blocks
with the measurement basis. Measurement outcomes of Since Clifford+T circuits are a sequence of π/8 rota-
Xi = −1 prompt a Pauli correction. If in the previous tions, each requiring the consumption of a magic state,
step, the Zi edge was measured together with a Pauli it is natural to partition a quantum computer into a set
operator P , the correction is a Pπ/2 gate. For instance, of tiles that are used for magic state distillation (distilla-
if in Fig. 8 the final measurements yield X2 = −1 and tion blocks) and a set of tiles that hosts data qubits and
X3 = −1, the corrections are a Yπ/2 rotation on |q2 i consumes magic states via Pauli product measurements
and a Zπ/2 rotation on |mi. (data blocks). In this section, we discuss designs for
This type of protocol can be used to measure any the latter. In principle, the structure shown in Fig. 8
product of n Pauli operators. An ancilla patch needs is a data block, where each qubit is stored in a two-
⊗n
to be initialized in the |+i state with Z edges adja- tile patch and magic states can be consumed every 1.
cent to the n operators part of the measurement. We However, this sort of design uses 3n tiles to host n data
show the concrete surface-code implementation of the qubits, which is a relatively large space overhead.
example of Fig. 8 in Appendix B.
Summary. Clifford+T circuits can be written in
2.1 Compact block
terms of π/8 rotations, π/4 rotations and measure-
ments. To convert input circuits into a standard form, The first design that we discuss uses only 1.5n + 3 tiles.
π/4 rotations can be commuted to the end of the cir- This compact block is shown in Fig. 9a, where each data

7
(a) Compact block (c) π/4 rotations

(b) Patch rotation

1 2 2 3 3

(d) Y ⊗ 1 ⊗ Y ⊗ Z ⊗ Y ⊗ Y rotation in 9
0 Step 1 1 Step 2 1 Step 3 2 Step 4

2 Step 5 5 Step 6 8 Step 7 9 Step 8

Figure 9: (a) Compact blocks store n data qubits in 1.5n + 3 tiles. The consumption of a magic state can take up to 9. (c) The
worst-case scenario are Pauli products involving an even number of Y operators, whose treatment requires explicit π/4 rotations.
The example in (d) shows the 8 steps necessary to consume the magic state, which involves π/4 rotations and patch rotations (b).

qubit is stored in a four-corner square patch. This low- a π/8 rotation, a Pπ/4 rotation can be executed using
ers the space cost, but restricts the operators that are a resource state |Y i = |0i + eiπ/4 |1i. However, even
accessible by Pauli product measurements, as only the though this state is a Pauli eigenstate, it cannot be
Z operator is free to be measured. Using 3, patches prepared immediately in our framework. Instead, we
may also be rotated (see Fig. 9b), such that the X oper- use a |0i state and Y measurements, such that a Pπ/4
ator becomes accessible instead of the Z operator. The rotation is performed by a P ⊗ Y measurement between
problematic operators are Y operators, which are the the qubits and the |0i state. Afterwards, the |0i state is
reason why the consumption of a magic state can take measured in X. If the P ⊗ Y and the X measurement
up to 9. yield different outcomes, a Pauli correction is necessary.
The worst-case scenario is a π/8 rotation involving In Fig. 9d, we go through the steps necessary to per-
an even number of Y operators, such as the one shown form a (Y ⊗ 1 ⊗ Y ⊗ Z ⊗ Y ⊗ Y )π/8 rotation. In step
in Fig. 9c. One possibility to replace Y operators by 1, we start with a 12-tile data block storing 6 qubits in
X or Z operators is via π/4 rotations, since Yπ/4 = the blue region. The orange region is not part of the
Zπ4 Xπ/4 Z−π/4 . Rotations with an even number of Y ’s data block, but is part of the adjacent distillation block,
require two π/4 rotations, while an odd number of Y ’s i.e., it is the source of the magic states. In steps 2-5,
can be handled by one rotation. Only the left two π/4 we perform the two π/4 rotations that are necessary to
rotations in Fig. 9c need to be performed explicitly. The replace the Y operators with X’s. In step 6, we first
right two rotations can be commuted to the end of the rotate patches in the upper row, and then in step 7 in
circuit, changing the later π/8 rotations. Similar to the lower row. Finally, in step 8, we measure the Pauli

8
1 Step 1 2 Step 2 2 Step 3

Figure 10: Patch rotations in preparation of a Z ⊗ X ⊗ Z ⊗ Z ⊗ X measurement with an intermediate block.

product involving the magic state. eliminating the need to move patches back to their row
This general procedure can be used for any π/8 ro- after the rotation. An example is shown in Fig. 10.
tation. First, up to two π/4 rotations are performed in Suppose we have 5 qubits and need to prepare them for
2. Next, patches in the upper and lower row are ro- a Z ⊗ X ⊗ Z ⊗ Z ⊗ X measurement. The first, third and
tated, which takes 3 per row. Finally, the Pauli prod- fourth qubit are moved to the other side, which takes
uct is measured in 1, requiring a total of 9. While 1. Simultaneously, the second and fifth qubit are ro-
this is very slow compared to Fig. 8, this is a valid choice tated, which takes 2. Therefore, the total number of
for small quantum computers where the distillation of time steps to consume a magic state is at most 5: 2
a magic state takes longer than 9. for up to two π/4 rotations, 2 for the patch rotations,
and 1 for the Pauli product measurement.
2.2 Intermediate block
2.3 Fast block
One possibility to speed up compact blocks is to store
all qubits in one row instead of two. This is the interme- The disadvantage of square patches is that only one
diate block shown in Fig. 11a, which uses 2n + 4 tiles to Pauli operator is available for Pauli product measure-
store n qubits. By eliminating one row, all patch rota- ments at any given time. Two-tile four-corner patches
tions can be done simultaneously. In addition, one can as in Fig. 8, on the other hand, allow for the measure-
save 1 by moving all patches to the other side, thereby ment of any Pauli operator, but use two tiles for each
qubit. In order to have both compact storage and ac-
(a) Intermediate block cess to all Pauli operators, we use six-corner patches for
our fast blocks in Fig. 11b. Six-corner patches use two
tiles to represent two qubits (see Fig. 1), where the first
qubit’s Pauli operators are in the left two edges, and
ancilla region the second qubit’s operators are in the right two edges.
Therefore, the example in Fig. 11b is a fast block that
(b) Fast block stores 18 qubits.
Since all Pauli operators are accessible, the Pauli
product measurement protocol of Fig. 8 can be used
to consume a magic state every 1. n qubits occupy
ap square arrangement of tiles with √ a side length of
n/2
p + 1, i.e., a total of 2n + 8n + 1 tiles. Even
if n/2 is not integer, one should keep the block as
square-shaped as possible by picking the closest integer
as a side length and shortening the last column. While
the fast block uses more tiles compared to the compact
and intermediate blocks, it has a lower space-time cost,
making it more favorable for large quantum comput-
ers for which the distillation of a magic state takes less
than 5.
Note that if undistilled magic states are sufficient,
then any data block can already be used as a full quan-
ancilla region
tum computer. A proof-of-principle two-qubit device
in the spirit of Ref. [31] that constitutes a universal
Figure 11: (a) Intermediate blocks store n data qubits in 2.5n+ two-qubit quantum computer with undistilled magic
4 tiles and√require up to 5 per magic state. (b) Fast blocks states and can demonstrate all the operations that are
use 2n + 8n + 1 tiles and require 1 per magic state. used in our framework can be realized with six tiles,

9
Figure 12: Encode-T -decode circuit of the 15-to-1 distillation protocol. The multi-target CNOTs (orange) can be commuted past
the T gates, such that they cancel and leave 15 Z-type Pauli product rotations.

as shown in Appendix C. This proof-of-principle device applied to any distillation protocol based on an error-
uses (3d − 1) · 2d physical data qubits, i.e., 48, 140, or correcting code with transversal T gates, such as punc-
280 data qubits for distances d = 3, 5 or 7. If ancilla tured Reed-Muller codes [15, 16] or block codes [17–19].
qubits are used for stabilizer measurements, the number To show the general structure of such a protocol, we go
of physical qubits roughly doubles, but it is still within through the example of 15-to-1 distillation [15], i.e., a
reach of near-term devices. protocol that uses 15 faulty magic states to distill a
Summary. Data blocks store the data qubits of single higher-fidelity state.
the computation and consume magic states. Compact
blocks use 1.5n + 3 tiles for n qubits and require up to
9 to consume a magic state. Intermediate blocks use 3.1 15-to-1 distillation
2n + 4 tiles and √take up to 5 per magic state. Fast
The 15-to-1 protocol is based on a quantum error-
blocks use 2n + 8n + 1 tiles and take 1 per magic
correcting code that uses 15 qubits to encode a single
state. Data blocks need to be combined with distillation
logical qubit with code distance 3. The reason why this
blocks for large-scale quantum computation.
can be used for magic state distillation is that, for this
code, a physical T gate on every physical qubit corre-
3 Distillation blocks sponds to a logical T gate (actually T † ) on the encoded
qubit, which is called a transversal T gate. The general
In this section, we discuss designs of tile blocks that structure of a distillation circuit based on a code with
are used for magic state distillation. This is necessary, transversal T gates is shown in Fig. 12 for the example
because with surface codes, the initialization of non- of 15-to-1. It consists of four parts: an encoding circuit,
Pauli eigenstates is prone to errors, which means that transversal T gates, decoding and measurement.
π/8 rotations performed using these states may lead The circuit begins with 5 qubits initialized in the |+i
to errors. In order to decrease the probability of such state and 10 qubits in the |0i state. Qubits 1-4, 5 and 6-
an error, magic state distillation [15] is used to con- 15 are associated with the four X stabilizers, the logical
vert many low-fidelity magic states into fewer higher- X operator, and the ten Z stabilizers of the code. The
fidelity states. This requires only Clifford gates (i.e., first five operations are multi-target CNOTs that corre-
Pauli product measurements), so, in principle, any of spond to the code’s encoding circuit. They map the X
the data blocks discussed in the previous section can Pauli operators of qubits 1-4 onto the code’s X stabiliz-
be used for this purpose. However, magic state distilla- ers, the X Pauli of qubit 5 onto the logical X operator
tion is repeated extremely often for large-scale quantum and the Z operators of qubits 6-15 onto the code’s Z
computation, so it is worth optimizing these protocols. stabilizers. Because we start out with +1-eigenstates of
Here, we discuss a general procedure that can be X and Z, this circuit prepares the simultaneous stabi-

10
Figure 13: 15-to-1 distillation circuits that uses 5 qubits and 11 π/8 rotations.

lizer eigenstate corresponding to the logical |+iL state. transversal T gates. In general, a code with mx X sta-
Next, a transversal T gate is applied, transforming the bilizers that uses n qubits to encode k logical qubits
logical state to TL |+iL (actually to TL† |+iL ). Note that yields a circuit of n−mx π/8 rotations on mx +k qubits.
the 15 Zπ/8 rotations are potentially faulty. Finally, the Each of the mx + k qubits are either associated with an
encoding circuit is reverted, shifting the logical qubit in- X stabilizer or one of the k logical qubits. For each of
formation back into qubit 5, and the information about the n qubits of the code, the circuit contains one π/8
the X and Z stabilizers into qubits 1-4 and 6-15. If rotation with an axis that has a Z on each stabilizer or
no errors occurred, qubit 5 is now a magic state T |+i logical X operator that this qubit is part of. In order to
(actually T † |+i). In order to detect whether any of the more easily determine the n − mx rotations, it is useful
15 π/8 rotations were affected by an error, qubits 1-4 to write down an n × (mx + k) matrix that shows the
and 6-15 are measured in the X and Z basis, respec- X stabilizers and logical X operators of the code. For
tively, effectively measuring the stabilizers of the code. 15-to-1, such a matrix could look like this:
Since the code distance is 3, up to two errors can be
detected, which will yield a -1 measurement outcome
 
on some stabilizers. If any error is detected, all qubits 0 0 0 1 0 0 0 0 1 1 1 1 1 1 1
are discarded and the distillation protocol is restarted. 0
 0 1 0 0 1 1 1 0 0 0 1 1 1 1
This way, if the error probability of each of the 15 T M15-to-1 0
= 1 0 0 1 0 1 1 0 1 1 0 0 1 1 (1)
gates is p, the error probability of the output state is 1 0 0 0 1 1 0 1 1 0 1 0 1 0 1
reduced to 35p3 . In other words, this protocol takes 15 0 0 0 0 1 1 1 0 1 1 0 1 0 0 1
magic states with error probability p as an input, and
outputs a single magic state with an error of 35p3 . Each of the first four rows describes one of the four
Simplifying the circuit. Using the commutation X stabilizers of the code, where 0 stands for 1 and 1
rules of Fig. 4b, we can commute the first set of multi- stands for X. For instance, the first row indicates that
target CNOTs to the right. This maps the Zπ/8 rota- the first X stabilizer of this 15-qubit code is 1 ⊗ 1 ⊗ 1 ⊗
tions onto Z-product π/8 rotations. Since controlled- X ⊗ 1 ⊗ 1 ⊗ 1 ⊗ 1 ⊗ X ⊗ X ⊗ X ⊗ X ⊗ X ⊗ X ⊗ X.
Pauli gates satisfy C(P1 , P2 ) = C(P1 , P2 )† , the multi- The rows below the horizontal bar – in this case the
target CNOTs of the encoding circuit will cancel the last row – show the logical X operators of the code.
multi-target CNOTs of the decoding circuit, leaving a The circuit in Fig. 13 is then obtained by placing a |+i
circuit of 15 Z-type π/8 rotations in Fig. 12. state for each row and a π/8 rotation for each column,
Note that qubits 6-15 in this circuit are entirely re- with the axis of rotation determined by the indices in
dundant. They are initialized in a Z eigenstate, are then the column – a 1 for each 0 and a Z for each 1.
part of a Z-type rotation, and are finally measured in
the Z basis, trivially yielding the outcome +1. Since
they serve no purpose, they can simply be removed to 3.2 Triorthogonal codes
yield the five-qubit circuit in Fig. 13, where we have
absorbed the single-qubit π/8 rotations into the initial The aforementioned circuit translation can be applied
|+i states and rearranged the remaining 11 rotations. to any code with transversal T gates. One particu-
This kind of circuit simplification is equivalent to the larly versatile and simple scheme to generate such codes
space-time trade-offs mentioned in Ref. [16] and can be is based on triorthogonal matrices [16, 17], which we
applied to any protocol that is based on a code with briefly review in this section. The first step is to write

11
Figure 14: 20-to-4 distillation circuits that uses 7 qubits and 17 π/8 rotations.

down a triorthogonal matrix G, such as puncture the matrix in Eq. (4) once by removing the
  first column, we retrieve the 15-to-1 protocol of Eq. (1).
11111111111111 1 1 We can also puncture it twice by removing the first two
0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
  columns. This yields the matrix
G= 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1. (2)
0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1

0 0 1 0 0 0 0 1 1 1 1 1 1 1

01010101010101 0 1  0 1 0 0 1 1 1 0 0 0 1 1 1 1
 
 1 0 0 1 0 1 1 0 1 1 0 0 1 1 ,
M14-to-2 =  (5)

Triorthogonality refers to three criteria: i) The number  0 0 0 1 1 0 1 1 0 1 0 1 0 1
of 1s in each row is a multiple of 8. ii) For each pair 0 0 0 1 1 1 0 1 1 0 1 0 0 1
of rows, the number of entries where both rows have a
1 is a multiple of 4. iii) For each set of three rows, the which describes a 14-to-2 protocol. The corresponding
number of entries where alls three rows have a 1 is a circuit can be simply read off from this matrix. It is
multiple of 2. In other words, almost identical to the 15-to-1 protocol of Fig. 13, ex-
X cept that the fourth qubit is initialized in the |+i state
∀a : Ga,i = 0 (mod 8) and is not measured at the end of the circuit, but in-
i
X stead outputs a second magic state. However, because
∀a, b : Ga,i Gb,i = 0 (mod 4) (3)
i the code of 14-to-2 has a code distance of 2, the output
error probability is higher, namely 7p2 [17]. Punctur-
X
∀a, b, c : Ga,i Gb,i Gc,i = 0 (mod 2)
i ing the matrix G̃ any further would yield codes with a
A general procedure based on classical Reed-Muller distance lower than 2, precluding them from detecting
codes to obtain such matrices is described in Ref. [16]. errors and improving the quality of magic states. In
After obtaining a triorthogonal matrix, such as the fact, the minimum number of qubits in triorthogonal
one in Eq. (2), the second step is to put it in a row codes was shown to be 14 [32].
echelon form by Gaussian elimination Semi-triorthogonal codes. There are also codes
  that are based on “semi-triorthogonal” matrices, where
0000100001111111 all three conditions of Eq. (3) are only satisfied mod-
0 0 0 1 0 0 1 1 1 0 0 0 1 1 1 1 ulo 2. One example is the matrix
 
0 0 1 0 0 1 0 1 1 0 1 1 0 0 1 1 . (4)
G̃ =  
0 1 0 0 0 1 1 0 1 1 0 1 0 1 0 1  
1000011101101001 0 0 0 0 0 0 1 1 0 0 1 1 01 1 1 0 1 1 0 1 1 0 1
0 0 0 0 0 1 0 1 0 1 0 1 11 0 1 1 0 1 1 0 1 1 0
 
The last step is to remove one of the columns that con- 0 0 0 0 1 0 0 1 1 0 0 1 10 1 0 1 1 0 1 1 0 1 1
 
tains a single 1, i.e., one of the first five columns, which 0 0 0 1 0 0 0 0 1 1 1 1 01 0 0 0 0 0 0 0 0 1 1
 ,
is also called puncturing. Puncturing an a×b triorthog- 0 0 1 0 0 0 0 0 1 1 1 1 00 0 0 0 0 0 1 1 1 0 0
 
onal matrix k times yields a code with mx = b − k, 0 1 0 0 0 0 0 0 1 1 1 1 00 0 0 1 1 1 0 0 0 0 0
n = a − k and k logical qubits. The rows of the ma- 1 0 0 0 0 0 0 0 1 1 1 1 10 1 1 0 0 0 0 0 0 0 0
trix after puncturing that contain an even number of 1s (6)
describe X stabilizers, whereas the rows with an odd When this matrix is punctured four times, it yields a
number of 1s describe X logical operators. In terms of code that can be used for a 20-to-4 protocol. A scheme
distillation protocols, a code described by such a ma- to generate such matrices for 3k+8-to-k distillation is
trix can be used for n-to-k distillation. Indeed, if we shown in Ref. [17]. While semi-triorthogonal codes can

12
(a) Selective π/4 rotation (b) Auto-corrected π/8 rotation

(c) Implementation of the 15-to-1 circuit in Fig. 13

0 Step 1 1 Step 2 1 Step 3 11 Step 22 11 Step 23

Figure 15: Implementation of the 15-to-1 distillation protocol in our framework. Each time step in (c) corresponds to an auto-
corrected π/8 rotation (b), which in turn is based on selective π/4 rotations (a).

be used the same way for distillation as properly tri- state is consumed. These corrections slow down the pro-
orthogonal codes, their caveat is that the basis of the tocol, because they change the final X measurements
final qubit measurements may be different from X. A to Pauli product measurements. Instead, we use a cir-
procedure to determine this correction is outlined in cuit which consumes a magic state and automatically
Ref. [17]. For the case of the 20-to-4 protocol, the ma- performs the Clifford correction. It is based on the se-
trix that describes the code lective π/4 rotation circuit in Fig. 15a. To perform a
  Pπ/4 rotation according to the circuit in Fig. 9c, a |0i
001 100110110110110 1 1 state is initialized and P ⊗ Y is measured, which takes
0 1 0 1 0 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1
  1. However, the π/4 rotation is only performed if the
1 0 0 1 1 0 0 1 1 1 0 1 1 0 1 1 0 1 1 0
  |0i qubit is measured in X afterwards. If, instead, it is
M = 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 , measured in Z, the qubit is simply discarded without
20-to-4 
0 0 0 0 1 1 1 1 0 0 0 0 0 0 1 1 1 0 0 0
  performing any operation. In other words, the choice
of measurement basis determines whether a Pπ/4 or a 1
0 0 0 0 1 1 1 1 0 0 0 1 1 1 0 0 0 0 0 0
00001111111000000 0 0 0 operation is performed. This can be used to construct
(7) the circuit in Fig. 15b. Here, the first step to perform a
can be straightforwardly translated into the circuit in Pπ/8 gate is to measure P ⊗ Z between the qubits and
Fig. 14. However, in this case, the three measurements a magic state |mi, and simultaneously measure Z ⊗ Y
at the end of the circuit are Z ⊗ Z ⊗ X, X ⊗ Z ⊗ Z and between |mi and |0i. If the outcome of the first mea-
X ⊗ X ⊗ X. The output error rate for 20-to-4 is 13p2 on surement is +1, no Clifford correction is required and
any of the four qubits. Note that 3k+8-to-k protocols |0i is read out in Z. If the outcome is -1, |0i is measured
can be modified to 3k+4-to-k [32–34]. in X, yielding the required Clifford correction.
This can be used to implement the 15-to-1 protocol
3.3 Surface-code implementation of Fig. 13 in 11 using 11 tiles, as shown in Fig. 15c.
Four qubits are initialized in |mi, and a fifth in |+i.
Having outlined the general structure of distillation pro- A 2 × 2 block of tiles to the left is reserved for the
tocols, we now discuss their implementation with sur- |mi and |0i qubits of the auto-corrected π/8 rotations.
face codes. Distillation protocols are particularly sim- Two additional tiles are used for the ancilla qubit of
ple quantum circuits, since they exclusively consist of the Pauli product measurement protocol. In step 2, the
Z-type π/8 rotations. Therefore, we can use a construc- first π/8 rotation (1 ⊗ 1 ⊗ Z ⊗ Z ⊗ Z)π/8 is performed.
tion similar to the compact data block, and still only Depending on the measurement outcome of step 2, the
require 1 per rotation. We first discuss the example |0i ancilla is read out in the X or Z basis. This is
of 15-to-1 distillation. repeated 11 times, once for each of the 11 rotations in
Because the distillation circuit is relatively short, it Fig. 13. Finally, in step 23, qubits 1-4 are measured in
is useful to avoid the Clifford corrections of Fig. 7 that X. If all four outcomes are +1, the distillation protocol
may be required with 50% probability after a magic yields a distilled magic state in tile 5. Since 11 tiles are

13
Final measurements 0 Step 1 17 Step 34 17 Step 35

18 Step 36 18 Step 37 19 Step 38 19 Step 39

Figure 16: Implementation of the 20-to-4 protocol in our framework. The final measurements correspond to the last three
measurements of the circuit in Fig. 14.

used for 11, the space-time cost is 121d3 in terms of ∼(1 − p)n , since any error will result in failure. There-
(physical data qubits)·(code cycles) to leading order. fore, such a protocol distills k magic state on average
Caveat. Even though our leading-order estimate of every (n−mx )/(1−p)n time steps. Thus, the space-time
the time cost of 11d code cycles is correct, the full time cost per magic state is
cost also contains contributions that do not scale with d.
[1.5(mx + k) + 4](n − mx )
The two processes that may require special care in the cost(n, mx , k, p) = . (8)
magic state distillation protocol are state injection and k(1 − p)n
classical processing. Every 1 requires the initialization In order to minimize the space-time cost for distillation
of a magic state and a short classical computation to de- in our framework, one should pick a distillation protocol
termine whether the |0i state needs to be measured in that minimizes this quantity for a given input and target
X or Z. While neither of these processes scales with d, error rate.
they can slow down the distillation protocol, depending 20-to-4 protocol. The previous estimate is only
on the injection scheme and the control hardware that valid for triorthogonal codes. With semi-triorthogonal
is used. This slowdown can be avoided by using addi- codes, additional time steps may be necessary to per-
tional 2 × 2 blocks of |0i-|mi pairs, as shown in Fig. 17 form the final measurements. The example of the 20-
for one additional block. Here, the left and right block to-4 protocol is shown in Fig. 16. Because the three
can be used in an alternating fashion, i.e., the left block qubits that are measured are discarded at the end of
for rotations 1, 3, 5, . . . and the right block for rotations the protocol, the three Pauli products can be measured
2, 4, 6, . . . While one block is being used for a rotation, in 2 instead of 3 as in Fig. 14. For this, the oper-
the other one can be used to prepare a new magic state ator Z ⊗ Z ⊗ X is measured in the first step. In the
and to process the measurement outcomes of the previ- second step, X ⊗ 1 ⊗ 1 and 1 ⊗ Z ⊗ Z are measured si-
ous rotation. multaneously. Their product yields one of the required
General space-time cost. The scheme of Fig. 15 measurements. Finally, qubits 2 and 3 are measured in
can be used to implement any protocol based on a X at no time cost. Multiplying these two results with
triorthogonal code. For an n-qubit code with k log- the X measurement in the previous step yields the final
ical qubits and mx X stabilizers, the protocol uses X ⊗ X ⊗ X measurement. Thus, the 20-to-4 protocol
1.5(mx + k) + 4 tiles for (n − mx ) . In this time, requires 17 for the π/8 rotations and 2 for the final
it distills k magic states with a success probability of measurements. With a space cost of 14 tiles, the total
space-time cost is 266d3 .

3.4 Benchmarking
We can use the previously described 15-to-1 and 20-
to-4 schemes to benchmark our implementations. In
Ref. [36], these schemes were implemented with lattice
Figure 17: Two 2 × 2 ancilla blocks can be used to prevent surgery and their cost compared to implementations
state injection and classical processing from slowing down the based on braiding of hole defects. In addition, the 7-
15-to-1 protocol. to-1 scheme was considered, which is a scheme to distill

14
ancilla

ancilla

ancilla
ancilla
ancilla
ancilla

ancilla

ancilla
Figure 18: 176-tile block that can be used for 225-to-1 distillation. The qubits highlighted in red are used for the second level of
the distillation protocol. The blue ancilla is used to move level-1 magic states into the two |mi-|0i blocks of the level-2 distillation.

|Y i states. The distillation of these states is not neces- Concatenation. In the 15-to-1 protocol, we use 15
sary in our framework, but for benchmarking purposes undistilled magic states to obtain a distilled magic state
we show the 7-to-1 protocol in Appendix D. It can be with an error rate of 35p3 . If we perform the same pro-
implemented using 7 tiles for 4, i.e., with a space-time tocol, but use 15 distilled magic states from previous
cost of 28d3 . 15-to-1 protocols as inputs, the output state will have
We summarize the leading-order space-time costs an error rate of 35(35p3 )3 = 1500625p9 . This corre-
of the three protocols in Table 1. The comparison sponds to a 225-to-1 protocol obtained from the con-
shows drastic reductions in space-time cost compared catenation of two 15-to-1 protocols. It is also possible
to schemes based on braiding of hole defects and com- to concatenate protocols that are not identical. Strate-
pared to other approaches to optimizing lattice surgery. gies to combine high-yield and low-yield protocols are
Compared to the braiding-based scheme, the space-time discussed in Ref. [17].
cost of 7-to-1, 15-to-1 and 20-to-4 is reduced by 60%, In Fig. 18, we show an unoptimized block that can
84% and 89%, respectively. be used for 225-to-1 distillation. It consists of 11 15-to-
1 blocks that are used for the first level of distillation.
3.5 Higher-fidelity protocols Since each of these 11 blocks takes 11 to finish, they
can be operated such that exactly one of these blocks
So far, we have only explicitly discussed protocols that finishes in every time step. Therefore, in every time
reduce the input error to ∼p2 or ∼p3 . There are two step, one first-level magic state can be used for second-
strategies to obtain protocols with a higher output fi- level distillation by moving it into one of the two level-2
delity: concatenation and higher-distance codes. |mi-|0i blocks via the blue ancilla. The qubits that are
used for the second level are highlighted in red. Note
that since, for the second level, the single-qubit π/8
7-to-1 15-to-1 20-to-4 rotations require distilled magic states, the 15-to-1 pro-
Hole braiding [19, 35] 70d 3
750d 3
2344d3 tocol of Fig. 13 requires 15 rotations instead of just 11.
Therefore, the entire protocol finishes in 15 using 176
Lattice surgery [36] 140d3 540d3 1134d3 tiles with a total space-time cost of 2640d3 .
Our framework 28d3 121d3 266d3 Higher-distance codes. Alternatively, we can use
a code that produces higher-fidelity states. In Ref. [16],
Table 1: Comparison of the leading-order space-time cost of 7- several protocols based on punctured Reed-Muller codes
to-1, 15-to-1 and 20-to-4 with defect-based schemes, optimized are discussed. One of these protocols is a 116-to-12
lattice surgery in Ref. [36] and our schemes. The space-time protocol based on a code with n = 116, k = 12 and
cost is in terms of (physical data qubits)·(code cycles). mx = 17. It yields 12 magic states which each have an

15
Summary. The class of magic state distillation pro-
ancilla 2
tocols that are based on an n-qubit error-correcting
code with mx X stabilizers and k logical qubits can
be implemented using 1.5(mx + k) + 4 tiles and n − mx
ancilla 1 time steps. Such protocols output k magic states with
a success probability of (1 − p)n . Therefore, if the in-
put fidelity and desired output fidelity are known, the
distillation protocol should minimize the cost function
given in Eq. (8).

Figure 19: 81-tile block that can be used for the 116-to-12
protocol. Here, two π/8 rotations can be performed at the 4 Trade-offs limited by T count
same time, where one rotation uses the ancilla space denoted
as ancilla 1, and the other one uses ancilla 2. Having discussed data blocks and distillation blocks in
the previous two sections, we are now ready to piece
them together to a full quantum computer. In order to
illustrate the steps that are necessary to calculate the
error rate of 41.25p4 . According to Eq. (8), this pro- space and time cost of a computation and to trade off
tocol can be implemented using 44 tiles for 99 with space against time, we consider an example computa-
a space-time cost of 363d3 per output state and a suc- tion with a T count of 108 and a T depth of 106 . We
cess probability of (1 − p)112 . For protocols with a high consider error rates of p = 10−3 and p = 10−4 . This er-
space cost such as 116-to-12, the space-time cost can be ror rate is assumed to be the physical error rate per code
slightly reduced by introducing additional ancilla space, cycle of every physical qubit, as well as the error rate of
such that two operations can be performed simultane- undistilled magic states. To calculate concrete numbers,
ously. One possible configuration is shown in Fig. 19. we assume that the quantum computer can perform a
This increases the space cost to 81 tiles, but reduces code cycle every 1 µs. We want to perform the 108 -T -
the time cost to 50, with a total space-time cost of gate computation in a way that the probability of any
337.5d3 per output state. one of the T gates being affected by an error stays be-
Input-to-output ratio is not everything. A pop- low 1%. In addition, we require that the probability of
ular figure of merit when comparing n-to-k distillation an error affecting any of the logical qubits encoded in
protocols is the ratio n/k. One of the protocols in surface-code patches stays below 1%. This results in a
Ref. [16] is a 912-to-112 protocol with n = 912, k = 112 2% chance that the quantum computation will yield a
and mx = 64, which yields 112 output state, each with wrong result. In order to exponentially increase the pre-
an error rate of 10.63p6 . While the output fidelity is cision of the computation, it can be repeated multiple
not as high as for 225-to-1, the input-to-output ratio is times or run in parallel on multiple quantum computers.
much higher. For p = 10−3 , the output fidelity of 225-
to-1 is ∼1.5 × 10−21 , while it is only ∼10−17 for 912-
4.1 Step 1: Determine distillation protocol
to-112. Therefore, if input-to-output ratio were a good
figure of merit, we would expect the 912-to-112 proto- The first step is to determine which distillation protocol
col to be considerably less costly compared to 225-to-1. is sufficient for the computation. In order to stay below
If we use an implementation in the spirit of Fig. 19, 1% error probability with 108 T gates, each magic state
the space cost is roughly 2.5(mx + k) tiles and the pro- needs to have an error rate below 10−10 . For p = 10−4 ,
tocol takes (n − mx )/2 time steps. Thus, 912-to-112 the 15-to-1 protocol is sufficient, since it yields an out-
uses 440 tiles for 424. This would put the space-time put error rate of 35p3 = 3.5 · 10−11 . For p = 10−3 ,
cost per state at 1665d3 , which is indeed lower than 15-to-1 is not enough. On the other hand, two levels of
that of 225-to-1. However, the success probability of 15-to-1, i.e., 225-to-1, yield magic states with an error
912-to-112 for p = 10−3 is only at ∼40%, which more rate of 1.5 · 10−21 , which is many orders of magnitude
than doubles the actual space-time cost. On the other above what is required. A less costly protocol is 116-
hand, the space-time cost of 225-to-1 is barely affected to-12, which yields output states with an error rate of
by the success probability, as each of the level-1 15-to- 41.25p4 = 4.125 · 10−11 , which suffices for our purposes.
1 blocks finishes with 98.5% success probability. This
means that, with 1.5% probability, a time step of 225-to-
4.2 Step 2: Construct a minimal setup
1 is skipped, since the necessary level-1 state is missing.
This only increases the space-time cost from 26403 to In order to determine the necessary code distance, we
2680d3 , implying that n/k is not a good figure of merit. first construct a minimal setup, i.e., a configuration of

16
(a) Minimal setup for p = 10−4 (a) Intermediate setup for p = 10−4

(b) Intermediate setup for p = 10−3

(b) Minimal setup for p = 10−3

Figure 20: Minimal setups using compact data blocks for p =

10−4 (with 15-to-1 distillation) and p = 10−3 (with 116-to- Figure 21: Intermediate setups using intermediate data blocks
12 distillation). Blue tiles are data block tiles, orange tiles and two 15-to-1 distillation blocks for p = 10−4 or one compact
are distillation block tiles, green tiles are used for magic state 116-to-12 distillation block for p = 10−3 .
storage and gray tiles are unused tiles.

at the same time, these states need to be stored before

tiles that can be used for the computation and uses as being consumed. Therefore, we introduce additional
little space as possible. The reason why this is useful storage tiles (green tiles in Fig. 20b). Here, we choose
to determine the code distance is that the initial space- the 12 output states to be qubits 6, 8, 10, . . . , 26 and 27.
time trade-offs that we discuss significantly improve the In the last step of the protocol these states are moved
overall space-time cost. Therefore, the minimal setup into the green space, where they are consumed by the
can be used to comfortably upper-bound the required data block one after the other. This minimal setup uses
code distance. 153 tiles for the data block, 44 tiles for the distillation
For p = 10−4 , a minimal setup consists of a compact block and 13 tiles for storage. In total, it uses 210 tiles
data block and a 15-to-1 distillation block, see Fig. 20a. and finishes the computation in 9.23 · 108 time steps.
The compact block stores 100 qubits in 153 tiles and
requires up to 9 to consume a magic state. The 15- 4.3 Step 3: Determine code distance
to-1 distillation block uses 11 tiles and outputs a magic
state every 11 with 99.9% success. To ensure that the Since each tile corresponds to d × d physical data qubits
tile of the distillation block that is occupied by qubit 5 is and each time step corresponds to d code cycles, 164 en-
not blocked during the first time step of the distillation coded logical qubits need to survive for (11 · 108 )d code
protocol, the first π/8 rotation of the protocol should cycles for the minimal setup with p = 10−4 . The proba-
be chosen such that it does not involve qubit 5, e.g., the bility of a single logical error on any of these 164 qubits
fourth rotation of Fig. 13. In total, this minimal setup needs to stay below 1% at the end of the computation.
uses 164 tiles and performs a T gate every 11, i.e., The logical error rate per logical qubit per code cycle
finishes the computation in 11 · 108 time steps. can be approximated [30] as
For p = 10−3 , a minimal setup consists of a compact
pL (p, d) = 0.1(100p)(d+1)/2 (9)
data block and a 116-to-12 distillation block, as shown
in Fig. 20b. For the minimal setup, we do not use the for circuit-level noise. Therefore, the condition to de-
larger and faster distillation block shown in Fig. 19, but termine the required code distance is
instead a block in the spirit of the 15-to-1 block. This
116-to-12 distillation block uses 44 tiles and distills 12 164 · 11 · 108 · d · pL (10−4 , d) < 0.01 . (10)
magic states in 99 with 89.4% success probability, i.e.,
on average one state every 9.23. Because this distil- For distance d = 11, the final error probability is at
lation protocol outputs magic states in bursts, i.e., 12 19.8%. Therefore, distance d = 13 is sufficient, with a

17
(a) Fast setup for p = 10−4 (b) Fast setup for p = 10−3

fast data block storage tiles

distillation block unused tiles

Figure 22: Fast setups using fast data blocks and 11 15-to-1 distillation blocks for p = 10−4 or 5 116-to-12 distillation block for
p = 10−3 .

final error probability of 0.2%. The number of physi- to 5.5 per state. However, the compact block can
cal qubits used in the minimal setup can be calculated only consume magic states at 9 per state. In order to
as the number of tiles multiplied by 2d2 , taking mea- avoid this bottleneck, we can use the intermediate data
surement qubits into account. The minimal setup for block instead, which occupies 204 tiles, but consumes
p = 10−4 uses 164 · 2 · 132 ≈ 55,400 physical qubits and one magic state every 5. With 22 tiles for distillation
finishes the computation in 13·11·108 code cycles. With (see Fig. 21), this setup uses 226 tiles and finishes the
1 µs per code cycle, this amounts to roughly 4 hours. computation after 5.5·108 time steps. This increases the
For p = 10−3 , the condition changes to qubit number to 76,400, but reduces the computational
time to 2 hours.
210 · 9.23 · 108 × d · pL (10−3 , d) < 0.01 , (11)
For p = 10−3 , the addition of a distillation block
which is satisfied for d = 27 with a final error probability
reduces the distillation time to 4.62. At this point,
of 0.5%. The final error probability for d = 25 is at
one should switch to the more efficient 116-to-12 block
4.8%. Thus, the minimal setup uses 210 · 2 · 272 ≈
of Fig. 19, which uses 81 tiles and distills a magic state
306,000 physical qubits and finishes the computation in
on average every 4.66. The intermediate data block
27 · 9.23 · 108 code cycles, which amounts to roughly
cannot keep up with this distillation rate, but we can
7 hours. Note that, in principle, a success probability
still use it to consume one magic state every 5 instead
of less than 50% would be sufficient to reach arbitrary
of 4.66. Such a configuration uses 228 data tiles, 81
precisions by repeating computations or running them
distillation tiles and 13 storage tiles, i.e., a total of 322
in parallel. This means that the code distances that we
tiles corresponding to approximately 469,000 physical
consider may be higher than what is necessary.
qubits. The computational time reduces to 5 · 108 time
steps, i.e., 3.75 hours. Note that in Fig. 21b, the 12
4.4 Step 4: Add distillation blocks output states of the 116-to-12 protocol should be chosen
Only a small fraction of the tiles of the minimal setup is as 1, 3, 5, . . . , 25. They can be moved into the green
used for magic state distillation, i.e., 6.7% for p = 10−4 storage space in the last step of the protocol, since the
and 21% for p = 10−3 . On the other hand, adding one space denoted as ancilla 2 in Fig. 19 is not being used
additional distillation block doubles the rate of magic in the last step.
state production, potentially doubling the speed of com- Trade-offs down to 1 per T gate. Adding addi-
putation. Therefore, in order to speed up the computational distillation blocks can reduce the time per T gate
tion and decrease the space-time cost, we add additional down to 1. For p = 10−4 , 11 distillation blocks pro-
distillation blocks to our setup. duce 1 magic state every 1. To consume these magic
For p = 10−4 , adding one more distillation block re- states fast enough, we need to use a fast data block.
duces the time that it takes to distill a magic state This fast block uses 231 tiles and the 11 distillation

18
blocks together with their storage tiles use 11∗12 = 132 der to fully exploit the space-time trade-offs discussed
tiles, as shown in Fig. 22a. With a total of 363 tiles, this in this section, the input circuit should be optimized for
setup uses 123,000 qubits and finishes the computation T count.
in 108 , i.e., in 21 minutes and 40 seconds.
For p = 10−3 , parallelizing 5 distillation blocks pro-
duces a magic state every 0.924. This is faster than 5 Trade-offs limited by T depth
the fast block can consume the states, but allows for
the execution of a T gate every 1. With 231 tiles for In the previous section, we parallelized distillation
the fast block, 405 distillation tiles and 60 storage tiles, blocks to finish computations in a time proportional to
the total space cost is 696 tiles. The setup shown in the T count. In this section, we combine the previous
Fig. 19b contains four unused tiles to make sure that constructions of data and distillation blocks to what we
all storage lines are connected to the data block. Stor- refer to as units. By parallelizing units, we exploit the
age lines need to be connected to the ancilla space of the fact that, in our example, the 108 T gates are arranged
data block either directly, via other storage lines or via in 106 layers of 100 T gates to finish the computation
unused tiles. In any case, this corresponds to roughly in a time proportional to the T depth. We first slightly
1,020,000 physical qubits. The computation finishes af- increase the space-time cost compared to the previous
ter 45 minutes. section, in order to speed up the computation down to
one measurement per T layer. In this sense, we imple-
Avoiding the classical overhead. Every con-
ment Fowler’s time-optimal scheme [20].
sumption of a magic state corresponds to a Pauli prod-
uct measurement, the outcome of which determines
whether a Clifford correction is required. This cor- 5.1 T layer parallelization
rection is commuted past the following rotations, po-
tentially changing the axis of rotation. Therefore, the The main concept used to parallelize T layers is quan-
computation cannot continue before the measurement tum teleportation. The teleportation circuit is shown
outcome is determined. This involves a small classical in Fig. 23a. It
√ starts with the generation of a Bell pair
computation to process the physical measurements (i.e., (|00i+|11i)/ 2 by the Z ⊗Z measurement of |+i⊗|+i.
decoding and feed-forward), which could slow down the An arbitrary gate U is performed on the second half of
quantum computation. In order to avoid this, the magic the Bell pair. Next, a qubit |ψi and the first half of the
state consumption can be performed using the auto- Bell pair are measured in the Bell basis, i.e., in X ⊗ X
corrected π/8 rotations of Fig. 15b. Here, the classi- and Z ⊗ Z. After the measurement, the first two qubits
cal computation merely determines, whether the ancilla are discarded and |ψi is teleported to the third qubit
qubit – which we refer to as the correction qubit |ci – is through the gate U . This means that the output state
measured in the X or Z basis. While this classical com- is U |ψi, if the teleportation is successful. However, it
putation is running, the magic state for the following is only successful, if both Bell basis measurements yield
π/8 rotation can be consumed, as the auto-corrected a +1 outcome. In the other three cases, the teleported
rotation involves no Clifford correction. This means state is U X |ψi, U Y |ψi or U Z |ψi. Note that the cor-
that distillation blocks should output |mi − |ci pairs, rection operation to recover the state |ψi is not a Pauli
for which we construct modified distillation blocks in
the following section. If the classical computation is, (a) Teleportation circuit
on average, faster than 1 (i.e., d code cycles), then
classical processing does not slow down the quantum
computation in the T -count-limited schemes.
Summary. Data blocks combined with distillation
blocks can be used for large-scale quantum computing.
The first step is to determine a sufficiently high-fidelity
distillation protocol. Next, one constructs a minimal (b) Teleportation through a π/8 rotation
setup from a compact data block and a single distilla-
tion block to upper-bound the required code distance.
Finally, one can trade off space against time by using Figure 23: (a) Circuit for quantum teleportation of |ψi through
fast data blocks and adding more distillation blocks. a gate U . Only if both Bell basis measurement yield +1, the
This can reduce the time per T gate down to 1. In teleported state is U |ψi. If Z ⊗ Z = −1, the state is U X |ψi.
our example, the trade-off also reduces the space-time If X ⊗ X = −1, the state is U Z |ψi. If both measurements
cost compared to the minimal setup by a factor of 5 for yield -1, the state is U Y |ψi. (b) If U is a π/8 rotation, the
p = 10−4 and by a factor of 2.8 for p = 10−3 . In or- corrective Paulis change Pπ/8 to P−π/8 .

19
(a) Clifford+T circuit (b) Post-corrected π/8 rotation

| {z } | {z } | {z }
layer 1 layer 2 layer 3

(c) Time-optimal Clifford+T circuit

Figure 24: Time-optimal implementation of a three-qubit quantum computation consisting of 9 T gates in 3 T layers. Post-
corrected π/8 rotations (b) can be used to decide at a later point, whether the performed operation was a Pπ/8 or a P−π/8
rotation.

operation P , but instead U P U † , which, in general, is as ecute multiple T layers simultaneously. If U is a product
difficult to perform as U itself. of mutually commuting π/8 rotations, i.e., a T layer,
If U is a Pπ/8 rotation, as in Fig. 23b, the Pauli er- the teleportation corrections replace all π/8 rotations
rors change Pπ/8 to P−π/8 up to a Pauli correction. with post-corrected rotations. An example is shown in
Since it is only after the Bell basis measurement that Fig. 24 for a three-qubit computation of three T layers,
we know, whether we should have performed a Pπ/8 or where all three T layers are executed simultaneously.
a P−π/8 gate, we use post-corrected π/8 rotations in The reason why we can only group up T gates that are
Fig. 24b, which are similar to the auto-corrected rota- part of the same layer is that otherwise the Pauli correc-
tions of Fig. 15b. The post-corrected rotation uses a tions of the post-corrected rotation would not commute
resource state consisting of two qubits, a magic state with the other rotations. The time-optimal circuit con-
|mi and a second qubit that we refer to as a correction sists of three steps: The preparation of Bell pairs for
qubit |ci. The resource state is generated by initializing each T layer, the application of T gates, and a set of fi-
|ci in |0i and measuring Z ⊗ Y between |mi and |ci. In nal Bell measurements. At this point, the computation
order to perform a post-corrected π/8 rotation, the re- is not finished, as we still need to measure the correction
source state is consumed by measuring P ⊗ Z involving qubits of the post-corrected rotations. Because these in-
the magic state, and measuring |mi in X. The correc- volve potential Pauli corrections, the correction qubits
tion qubit |ci is stored for later use. It can be used at of the different T layers need to be measured one after
a later moment to decide, whether the rotation should the other. Thus, every T layer is executed one after the
have been a +π/8 or −π/8 rotation by measuring |ci other, where each execution requires the time that it
either in the Z or X basis. Depending on the measure- takes to measure the correction qubits and perform the
ment outcome, a Pauli correction may be required. classical processing to determine the next set of mea-
The time-optimal circuit. This can be used to ex- surements from the Pauli corrections. We refer to this

20
Figure 25: An example of a time-optimal circuit using four units. In this case, each unit consists of six qubits, i.e., it is a three-qubit
quantum computation, where three T layers can be executed simultaneously.

time as tm . In other words, any Clifford+T circuit con- next unit preparation. For the first and last block, on
sisting of nL T layers can be executed in nL · tm , inde- the other hand, the required storage space is halved.
pendent of the code distance, which is the main feature In the following, we will show how to prepare units
of the time-optimal scheme [20]. in our framework. We find that, for our examples, unit
preparation takes 113. If tm = 1 µs, then nmax is
The circuit in Fig. 24c naively requires 2n · nL qubits
∼1500 for p = 10−4 and ∼3000 for p = 10−3 . Indepen-
for an n-qubit computation, which scales with the
dently of the error rate, the computational time drops
length of the computation. Since we only have a finite
to one second.
number of qubits at our disposal, our goal is to imple-
ment the circuit in Fig. 25 instead. Here, the qubits
form groups of 2n qubits. We refer to each of these 5.2 Units
groups as a unit. Using nu units, nu −1 layers of T gates
Units differ from the fast setups in Fig. 22 in three as-
can be performed at the same time. In the circuit, the
pects. First, the number of qubits stored in the data
steps of Bell state preparation (BP ), post-corrected T
block is doubled. Secondly, the distillation protocols are
layer execution (T ) and Bell basis measurement (BM )
modified to output |mi-|ci pairs, instead of just magic
are performed repeatedly until the end of the computa-
states |mi. Thirdly, in order to store correction qubits
tion. We refer to the block of operations (BP -T -BM )
|ci, additional space is required. Contrary to magic-
as unit preparation. Every time that unit preparation is
state storage tiles, correction-qubit storage tiles do not
finished, all qubits except for the correction qubits (not
need to be connected to the data block’s ancilla region.
shown in Fig. 25) and half of the qubits of the last unit
Modified distillation blocks. In order to have dis-
are discarded. At this point, the next set of unit prepa-
tillation blocks output |mi-|ci pairs, extra tiles and op-
rations begins. Simultaneously, the correction qubits of
erations are required. We show the necessary modifi-
the recently finished units are measured one after the
cations for the example of 15-to-1 and 116-to-12 distil-
other, which has a time cost of (nu −1)·tm . This means
lation. A modified 15-to-1 block is shown in Fig. 26a.
that the number of units can be increased to speed up
Apart from the standard 11 distillation tiles (orange)
the computation, until (nu − 1) · tm reaches the time
and one magic-state storage tile (green), it also contains
that it takes to prepare a unit tu . At this maximum
19 correction-qubit storage tiles (purple) and an addi-
number of units tmax = tu /tm + 1, a T layer is executed
tional tile (gray) that is used for neither distillation nor
every tm and the computation cannot be sped up any
storage. The additional steps that modify the protocol
further in the Clifford+T framework.
are shown in Fig. 26c, which zooms into the highlighted
Note that the first and last unit differ from the other region of Fig. 26a. Step 1 of the shown protocol is right
units. While all other units need to execute nT T gates as the distillation finishes after 11. The patch of the
every tu , the first and last unit need to execute nT T output state is deformed in step 2, and an additional
gates only every 2tu , where nT is the number of T gates qubit |ci is initialized in the |0i state. The Y ⊗ Z op-
per layer. Furthermore, the other blocks need to be able erator between |ci and |mi is measured in step 3. In
to store up to 2nT correction qubits, since, after the end step 4, the correction qubit is sent to storage. Finally,
of a unit preparation, nT correction qubits are stored, in step 5, the magic state |mi is moved to its storage
and may need to remain stored until the end of the tile. This operation blocks one of the orange tiles that is

21
(a) Modified 15-to-1 block (c) Modified 15-to-1 protocol
11 Step 1 12 Step 2

(b) Modified 116-to-12 block

13 Step 3 14 Step 4 15 Step 5

(d) Modified 116-to-12 protocol

50 Step 1 52 Step 2 53 Step 3

Figure 26: Modified 15-to-1 distillation blocks (a) output a |mi-|ci pair every 11. After the end of the distillation protocol, four
additional steps (c) are necessary. The modified 116-to-12 distillation block (b) finishes after 53, due to the three additional
steps in (d).

used for the distillation protocol for 4. Still, this does and a number of distillation blocks. Since we will show
not slow down 15-to-1 distillation, since the first 4 rota- that unit preparation takes 113 in our case, the num-
tion of the protocol in Fig. 13 can be chosen, such that ber of distillation blocks is chosen such that at least
the output qubit is not needed. Therefore, the modified 100 |mi-|ci pairs can be distilled in 113. A full time-
distillation block outputs one |mi-|ci pair every 11. optimal quantum computer consists of a row of multiple
For 116-to-12 distillation, a modified block is shown units, see Fig. 28c. The units shown in the figure con-
in Fig. 26b. We arrange the qubits, such that the 12 out- tain some unused tiles. This gives the units a rectangu-
put states are found in the positions shown in step 1 of lar profiles, even though this is not necessarily required.
Fig. 26d. Using 2, correction qubits are prepared and
Y ⊗ Z operators are measured. Finally, the patches are 0 Step 1
deformed back to square patches and all magic states
are sent to the green storage, while all correction qubits
are sent to the purple storage. This adds 3 to the pro-
tocol, meaning that this block outputs 12 |mi-|ci pairs
every 53 with a success probability of (1 − p)112 . For 1 Step 2 1 Step 3
p = 10−3 , this corresponds to one output every 4.94.
As mentioned in Sec. 4, modified distillation blocks
can also be used with setups, in which T gates are per-
formed one after the other, in order to deal with slow
classical processing. In this case, only one correction 2 Step 4 2 Step 5
qubit storage tile per magic state is required.
Units. Modified distillation blocks together with fast
data blocks are what we refer to as units. The units for
our example computation for p = 10−3 and p = 10−4
are shown in Fig. 28a-b. They both consist of a 200-
qubit fast data block, 200 correction-qubit storage tiles, Figure 27: Bell basis measurement (BM ) in 2.

22
(a) Unit for p = 10−3

(b) Unit for p = 10−4

data |mi storage
distillation |ci storage
unused tiles

(c) Time-optimal setup

unit 1

unit 2

unit 3

unit 4

Figure 28: Units consist of fast data blocks, modified distillation blocks and storage tiles. (a) The unit for p = 10−3 consists of
54 × 21 = 1134 tiles. (b) For p = 10−4 , the number of tiles is 37 × 21 = 777. (c) A time-optimal setup consists of a row of
multiple units, which means that the space to the bottom and top of the fast data blocks needs to remain free.

In our case, the units have a footprint of 54 × 21 and This arrangement of qubits implies that, for every
37 × 21 tiles, respectively. Note that the first and last six-corner patch, one of the qubits needs to be part of a
unit of a time-optimal setup are smaller, as they only Bell state preparation (BP ) with the neighboring unit
require 100 correction-qubit storage tiles and half the to the top, and the other with a neighboring unit to the
number of distillation blocks. bottom. For an n-qubit quantum computation,
√ this Bell
Unit preparation. In order to implement the time- state preparation can be performed in n+1 time steps,
optimal circuit of Fig. 25 with the setup of Fig. 28, we as we show in Fig. 29 for the example of n = 9. For this,
show protocols that can be used for the BP -T -BM op- every qubit is initialized in the |+i state. The Bell state
erations. The data blocks of every unit store 2n qubits preparation requires a series of Z ⊗ Z measurements.
in n six-corner patches. We arrange the qubits in such The protocol in Fig. 29 shows that, since an n-qubit
a way that the the final Bell measurements (BM ) are computation √ implies that the number of rows of the
Z ⊗ Z and X ⊗ X measurements of the two qubits of data
√ block is n, these measurements require a total of
every six-corner patch. This Bell measurement can be n + 1 time steps.
done in 2, as shown in Fig. 27. In total, the unit preparation of an n-qubit computa-

23
1 Step 1 2 Step 2 3 Step 3 4 Step 4 (a) Distributed quantum computing

ent. dist .
. ent. dist
unit un it
(b) effective circuit
ent. dist .
. ent. dist
Bell pairs Bell pairs
ent. dist. ent. dist.
unit unit
ent. dist. ent. dist.
Bell pairs Bell pairs
. ent. dist
ent. dist .
unit unit
Figure 29: Bell state preparation (BP ) for a 9-qubit compu-
. ent. dist
tation (18 qubits per unit) in 4. All six-corner patches are ent. dist .
initialized in the |+i⊗2 state. Each red arrow is a Z ⊗ Z mea-
surement between the two qubits at the ends of the arrow. For
√
n-qubit computations, this requires n + 1 time steps.
Figure 30: Scheme for distributed quantum computing in a
circular arrangement of quantum computers with the ability
√ to share Bell pairs between nearest neighbors. If the Bell-pair
tion with nT T gates per layer requires n+1 time steps
fidelity is low, entanglement distillation (ent. dist.) can be used
for the Bell state preparation, nT time steps for the exe-
to increase the fidelity. This scheme effectively implements the
cution of the T layer, and 2 time steps
√ for the Bell basis circular time-optimal circuit drawn schematically in (b).
measurement, i.e., a total of nT + n + 3 time steps. In
our example, this amounts to 113, which corresponds
to tu = 1469 µs for p = 10−4 and tu = 3051 µs for each other. This implies that, if Bell pairs can be shared
p = 10−3 . Thus, time optimality is reached with 1470 between different quantum computers, each unit can be
units for p = 10−4 and 3052 units for p = 10−3 . located in a separate quantum computer. The shared
Space-time trade-offs. Of course, it is also possi- Bell pairs do not even need to have a high fidelity, as
ble to use fewer units than required for time optimality. software-based entanglement distillation [37, 38] can be
Using nu units means that nT · (nu − 1) T gates are per- used to convert a large number of low-fidelity Bell pairs
formed every tu . In our example, 100 · (nu − 1) T gates into fewer high-fidelity Bell pairs. Recent experiments
are performed every 113. With three units, the com- have made progress towards generating entanglement
putational time drops to 56.5% of the computational between different superconducting chips [39–41].
time of the fast setup in Fig. 22. With ten units, it drops For the time-optimal scheme, quantum computers
to 11%. The number of qubits per unit is ∼260,000 may be arranged in a circle as shown in Fig. 30a,
for p = 10−4 and ∼1,650,000 for p = 10−3 , so going with the ability to share Bell pairs between neighboring
from the fast setup to parallelized units is, initially, not quantum computers. This effectively implements the
a favorable space-time trade-off. Since the space-time circuit that is schematically drawn in Fig. 30b. Note
cost has increased compared to the fast setup, it is also that in this circuit, there is no first and last unit. Here,
useful to check whether the code distance needs to be every unit performs nT π/8 rotations every tu . There-
readjusted. If we use three units – ignoring that the first fore, time optimality is reached with one fewer unit, and
and last unit are, in principle, smaller – the space-time each unit only needs to store nT correction qubits in-
cost is still below the space-time cost of the minimal stead of 2nT . With only 100 correction-qubit storage
setup in both cases. Adding more units significantly tiles and ignoring the unused tiles, the qubit count of
improves the space-time cost. It is also a prescription the units in Fig. 28 drops to ∼220,000 for p = 10−4 and
to linearly speed up the quantum computer down to the ∼1,470,000 for p = 10−3 , which are the numbers that
time-optimal limit. we report in Fig. 3. Thus, if nearest-neighbor communi-
cation between quantum computers is feasible, already
5.3 Distributed quantum computing fewer than 2 million physical qubits per quantum com-
puter can be used to implement the full time-optimal
Note that, apart from the initial sharing of entangled scheme with 1500-3000 quantum computers.
Bell pairs, the units operate entirely independently of Entanglement distillation increases the qubit count.

24
Note that it does not slow down the computation, as
Bell pairs do not need to be distilled instantly. Entan-
glement distillation can take up to tu to distill the nT
Bell pairs required per entanglement distillation block.
Summary. In order to speed up an n-qubit quan-
tum computation beyond 1 per T gate, we parallelize
T layers using units. With an average
√ of nT T gates per
layer, a unit consist of 4n + 4 n + 1 tiles for the data
block, 2nT storage tiles for the correction qubits, and | {z } | {z }
enough distillation blocks to distill nT |mi-|ci pairs
√ in
layer 1 layer 2
the time it takes to prepare a unit, which is nT + n + 3 Figure 31: Clifford+ϕ circuit. The first two rotation layers (ϕ
time steps. If the unit preparation time is tu and the layers) with three rotations per layer are shown.
time for single-qubit measurements and classical pro-
cessing is tm , a time-optimal setup consists of tu /tm + 1
units, executing one T layer every tm . Using fewer units to consider additional resources for gates other than T
results in a linear space-time trade-off. With nu units, gates.
nT · (nu − 1) T gates are performed in tu . A circular ar-
rangement of units can be used for distributed quantum 6.1 Clifford+ϕ circuits
computing. This also reduces the number of correction-
Instead of requiring an input circuit that consists of
qubit storage tiles to 1nT and the number of units in a
Clifford gates and π/8 rotations, we consider circuits
time-optimal setup to tu /tm . In order to fully exploit
that consist of Clifford gates and arbitrary ϕ rotations,
the space-time trade-offs discussed in this section, the
which we call Clifford+ϕ circuits. Using the procedure
input circuit should be optimized for T depth.
in Sec. 1, Clifford gates can be commuted to the end of
the circuit, such that we end up with a circuit like the
6 Trade-offs beyond Clifford+T one in Fig. 31. Rotations that mutually commute can
be grouped up into layers. The algorithm of Sec. 1 can
Under the assumption that measurements and feed- also be used to reduce the number of layers. It can even
forward can be done in 1 µs, we described how to per- reduce the number of rotations, since, if two rotations
form a 108 -T -gate computation in just 1 second. A more Pϕ1 and Pϕ2 with the same axis of rotation are moved
conservative assumption would be a measurement and into the same layer, they can be combined into a single
feed-forward time of 10 µs, which increases the compu- rotation Pϕ1 +ϕ2 . Clifford+ϕ circuits are characterized
tation time to 10 seconds. Although this seems fast, by rotation count (or ϕ count) and rotation depth (or ϕ
many quantum computations have T counts that are depth), rather than T count and T depth.
significantly higher than 108 . While the T count of Each ϕ rotation can be performed using a |ϕi =
Hubbard model simulations [2] is indeed in this range, |0i + ei(2ϕ) |1i resource state. When this state is con-
quantum chemistry simulations can be more demand- sumed to perform a Pϕ rotation, there is a 50% chance
ing. In particular, the simulation of FeMoco [1], a struc- that a P−ϕ rotation is performed instead. For π/8 ro-
ture that plays an important role in nitrogen fixation, tations, this is not very problematic, since the correc-
can have a T count of up to 1015 . With a serial execution operation is a π/4 rotation, which can simply be
tion of one T gate every 10 µs, the computation takes commuted to the end of the circuit. For general P−ϕ ,
317 years to finish. Even if the gates are grouped into the correction is a P2ϕ rotation, which requires the use
100 T gates per layer, the computation still takes over of a |2ϕi state. If this fails, the next correction is a
3 years. P4ϕ rotation requiring a |4ϕi state and so on. Thus,
While Clifford+T is a gate set that is very well a wide variety of resource state is required to execute
suited for surface codes, it is often not the gate set arbitrary-angle rotations. These can either be pieced
which is natural to the quantum computations in ques- together from ordinary magic states |mi, or, more effi-
tion. In particular, quantum simulation based on Trot- ciently, distilled using specialized protocols [34, 43].
terization consists of many small-angle rotations. In All the schemes discussed in this work can be used
the Clifford+T framework, each small-angle rotation is with Clifford+ϕ circuits by replacing magic state dis-
translated into a series of T gates via gate synthesis. De- tillation blocks by distillation blocks that produce re-
pending on the desired precision, this can require ∼100 source states for arbitrary-angle rotations. In order to
T gates for each rotation [42], which must be executed consume these states in a systematic way similar to the
in series. In order to speed up computations beyond post-corrected π/8 rotations in Fig. 24b, we can use the
their T count or T depth, it is therefore constructive post-corrected version of ϕ rotations shown in Fig. 32.

25
(a) Post-corrected ϕ rotation

Figure 33: C(P1 , P2 , P3 ) gate in terms of seven π/8 rotations.

also referred to as programmable ancilla rotations [44].

(b) C(P1 , P2 ) gates via measurements
Note that the cascade of measurements can also be post-
poned to a later point, such that the post-corrected
ϕ rotations can be used in the time-optimal scheme.
Using the T -count-limited scheme of Sec. 4, we can
execute a ϕ rotation every 1. For 100 T gates per ϕ
rotation, this speeds up the computation by a factor of
100. Also, the time-optimal setting of Sec. 5 can be used
Figure 32: (a) A post-corrected ϕ rotation can be used to with Clifford+ϕ circuits. However, the execution of a ϕ
decide at a later point, whether the performed operation was layer can take more than 2tm , as the measurement cas-
a Pϕ or a P−ϕ gate. (b) A C(P1 , P2 ) gate can be performed cades for all rotations in the layer need to terminate.
explicitly using a |+i ancilla and Pauli product measurements. For instance, for 100 rotations per layer, each layer ex-
ecution takes, on average, 8tm . For 100 T gates per
rotation, ϕ layer parallelization reduces the computa-
First, the n resource states are entangled with the data tional time by a factor of 12.5 compared to T layer par-
qubits via a C(P, Z ⊗n ) gate. Just like magic state con- allelization, i.e., from over 3 years to 3 months. In the
sumption, this can be done every 1, since the data specific case of quantum chemistry simulations, their
qubits are only part of one measurement in the mea- T count can be reduced by using more advanced algo-
surement circuit in Fig. 32b. Next, the |ϕi state is rithms [45–47], which also profit from arbitrary-angle
measured in Z. If the outcome of this measurement rotations. Thus, if distributed quantum computing is
is +1, then the rotation is successful and all other re- feasible, Clifford+ϕ circuits such as the ones used for
source states are discarded by measuring them in X. If quantum chemistry can be executed with qubit counts
the outcome is -1, the |2ϕi state is measured in Z. If per quantum computer not far above the numbers re-
the outcome of the Z is measurement is +1, the correc- ported in Fig. 3. The only difference to Clifford+T units
tion is successful, and the remaining resource states are is that larger distillation blocks are required to produce
discarded by X measurements. For -1, the corrections and store the |ϕi resource states.
continue with a Z measurement of |4ϕi. Note that, in Multi-controlled Pauli gates. Other gates that
most cases, this cascade of measurements finishes in the are used extensively in quantum algorithms are multi-
second step. Therefore, on average, it takes 2tm to per- controlled Paulis, such as Toffoli or CCZ gates. In
form these measurements. However, sufficiently many Fig. 5, we have shown how C(P1 , P2 ) gates can be writ-
resource state are required in order to be prepared for ten in terms of π/4 rotations. A similar decomposition
the most unlikely situations, in which many measure- is possible for multi-controlled Pauli gates. In Fig. 33,
ment steps are required. The probability to require we show how a C(P1 , P2 , P3 ) gate is a product of 7
n measurement steps (i.e., n resource states down to π/8 rotations. For instance, C(Z, Z, X) is the Toffoli
|2n ϕi) is exponentially low, 2−n . Therefore, the num- gate. From the circuit, it is evident that the T depth
ber of resource states that need to be generated for each of C(P1 , P2 , P3 ) gates is one [27]. In principle, these
ϕ rotation scales logarithmically with the rotation count doubly-controlled Pauli gates can be written with just
of the circuit, if one wants to stay below a certain prob- four T gates [48], but this increases the number of lay-
ability that any of these rotations is slowed down by a ers and a similar effect can be obtained by cancelling
missing resource state. If |π/2k i states are used, the π/8 rotations from pairs of doubly-controlled gates in a
cascade of measurements terminates after k steps. This circuit. Reducing the T count by increasing the circuit
technique of cascading resource state measurements is depth [49] can still be a useful circuit manipulation for

26
Figure 34: C(P1 , P2 , P3 , P4 ) gate in terms of 15 π/15 rotations.

T -count-limited setups. We also note that the T count they are measured. The physical qubit measurement
can be reduced by combining gate synthesis and magic does not need to be a quantum non-demolition mea-
state distillation (synthillation) [50, 51]. surement, but can be a desctructive measurement. Ul-
C(P1 , P2 , P3 , P4 ) gates, i.e., triply-controlled Pauli timately, however, the speed of quantum computation
gates, can be written as 15 π/16 rotations, as shown is limited by the speed of classical computation. Ex-
in Fig. 34. While the T depth of this circuit is no ploring superconducting logic [52] to speed up classical
longer 1, the rotation depth is. In fact, any multi- computation may be a viable route to speed up quan-
controlled Pauli gate with n controls can be constructed tum computers.
from 2n − 1 Pπ/2n rotations by following the pattern Summary. All the schemes discussed in this paper
shown in Figs. 5, 33 and 34. The rotation depth of can not only be used with Clifford+T circuits, but also
all these gates is 1. Multi-controlled gates can also be with Clifford+ϕ circuits. The only difference is that
pieced together from C(P1 , P2 , P3 ) rotations, but this more and different resource states are required. Their
increases the circuit depth. By using small-angle rota- distillation and storage requires more space than ordi-
tions, any multi-controlled Pauli gate can be executed nary magic state distillation, but their use can speed up
in one step. the computation by several orders of magnitude.

6.2 Shorter measurements

7 Conclusion
If the bottleneck of slow classical processing can be over-
come, then the only hardware-based restriction to the In this work, we described how full quantum com-
speed of quantum computation is the time it takes to putations can be performed in surface-code-based ar-
measure a physical qubit. In the time-optimal scheme, chitectures of different sizes. Previous works on the
the execution time of each rotation layer is governed translation of quantum computations into surface-code
by the measurement time. This measurement time schemes [36, 53–55] attempted to optimize the logical
only needs to be high, if the measurement fidelity is qubit arrangement via algorithms that take a quan-
required to be sufficiently low. In order to speed up tum circuit as an input. Here, we took a different
the computation, one can use shorter qubit measure- approach by discussing computational schemes that do
ments. This exponentially decreases the measurement not require any prior knowledge about the input circuit.
fidelity. On the other hand, the measurement fidelity This has the advantage that a resource count with our
of encoded surface-code qubits increases exponentially schemes only requires the T count and T depth of the
with the number of qubits comprising the logical qubit. input circuit, and that the schemes consist of modu-
Thus, by using twice as many physical qubits to encode lar blocks that can be optimized independently of each
the measured logical qubit, the measurement time can other. In addition, the space-time cost is lower com-
be decreased by a factor of two, doubling the compu- pared to earlier works [19, 36].
tational speed of the quantum computer. In fact, not Big quantum computers are fast. Starting from
all qubits need to use a higher code distance. Only the minimal setup in Fig. 20 that consists of a compact
the correction qubits that are measured to execute each data block and a single distillation block, we traded off
rotation layer need to be larger, and only right before space versus time, increasing the size of the quantum

27
space-time cost normalized to minimal setup
100%
80%
60%
40%
20%

space cost normalized to minimal setup

104

103
102

101

100

time cost normalized to minimal setup

100

10−1
10−2

10−3

10−4

A B C D E F G H I J KL M N O P
A: Compact block + 1 distillation block (Fig. 20) L: 2 units (Figs. 28, 30) M: 3 units N: 10 units
B: Intermediate block + 2 distillation blocks (Fig. 21) O: 100 units P: 1469/1470 units (time-optimal)
C-K: Fast block + 3-11 distillation block (Fig. 22)

Figure 35: Space-time, space, and time cost of the schemes discussed in this paper for the example of a 100-qubit quantum
computation with T count 108 and T depth 106 , under the assumption of a 1 µs code cycle time, and a 1 µs measurement and
classical processing time. The solid and dashed lines in M-P are for circular (solid) and linear (dashed) arrangements of units.

computer and, in return, decreasing the computational Room for optimization. In our T -count-limited
time. For the example of a computation with T count schemes and for the preparation of units, one T gate is
108 and T depth 106 with an error rate of p = 10−4 , the performed after the other. If the input circuit is known,
minimal setup consists of 164 tiles and executes one T it is reasonable to assume that qubits can be arranged in
gate every 11, corresponding to a computational time a way that allows for the parallel execution of multiple
of 4 hours with 55,400 physical qubits. From here, the T gates in the same data block. Furthermore, there is a
space-time cost is drastically reduced by adding more strict separation between tiles used for magic state dis-
distillation blocks, as shown in Fig. 35 and Tab. 2. With tillation and tiles used for data blocks in our schemes.
this strategy, the computational time is reduced to 1 By sharing tiles between blocks, the space overhead may
per T gate, where the computational cost of a circuit is be reduced. Moreover, we have only considered a hand-
governed by its T count. ful of distillation protocols. It would be interesting to
For further space-time trade-offs, we parallelized T see which distillation protocols can be used to optimize
layers using units. This is an increase in space-time the cost function of Eq. (8). Finally, concrete tile lay-
cost, especially for linear arrangements of units (dashed outs that can be used to distill and consume the addi-
line in Fig. 35), but enables further space-time trade- tional resources necessary for Clifford+ϕ computing are
offs. Linearly trading off space versus time, the compu- still missing.
tational time can be reduced to one measurement per Beyond surface codes. Even though we designed
T layer. Units are well-suited for distributed quantum our schemes with surface codes in mind, they can, in
computing, as the sharing of Bell pairs between neigh- principle, be applied to other toric-code-based patches,
boring units is part of the parallelization scheme. such as Majorana surface-code patches [11] or color-
This exhausts the space-time trade-offs that are pos- code patches [12, 56]. Color codes can reduce the num-
sible within the Clifford+T framework. Switching to ber of physical qubits due to more compact encoding,
Clifford+ϕ circuits can provide further trade-offs, as but require more elaborate hardware to measure the
additional resources are introduced for arbitrary-angle higher-weight check operators. The space cost is re-
rotations. This can be used to execute circuits in a time duced by replacing all surface-code patches by color-
proportional to their rotation depth, as opposed to their code patches, with the exception of Pauli product mea-
T depth. We have not investigated how this trade-off surement ancillas. In order to keep the space cost
affects the space-time cost in our scheme. low, measurement ancillas should remain surface-code

28
scheme A B C-K L M N-P

physical qubits 55,400 76,400 90,200 - 123,000 447,000 679,000 2,230,000 - 328,000,000
(788,000) (2,630,000 - 386,000,000)

computational time 4h 2h 79-22 min 12 min 490 sec 147 sec - 1 sec
(734 sec) (163 sec - 1 sec)

Table 2: Space and time cost of the schemes plotted in Fig. 35. The number in parentheses are for linear arrangements of units
(dashed lines in Fig. 35).

patches and color-to-surface code lattice surgery [57] on quantum computers, PNAS 114, 7555 (2017).
should be used during the Pauli product measurement [2] R. Babbush, C. Gidney, D. W. Berry, N. Wiebe,
protocol, as described in Ref. [58]. J. McClean, A. Paler, A. Fowler, and H. Neven,
Outlook. If the number of qubits continues to dou- Encoding electronic spectra in quantum circuits
ble every 8 months [59], the 60,000 - 300,000 physi- with linear T complexity, arXiv:1805.03662 (2018).
cal qubits necessary for classically intractable Hubbard [3] J. Preskill, Reliable quantum computers, Proc. Roy.
model simulations with a T count of 108 will be avail- Soc. Lond. A 454, 385 (1998).
able in 7-9 years. If multiple quantum computers can [4] B. M. Terhal, Quantum error correction for quan-
be connected in a network, time-optimal quantum com- tum memories, Rev. Mod. Phys. 87, 307 (2015).
puting becomes available shortly thereafter, facilitating [5] E. T. Campbell, B. M. Terhal, and C. Vuil-
the implementation of more difficult algorithms such lot, Roads towards fault-tolerant universal quantum
as quantum chemistry simulations or Shor’s algorithm. computation, Nature 549, 172 (2017).
Classical processing in terms of measurements, feed- [6] A. Y. Kitaev, Fault-tolerant quantum computation
forward and decoding is expected to be a significant by anyons, Ann. Phys. 303, 2 (2003).
roadblock in speeding up quantum computers. Ulti- [7] A. G. Fowler, M. Mariantoni, J. M. Martinis, and
mately, faster classical control hardware will be nec- A. N. Cleland, Surface codes: Towards practical
essary to build faster quantum computers. I hope that large-scale quantum computation, Phys. Rev. A 86,
the schemes discussed in this work are a useful roadmap 032324 (2012).
towards large-scale quantum computing, and that the [8] H. Bombin, Topological order with a twist: Ising
patch-based framework is a valuable toolbox to con- anyons from an abelian model, Phys. Rev. Lett.
struct surface-code-based implementations of quantum 105, 030403 (2010).
algorithms.
[9] C. Horsman, A. G. Fowler, S. Devitt, and R. V.
Meter, Surface code quantum computing by lattice
surgery, New J. Phys. 14, 123011 (2012).
Acknowledgments [10] B. J. Brown, K. Laubscher, M. S. Kesselring, and
J. R. Wootton, Poking holes and cutting corners to
This work would not have been possible without in-
achieve Clifford gates with the surface code, Phys.
sightful discussion with Austin Fowler and Craig Gid-
Rev. X 7, 021029 (2017).
ney about Pauli product measurements and 15-to-1 dis-
tillation, with Jens Eisert, Markus Kesselring and Fe- [11] D. Litinski and F. v. Oppen, Lattice Surgery with a
lix von Oppen about Clifford tracking and space-time Twist: Simplifying Clifford Gates of Surface Codes,
trade-offs, with Jeongwan Haah and Matthew Hastings Quantum 2, 62 (2018).
about magic state distillation, with Guang Hao Low [12] A. J. Landahl and C. Ryan-Anderson, Quan-
and Nathan Wiebe about quantum simulation algo- tum computing by color-code lattice surgery,
rithms, and with Ali Lavasani about few-qubit surface- arXiv:1407.5103 (2014).
code architectures. This work has been supported by [13] Y. Li, A magic states fidelity can be superior to the
the Deutsche Forschungsgemeinschaft (Bonn) within operations that created it, New J. Phys. 17, 023037
the network CRC TR 183. (2015).
[14] D. Herr, F. Nori, and S. J. Devitt, Optimization
of lattice surgery is NP-hard, npj Quant. Inf. 3
References (2017), 10.1038/s41534-017-0035-1.
[15] S. Bravyi and A. Kitaev, Universal quantum com-
[1] M. Reiher, N. Wiebe, K. M. Svore, D. Wecker, putation with ideal Clifford gates and noisy ancil-
and M. Troyer, Elucidating reaction mechanisms las, Phys. Rev. A 71, 022316 (2005).

29
[16] J. Haah and M. B. Hastings, Codes and Protocols [32] E. T. Campbell and M. Howard, Magic state
for Distilling T , controlled-S, and Toffoli Gates, parity-checker with pre-distilled components,
Quantum 2, 71 (2018). Quantum 2, 56 (2018).
[17] S. Bravyi and J. Haah, Magic-state distillation with [33] A. M. Meier, B. Eastin, and E. Knill, Magic-
low overhead, Phys. Rev. A 86, 052329 (2012). state distillation with the four-qubit code, Quant.
[18] C. Jones, Multilevel distillation of magic states Inf. Comp. 13, 195 (2013).
for quantum computing, Phys. Rev. A 87, 042305 [34] E. T. Campbell and J. OGorman, An effi-
(2013). cient magic state approach to small angle rota-
[19] A. G. Fowler, S. J. Devitt, and C. Jones, Surface tions, Quantum Science and Technology 1, 015007
code implementation of block code state distillation, (2016).
Scientific rep. 3, 1939 (2013). [35] A. G. Fowler and S. J. Devitt, A bridge to lower
[20] A. G. Fowler, Time-optimal quantum computation, overhead quantum computation, arXiv:1209.0510
arXiv:1210.4626 (2012). (2012).
[21] D. Gottesman, The Heisenberg representation of [36] D. Herr, F. Nori, and S. J. Devitt, Lattice surgery
quantum computers, Proc. XXII Int. Coll. Group. translation for quantum computation, New J. Phys.
Th. Meth. Phys. 1, 32 (1999). 19, 013034 (2017).
[22] V. Kliuchnikov, D. Maslov, and M. Mosca, [37] C. H. Bennett, G. Brassard, S. Popescu, B. Schu-
Fast and efficient exact synthesis of single qubit macher, J. A. Smolin, and W. K. Wootters, Pu-
unitaries generated by Clifford and T gates, rification of noisy entanglement and faithful tele-
arXiv:1206.5236 (2012). portation via noisy channels, Phys. Rev. Lett. 76,
722 (1996).
[23] V. Kliuchnikov, D. Maslov, and M. Mosca, Asymp-
[38] C. H. Bennett, H. J. Bernstein, S. Popescu, and
totically optimal approximation of single qubit uni-
B. Schumacher, Concentrating partial entangle-
taries by Clifford and T circuits using a constant
ment by local operations, Phys. Rev. A 53, 2046
number of ancillary qubits, Phys. Rev. Lett. 110,
(1996).
190502 (2013).
[39] C. Dickel, J. J. Wesdorp, N. K. Langford, S. Peiter,
[24] D. Gosset, V. Kliuchnikov, M. Mosca, and
R. Sagastizabal, A. Bruno, B. Criger, F. Mot-
V. Russo, An algorithm for the T -count,
zoi, and L. DiCarlo, Chip-to-chip entanglement
arXiv:1308.4134 (2013).
of transmon qubits using engineered measurement
[25] L. Heyfron and E. T. Campbell, An effi- fields, Phys. Rev. B 97, 064508 (2018).
cient quantum compiler that reduces T count,
[40] P. Campagne-Ibarcq, E. Zalys-Geller, A. Narla,
arXiv:1712.01557 (2017).
S. Shankar, P. Reinhold, L. Burkhart, C. Ax-
[26] M. Amy, D. Maslov, M. Mosca, and M. Roetteler, line, W. Pfaff, L. Frunzio, R. J. Schoelkopf, and
A meet-in-the-middle algorithm for fast synthesis M. H. Devoret, Deterministic remote entanglement
of depth-optimal quantum circuits, IEEE Transac- of superconducting circuits through microwave two-
tions on Computer-Aided Design of Integrated Cir- photon transitions, Phys. Rev. Lett. 120, 200501
cuits and Systems 32, 818 (2013). (2018).
[27] P. Selinger, Quantum circuits of T -depth one, [41] C. J. Axline, L. D. Burkhart, W. Pfaff, M. Zhang,
Phys. Rev. A 87, 042302 (2013). K. Chou, P. Campagne-Ibarcq, P. Reinhold,
[28] M. Amy, D. Maslov, and M. Mosca, Polynomial- L. Frunzio, S. Girvin, L. Jiang, et al., On-demand
time T -depth optimization of Clifford+T circuits quantum state transfer and entanglement between
via matroid partitioning, IEEE Transactions on remote microwave cavity memories, Nat. Phys. ,
Computer-Aided Design of Integrated Circuits and 705 (2018).
Systems 33, 1476 (2014). [42] N. J. Ross and P. Selinger, Optimal ancilla-
[29] D. Litinski and F. von Oppen, Quantum computing free Clifford+T approximation of z-rotations,
with Majorana fermion codes, Phys. Rev. B 97, arXiv:1403.2975 (2014).
205404 (2018). [43] G. Duclos-Cianci and D. Poulin, Reducing the
[30] A. G. Fowler and C. Gidney, Low overhead quan- quantum-computing overhead with complex gate
tum computation using lattice surgery, in prepara- distillation, Phys. Rev. A 91, 042315 (2015).
tion . [44] N. C. Jones, J. D. Whitfield, P. L. McMahon, M.-
[31] A. Lavasani and M. Barkeshli, Low overhead H. Yung, R. V. Meter, A. Aspuru-Guzik, and
Clifford gates from joint measurements in sur- Y. Yamamoto, Faster quantum chemistry simula-
face, color, and hyperbolic codes, arXiv:1804.04144 tion on fault-tolerant quantum computers, New J.
(2018). Phys. 14, 115023 (2012).

30
[45] G. H. Low and I. L. Chuang, Hamiltonian simula- A Surface-code qubits and lattice-
tion by qubitization, arXiv:1610.06546 (2016).
surgery operations
[46] G. H. Low and I. L. Chuang, Optimal Hamil-
tonian simulation by quantum signal processing,
To illustrate the translation of our framework to
Phys. Rev. Lett. 118, 010501 (2017).
surface-code patches, we show how the protocols of
[47] R. Babbush, D. W. Berry, J. R. McClean, Fig. 2 are implemented with surface codes. This cor-
and H. Neven, Quantum simulation of chem- respondence is explained in detail in Ref. [11].
istry with sublinear scaling to the continuum,
Bell pair preparation. The first operation demon-
arXiv:1807.09802 (2018).
strates square patches, qubit initialization and standard
[48] C. Jones, Low-overhead constructions for the fault- lattice surgery. It is shown in Fig. 36. Physical qubits
tolerant Toffoli gate, Phys. Rev. A 87, 022328 are placed on vertices, light faces correspond to Z stabi-
(2013). lizers and dark faces to X stabilizers. Two surface-code
[49] C. Gidney, Halving the cost of quantum addition, patches are initialized in the logical |+i state by ini-
Quantum 2, 74 (2018). tializing all physical qubits in |+i and measuring the
[50] E. T. Campbell and M. Howard, Unified framework stabilizers. Simultaneously, lattice surgery between the
for magic state distillation and multiqubit gate syn- two patches is performed, measuring the logical Z ⊗ Z
thesis with reduced resource cost, Phys. Rev. A 95, operator as the product of newly introduced Z stabi-
022316 (2017). lizers. To account for measurement errors, this is done
[51] J. O’Gorman and E. T. Campbell, Quantum com- for d code cycles. Finally, the patch is split into two
putation with realistic magic-state factories, Phys. patches again.
Rev. A 95, 032338 (2017). Moving boundaries. The protocol to move patches
[52] K. K. Likharev and V. K. Semenov, RSFQ is essentially the same as the previous protocol. It is
logic/memory family: A new Josephson-junction shown in Fig. 38. Extending the patch via its Z bound-
technology for sub-terahertz-clock-frequency digital ary in the second step is the same operation as a Z ⊗ Z
systems, IEEE Transactions on Applied Supercon- lattice surgery between the patch and a rectangular |+i
ductivity 1, 3 (1991). ancilla qubit to the right. This needs to be done for d
[53] A. G. Fowler, S. J. Devitt, and C. Jones, Syn-
thesis of arbitrary quantum circuits to topological
assembly: Systematic, online and compact, Scien-
tific Rep. 7, 10414 (2017).
[54] A. Paler, I. Polian, K. Nemoto, and S. J. Devitt,
Fault-tolerant, high-level quantum circuits: form,
compilation and description, Quantum Science and Z Z
Technology 2, 025003 (2017).
[55] L. Lao, B. van Wee, I. Ashraf, J. van Someren, Z Z
N. Khammassi, K. Bertels, and C. Almudever,
Mapping of lattice surgery-based quantum circuits X X
on surface code architectures, arXiv:1805.11127
(2018).
X X
[56] H. Bombin and M. A. Martin-Delgado, Topological
quantum distillation, Phys. Rev. Lett. 97, 180501
Z
(2006).
[57] H. P. Nautrup, N. Friis, and H. J. Briegel, Fault-
tolerant interface between quantum memories and Z
quantum processors, Nat. Commun. 8, 1321 (2017).
X
[58] D. Litinski and F. von Oppen, Braiding by Ma-
jorana tracking and long-range CNOT gates with
color codes, Phys. Rev. B 96, 205413 (2017). X
[59] IBM doubling qubits every 8 months,
https://www.nextbigfuture.com/2018/02/ibm-
doubling-qubits-every-8-months-and-ecommerce-
cryptography-at-risk-in-7-15-years.html, accessed: Figure 36: Surface-code implementation of the protocol in
2018-08-01. Fig. 2a.

31
Z Z

X X
X X

Z Z
X Y Z

Z Z

Figure 37: Surface-code implementation of the protocol in Fig. 2c.

code cycles to account for measurement errors. Finally, couplings, as we show in Fig. 39. For the measurement
the patch is shortened again by measuring the left 2/3 of twist operators and wide X and Z stabilizers, up to
of physical qubits in the X basis. three measurement ancillas can be used.
Y measurements. The third protocol in Fig. 37 Moving corners. The movement of corners of a
shows patch deformation and lattice surgery involving surface-code patch is shown in Fig. 41. It corresponds to
the Y operator. First, a patch is deformed to a wider a change of boundary stabilizers. In order to account for
patch by initializing physical qubits in the X basis and measurement errors of the newly measured stabilizers,
measuring the new stabilizers, which takes d code cy- this requires d code cycles.
cles. Note that the wide patch only occupies (2d−1)×d Six-corner patches and shortened boundaries.
physical qubits. Below the wide patch, a rectangular A six-corner patch corresponds to a surface-code patch
ancilla patch is initialized in the |0i basis. A column with three X boundaries and three Z boundaries. A
of physical qubits in the center is missing, so that, in distance-5 patch is shown in Fig. 42, which uses 2d2
the next step, the ancilla can be used for twist-based physical data qubits to encode two logical qubits. Short-
lattice surgery [11], measuring the Y operator. This ened boundaries correspond to decreasing the length of
lattice surgery in the third step involves dislocation op- certain surface-code boundaries, making them suscepti-
erators and a five-qubit twist defect. Even though these ble to errors. With the shortened boundaries in Fig. 42,
stabilizers are irregular, they can still be measured in a
square lattice of physical qubits with nearest-neighbor

Figure 39: Twist-based lattice surgery in a square lattice of

qubits with nearest-neighbor couplings. The black dots are
Figure 38: Surface-code implementation of the protocol in physical data qubits and the white dots are physical measure-
Fig. 2b. ment qubits.

32
Figure 40: Left: Naive straightforward translation of the Pauli product measurement protocol in Fig. 8 to surface codes. Right:
Topologically equivalent protocol with a reduced space cost. This way, any Pauli product measurement only requires a free ancilla
region of width d.

the distance to X errors is reduced to 2. Note that, in surement outcomes. Note that in this naive one-to-one
this case, the qubit occupies more than d2 physical data translation, there are small gaps between the patches
qubits. This will not be the case for the Pauli product to account for the shortened edges of the ancilla qubits.
measurement protocol in Appendix B, where shortened This can be avoided by, instead, performing the Pauli
patches are used. product measurement protocol in one combined step, as
shown in the right panels of Fig. 40. As pointed out in
Ref. [30], it is entirely equivalent to connect the patches
B Pauli product measurement protocol involved in the Pauli product measurement, such that
the resulting patch has the same shape as the config-
A straightforward implementation of the Pauli product uration in the bottom left panel of Fig. 40. Here, the
measurement protocol of Fig. 8 is shown in the left two patches shown in the in the two bottom panels of Fig. 40
panels of Fig. 40. Here, an 8-corner ancilla patch is have the exact same boundaries, except that the bound-
⊗3
initialized in the |+i state by initializing all physical ary lengths are different. In both cases, the separation
qubits in |+i and measuring the stabilizer. Next, a set of between boundaries is such, that the code distance is
lattice surgeries is performed, yielding the desired mea-

Figure 41: Surface-code implementation of the protocol in Figure 42: Surface-code implementation of six-corner patches
Fig. 2d. and shortened boundaries in Fig. 2e.

33
1 2 3 4

5 6 7 8 9

10 11 12 13 14

Figure 43: Proof-of-principle two-qubit device implemented with 48 physical data qubits.

not decreased. Therefore, for Pauli product measure- The next rotation is a Y ⊗ X rotation. Here, we
ments, it is sufficient to leave a free ancilla region of first need to deform |q1 i, such that both the X and Z
width d between the qubits that are part of the mea- boundaries of the qubit are accessible. Qubit |q2 i is
surement. rotated in steps 5-8 using the protocol in Fig. 9b. In
step 9, again, a magic state is initialized in a two-qubit
repetition code with ZL = Za1 ⊗ Za2 . In step 10, the
C Proof-of-principle device magic state is consumed via a Y1 ⊗ Za1 and a X1 ⊗ Za2
measurement.
Here, we discuss how (3d − 1) · 2d physical data qubits
can be used to build a proof-of-principle device that is a This kind of protocol consisting of patch deformations
universal two-qubit error-corrected quantum computer and patch rotations can be used to perform any π/8 ro-
that uses undistilled magic states and can demonstrate tation with the exception of (Y ⊗ Y )π/8 , since there is
all the operations required for large-scale quantum com- not enough space to make both Y operators accessible
puting. We go through the example of a computation for lattice surgery. For this rotation, we first explicitly
that starts with three π/8 rotations around Z⊗Z, Y ⊗X execute a Clifford gate to change (Y ⊗ Y )π/8 to any
and Y ⊗ Y in Fig. 43. For the first rotation, we need to other rotation. Any Clifford gate that does not com-
measure Z1 ⊗ Z2 ⊗ Z|mi . A magic state is initialized in mute with Y ⊗ Y will do. In our example, we choose a
a long patch in step 2, which is equivalent to initializing Zπ/4 rotation. It is performed by initializing a |0i state
a magic state and measuring X ⊗ X between the magic in step 13, and measuring Z1 ⊗ Y between |q1 i and the
state and neighboring |0i ancillas. This effectively en- ancilla, following the protocol of Fig. 9c.
codes the magic state in a three-qubit repetition code This demonstrates that a proof-of-principle experi-
with a logical Z operator ZL = Z ⊗ Z ⊗ Z. To consume ment can be built with 48 physical data qubits. In gen-
the magic state, Z1 ⊗ Z2 ⊗ ZL is measured in step 3. eral, this requires 6d2 − 2d qubits, i.e., 48 for d = 3, 140
This consumes a magic state for the Z ⊗ Z rotation. for d = 5 and 280 for d = 7. If measurement qubits are

34
required for syndrome readout, the number of physical (a) Steane code (b) Distillation block
qubits roughly doubles.

D Implementation of the 7-to-1 proto-

col
Even though the distillation of |Y i = |0i + i |1i states
has no use in our framework, we show how to imple-
ment the 7-to-1 distillation protocol for benchmarking (c) 7-to-1 distillation circuit
purposes in Fig. 44. The protocol is based on the 7-
qubit Steane code. Its X stabilizers are the faces shown
in Fig. 44a, and its logical X operator can be chosen
as the X ⊗ X ⊗ X operator with support on the three
qubits drawn in red.
Following the procedure in Sec. 3, the distillation cir-
cuit is obtained by initializing mx + k = 4 qubits in
the |+i state, where the first three qubits are associ-
ated with the three X stabilizers, and the last qubit is
associated with the logical X operator. For each qubit
of the Steane code, the circuit contains a π/4 rotation
with Z’s on each stabilizer and logical operator that Figure 44: The Steane code (a) is the basis of 7-to-1 distillation
the qubit is part of. The three qubits in the corner of (c). In our framework, the corresponding distillation block (b)
the triangle are only part of a single stabilizer and no uses 7 tiles for 4.
logical operator, therefore they contribute with single-
qubit Zπ/4 rotations, which can be absorbed into the
initial state. The remaining four rotations are shown in resource states requires no Clifford correction, this block
Fig. 44c. consists of only 7 tiles. With four rotations, the leading
A distillation block that can be used for this pro- order of the space-time cost of this protocol is 7d2 · 4d =
tocol is shown in Fig. 44b. Since the consumption of |Y i 28d3 .

Arkady Plotnitsky Reconfigurations Critical Theory and General Economy
100% (4)
Arkady Plotnitsky Reconfigurations Critical Theory and General Economy
442 pages
Term Symbol
73% (11)
Term Symbol
23 pages
Singh RB - Thermal and Statistical Physics - 2ed
No ratings yet
Singh RB - Thermal and Statistical Physics - 2ed
20 pages
Game of Surface Codes
No ratings yet
Game of Surface Codes
37 pages
hyperbolic Floquet codes
No ratings yet
hyperbolic Floquet codes
10 pages
The Domain Wall Color Code: Konstantin - Tiurev@quantumsimulations - de
No ratings yet
The Domain Wall Color Code: Konstantin - Tiurev@quantumsimulations - de
17 pages
An Almost-linear Ime Decoding Algorithm for Quantum LDPC Codes Under Circuit-level Noise (2)
No ratings yet
An Almost-linear Ime Decoding Algorithm for Quantum LDPC Codes Under Circuit-level Noise (2)
13 pages
PRXQuantum.5.010348
No ratings yet
PRXQuantum.5.010348
26 pages
Asymmetric Distances For Binary Embeddings: Albert Gordo, Florent Perronnin, Yunchao Gong, Svetlana Lazebnik
No ratings yet
Asymmetric Distances For Binary Embeddings: Albert Gordo, Florent Perronnin, Yunchao Gong, Svetlana Lazebnik
15 pages
Optimal Zero-Aliasing Space Compaction
No ratings yet
Optimal Zero-Aliasing Space Compaction
17 pages
How To Factor 2048 Bit RSA Integers in 8 Hours Using 20 Million Noisy Qubits
No ratings yet
How To Factor 2048 Bit RSA Integers in 8 Hours Using 20 Million Noisy Qubits
31 pages
Designing Neural Network Based Decoders For Surface Codes
No ratings yet
Designing Neural Network Based Decoders For Surface Codes
13 pages
Quantum LDPC Codes
No ratings yet
Quantum LDPC Codes
19 pages
How To Compute A 256-Bit Elliptic Curve Private Key With Only 50 Million Toffoli Gates
No ratings yet
How To Compute A 256-Bit Elliptic Curve Private Key With Only 50 Million Toffoli Gates
19 pages
Correlated Decoding of Logical Algorithms With Transversal Gates
No ratings yet
Correlated Decoding of Logical Algorithms With Transversal Gates
19 pages
Exponential Suppression of Bit or Phase Flip Errors With Repetitive Error Correction
No ratings yet
Exponential Suppression of Bit or Phase Flip Errors With Repetitive Error Correction
32 pages
The XZZX Surface Code: Article
No ratings yet
The XZZX Surface Code: Article
12 pages
2504.11805v1
No ratings yet
2504.11805v1
13 pages
CompatativeStude_QEC_Stategies_Hex_Lattice
No ratings yet
CompatativeStude_QEC_Stategies_Hex_Lattice
31 pages
s41586-022-05434-1
No ratings yet
s41586-022-05434-1
7 pages
An Efficient PDF
No ratings yet
An Efficient PDF
4 pages
Quantum Advantage With Shallow Circuits
No ratings yet
Quantum Advantage With Shallow Circuits
23 pages
2408.13687v1 (1)
No ratings yet
2408.13687v1 (1)
27 pages
Quantum Computing in Civil Engineering Potentials and Limitations
No ratings yet
Quantum Computing in Civil Engineering Potentials and Limitations
10 pages
INVITED Cryo-CMOS Electronic Control For Scalable Quantum Computing
No ratings yet
INVITED Cryo-CMOS Electronic Control For Scalable Quantum Computing
6 pages
Novel_Optimized_Designs_of_Modulo_2_n__1_Adder_for_Quantum_Computing
No ratings yet
Novel_Optimized_Designs_of_Modulo_2_n__1_Adder_for_Quantum_Computing
5 pages
Factoring 2048 RSA in 177 Days
No ratings yet
Factoring 2048 RSA in 177 Days
18 pages
s41586-024-08449-y_reference
No ratings yet
s41586-024-08449-y_reference
14 pages
Tackling The Qubit Mapping Problem For NISQ-Era Quantum Devices
No ratings yet
Tackling The Qubit Mapping Problem For NISQ-Era Quantum Devices
13 pages
Logical Quantum Processor Based On Reconfigurable Atom Arrays
No ratings yet
Logical Quantum Processor Based On Reconfigurable Atom Arrays
28 pages
Fast_and_Scaled_Counting-Based_Stochastic_Computing_Divider_Design
No ratings yet
Fast_and_Scaled_Counting-Based_Stochastic_Computing_Divider_Design
11 pages
Design of Radix-4 Signed Digit Encoding For Pre-Encoded Multipliers Using Verilog
No ratings yet
Design of Radix-4 Signed Digit Encoding For Pre-Encoded Multipliers Using Verilog
6 pages
8 X 8 Bit Pipelined Dadda Multiplier in CMOS
No ratings yet
8 X 8 Bit Pipelined Dadda Multiplier in CMOS
10 pages
Quantum Algorithms For Solving Ordinary Differential Equations Via Classical Integration Methods
No ratings yet
Quantum Algorithms For Solving Ordinary Differential Equations Via Classical Integration Methods
13 pages
A Low-Complexity Three-Error-Correcting BCH Decoder With Applications in Concatenated Codes
No ratings yet
A Low-Complexity Three-Error-Correcting BCH Decoder With Applications in Concatenated Codes
5 pages
VLSI Design of Non-Redundant Radix-4 Signed-Digit Encoding For Pre-Encoded Multipliers
No ratings yet
VLSI Design of Non-Redundant Radix-4 Signed-Digit Encoding For Pre-Encoded Multipliers
5 pages
aditya_godse
No ratings yet
aditya_godse
2 pages
A Fast Analysis For Thread-Local Garbage Collection With Dynamic Class Loading
No ratings yet
A Fast Analysis For Thread-Local Garbage Collection With Dynamic Class Loading
10 pages
logical qubits
No ratings yet
logical qubits
7 pages
Very Low-complexity Hardwareinterleaver for Turbo Decoding
No ratings yet
Very Low-complexity Hardwareinterleaver for Turbo Decoding
5 pages
Qiskit 1
No ratings yet
Qiskit 1
18 pages
Decoding Small Surface Codes With Feedforward Neural Networks
No ratings yet
Decoding Small Surface Codes With Feedforward Neural Networks
12 pages
Timing-Constrained Area Minimization Algorithm For Parallel Prefix Adders
No ratings yet
Timing-Constrained Area Minimization Algorithm For Parallel Prefix Adders
8 pages
1349b2a0-f352-4a9a-8802-e11f58e47d97
No ratings yet
1349b2a0-f352-4a9a-8802-e11f58e47d97
11 pages
A_low-complexity_implementation_of_QC-LD
No ratings yet
A_low-complexity_implementation_of_QC-LD
4 pages
Algorithmic Fault Tolerance For Fast Quantum Computing
No ratings yet
Algorithmic Fault Tolerance For Fast Quantum Computing
39 pages
Closing The Smoothness and Uniformity Gap in Area Fill Synthesis
No ratings yet
Closing The Smoothness and Uniformity Gap in Area Fill Synthesis
6 pages
Analyzing Strategies For Dynamical Decoupling Insertion On IBM Quantum Computer
No ratings yet
Analyzing Strategies For Dynamical Decoupling Insertion On IBM Quantum Computer
7 pages
A Probabilistic Compute Fabric Based On Coupled Ring Oscillators For Solving Combinatorial Optimization Problems
No ratings yet
A Probabilistic Compute Fabric Based On Coupled Ring Oscillators For Solving Combinatorial Optimization Problems
11 pages
A Hybrid Classical-Quantum HPC Workload: Aniello Esposito, Jessica R. Jones, Sebastien Cabaniols and David Brayford
No ratings yet
A Hybrid Classical-Quantum HPC Workload: Aniello Esposito, Jessica R. Jones, Sebastien Cabaniols and David Brayford
5 pages
A Modular Quantum Compilation Framework For Distributed Quantum Computing
No ratings yet
A Modular Quantum Compilation Framework For Distributed Quantum Computing
13 pages
Neural Joint Source-Channel Coding
No ratings yet
Neural Joint Source-Channel Coding
15 pages
Attachment 3
No ratings yet
Attachment 3
8 pages
KPConv - Flexible and Deformable Convolution For Point Clouds
No ratings yet
KPConv - Flexible and Deformable Convolution For Point Clouds
15 pages
1907.11157v1
No ratings yet
1907.11157v1
29 pages
Reconfigurable Computing - What, Why & How
No ratings yet
Reconfigurable Computing - What, Why & How
6 pages
Efficient GPU Path Rendering Using Scanline Rasterization
No ratings yet
Efficient GPU Path Rendering Using Scanline Rasterization
12 pages
Download
No ratings yet
Download
7 pages
Five Experimental Tests On The 5-Qubit IBM Quantum
No ratings yet
Five Experimental Tests On The 5-Qubit IBM Quantum
8 pages
Low-Complexity Ciphertext Multiplication For CKKS Homomorphic Encryption
No ratings yet
Low-Complexity Ciphertext Multiplication For CKKS Homomorphic Encryption
5 pages
Efficient_Number_Theoretic_Transform_Architecture_for_CRYSTALS-Kyber
No ratings yet
Efficient_Number_Theoretic_Transform_Architecture_for_CRYSTALS-Kyber
5 pages
Quantum Circuits for S-Box Implementationwithout Ancilla Qubits
No ratings yet
Quantum Circuits for S-Box Implementationwithout Ancilla Qubits
9 pages
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
From Everand
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
Fouad Sabry
No ratings yet
On_toric_codes_and_multivariate_Vandermo
No ratings yet
On_toric_codes_and_multivariate_Vandermo
2 pages
strengthening-operational-technology-security
No ratings yet
strengthening-operational-technology-security
7 pages
Gottesman_Knill_Theorem
No ratings yet
Gottesman_Knill_Theorem
21 pages
Les_Houches_2018_Penning_L1
No ratings yet
Les_Houches_2018_Penning_L1
39 pages
HoneyComb_1901.04117v1
No ratings yet
HoneyComb_1901.04117v1
23 pages
lecturenotes1
No ratings yet
lecturenotes1
10 pages
preskill_7
No ratings yet
preskill_7
92 pages
ml4q_platforms_exercises_1-2
No ratings yet
ml4q_platforms_exercises_1-2
3 pages
PhysRevA.54.4741
No ratings yet
PhysRevA.54.4741
11 pages
lecturenotes6
No ratings yet
lecturenotes6
7 pages
Non-Adaptive Measurement-based Quantum Computation
No ratings yet
Non-Adaptive Measurement-based Quantum Computation
10 pages
2206.13724v3
No ratings yet
2206.13724v3
26 pages
Intro_Stabilizer_Circuits
No ratings yet
Intro_Stabilizer_Circuits
12 pages
Solovay_Kitaev_Algorithm
No ratings yet
Solovay_Kitaev_Algorithm
15 pages
0811.0898
No ratings yet
0811.0898
14 pages
0904.2557
No ratings yet
0904.2557
46 pages
Single_ions_in_Paul_traps
No ratings yet
Single_ions_in_Paul_traps
20 pages
2312.10851v1
No ratings yet
2312.10851v1
12 pages
scribe10
No ratings yet
scribe10
8 pages
Quantum_Computer_Architecture_Towards_Full-Stack_Q (1)
No ratings yet
Quantum_Computer_Architecture_Towards_Full-Stack_Q (1)
21 pages
scribe11 (1)
No ratings yet
scribe11 (1)
4 pages
scribe2
No ratings yet
scribe2
12 pages
scribe3
No ratings yet
scribe3
10 pages
scribe8
No ratings yet
scribe8
9 pages
scribe9
No ratings yet
scribe9
7 pages
scribe5
No ratings yet
scribe5
13 pages
scribe7
No ratings yet
scribe7
10 pages
scribe4
No ratings yet
scribe4
11 pages
Concepts and Methods of 2D Infrared Spectroscopy 1st Edition Peter Hamm All Chapters Instant Download
100% (4)
Concepts and Methods of 2D Infrared Spectroscopy 1st Edition Peter Hamm All Chapters Instant Download
61 pages
On Free Electron Theory
100% (3)
On Free Electron Theory
36 pages
Quantum Mechanics-16-1-21 PDF
No ratings yet
Quantum Mechanics-16-1-21 PDF
3 pages
Ganguli CV
No ratings yet
Ganguli CV
9 pages
Property Optimized Gaussian Basis Sets For Lanthanides
No ratings yet
Property Optimized Gaussian Basis Sets For Lanthanides
30 pages
Ch. 5.4 Variational Methods
No ratings yet
Ch. 5.4 Variational Methods
4 pages
Khwopa Secondary School
No ratings yet
Khwopa Secondary School
40 pages
Full download Laser Spectroscopy Proceedings of the XIX International Conference Hidetoshi Katori pdf docx
No ratings yet
Full download Laser Spectroscopy Proceedings of the XIX International Conference Hidetoshi Katori pdf docx
77 pages
Compton Scattering Wavelength Shift
No ratings yet
Compton Scattering Wavelength Shift
9 pages
Notas de Mecánica Cuántica - Rodolfo A. Díaz S.
100% (1)
Notas de Mecánica Cuántica - Rodolfo A. Díaz S.
373 pages
Flametest 1 2
No ratings yet
Flametest 1 2
7 pages
1-20 Element
No ratings yet
1-20 Element
24 pages
Transcript PDF
No ratings yet
Transcript PDF
1 page
Hot Carrier Solar Cells: Principles, Materials and Design
No ratings yet
Hot Carrier Solar Cells: Principles, Materials and Design
7 pages
Atoms and Bohr Model
No ratings yet
Atoms and Bohr Model
18 pages
Einstein's Contributions To Early Quantum Theory
No ratings yet
Einstein's Contributions To Early Quantum Theory
16 pages
Superconducting Proximity Effect and Majorana Fermions at The Surface of A Topological Insulator
No ratings yet
Superconducting Proximity Effect and Majorana Fermions at The Surface of A Topological Insulator
4 pages
Science 8 3rd April 5-9, 2021: Learning Area Grade Level Quarter Date
100% (2)
Science 8 3rd April 5-9, 2021: Learning Area Grade Level Quarter Date
2 pages
IPhO20211 T1-T3 Planetary Physics, Electrostatic Lens, Particles and Waves
No ratings yet
IPhO20211 T1-T3 Planetary Physics, Electrostatic Lens, Particles and Waves
13 pages
Quantum Biology Thesis PDF
100% (3)
Quantum Biology Thesis PDF
5 pages
Phy310 Chapter 3 Mac2024
No ratings yet
Phy310 Chapter 3 Mac2024
53 pages
Unit 6 Spectroscopy Techniques and Applications
100% (1)
Unit 6 Spectroscopy Techniques and Applications
86 pages
New Energy Technologies Issue 15
No ratings yet
New Energy Technologies Issue 15
100 pages
Stephen Hawking Black Hole.....
No ratings yet
Stephen Hawking Black Hole.....
3 pages
Microelectronics II: EE 311A, January-May 2021
No ratings yet
Microelectronics II: EE 311A, January-May 2021
2 pages
Coulomb Blockade
No ratings yet
Coulomb Blockade
21 pages
Chapter 1 - Introduction To Spectrometric Methods
No ratings yet
Chapter 1 - Introduction To Spectrometric Methods
57 pages

A_Game_of_Surface_Codes_Large-Scale_Quantum_Comput

Uploaded by

A_Game_of_Surface_Codes_Large-Scale_Quantum_Comput

Uploaded by

A Game of Surface Codes:

Large-Scale Quantum Computing with Lattice Surgery

overhead compared to other schemes, and offer the pos-

– Qubits can be initialized in the X and Z eigen-

Getting rid of Clifford gates. Clifford gates are

(b) Patch rotation

2 Step 5 5 Step 6 8 Step 7 9 Step 8

Figure 10: Patch rotations in preparation of a Z ⊗ X ⊗ Z ⊗ Z ⊗ X measurement with an intermediate block.

(c) Implementation of the 15-to-1 circuit in Fig. 13

18 Step 36 18 Step 37 19 Step 38 19 Step 39

(b) Intermediate setup for p = 10−3

Figure 20: Minimal setups using compact data blocks for p =

at the same time, these states need to be stored before

fast data block storage tiles

(c) Time-optimal Clifford+T circuit

(b) Modified 116-to-12 block

13 Step 3 14 Step 4 15 Step 5

(d) Modified 116-to-12 protocol

(b) Unit for p = 10−4

(c) Time-optimal setup

Figure 33: C(P1 , P2 , P3 ) gate in terms of seven π/8 rotations.

also referred to as programmable ancilla rotations [44].

6.2 Shorter measurements

space cost normalized to minimal setup

time cost normalized to minimal setup

Figure 37: Surface-code implementation of the protocol in Fig. 2c.

Figure 39: Twist-based lattice surgery in a square lattice of

D Implementation of the 7-to-1 proto-

You might also like

2 Step 5 5 Step 6 8 Step 7 9 Step 8

18 Step 36 18 Step 37 19 Step 38 19 Step 39

13 Step 3 14 Step 4 15 Step 5