A_Game_of_Surface_Codes_Large-Scale_Quantum_Comput
A_Game_of_Surface_Codes_Large-Scale_Quantum_Comput
Given a quantum gate circuit, how does one perform in a surface-code architecture.
execute it in a fault-tolerant architecture with There exist several encoding schemes for surface
as little overhead as possible? This paper is codes, among others, defect-based [7], twist-based [8]
a collection of strategies for surface-code quan- and patch-based [9] encodings. In this work, we focus
tum computing on small, intermediate and large on the latter. Surface-code patches have a low space
scales. They are strategies for space-time trade-
arXiv:1808.02892v1 [quant-ph] 8 Aug 2018
1
ator. While the square patch only occupies one tile, a (a) Bell state preparation
four-corner patches can also be shaped to, e.g., occupy 0 Step 1 1 Step 2 1 Step 3
three tiles (b). A six-corner patch (c) represents two
qubits. The first qubits’ Pauli operators X1 and Z1
are represented by the two top edges, while the second
qubits’ operators X2 and Z2 are found in the two bot-
(b) Qubit movement
tom edges. The general rule that assigns the operators
of N qubits to the edges of a (2N + 2)-corner patch is 0 Step 1 1 Step 2 1 Step 3
given in Fig. 1d. Going clockwise, the dashed bound-
aries correspond to X1 , X1 X2 , X2 X3 , . . . , XN −1 XN and
XN . Starting to the right of X1 , the solid edges corre-
spond to Z1 , Z2 , . . . , ZN and the product Z1 Z2 · · · ZN . (c) Y basis measurement
In the following, we specify the operations that can be
0 Step 1 1 Step 2 2 Step 3 2 Step 4
used to manipulate the qubits represented by patches.
Some of these operations take a time step to complete,
whereas others can be performed instantly. The goal
is to implement quantum algorithms using as few tiles
and time steps () as possible. There are three types
of operations: qubit initialization, qubit measurement
and patch deformation. (d) Moving corners (e) Shortened edges
0 Step 1 1 Step 2 normal shortened
I. Qubit initialization:
2
are initialized in the |+i state. Next, the operator Z ⊗Z tum computations. In this work, we discuss strategies
is measured. Before the measurement, the qubits are in to tackle the following problem: Given a quantum cir-
the state |+i ⊗ |+i = (|00i + |01i + |10i + |11i)/2. If cuit, how does one execute it as fast as possible on a
the measurement outcome √ is +1, the qubits end up in surface-code-based quantum computer of a certain size?
the state (|00i + |11i)/√ 2. For the outcome −1, the This is an optimization problem that was shown to be
state is (|01i + |10i)/ 2. In both cases, the two qubits NP-hard [14], so the focus is rather on finding heuristics.
are in a maximally entangled Bell state. This protocol The content of this paper is outlined in Fig. 3.
takes 1 to complete. The second example (b) is the The input to our problem is an arbitrary gate cir-
movement of a square patch into a different tile. For cuit corresponding to the computation. We refer to the
this, the square patch is enlarged by patch deformation, qubits that this circuit acts on as data qubits. As we
which takes 1, and then made smaller again at no review in Sec. 1, the natural universal gate set for sur-
time cost. The third example (c) is the measurement face codes is Clifford+T , where Clifford gates are cheap
of a square patch in the Y basis. For this, the patch is and T gates are expensive. In fact, Clifford gates can
deformed such that the X and Z edge are on the same be treated entirely classically, and T gates require the
side of the patch. An ancillary patch is initialized in the consumption of a magic state |0i+eiπ/4 |1i. Only faulty
|0i state and the operator Z ⊗ Y between the ancilla (undistilled ) magic states can be prepared in our frame-
and the qubit is measured. The ancilla is discarded by work. To generate higher-fidelity magic states for large-
measuring it in the Z basis. scale quantum computation, a lengthy protocol called
Translation to surface codes. Protocols designed magic state distillation [15] is used.
within this framework can be straightforwardly trans- It is therefore natural to partition a quantum com-
lated into surface-code operations. The exact cor- puter into a block of tiles that is used to distill magic
respondence between our framework and surface-code states (a distillation block) and a block of tiles that
patches is specified in Appendix A, but it is not cru- hosts the data qubits (a data block) and consumes
cial to the understanding of this paper. Essentially, magic states. The speed of a quantum computer is gov-
patches correspond to surface-code patches with dashed erned by how fast magic states can be distilled, and how
and solid edges as rough and smooth boundaries. Thus, fast they can be consumed by the data block.
for surface codes with a code distance d, each tile cor- In Sec. 2, we discuss how to design data blocks. In
responds to d2 physical data qubits. Pauli product particular, we show three designs: compact, intermedi-
measurements that take 1 to complete correspond to ate and fast blocks. The compact block uses 1.5n + 3
(twist-based) lattice surgery [9, 11], which requires d tiles to store n qubits, but takes up to 9 to consume
code cycles. Thus, 1 corresponds to d code cycles. a magic state. Intermediate blocks use 2n + 4 tiles and
Qubit initialization has no time cost, since, in case of require up √to 5 per magic state. Finally, the fast block
X and Z eigenstates, it can be done simultaneously with uses 2n + 8n + 1 tiles, but requires only 1 to con-
the following lattice surgery [9, 12]. For arbitrary states, sume a magic state. The compact block is an option
initialization corresponds to state injection [12, 13]. Its for early quantum computers with few qubits, where
time cost does not scale with d. Similarly, single-qubit the generation of a single magic state takes longer than
measurements in the X or Z basis correspond to the si- 11. The fast block has a better space-time overhead,
multaneous measurement of all physical data qubits in which makes it more favorable on larger scales.
the corresponding basis and some classical error correc-
Data blocks need to be combined with distillation
tion, which does not scale with d either. Patch defor-
blocks for universal quantum computing. In Sec. 3,
mation is code deformation, which requires d code cy-
we discuss designs of distillation blocks. Since magic
cles, unless the patch becomes smaller in the process, in
state distillation is the main operation of a surface-
which case it corresponds to single-qubit measurements.
code-based quantum computer, it is important to min-
In essence, the framework can be used to estimate the
imize its space-time cost. We discuss distillation proto-
space-time cost of a computation. The leading-order
cols based on error-correcting codes with transversal T
term of the space-time cost – the term that scales with
gates, such as punctured Reed-Muller codes [15, 16] and
d3 – of a protocol that uses s tiles for t time steps is
block codes [17–19]. In comparison to braiding-based
st · d3 in terms of (physical data qubits)·(code cycles).
implementations of distillation protocols, we reduce the
space-time cost by up to 90%.
Overview A data block combined with a distillation block con-
stitutes a quantum computer in which T gates are per-
Having established the rules of the game and the corre- formed one after the other. At this stage, the quan-
spondence of our framework to surface-code operations, tum computer can be sped up by increasing the num-
our goal is to find implementations of arbitrary quan- ber of distillation blocks, effectively decreasing the time
3
Sec. 1: Clifford+T circuits Sec. 2: Data blocks Sec. 3: Distillation blocks
Example:
100 qubits Sec. 4: Sec. 5: Sec. 6:
Trade-offs limited by T count Trade-offs limited by T depth Trade-offs beyond Clifford+T
108 T gates
p = 10−4 55,000 qubits 120,000 qubits 1500 × 220,000 = 330m qubits ···
d = 13 4 hours 22 minutes 1 second ···
∼100 qubits p = 10−3 310,000 qubits 1,000,000 qubits 3000 × 1,500,000 ≈ 4.5b qubits ···
(Appendix C) d = 27 7 hours 45 minutes 1 second ···
Figure 3: Overview of the content of this paper. To illustrate the space-time trade-offs discussed in this work, we show the number
of physical qubits and the computational time required for a circuit of 108 T gates distributed over 106 T layers. We consider
physical error rates of p = 10−4 and p = 10−3 , for which we need code distances d = 13 and d = 27, respectively. We assume
that each code cycle takes 1 µs.
it takes to distill a single magic state, as we discuss in puters with 220,000 qubits each, and with the ability to
Sec. 4. In order to illustrate the resulting space-time share Bell pairs between neighboring computers.
trade-off, we consider the example of a 100-qubit com- In Sec. 6, we discuss further space-time trade-offs that
putation with 108 T gates, which can already be used are beyond the parallelization of Clifford+T circuits. In
for classically intractable computations [2]. Assuming particular, we discuss the use of Clifford+ϕ circuits, i.e.,
an error rate of p = 10−4 and a code cycle time of circuits containing arbitrary-angle rotations beyond T
1 µs, a compact data block together with a distillation gates. These require the use of additional resources,
block can finish the computation in 4 hours using 55,000 but can speed up the computation. We also discuss the
physical qubits.1 Adding 10 more distillation blocks in- possibility of hardware-based trade-offs by using higher
creases the qubit number to 120,000 and decreases the code distances, but in turn shorter measurements with
computational time to 22 minutes, using 1 per T gate. a decreased measurement fidelity. Ultimately, the speed
For further space-time trade-offs in Sec. 5, we exploit of a quantum computer is limited by classical process-
that the T gates of a circuit are arranged in layers of ing, which can only be solved by faster classical com-
gates that can be executed simultaneously. This en- puting.
ables linear space-time trade-offs down to the execution Finally, we note that while the qubit numbers re-
of one T layer per qubit measurement time, effectively quired for useful quantum computing are orders of mag-
implementing Fowler’s time-optimal scheme [20]. If the nitude above what is currently available, a proof-of-
108 T gates are distributed over 106 layers, and mea- principle two-qubit device demonstrating all necessary
surements (and classical processing) can be performed operations using undistilled magic states can be built
in 1 µs, up to 1500 units of 220,000 qubits can be run with 48 physical data qubits, see Appendix C.
in parallel. This way, the computational time can be
brought down to 1 second using 330 million qubits.
While this is a large number, the units do not neces- 1 Clifford+T quantum circuits
sarily need to be part of the same quantum computer,
but can be distributed over up to 1500 quantum com- Our goal is to implement full quantum algorithms with
surface codes. The input to our problem is the al-
1 We will assume that the total number of physical qubits is
gorithm’s quantum circuit. The universal gate set
twice the number of physical data qubits. This is consistent with
superconducting qubit platforms, where the use of measurement
Clifford+T is well-suited for surface codes, since it sepa-
ancillas doubles the qubit count. If a platform does not require rates easy operations from difficult ones. Often, this set
the use of ancilla qubits, the total qubit count is reduced by 50% is generated using the Hadamard gate H, phase gate S,
compared to the numbers reported in this paper. controlled-NOT (CNOT) gate, and the T gate. Instead,
4
(a/b)
if P P 0 = P 0 P : (a) if P P 0 = P 0 P : (c)
(c)
if P P 0 = −P 0 P : if P P 0 = −P 0 P :
if P1 P 0 = −P 0 P1 : if P2 P 0 = −P 0 P2 : (b)
Figure 4: A generic circuit consists of π/4 rotations (orange), π/8 rotations (green) and measurements (blue). The Pauli product
in each box specifies the axis of rotation or the basis of measurement. If the Pauli operator is −P instead of P , a minus sign
is found in the corner of the box, such that, e.g., Z−π/4 corresponds to an S † gate. Using the commutation rules in (a/b), all
Clifford gates can be moved to the end of the circuit. Using (c), the Clifford gates can be absorbed by the final measurements.
we choose to write our circuits using Pauli product ro- into (P 0 P1 )ϕ . If P 0 anticommutes with both P1 and P2 ,
tations Pϕ (see Fig. 5), because it simplifies circuit ma- Pϕ0 turns into (P 0 P1 P2 )ϕ .
nipulations. Here, Pϕ = exp(−iP ϕ), where P is a Pauli After moving the Clifford gates to the right, the re-
product operator (such as Z, Y ⊗ X, or X ⊗ 1 ⊗ X) and sulting circuit consists of three parts: a set of π/8 ro-
ϕ is an angle. In this sense, S = Zπ/4 , T = Zπ/8 , tations, a set of π/4 rotations, and Z measurements.
and H = Zπ/4 · Xπ/4 · Zπ/4 . The CNOT gate can Because Clifford gates map Pauli operators onto other
also be written in terms of Pauli product rotations as Pauli operators, the Clifford gates can be absorbed by
CNOT = (Z ⊗ Z)π/4 · (1 ⊗ Z)−π/4 · (Z ⊗ 1)−π/4 . In fact,
we can more generally define P1 -controlled-P2 gates as
C(P1 , P2 ) = (P1 ⊗ P2 )π/4 · (1 ⊗ P2 )−π/4 · (P1 ⊗ 1)−π/4 . (a) Single-qubit rotations
The CNOT gate is the specific case of C(Z, X).
5
| {z } | {z } | {z } | {z } | {z } | {z }
layer 1 layer 2 layer 3 layer 4 layer 1 layer 2
Figure 6: Clifford+T circuits can be written as a number of consecutive π/8 rotations. These gates are grouped into layers of
mutually commuting rotations. A simple greedy algorithm can be used to reduce the number of layers, i.e., the T depth.
the final measurements, turning Z measurements into π/4 rotation that is commuted to the end of the circuit,
Pauli product measurements. The commutation rules thereby decreasing the T count. As we discuss in Sec. 6,
of this final step are shown in Fig. 4c and are similar to this kind of algorithm can not only be used with π/8 ro-
the commutation of Clifford gates past rotations. tations, but, in principle, with arbitrary Pauli product
T count and T depth. Thus, every n-qubit circuit rotations. The reduction of the circuit depth in terms
can be written as a number of consecutive π/8 rotations of non-π/8 rotations can be useful when going beyond
and n final Pauli product measurements, as shown in Clifford+T circuits.
Fig. 6. We refer to the number of π/8 rotations as the
T count. An important part of circuit optimization is
the minimization of the T count, for which there ex- 1.1 Pauli product measurements
ist various approaches [22–25]. The π/8 rotations of When implementing circuits like Fig. 6 with surface
a circuit can be grouped into layers. All π/8 rotations codes, one obstacle is that π/8 rotations are not di-
that are part of a layer need to mutually commute. The rectly part of the set of available operations. Instead,
number of π/8 layers of a circuit is strictly speaking not one uses magic states [15] as a resource. These states
the same quantity as the T depth, but we will still refer are π/8-rotated Pauli eigenstates |mi = |0i + eiπ/4 |1i.
to it as the T depth and to π/8 layers as T layers. They can be consumed in order to perform Pπ/8 rota-
When partitioning π/8 rotations into layers, the naive tions. The corresponding circuit [29] is shown in Fig. 7.
approach often yields more layers than are necessary. A Pπ/8 rotation corresponds to a P ⊗ Z measurement
For instance, a naive partitioning of the first 6 T gates involving the magic state. If the measurement outcome
of Fig. 6 yields 4 layers. A few commutations can bring is P ⊗Z = −1, then a corrective Pπ/4 operation is neces-
the number down to 2 layers. There are a number of sary. Since this is a Clifford gate, it can be simply com-
algorithms for the optimization of the T depth [26–28]. muted to the end of the circuit, changing the axes of the
Here, we use a simple greedy algorithm to reduce the following π/8 rotations. Finally, in order to discard the
number of layers: magic state, it is disentangled from the rest of the sys-
tem by an X measurement. Here, an outcome X = −1
repeat prompts a Pπ/2 correction. π/2 rotations correspond to
for each layer i do Pauli operators, i.e., Pπ/2 = P . The Pauli correction
for each rotation j in layer i + 1 do can also be commuted to the end of the circuit. When
if (rotation j commutes with all
rotations in layer i) then
Move rotation j from layer i + 1 to
layer i;
end
end
end
until the partitioning no longer changes;
Note that when a reordering puts two equal π/8 rota- Figure 7: Circuit to perform a π/8 rotation by consuming a
tions into the same layer, they can be combined into a magic state.
6
Pπ/2 is moved past a P 0 rotation or measurement, it (a) Measurement of Z|q1 i ⊗ Y|q2 i ⊗ X|q4 i ⊗ Z|mi
changes the axis of rotation or measurement basis to
−P 0 if P and P 0 anticommute. 0 Step 1 0 Step 2
Pauli product measurements in 1. In essence,
if magic states are available, the only operations re-
quired for universal quantum computing are Pauli prod-
uct measurements. Using a (2n)-corner patch as an
ancilla, an n-qubit Pauli product can be measured
in 1 [30]. An example is shown in Fig. 8. Suppose we
1 Step 3 1 Step 3
have four qubits |q1 i - |q4 i in four two-tile four-corner
patches, and we need to perform a (Z ⊗ Y ⊗ 1 ⊗ X)π/8
rotation. According to the circuit in Fig. 7, this is done
by measuring Z|q1 i ⊗ Y|q2 i ⊗ X|q4 i ⊗ Z|mi between the
four qubits and a magic state. Note that we only want
to measure the Pauli product without learning anything
about the individual Pauli operators Z|q1 i , Y|q2 i , X|q4 i
and Z|mi . (b) Ancilla patch
To this end, an 8-corner ancilla patch is initialized
⊗3
in the |+i state. The shape of this patch is chosen,
such that each of the four Z edges is adjacent to one
of the four operators that are part of the measurement. Figure 8: Pauli product measurement protocol. (a) Example
Note that this means that some of the X edges are of a measurement of the operator Z ⊗ Y ⊗ 1 ⊗ X ⊗ Z of the
shortened, such that the qubits are susceptible to X qubits |q1 i, |q2 i, |q3 i, |q4 i and |mi. (b) Ancilla patch used
errors. In this case, this is not a problem, since the during the measurement.
qubits are initialized in X eigenstates and random X
errors will cause no change to the states. Next, in step 3,
we measure the four Pauli products Z|q1 i ⊗Z1 , Y|q2 i ⊗Z2 , cuit and absorbed by the final measurements. Thus, any
Z|mi ⊗ Z3 and X|q4 i ⊗ (Z1 · Z2 · Z3 ). Because the ancilla quantum computation can be written as a sequence of
is initialized in an X eigenstate, the operators Z1 , Z2 π/8 rotations grouped into layers of mutually commut-
and Z3 are unknown, and the outcome of each of the ing rotations. The number of rotations is the T count
four aforementioned measurements is entirely random. and the number of layers is the T depth. Each rotation
However, multiplying the four measurement outcomes can be performed by consuming a magic state via a
yields Z|q1 i ⊗Y|q2 i ⊗X|q4 i ⊗Z|mi ⊗(Z1 ·Z2 ·Z3 ·Z1 ·Z2 ·Z3 ), Pauli product measurement. These measurements can
which is precisely the operator Z|q1 i ⊗ Y|q2 i ⊗ X|q4 i ⊗ be implemented in our framework in 1.
Z|mi that we wanted to measure. Finally, to discard the
ancilla patch we measure its three qubits in the X basis.
Again, X errors will have no effect, as they commute 2 Data blocks
with the measurement basis. Measurement outcomes of Since Clifford+T circuits are a sequence of π/8 rota-
Xi = −1 prompt a Pauli correction. If in the previous tions, each requiring the consumption of a magic state,
step, the Zi edge was measured together with a Pauli it is natural to partition a quantum computer into a set
operator P , the correction is a Pπ/2 gate. For instance, of tiles that are used for magic state distillation (distilla-
if in Fig. 8 the final measurements yield X2 = −1 and tion blocks) and a set of tiles that hosts data qubits and
X3 = −1, the corrections are a Yπ/2 rotation on |q2 i consumes magic states via Pauli product measurements
and a Zπ/2 rotation on |mi. (data blocks). In this section, we discuss designs for
This type of protocol can be used to measure any the latter. In principle, the structure shown in Fig. 8
product of n Pauli operators. An ancilla patch needs is a data block, where each qubit is stored in a two-
⊗n
to be initialized in the |+i state with Z edges adja- tile patch and magic states can be consumed every 1.
cent to the n operators part of the measurement. We However, this sort of design uses 3n tiles to host n data
show the concrete surface-code implementation of the qubits, which is a relatively large space overhead.
example of Fig. 8 in Appendix B.
Summary. Clifford+T circuits can be written in
2.1 Compact block
terms of π/8 rotations, π/4 rotations and measure-
ments. To convert input circuits into a standard form, The first design that we discuss uses only 1.5n + 3 tiles.
π/4 rotations can be commuted to the end of the cir- This compact block is shown in Fig. 9a, where each data
7
(a) Compact block (c) π/4 rotations
(d) Y ⊗ 1 ⊗ Y ⊗ Z ⊗ Y ⊗ Y rotation in 9
0 Step 1 1 Step 2 1 Step 3 2 Step 4
Figure 9: (a) Compact blocks store n data qubits in 1.5n + 3 tiles. The consumption of a magic state can take up to 9. (c) The
worst-case scenario are Pauli products involving an even number of Y operators, whose treatment requires explicit π/4 rotations.
The example in (d) shows the 8 steps necessary to consume the magic state, which involves π/4 rotations and patch rotations (b).
qubit is stored in a four-corner square patch. This low- a π/8 rotation, a Pπ/4 rotation can be executed using
ers the space cost, but restricts the operators that are a resource state |Y i = |0i + eiπ/4 |1i. However, even
accessible by Pauli product measurements, as only the though this state is a Pauli eigenstate, it cannot be
Z operator is free to be measured. Using 3, patches prepared immediately in our framework. Instead, we
may also be rotated (see Fig. 9b), such that the X oper- use a |0i state and Y measurements, such that a Pπ/4
ator becomes accessible instead of the Z operator. The rotation is performed by a P ⊗ Y measurement between
problematic operators are Y operators, which are the the qubits and the |0i state. Afterwards, the |0i state is
reason why the consumption of a magic state can take measured in X. If the P ⊗ Y and the X measurement
up to 9. yield different outcomes, a Pauli correction is necessary.
The worst-case scenario is a π/8 rotation involving In Fig. 9d, we go through the steps necessary to per-
an even number of Y operators, such as the one shown form a (Y ⊗ 1 ⊗ Y ⊗ Z ⊗ Y ⊗ Y )π/8 rotation. In step
in Fig. 9c. One possibility to replace Y operators by 1, we start with a 12-tile data block storing 6 qubits in
X or Z operators is via π/4 rotations, since Yπ/4 = the blue region. The orange region is not part of the
Zπ4 Xπ/4 Z−π/4 . Rotations with an even number of Y ’s data block, but is part of the adjacent distillation block,
require two π/4 rotations, while an odd number of Y ’s i.e., it is the source of the magic states. In steps 2-5,
can be handled by one rotation. Only the left two π/4 we perform the two π/4 rotations that are necessary to
rotations in Fig. 9c need to be performed explicitly. The replace the Y operators with X’s. In step 6, we first
right two rotations can be commuted to the end of the rotate patches in the upper row, and then in step 7 in
circuit, changing the later π/8 rotations. Similar to the lower row. Finally, in step 8, we measure the Pauli
8
1 Step 1 2 Step 2 2 Step 3
product involving the magic state. eliminating the need to move patches back to their row
This general procedure can be used for any π/8 ro- after the rotation. An example is shown in Fig. 10.
tation. First, up to two π/4 rotations are performed in Suppose we have 5 qubits and need to prepare them for
2. Next, patches in the upper and lower row are ro- a Z ⊗ X ⊗ Z ⊗ Z ⊗ X measurement. The first, third and
tated, which takes 3 per row. Finally, the Pauli prod- fourth qubit are moved to the other side, which takes
uct is measured in 1, requiring a total of 9. While 1. Simultaneously, the second and fifth qubit are ro-
this is very slow compared to Fig. 8, this is a valid choice tated, which takes 2. Therefore, the total number of
for small quantum computers where the distillation of time steps to consume a magic state is at most 5: 2
a magic state takes longer than 9. for up to two π/4 rotations, 2 for the patch rotations,
and 1 for the Pauli product measurement.
2.2 Intermediate block
2.3 Fast block
One possibility to speed up compact blocks is to store
all qubits in one row instead of two. This is the interme- The disadvantage of square patches is that only one
diate block shown in Fig. 11a, which uses 2n + 4 tiles to Pauli operator is available for Pauli product measure-
store n qubits. By eliminating one row, all patch rota- ments at any given time. Two-tile four-corner patches
tions can be done simultaneously. In addition, one can as in Fig. 8, on the other hand, allow for the measure-
save 1 by moving all patches to the other side, thereby ment of any Pauli operator, but use two tiles for each
qubit. In order to have both compact storage and ac-
(a) Intermediate block cess to all Pauli operators, we use six-corner patches for
our fast blocks in Fig. 11b. Six-corner patches use two
tiles to represent two qubits (see Fig. 1), where the first
qubit’s Pauli operators are in the left two edges, and
ancilla region the second qubit’s operators are in the right two edges.
Therefore, the example in Fig. 11b is a fast block that
(b) Fast block stores 18 qubits.
Since all Pauli operators are accessible, the Pauli
product measurement protocol of Fig. 8 can be used
to consume a magic state every 1. n qubits occupy
ap square arrangement of tiles with √ a side length of
n/2
p + 1, i.e., a total of 2n + 8n + 1 tiles. Even
if n/2 is not integer, one should keep the block as
square-shaped as possible by picking the closest integer
as a side length and shortening the last column. While
the fast block uses more tiles compared to the compact
and intermediate blocks, it has a lower space-time cost,
making it more favorable for large quantum comput-
ers for which the distillation of a magic state takes less
than 5.
Note that if undistilled magic states are sufficient,
then any data block can already be used as a full quan-
ancilla region
tum computer. A proof-of-principle two-qubit device
in the spirit of Ref. [31] that constitutes a universal
Figure 11: (a) Intermediate blocks store n data qubits in 2.5n+ two-qubit quantum computer with undistilled magic
4 tiles and√require up to 5 per magic state. (b) Fast blocks states and can demonstrate all the operations that are
use 2n + 8n + 1 tiles and require 1 per magic state. used in our framework can be realized with six tiles,
9
Figure 12: Encode-T -decode circuit of the 15-to-1 distillation protocol. The multi-target CNOTs (orange) can be commuted past
the T gates, such that they cancel and leave 15 Z-type Pauli product rotations.
as shown in Appendix C. This proof-of-principle device applied to any distillation protocol based on an error-
uses (3d − 1) · 2d physical data qubits, i.e., 48, 140, or correcting code with transversal T gates, such as punc-
280 data qubits for distances d = 3, 5 or 7. If ancilla tured Reed-Muller codes [15, 16] or block codes [17–19].
qubits are used for stabilizer measurements, the number To show the general structure of such a protocol, we go
of physical qubits roughly doubles, but it is still within through the example of 15-to-1 distillation [15], i.e., a
reach of near-term devices. protocol that uses 15 faulty magic states to distill a
Summary. Data blocks store the data qubits of single higher-fidelity state.
the computation and consume magic states. Compact
blocks use 1.5n + 3 tiles for n qubits and require up to
9 to consume a magic state. Intermediate blocks use 3.1 15-to-1 distillation
2n + 4 tiles and √take up to 5 per magic state. Fast
The 15-to-1 protocol is based on a quantum error-
blocks use 2n + 8n + 1 tiles and take 1 per magic
correcting code that uses 15 qubits to encode a single
state. Data blocks need to be combined with distillation
logical qubit with code distance 3. The reason why this
blocks for large-scale quantum computation.
can be used for magic state distillation is that, for this
code, a physical T gate on every physical qubit corre-
3 Distillation blocks sponds to a logical T gate (actually T † ) on the encoded
qubit, which is called a transversal T gate. The general
In this section, we discuss designs of tile blocks that structure of a distillation circuit based on a code with
are used for magic state distillation. This is necessary, transversal T gates is shown in Fig. 12 for the example
because with surface codes, the initialization of non- of 15-to-1. It consists of four parts: an encoding circuit,
Pauli eigenstates is prone to errors, which means that transversal T gates, decoding and measurement.
π/8 rotations performed using these states may lead The circuit begins with 5 qubits initialized in the |+i
to errors. In order to decrease the probability of such state and 10 qubits in the |0i state. Qubits 1-4, 5 and 6-
an error, magic state distillation [15] is used to con- 15 are associated with the four X stabilizers, the logical
vert many low-fidelity magic states into fewer higher- X operator, and the ten Z stabilizers of the code. The
fidelity states. This requires only Clifford gates (i.e., first five operations are multi-target CNOTs that corre-
Pauli product measurements), so, in principle, any of spond to the code’s encoding circuit. They map the X
the data blocks discussed in the previous section can Pauli operators of qubits 1-4 onto the code’s X stabiliz-
be used for this purpose. However, magic state distilla- ers, the X Pauli of qubit 5 onto the logical X operator
tion is repeated extremely often for large-scale quantum and the Z operators of qubits 6-15 onto the code’s Z
computation, so it is worth optimizing these protocols. stabilizers. Because we start out with +1-eigenstates of
Here, we discuss a general procedure that can be X and Z, this circuit prepares the simultaneous stabi-
10
Figure 13: 15-to-1 distillation circuits that uses 5 qubits and 11 π/8 rotations.
lizer eigenstate corresponding to the logical |+iL state. transversal T gates. In general, a code with mx X sta-
Next, a transversal T gate is applied, transforming the bilizers that uses n qubits to encode k logical qubits
logical state to TL |+iL (actually to TL† |+iL ). Note that yields a circuit of n−mx π/8 rotations on mx +k qubits.
the 15 Zπ/8 rotations are potentially faulty. Finally, the Each of the mx + k qubits are either associated with an
encoding circuit is reverted, shifting the logical qubit in- X stabilizer or one of the k logical qubits. For each of
formation back into qubit 5, and the information about the n qubits of the code, the circuit contains one π/8
the X and Z stabilizers into qubits 1-4 and 6-15. If rotation with an axis that has a Z on each stabilizer or
no errors occurred, qubit 5 is now a magic state T |+i logical X operator that this qubit is part of. In order to
(actually T † |+i). In order to detect whether any of the more easily determine the n − mx rotations, it is useful
15 π/8 rotations were affected by an error, qubits 1-4 to write down an n × (mx + k) matrix that shows the
and 6-15 are measured in the X and Z basis, respec- X stabilizers and logical X operators of the code. For
tively, effectively measuring the stabilizers of the code. 15-to-1, such a matrix could look like this:
Since the code distance is 3, up to two errors can be
detected, which will yield a -1 measurement outcome
on some stabilizers. If any error is detected, all qubits 0 0 0 1 0 0 0 0 1 1 1 1 1 1 1
are discarded and the distillation protocol is restarted. 0
0 1 0 0 1 1 1 0 0 0 1 1 1 1
This way, if the error probability of each of the 15 T M15-to-1 0
= 1 0 0 1 0 1 1 0 1 1 0 0 1 1 (1)
gates is p, the error probability of the output state is 1 0 0 0 1 1 0 1 1 0 1 0 1 0 1
reduced to 35p3 . In other words, this protocol takes 15 0 0 0 0 1 1 1 0 1 1 0 1 0 0 1
magic states with error probability p as an input, and
outputs a single magic state with an error of 35p3 . Each of the first four rows describes one of the four
Simplifying the circuit. Using the commutation X stabilizers of the code, where 0 stands for 1 and 1
rules of Fig. 4b, we can commute the first set of multi- stands for X. For instance, the first row indicates that
target CNOTs to the right. This maps the Zπ/8 rota- the first X stabilizer of this 15-qubit code is 1 ⊗ 1 ⊗ 1 ⊗
tions onto Z-product π/8 rotations. Since controlled- X ⊗ 1 ⊗ 1 ⊗ 1 ⊗ 1 ⊗ X ⊗ X ⊗ X ⊗ X ⊗ X ⊗ X ⊗ X.
Pauli gates satisfy C(P1 , P2 ) = C(P1 , P2 )† , the multi- The rows below the horizontal bar – in this case the
target CNOTs of the encoding circuit will cancel the last row – show the logical X operators of the code.
multi-target CNOTs of the decoding circuit, leaving a The circuit in Fig. 13 is then obtained by placing a |+i
circuit of 15 Z-type π/8 rotations in Fig. 12. state for each row and a π/8 rotation for each column,
Note that qubits 6-15 in this circuit are entirely re- with the axis of rotation determined by the indices in
dundant. They are initialized in a Z eigenstate, are then the column – a 1 for each 0 and a Z for each 1.
part of a Z-type rotation, and are finally measured in
the Z basis, trivially yielding the outcome +1. Since
they serve no purpose, they can simply be removed to 3.2 Triorthogonal codes
yield the five-qubit circuit in Fig. 13, where we have
absorbed the single-qubit π/8 rotations into the initial The aforementioned circuit translation can be applied
|+i states and rearranged the remaining 11 rotations. to any code with transversal T gates. One particu-
This kind of circuit simplification is equivalent to the larly versatile and simple scheme to generate such codes
space-time trade-offs mentioned in Ref. [16] and can be is based on triorthogonal matrices [16, 17], which we
applied to any protocol that is based on a code with briefly review in this section. The first step is to write
11
Figure 14: 20-to-4 distillation circuits that uses 7 qubits and 17 π/8 rotations.
down a triorthogonal matrix G, such as puncture the matrix in Eq. (4) once by removing the
first column, we retrieve the 15-to-1 protocol of Eq. (1).
11111111111111 1 1 We can also puncture it twice by removing the first two
0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
columns. This yields the matrix
G= 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1. (2)
0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
0 0 1 0 0 0 0 1 1 1 1 1 1 1
01010101010101 0 1 0 1 0 0 1 1 1 0 0 0 1 1 1 1
1 0 0 1 0 1 1 0 1 1 0 0 1 1 ,
M14-to-2 = (5)
Triorthogonality refers to three criteria: i) The number 0 0 0 1 1 0 1 1 0 1 0 1 0 1
of 1s in each row is a multiple of 8. ii) For each pair 0 0 0 1 1 1 0 1 1 0 1 0 0 1
of rows, the number of entries where both rows have a
1 is a multiple of 4. iii) For each set of three rows, the which describes a 14-to-2 protocol. The corresponding
number of entries where alls three rows have a 1 is a circuit can be simply read off from this matrix. It is
multiple of 2. In other words, almost identical to the 15-to-1 protocol of Fig. 13, ex-
X cept that the fourth qubit is initialized in the |+i state
∀a : Ga,i = 0 (mod 8) and is not measured at the end of the circuit, but in-
i
X stead outputs a second magic state. However, because
∀a, b : Ga,i Gb,i = 0 (mod 4) (3)
i the code of 14-to-2 has a code distance of 2, the output
error probability is higher, namely 7p2 [17]. Punctur-
X
∀a, b, c : Ga,i Gb,i Gc,i = 0 (mod 2)
i ing the matrix G̃ any further would yield codes with a
A general procedure based on classical Reed-Muller distance lower than 2, precluding them from detecting
codes to obtain such matrices is described in Ref. [16]. errors and improving the quality of magic states. In
After obtaining a triorthogonal matrix, such as the fact, the minimum number of qubits in triorthogonal
one in Eq. (2), the second step is to put it in a row codes was shown to be 14 [32].
echelon form by Gaussian elimination Semi-triorthogonal codes. There are also codes
that are based on “semi-triorthogonal” matrices, where
0000100001111111 all three conditions of Eq. (3) are only satisfied mod-
0 0 0 1 0 0 1 1 1 0 0 0 1 1 1 1 ulo 2. One example is the matrix
0 0 1 0 0 1 0 1 1 0 1 1 0 0 1 1 . (4)
G̃ =
0 1 0 0 0 1 1 0 1 1 0 1 0 1 0 1
1000011101101001 0 0 0 0 0 0 1 1 0 0 1 1 01 1 1 0 1 1 0 1 1 0 1
0 0 0 0 0 1 0 1 0 1 0 1 11 0 1 1 0 1 1 0 1 1 0
The last step is to remove one of the columns that con- 0 0 0 0 1 0 0 1 1 0 0 1 10 1 0 1 1 0 1 1 0 1 1
tains a single 1, i.e., one of the first five columns, which 0 0 0 1 0 0 0 0 1 1 1 1 01 0 0 0 0 0 0 0 0 1 1
,
is also called puncturing. Puncturing an a×b triorthog- 0 0 1 0 0 0 0 0 1 1 1 1 00 0 0 0 0 0 1 1 1 0 0
onal matrix k times yields a code with mx = b − k, 0 1 0 0 0 0 0 0 1 1 1 1 00 0 0 1 1 1 0 0 0 0 0
n = a − k and k logical qubits. The rows of the ma- 1 0 0 0 0 0 0 0 1 1 1 1 10 1 1 0 0 0 0 0 0 0 0
trix after puncturing that contain an even number of 1s (6)
describe X stabilizers, whereas the rows with an odd When this matrix is punctured four times, it yields a
number of 1s describe X logical operators. In terms of code that can be used for a 20-to-4 protocol. A scheme
distillation protocols, a code described by such a ma- to generate such matrices for 3k+8-to-k distillation is
trix can be used for n-to-k distillation. Indeed, if we shown in Ref. [17]. While semi-triorthogonal codes can
12
(a) Selective π/4 rotation (b) Auto-corrected π/8 rotation
Figure 15: Implementation of the 15-to-1 distillation protocol in our framework. Each time step in (c) corresponds to an auto-
corrected π/8 rotation (b), which in turn is based on selective π/4 rotations (a).
be used the same way for distillation as properly tri- state is consumed. These corrections slow down the pro-
orthogonal codes, their caveat is that the basis of the tocol, because they change the final X measurements
final qubit measurements may be different from X. A to Pauli product measurements. Instead, we use a cir-
procedure to determine this correction is outlined in cuit which consumes a magic state and automatically
Ref. [17]. For the case of the 20-to-4 protocol, the ma- performs the Clifford correction. It is based on the se-
trix that describes the code lective π/4 rotation circuit in Fig. 15a. To perform a
Pπ/4 rotation according to the circuit in Fig. 9c, a |0i
001 100110110110110 1 1 state is initialized and P ⊗ Y is measured, which takes
0 1 0 1 0 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1
1. However, the π/4 rotation is only performed if the
1 0 0 1 1 0 0 1 1 1 0 1 1 0 1 1 0 1 1 0
|0i qubit is measured in X afterwards. If, instead, it is
M = 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 , measured in Z, the qubit is simply discarded without
20-to-4
0 0 0 0 1 1 1 1 0 0 0 0 0 0 1 1 1 0 0 0
performing any operation. In other words, the choice
of measurement basis determines whether a Pπ/4 or a 1
0 0 0 0 1 1 1 1 0 0 0 1 1 1 0 0 0 0 0 0
00001111111000000 0 0 0 operation is performed. This can be used to construct
(7) the circuit in Fig. 15b. Here, the first step to perform a
can be straightforwardly translated into the circuit in Pπ/8 gate is to measure P ⊗ Z between the qubits and
Fig. 14. However, in this case, the three measurements a magic state |mi, and simultaneously measure Z ⊗ Y
at the end of the circuit are Z ⊗ Z ⊗ X, X ⊗ Z ⊗ Z and between |mi and |0i. If the outcome of the first mea-
X ⊗ X ⊗ X. The output error rate for 20-to-4 is 13p2 on surement is +1, no Clifford correction is required and
any of the four qubits. Note that 3k+8-to-k protocols |0i is read out in Z. If the outcome is -1, |0i is measured
can be modified to 3k+4-to-k [32–34]. in X, yielding the required Clifford correction.
This can be used to implement the 15-to-1 protocol
3.3 Surface-code implementation of Fig. 13 in 11 using 11 tiles, as shown in Fig. 15c.
Four qubits are initialized in |mi, and a fifth in |+i.
Having outlined the general structure of distillation pro- A 2 × 2 block of tiles to the left is reserved for the
tocols, we now discuss their implementation with sur- |mi and |0i qubits of the auto-corrected π/8 rotations.
face codes. Distillation protocols are particularly sim- Two additional tiles are used for the ancilla qubit of
ple quantum circuits, since they exclusively consist of the Pauli product measurement protocol. In step 2, the
Z-type π/8 rotations. Therefore, we can use a construc- first π/8 rotation (1 ⊗ 1 ⊗ Z ⊗ Z ⊗ Z)π/8 is performed.
tion similar to the compact data block, and still only Depending on the measurement outcome of step 2, the
require 1 per rotation. We first discuss the example |0i ancilla is read out in the X or Z basis. This is
of 15-to-1 distillation. repeated 11 times, once for each of the 11 rotations in
Because the distillation circuit is relatively short, it Fig. 13. Finally, in step 23, qubits 1-4 are measured in
is useful to avoid the Clifford corrections of Fig. 7 that X. If all four outcomes are +1, the distillation protocol
may be required with 50% probability after a magic yields a distilled magic state in tile 5. Since 11 tiles are
13
Final measurements 0 Step 1 17 Step 34 17 Step 35
Figure 16: Implementation of the 20-to-4 protocol in our framework. The final measurements correspond to the last three
measurements of the circuit in Fig. 14.
used for 11, the space-time cost is 121d3 in terms of ∼(1 − p)n , since any error will result in failure. There-
(physical data qubits)·(code cycles) to leading order. fore, such a protocol distills k magic state on average
Caveat. Even though our leading-order estimate of every (n−mx )/(1−p)n time steps. Thus, the space-time
the time cost of 11d code cycles is correct, the full time cost per magic state is
cost also contains contributions that do not scale with d.
[1.5(mx + k) + 4](n − mx )
The two processes that may require special care in the cost(n, mx , k, p) = . (8)
magic state distillation protocol are state injection and k(1 − p)n
classical processing. Every 1 requires the initialization In order to minimize the space-time cost for distillation
of a magic state and a short classical computation to de- in our framework, one should pick a distillation protocol
termine whether the |0i state needs to be measured in that minimizes this quantity for a given input and target
X or Z. While neither of these processes scales with d, error rate.
they can slow down the distillation protocol, depending 20-to-4 protocol. The previous estimate is only
on the injection scheme and the control hardware that valid for triorthogonal codes. With semi-triorthogonal
is used. This slowdown can be avoided by using addi- codes, additional time steps may be necessary to per-
tional 2 × 2 blocks of |0i-|mi pairs, as shown in Fig. 17 form the final measurements. The example of the 20-
for one additional block. Here, the left and right block to-4 protocol is shown in Fig. 16. Because the three
can be used in an alternating fashion, i.e., the left block qubits that are measured are discarded at the end of
for rotations 1, 3, 5, . . . and the right block for rotations the protocol, the three Pauli products can be measured
2, 4, 6, . . . While one block is being used for a rotation, in 2 instead of 3 as in Fig. 14. For this, the oper-
the other one can be used to prepare a new magic state ator Z ⊗ Z ⊗ X is measured in the first step. In the
and to process the measurement outcomes of the previ- second step, X ⊗ 1 ⊗ 1 and 1 ⊗ Z ⊗ Z are measured si-
ous rotation. multaneously. Their product yields one of the required
General space-time cost. The scheme of Fig. 15 measurements. Finally, qubits 2 and 3 are measured in
can be used to implement any protocol based on a X at no time cost. Multiplying these two results with
triorthogonal code. For an n-qubit code with k log- the X measurement in the previous step yields the final
ical qubits and mx X stabilizers, the protocol uses X ⊗ X ⊗ X measurement. Thus, the 20-to-4 protocol
1.5(mx + k) + 4 tiles for (n − mx ) . In this time, requires 17 for the π/8 rotations and 2 for the final
it distills k magic states with a success probability of measurements. With a space cost of 14 tiles, the total
space-time cost is 266d3 .
3.4 Benchmarking
We can use the previously described 15-to-1 and 20-
to-4 schemes to benchmark our implementations. In
Ref. [36], these schemes were implemented with lattice
Figure 17: Two 2 × 2 ancilla blocks can be used to prevent surgery and their cost compared to implementations
state injection and classical processing from slowing down the based on braiding of hole defects. In addition, the 7-
15-to-1 protocol. to-1 scheme was considered, which is a scheme to distill
14
ancilla
ancilla
ancilla
ancilla
ancilla
ancilla
ancilla
ancilla
ancilla
ancilla
ancilla
ancilla
ancilla
Figure 18: 176-tile block that can be used for 225-to-1 distillation. The qubits highlighted in red are used for the second level of
the distillation protocol. The blue ancilla is used to move level-1 magic states into the two |mi-|0i blocks of the level-2 distillation.
|Y i states. The distillation of these states is not neces- Concatenation. In the 15-to-1 protocol, we use 15
sary in our framework, but for benchmarking purposes undistilled magic states to obtain a distilled magic state
we show the 7-to-1 protocol in Appendix D. It can be with an error rate of 35p3 . If we perform the same pro-
implemented using 7 tiles for 4, i.e., with a space-time tocol, but use 15 distilled magic states from previous
cost of 28d3 . 15-to-1 protocols as inputs, the output state will have
We summarize the leading-order space-time costs an error rate of 35(35p3 )3 = 1500625p9 . This corre-
of the three protocols in Table 1. The comparison sponds to a 225-to-1 protocol obtained from the con-
shows drastic reductions in space-time cost compared catenation of two 15-to-1 protocols. It is also possible
to schemes based on braiding of hole defects and com- to concatenate protocols that are not identical. Strate-
pared to other approaches to optimizing lattice surgery. gies to combine high-yield and low-yield protocols are
Compared to the braiding-based scheme, the space-time discussed in Ref. [17].
cost of 7-to-1, 15-to-1 and 20-to-4 is reduced by 60%, In Fig. 18, we show an unoptimized block that can
84% and 89%, respectively. be used for 225-to-1 distillation. It consists of 11 15-to-
1 blocks that are used for the first level of distillation.
3.5 Higher-fidelity protocols Since each of these 11 blocks takes 11 to finish, they
can be operated such that exactly one of these blocks
So far, we have only explicitly discussed protocols that finishes in every time step. Therefore, in every time
reduce the input error to ∼p2 or ∼p3 . There are two step, one first-level magic state can be used for second-
strategies to obtain protocols with a higher output fi- level distillation by moving it into one of the two level-2
delity: concatenation and higher-distance codes. |mi-|0i blocks via the blue ancilla. The qubits that are
used for the second level are highlighted in red. Note
that since, for the second level, the single-qubit π/8
7-to-1 15-to-1 20-to-4 rotations require distilled magic states, the 15-to-1 pro-
Hole braiding [19, 35] 70d 3
750d 3
2344d3 tocol of Fig. 13 requires 15 rotations instead of just 11.
Therefore, the entire protocol finishes in 15 using 176
Lattice surgery [36] 140d3 540d3 1134d3 tiles with a total space-time cost of 2640d3 .
Our framework 28d3 121d3 266d3 Higher-distance codes. Alternatively, we can use
a code that produces higher-fidelity states. In Ref. [16],
Table 1: Comparison of the leading-order space-time cost of 7- several protocols based on punctured Reed-Muller codes
to-1, 15-to-1 and 20-to-4 with defect-based schemes, optimized are discussed. One of these protocols is a 116-to-12
lattice surgery in Ref. [36] and our schemes. The space-time protocol based on a code with n = 116, k = 12 and
cost is in terms of (physical data qubits)·(code cycles). mx = 17. It yields 12 magic states which each have an
15
Summary. The class of magic state distillation pro-
ancilla 2
tocols that are based on an n-qubit error-correcting
code with mx X stabilizers and k logical qubits can
be implemented using 1.5(mx + k) + 4 tiles and n − mx
ancilla 1 time steps. Such protocols output k magic states with
a success probability of (1 − p)n . Therefore, if the in-
put fidelity and desired output fidelity are known, the
distillation protocol should minimize the cost function
given in Eq. (8).
Figure 19: 81-tile block that can be used for the 116-to-12
protocol. Here, two π/8 rotations can be performed at the 4 Trade-offs limited by T count
same time, where one rotation uses the ancilla space denoted
as ancilla 1, and the other one uses ancilla 2. Having discussed data blocks and distillation blocks in
the previous two sections, we are now ready to piece
them together to a full quantum computer. In order to
illustrate the steps that are necessary to calculate the
error rate of 41.25p4 . According to Eq. (8), this pro- space and time cost of a computation and to trade off
tocol can be implemented using 44 tiles for 99 with space against time, we consider an example computa-
a space-time cost of 363d3 per output state and a suc- tion with a T count of 108 and a T depth of 106 . We
cess probability of (1 − p)112 . For protocols with a high consider error rates of p = 10−3 and p = 10−4 . This er-
space cost such as 116-to-12, the space-time cost can be ror rate is assumed to be the physical error rate per code
slightly reduced by introducing additional ancilla space, cycle of every physical qubit, as well as the error rate of
such that two operations can be performed simultane- undistilled magic states. To calculate concrete numbers,
ously. One possible configuration is shown in Fig. 19. we assume that the quantum computer can perform a
This increases the space cost to 81 tiles, but reduces code cycle every 1 µs. We want to perform the 108 -T -
the time cost to 50, with a total space-time cost of gate computation in a way that the probability of any
337.5d3 per output state. one of the T gates being affected by an error stays be-
Input-to-output ratio is not everything. A pop- low 1%. In addition, we require that the probability of
ular figure of merit when comparing n-to-k distillation an error affecting any of the logical qubits encoded in
protocols is the ratio n/k. One of the protocols in surface-code patches stays below 1%. This results in a
Ref. [16] is a 912-to-112 protocol with n = 912, k = 112 2% chance that the quantum computation will yield a
and mx = 64, which yields 112 output state, each with wrong result. In order to exponentially increase the pre-
an error rate of 10.63p6 . While the output fidelity is cision of the computation, it can be repeated multiple
not as high as for 225-to-1, the input-to-output ratio is times or run in parallel on multiple quantum computers.
much higher. For p = 10−3 , the output fidelity of 225-
to-1 is ∼1.5 × 10−21 , while it is only ∼10−17 for 912-
4.1 Step 1: Determine distillation protocol
to-112. Therefore, if input-to-output ratio were a good
figure of merit, we would expect the 912-to-112 proto- The first step is to determine which distillation protocol
col to be considerably less costly compared to 225-to-1. is sufficient for the computation. In order to stay below
If we use an implementation in the spirit of Fig. 19, 1% error probability with 108 T gates, each magic state
the space cost is roughly 2.5(mx + k) tiles and the pro- needs to have an error rate below 10−10 . For p = 10−4 ,
tocol takes (n − mx )/2 time steps. Thus, 912-to-112 the 15-to-1 protocol is sufficient, since it yields an out-
uses 440 tiles for 424. This would put the space-time put error rate of 35p3 = 3.5 · 10−11 . For p = 10−3 ,
cost per state at 1665d3 , which is indeed lower than 15-to-1 is not enough. On the other hand, two levels of
that of 225-to-1. However, the success probability of 15-to-1, i.e., 225-to-1, yield magic states with an error
912-to-112 for p = 10−3 is only at ∼40%, which more rate of 1.5 · 10−21 , which is many orders of magnitude
than doubles the actual space-time cost. On the other above what is required. A less costly protocol is 116-
hand, the space-time cost of 225-to-1 is barely affected to-12, which yields output states with an error rate of
by the success probability, as each of the level-1 15-to- 41.25p4 = 4.125 · 10−11 , which suffices for our purposes.
1 blocks finishes with 98.5% success probability. This
means that, with 1.5% probability, a time step of 225-to-
4.2 Step 2: Construct a minimal setup
1 is skipped, since the necessary level-1 state is missing.
This only increases the space-time cost from 26403 to In order to determine the necessary code distance, we
2680d3 , implying that n/k is not a good figure of merit. first construct a minimal setup, i.e., a configuration of
16
(a) Minimal setup for p = 10−4 (a) Intermediate setup for p = 10−4
17
(a) Fast setup for p = 10−4 (b) Fast setup for p = 10−3
Figure 22: Fast setups using fast data blocks and 11 15-to-1 distillation blocks for p = 10−4 or 5 116-to-12 distillation block for
p = 10−3 .
final error probability of 0.2%. The number of physi- to 5.5 per state. However, the compact block can
cal qubits used in the minimal setup can be calculated only consume magic states at 9 per state. In order to
as the number of tiles multiplied by 2d2 , taking mea- avoid this bottleneck, we can use the intermediate data
surement qubits into account. The minimal setup for block instead, which occupies 204 tiles, but consumes
p = 10−4 uses 164 · 2 · 132 ≈ 55,400 physical qubits and one magic state every 5. With 22 tiles for distillation
finishes the computation in 13·11·108 code cycles. With (see Fig. 21), this setup uses 226 tiles and finishes the
1 µs per code cycle, this amounts to roughly 4 hours. computation after 5.5·108 time steps. This increases the
For p = 10−3 , the condition changes to qubit number to 76,400, but reduces the computational
time to 2 hours.
210 · 9.23 · 108 × d · pL (10−3 , d) < 0.01 , (11)
For p = 10−3 , the addition of a distillation block
which is satisfied for d = 27 with a final error probability
reduces the distillation time to 4.62. At this point,
of 0.5%. The final error probability for d = 25 is at
one should switch to the more efficient 116-to-12 block
4.8%. Thus, the minimal setup uses 210 · 2 · 272 ≈
of Fig. 19, which uses 81 tiles and distills a magic state
306,000 physical qubits and finishes the computation in
on average every 4.66. The intermediate data block
27 · 9.23 · 108 code cycles, which amounts to roughly
cannot keep up with this distillation rate, but we can
7 hours. Note that, in principle, a success probability
still use it to consume one magic state every 5 instead
of less than 50% would be sufficient to reach arbitrary
of 4.66. Such a configuration uses 228 data tiles, 81
precisions by repeating computations or running them
distillation tiles and 13 storage tiles, i.e., a total of 322
in parallel. This means that the code distances that we
tiles corresponding to approximately 469,000 physical
consider may be higher than what is necessary.
qubits. The computational time reduces to 5 · 108 time
steps, i.e., 3.75 hours. Note that in Fig. 21b, the 12
4.4 Step 4: Add distillation blocks output states of the 116-to-12 protocol should be chosen
Only a small fraction of the tiles of the minimal setup is as 1, 3, 5, . . . , 25. They can be moved into the green
used for magic state distillation, i.e., 6.7% for p = 10−4 storage space in the last step of the protocol, since the
and 21% for p = 10−3 . On the other hand, adding one space denoted as ancilla 2 in Fig. 19 is not being used
additional distillation block doubles the rate of magic in the last step.
state production, potentially doubling the speed of com- Trade-offs down to 1 per T gate. Adding addi-
putation. Therefore, in order to speed up the computa- tional distillation blocks can reduce the time per T gate
tion and decrease the space-time cost, we add additional down to 1. For p = 10−4 , 11 distillation blocks pro-
distillation blocks to our setup. duce 1 magic state every 1. To consume these magic
For p = 10−4 , adding one more distillation block re- states fast enough, we need to use a fast data block.
duces the time that it takes to distill a magic state This fast block uses 231 tiles and the 11 distillation
18
blocks together with their storage tiles use 11∗12 = 132 der to fully exploit the space-time trade-offs discussed
tiles, as shown in Fig. 22a. With a total of 363 tiles, this in this section, the input circuit should be optimized for
setup uses 123,000 qubits and finishes the computation T count.
in 108 , i.e., in 21 minutes and 40 seconds.
For p = 10−3 , parallelizing 5 distillation blocks pro-
duces a magic state every 0.924. This is faster than 5 Trade-offs limited by T depth
the fast block can consume the states, but allows for
the execution of a T gate every 1. With 231 tiles for In the previous section, we parallelized distillation
the fast block, 405 distillation tiles and 60 storage tiles, blocks to finish computations in a time proportional to
the total space cost is 696 tiles. The setup shown in the T count. In this section, we combine the previous
Fig. 19b contains four unused tiles to make sure that constructions of data and distillation blocks to what we
all storage lines are connected to the data block. Stor- refer to as units. By parallelizing units, we exploit the
age lines need to be connected to the ancilla space of the fact that, in our example, the 108 T gates are arranged
data block either directly, via other storage lines or via in 106 layers of 100 T gates to finish the computation
unused tiles. In any case, this corresponds to roughly in a time proportional to the T depth. We first slightly
1,020,000 physical qubits. The computation finishes af- increase the space-time cost compared to the previous
ter 45 minutes. section, in order to speed up the computation down to
one measurement per T layer. In this sense, we imple-
Avoiding the classical overhead. Every con-
ment Fowler’s time-optimal scheme [20].
sumption of a magic state corresponds to a Pauli prod-
uct measurement, the outcome of which determines
whether a Clifford correction is required. This cor- 5.1 T layer parallelization
rection is commuted past the following rotations, po-
tentially changing the axis of rotation. Therefore, the The main concept used to parallelize T layers is quan-
computation cannot continue before the measurement tum teleportation. The teleportation circuit is shown
outcome is determined. This involves a small classical in Fig. 23a. It
√ starts with the generation of a Bell pair
computation to process the physical measurements (i.e., (|00i+|11i)/ 2 by the Z ⊗Z measurement of |+i⊗|+i.
decoding and feed-forward), which could slow down the An arbitrary gate U is performed on the second half of
quantum computation. In order to avoid this, the magic the Bell pair. Next, a qubit |ψi and the first half of the
state consumption can be performed using the auto- Bell pair are measured in the Bell basis, i.e., in X ⊗ X
corrected π/8 rotations of Fig. 15b. Here, the classi- and Z ⊗ Z. After the measurement, the first two qubits
cal computation merely determines, whether the ancilla are discarded and |ψi is teleported to the third qubit
qubit – which we refer to as the correction qubit |ci – is through the gate U . This means that the output state
measured in the X or Z basis. While this classical com- is U |ψi, if the teleportation is successful. However, it
putation is running, the magic state for the following is only successful, if both Bell basis measurements yield
π/8 rotation can be consumed, as the auto-corrected a +1 outcome. In the other three cases, the teleported
rotation involves no Clifford correction. This means state is U X |ψi, U Y |ψi or U Z |ψi. Note that the cor-
that distillation blocks should output |mi − |ci pairs, rection operation to recover the state |ψi is not a Pauli
for which we construct modified distillation blocks in
the following section. If the classical computation is, (a) Teleportation circuit
on average, faster than 1 (i.e., d code cycles), then
classical processing does not slow down the quantum
computation in the T -count-limited schemes.
Summary. Data blocks combined with distillation
blocks can be used for large-scale quantum computing.
The first step is to determine a sufficiently high-fidelity
distillation protocol. Next, one constructs a minimal (b) Teleportation through a π/8 rotation
setup from a compact data block and a single distilla-
tion block to upper-bound the required code distance.
Finally, one can trade off space against time by using Figure 23: (a) Circuit for quantum teleportation of |ψi through
fast data blocks and adding more distillation blocks. a gate U . Only if both Bell basis measurement yield +1, the
This can reduce the time per T gate down to 1. In teleported state is U |ψi. If Z ⊗ Z = −1, the state is U X |ψi.
our example, the trade-off also reduces the space-time If X ⊗ X = −1, the state is U Z |ψi. If both measurements
cost compared to the minimal setup by a factor of 5 for yield -1, the state is U Y |ψi. (b) If U is a π/8 rotation, the
p = 10−4 and by a factor of 2.8 for p = 10−3 . In or- corrective Paulis change Pπ/8 to P−π/8 .
19
(a) Clifford+T circuit (b) Post-corrected π/8 rotation
| {z } | {z } | {z }
layer 1 layer 2 layer 3
Figure 24: Time-optimal implementation of a three-qubit quantum computation consisting of 9 T gates in 3 T layers. Post-
corrected π/8 rotations (b) can be used to decide at a later point, whether the performed operation was a Pπ/8 or a P−π/8
rotation.
operation P , but instead U P U † , which, in general, is as ecute multiple T layers simultaneously. If U is a product
difficult to perform as U itself. of mutually commuting π/8 rotations, i.e., a T layer,
If U is a Pπ/8 rotation, as in Fig. 23b, the Pauli er- the teleportation corrections replace all π/8 rotations
rors change Pπ/8 to P−π/8 up to a Pauli correction. with post-corrected rotations. An example is shown in
Since it is only after the Bell basis measurement that Fig. 24 for a three-qubit computation of three T layers,
we know, whether we should have performed a Pπ/8 or where all three T layers are executed simultaneously.
a P−π/8 gate, we use post-corrected π/8 rotations in The reason why we can only group up T gates that are
Fig. 24b, which are similar to the auto-corrected rota- part of the same layer is that otherwise the Pauli correc-
tions of Fig. 15b. The post-corrected rotation uses a tions of the post-corrected rotation would not commute
resource state consisting of two qubits, a magic state with the other rotations. The time-optimal circuit con-
|mi and a second qubit that we refer to as a correction sists of three steps: The preparation of Bell pairs for
qubit |ci. The resource state is generated by initializing each T layer, the application of T gates, and a set of fi-
|ci in |0i and measuring Z ⊗ Y between |mi and |ci. In nal Bell measurements. At this point, the computation
order to perform a post-corrected π/8 rotation, the re- is not finished, as we still need to measure the correction
source state is consumed by measuring P ⊗ Z involving qubits of the post-corrected rotations. Because these in-
the magic state, and measuring |mi in X. The correc- volve potential Pauli corrections, the correction qubits
tion qubit |ci is stored for later use. It can be used at of the different T layers need to be measured one after
a later moment to decide, whether the rotation should the other. Thus, every T layer is executed one after the
have been a +π/8 or −π/8 rotation by measuring |ci other, where each execution requires the time that it
either in the Z or X basis. Depending on the measure- takes to measure the correction qubits and perform the
ment outcome, a Pauli correction may be required. classical processing to determine the next set of mea-
The time-optimal circuit. This can be used to ex- surements from the Pauli corrections. We refer to this
20
Figure 25: An example of a time-optimal circuit using four units. In this case, each unit consists of six qubits, i.e., it is a three-qubit
quantum computation, where three T layers can be executed simultaneously.
time as tm . In other words, any Clifford+T circuit con- next unit preparation. For the first and last block, on
sisting of nL T layers can be executed in nL · tm , inde- the other hand, the required storage space is halved.
pendent of the code distance, which is the main feature In the following, we will show how to prepare units
of the time-optimal scheme [20]. in our framework. We find that, for our examples, unit
preparation takes 113. If tm = 1 µs, then nmax is
The circuit in Fig. 24c naively requires 2n · nL qubits
∼1500 for p = 10−4 and ∼3000 for p = 10−3 . Indepen-
for an n-qubit computation, which scales with the
dently of the error rate, the computational time drops
length of the computation. Since we only have a finite
to one second.
number of qubits at our disposal, our goal is to imple-
ment the circuit in Fig. 25 instead. Here, the qubits
form groups of 2n qubits. We refer to each of these 5.2 Units
groups as a unit. Using nu units, nu −1 layers of T gates
Units differ from the fast setups in Fig. 22 in three as-
can be performed at the same time. In the circuit, the
pects. First, the number of qubits stored in the data
steps of Bell state preparation (BP ), post-corrected T
block is doubled. Secondly, the distillation protocols are
layer execution (T ) and Bell basis measurement (BM )
modified to output |mi-|ci pairs, instead of just magic
are performed repeatedly until the end of the computa-
states |mi. Thirdly, in order to store correction qubits
tion. We refer to the block of operations (BP -T -BM )
|ci, additional space is required. Contrary to magic-
as unit preparation. Every time that unit preparation is
state storage tiles, correction-qubit storage tiles do not
finished, all qubits except for the correction qubits (not
need to be connected to the data block’s ancilla region.
shown in Fig. 25) and half of the qubits of the last unit
Modified distillation blocks. In order to have dis-
are discarded. At this point, the next set of unit prepa-
tillation blocks output |mi-|ci pairs, extra tiles and op-
rations begins. Simultaneously, the correction qubits of
erations are required. We show the necessary modifi-
the recently finished units are measured one after the
cations for the example of 15-to-1 and 116-to-12 distil-
other, which has a time cost of (nu −1)·tm . This means
lation. A modified 15-to-1 block is shown in Fig. 26a.
that the number of units can be increased to speed up
Apart from the standard 11 distillation tiles (orange)
the computation, until (nu − 1) · tm reaches the time
and one magic-state storage tile (green), it also contains
that it takes to prepare a unit tu . At this maximum
19 correction-qubit storage tiles (purple) and an addi-
number of units tmax = tu /tm + 1, a T layer is executed
tional tile (gray) that is used for neither distillation nor
every tm and the computation cannot be sped up any
storage. The additional steps that modify the protocol
further in the Clifford+T framework.
are shown in Fig. 26c, which zooms into the highlighted
Note that the first and last unit differ from the other region of Fig. 26a. Step 1 of the shown protocol is right
units. While all other units need to execute nT T gates as the distillation finishes after 11. The patch of the
every tu , the first and last unit need to execute nT T output state is deformed in step 2, and an additional
gates only every 2tu , where nT is the number of T gates qubit |ci is initialized in the |0i state. The Y ⊗ Z op-
per layer. Furthermore, the other blocks need to be able erator between |ci and |mi is measured in step 3. In
to store up to 2nT correction qubits, since, after the end step 4, the correction qubit is sent to storage. Finally,
of a unit preparation, nT correction qubits are stored, in step 5, the magic state |mi is moved to its storage
and may need to remain stored until the end of the tile. This operation blocks one of the orange tiles that is
21
(a) Modified 15-to-1 block (c) Modified 15-to-1 protocol
11 Step 1 12 Step 2
Figure 26: Modified 15-to-1 distillation blocks (a) output a |mi-|ci pair every 11. After the end of the distillation protocol, four
additional steps (c) are necessary. The modified 116-to-12 distillation block (b) finishes after 53, due to the three additional
steps in (d).
used for the distillation protocol for 4. Still, this does and a number of distillation blocks. Since we will show
not slow down 15-to-1 distillation, since the first 4 rota- that unit preparation takes 113 in our case, the num-
tion of the protocol in Fig. 13 can be chosen, such that ber of distillation blocks is chosen such that at least
the output qubit is not needed. Therefore, the modified 100 |mi-|ci pairs can be distilled in 113. A full time-
distillation block outputs one |mi-|ci pair every 11. optimal quantum computer consists of a row of multiple
For 116-to-12 distillation, a modified block is shown units, see Fig. 28c. The units shown in the figure con-
in Fig. 26b. We arrange the qubits, such that the 12 out- tain some unused tiles. This gives the units a rectangu-
put states are found in the positions shown in step 1 of lar profiles, even though this is not necessarily required.
Fig. 26d. Using 2, correction qubits are prepared and
Y ⊗ Z operators are measured. Finally, the patches are 0 Step 1
deformed back to square patches and all magic states
are sent to the green storage, while all correction qubits
are sent to the purple storage. This adds 3 to the pro-
tocol, meaning that this block outputs 12 |mi-|ci pairs
every 53 with a success probability of (1 − p)112 . For 1 Step 2 1 Step 3
p = 10−3 , this corresponds to one output every 4.94.
As mentioned in Sec. 4, modified distillation blocks
can also be used with setups, in which T gates are per-
formed one after the other, in order to deal with slow
classical processing. In this case, only one correction 2 Step 4 2 Step 5
qubit storage tile per magic state is required.
Units. Modified distillation blocks together with fast
data blocks are what we refer to as units. The units for
our example computation for p = 10−3 and p = 10−4
are shown in Fig. 28a-b. They both consist of a 200-
qubit fast data block, 200 correction-qubit storage tiles, Figure 27: Bell basis measurement (BM ) in 2.
22
(a) Unit for p = 10−3
unit 1
unit 2
unit 3
unit 4
Figure 28: Units consist of fast data blocks, modified distillation blocks and storage tiles. (a) The unit for p = 10−3 consists of
54 × 21 = 1134 tiles. (b) For p = 10−4 , the number of tiles is 37 × 21 = 777. (c) A time-optimal setup consists of a row of
multiple units, which means that the space to the bottom and top of the fast data blocks needs to remain free.
In our case, the units have a footprint of 54 × 21 and This arrangement of qubits implies that, for every
37 × 21 tiles, respectively. Note that the first and last six-corner patch, one of the qubits needs to be part of a
unit of a time-optimal setup are smaller, as they only Bell state preparation (BP ) with the neighboring unit
require 100 correction-qubit storage tiles and half the to the top, and the other with a neighboring unit to the
number of distillation blocks. bottom. For an n-qubit quantum computation,
√ this Bell
Unit preparation. In order to implement the time- state preparation can be performed in n+1 time steps,
optimal circuit of Fig. 25 with the setup of Fig. 28, we as we show in Fig. 29 for the example of n = 9. For this,
show protocols that can be used for the BP -T -BM op- every qubit is initialized in the |+i state. The Bell state
erations. The data blocks of every unit store 2n qubits preparation requires a series of Z ⊗ Z measurements.
in n six-corner patches. We arrange the qubits in such The protocol in Fig. 29 shows that, since an n-qubit
a way that the the final Bell measurements (BM ) are computation √ implies that the number of rows of the
Z ⊗ Z and X ⊗ X measurements of the two qubits of data
√ block is n, these measurements require a total of
every six-corner patch. This Bell measurement can be n + 1 time steps.
done in 2, as shown in Fig. 27. In total, the unit preparation of an n-qubit computa-
23
1 Step 1 2 Step 2 3 Step 3 4 Step 4 (a) Distributed quantum computing
ent. dist .
. ent. dist
unit un it
(b) effective circuit
ent. dist .
. ent. dist
Bell pairs Bell pairs
ent. dist. ent. dist.
unit unit
ent. dist. ent. dist.
Bell pairs Bell pairs
. ent. dist
ent. dist .
unit unit
Figure 29: Bell state preparation (BP ) for a 9-qubit compu-
. ent. dist
tation (18 qubits per unit) in 4. All six-corner patches are ent. dist .
initialized in the |+i⊗2 state. Each red arrow is a Z ⊗ Z mea-
surement between the two qubits at the ends of the arrow. For
√
n-qubit computations, this requires n + 1 time steps.
Figure 30: Scheme for distributed quantum computing in a
circular arrangement of quantum computers with the ability
√ to share Bell pairs between nearest neighbors. If the Bell-pair
tion with nT T gates per layer requires n+1 time steps
fidelity is low, entanglement distillation (ent. dist.) can be used
for the Bell state preparation, nT time steps for the exe-
to increase the fidelity. This scheme effectively implements the
cution of the T layer, and 2 time steps
√ for the Bell basis circular time-optimal circuit drawn schematically in (b).
measurement, i.e., a total of nT + n + 3 time steps. In
our example, this amounts to 113, which corresponds
to tu = 1469 µs for p = 10−4 and tu = 3051 µs for each other. This implies that, if Bell pairs can be shared
p = 10−3 . Thus, time optimality is reached with 1470 between different quantum computers, each unit can be
units for p = 10−4 and 3052 units for p = 10−3 . located in a separate quantum computer. The shared
Space-time trade-offs. Of course, it is also possi- Bell pairs do not even need to have a high fidelity, as
ble to use fewer units than required for time optimality. software-based entanglement distillation [37, 38] can be
Using nu units means that nT · (nu − 1) T gates are per- used to convert a large number of low-fidelity Bell pairs
formed every tu . In our example, 100 · (nu − 1) T gates into fewer high-fidelity Bell pairs. Recent experiments
are performed every 113. With three units, the com- have made progress towards generating entanglement
putational time drops to 56.5% of the computational between different superconducting chips [39–41].
time of the fast setup in Fig. 22. With ten units, it drops For the time-optimal scheme, quantum computers
to 11%. The number of qubits per unit is ∼260,000 may be arranged in a circle as shown in Fig. 30a,
for p = 10−4 and ∼1,650,000 for p = 10−3 , so going with the ability to share Bell pairs between neighboring
from the fast setup to parallelized units is, initially, not quantum computers. This effectively implements the
a favorable space-time trade-off. Since the space-time circuit that is schematically drawn in Fig. 30b. Note
cost has increased compared to the fast setup, it is also that in this circuit, there is no first and last unit. Here,
useful to check whether the code distance needs to be every unit performs nT π/8 rotations every tu . There-
readjusted. If we use three units – ignoring that the first fore, time optimality is reached with one fewer unit, and
and last unit are, in principle, smaller – the space-time each unit only needs to store nT correction qubits in-
cost is still below the space-time cost of the minimal stead of 2nT . With only 100 correction-qubit storage
setup in both cases. Adding more units significantly tiles and ignoring the unused tiles, the qubit count of
improves the space-time cost. It is also a prescription the units in Fig. 28 drops to ∼220,000 for p = 10−4 and
to linearly speed up the quantum computer down to the ∼1,470,000 for p = 10−3 , which are the numbers that
time-optimal limit. we report in Fig. 3. Thus, if nearest-neighbor communi-
cation between quantum computers is feasible, already
5.3 Distributed quantum computing fewer than 2 million physical qubits per quantum com-
puter can be used to implement the full time-optimal
Note that, apart from the initial sharing of entangled scheme with 1500-3000 quantum computers.
Bell pairs, the units operate entirely independently of Entanglement distillation increases the qubit count.
24
Note that it does not slow down the computation, as
Bell pairs do not need to be distilled instantly. Entan-
glement distillation can take up to tu to distill the nT
Bell pairs required per entanglement distillation block.
Summary. In order to speed up an n-qubit quan-
tum computation beyond 1 per T gate, we parallelize
T layers using units. With an average
√ of nT T gates per
layer, a unit consist of 4n + 4 n + 1 tiles for the data
block, 2nT storage tiles for the correction qubits, and | {z } | {z }
enough distillation blocks to distill nT |mi-|ci pairs
√ in
layer 1 layer 2
the time it takes to prepare a unit, which is nT + n + 3 Figure 31: Clifford+ϕ circuit. The first two rotation layers (ϕ
time steps. If the unit preparation time is tu and the layers) with three rotations per layer are shown.
time for single-qubit measurements and classical pro-
cessing is tm , a time-optimal setup consists of tu /tm + 1
units, executing one T layer every tm . Using fewer units to consider additional resources for gates other than T
results in a linear space-time trade-off. With nu units, gates.
nT · (nu − 1) T gates are performed in tu . A circular ar-
rangement of units can be used for distributed quantum 6.1 Clifford+ϕ circuits
computing. This also reduces the number of correction-
Instead of requiring an input circuit that consists of
qubit storage tiles to 1nT and the number of units in a
Clifford gates and π/8 rotations, we consider circuits
time-optimal setup to tu /tm . In order to fully exploit
that consist of Clifford gates and arbitrary ϕ rotations,
the space-time trade-offs discussed in this section, the
which we call Clifford+ϕ circuits. Using the procedure
input circuit should be optimized for T depth.
in Sec. 1, Clifford gates can be commuted to the end of
the circuit, such that we end up with a circuit like the
6 Trade-offs beyond Clifford+T one in Fig. 31. Rotations that mutually commute can
be grouped up into layers. The algorithm of Sec. 1 can
Under the assumption that measurements and feed- also be used to reduce the number of layers. It can even
forward can be done in 1 µs, we described how to per- reduce the number of rotations, since, if two rotations
form a 108 -T -gate computation in just 1 second. A more Pϕ1 and Pϕ2 with the same axis of rotation are moved
conservative assumption would be a measurement and into the same layer, they can be combined into a single
feed-forward time of 10 µs, which increases the compu- rotation Pϕ1 +ϕ2 . Clifford+ϕ circuits are characterized
tation time to 10 seconds. Although this seems fast, by rotation count (or ϕ count) and rotation depth (or ϕ
many quantum computations have T counts that are depth), rather than T count and T depth.
significantly higher than 108 . While the T count of Each ϕ rotation can be performed using a |ϕi =
Hubbard model simulations [2] is indeed in this range, |0i + ei(2ϕ) |1i resource state. When this state is con-
quantum chemistry simulations can be more demand- sumed to perform a Pϕ rotation, there is a 50% chance
ing. In particular, the simulation of FeMoco [1], a struc- that a P−ϕ rotation is performed instead. For π/8 ro-
ture that plays an important role in nitrogen fixation, tations, this is not very problematic, since the correc-
can have a T count of up to 1015 . With a serial execu- tion operation is a π/4 rotation, which can simply be
tion of one T gate every 10 µs, the computation takes commuted to the end of the circuit. For general P−ϕ ,
317 years to finish. Even if the gates are grouped into the correction is a P2ϕ rotation, which requires the use
100 T gates per layer, the computation still takes over of a |2ϕi state. If this fails, the next correction is a
3 years. P4ϕ rotation requiring a |4ϕi state and so on. Thus,
While Clifford+T is a gate set that is very well a wide variety of resource state is required to execute
suited for surface codes, it is often not the gate set arbitrary-angle rotations. These can either be pieced
which is natural to the quantum computations in ques- together from ordinary magic states |mi, or, more effi-
tion. In particular, quantum simulation based on Trot- ciently, distilled using specialized protocols [34, 43].
terization consists of many small-angle rotations. In All the schemes discussed in this work can be used
the Clifford+T framework, each small-angle rotation is with Clifford+ϕ circuits by replacing magic state dis-
translated into a series of T gates via gate synthesis. De- tillation blocks by distillation blocks that produce re-
pending on the desired precision, this can require ∼100 source states for arbitrary-angle rotations. In order to
T gates for each rotation [42], which must be executed consume these states in a systematic way similar to the
in series. In order to speed up computations beyond post-corrected π/8 rotations in Fig. 24b, we can use the
their T count or T depth, it is therefore constructive post-corrected version of ϕ rotations shown in Fig. 32.
25
(a) Post-corrected ϕ rotation
26
Figure 34: C(P1 , P2 , P3 , P4 ) gate in terms of 15 π/15 rotations.
T -count-limited setups. We also note that the T count they are measured. The physical qubit measurement
can be reduced by combining gate synthesis and magic does not need to be a quantum non-demolition mea-
state distillation (synthillation) [50, 51]. surement, but can be a desctructive measurement. Ul-
C(P1 , P2 , P3 , P4 ) gates, i.e., triply-controlled Pauli timately, however, the speed of quantum computation
gates, can be written as 15 π/16 rotations, as shown is limited by the speed of classical computation. Ex-
in Fig. 34. While the T depth of this circuit is no ploring superconducting logic [52] to speed up classical
longer 1, the rotation depth is. In fact, any multi- computation may be a viable route to speed up quan-
controlled Pauli gate with n controls can be constructed tum computers.
from 2n − 1 Pπ/2n rotations by following the pattern Summary. All the schemes discussed in this paper
shown in Figs. 5, 33 and 34. The rotation depth of can not only be used with Clifford+T circuits, but also
all these gates is 1. Multi-controlled gates can also be with Clifford+ϕ circuits. The only difference is that
pieced together from C(P1 , P2 , P3 ) rotations, but this more and different resource states are required. Their
increases the circuit depth. By using small-angle rota- distillation and storage requires more space than ordi-
tions, any multi-controlled Pauli gate can be executed nary magic state distillation, but their use can speed up
in one step. the computation by several orders of magnitude.
27
space-time cost normalized to minimal setup
100%
80%
60%
40%
20%
104
103
102
101
100
10−1
10−2
10−3
10−4
A B C D E F G H I J KL M N O P
A: Compact block + 1 distillation block (Fig. 20) L: 2 units (Figs. 28, 30) M: 3 units N: 10 units
B: Intermediate block + 2 distillation blocks (Fig. 21) O: 100 units P: 1469/1470 units (time-optimal)
C-K: Fast block + 3-11 distillation block (Fig. 22)
Figure 35: Space-time, space, and time cost of the schemes discussed in this paper for the example of a 100-qubit quantum
computation with T count 108 and T depth 106 , under the assumption of a 1 µs code cycle time, and a 1 µs measurement and
classical processing time. The solid and dashed lines in M-P are for circular (solid) and linear (dashed) arrangements of units.
computer and, in return, decreasing the computational Room for optimization. In our T -count-limited
time. For the example of a computation with T count schemes and for the preparation of units, one T gate is
108 and T depth 106 with an error rate of p = 10−4 , the performed after the other. If the input circuit is known,
minimal setup consists of 164 tiles and executes one T it is reasonable to assume that qubits can be arranged in
gate every 11, corresponding to a computational time a way that allows for the parallel execution of multiple
of 4 hours with 55,400 physical qubits. From here, the T gates in the same data block. Furthermore, there is a
space-time cost is drastically reduced by adding more strict separation between tiles used for magic state dis-
distillation blocks, as shown in Fig. 35 and Tab. 2. With tillation and tiles used for data blocks in our schemes.
this strategy, the computational time is reduced to 1 By sharing tiles between blocks, the space overhead may
per T gate, where the computational cost of a circuit is be reduced. Moreover, we have only considered a hand-
governed by its T count. ful of distillation protocols. It would be interesting to
For further space-time trade-offs, we parallelized T see which distillation protocols can be used to optimize
layers using units. This is an increase in space-time the cost function of Eq. (8). Finally, concrete tile lay-
cost, especially for linear arrangements of units (dashed outs that can be used to distill and consume the addi-
line in Fig. 35), but enables further space-time trade- tional resources necessary for Clifford+ϕ computing are
offs. Linearly trading off space versus time, the compu- still missing.
tational time can be reduced to one measurement per Beyond surface codes. Even though we designed
T layer. Units are well-suited for distributed quantum our schemes with surface codes in mind, they can, in
computing, as the sharing of Bell pairs between neigh- principle, be applied to other toric-code-based patches,
boring units is part of the parallelization scheme. such as Majorana surface-code patches [11] or color-
This exhausts the space-time trade-offs that are pos- code patches [12, 56]. Color codes can reduce the num-
sible within the Clifford+T framework. Switching to ber of physical qubits due to more compact encoding,
Clifford+ϕ circuits can provide further trade-offs, as but require more elaborate hardware to measure the
additional resources are introduced for arbitrary-angle higher-weight check operators. The space cost is re-
rotations. This can be used to execute circuits in a time duced by replacing all surface-code patches by color-
proportional to their rotation depth, as opposed to their code patches, with the exception of Pauli product mea-
T depth. We have not investigated how this trade-off surement ancillas. In order to keep the space cost
affects the space-time cost in our scheme. low, measurement ancillas should remain surface-code
28
scheme A B C-K L M N-P
physical qubits 55,400 76,400 90,200 - 123,000 447,000 679,000 2,230,000 - 328,000,000
(788,000) (2,630,000 - 386,000,000)
computational time 4h 2h 79-22 min 12 min 490 sec 147 sec - 1 sec
(734 sec) (163 sec - 1 sec)
Table 2: Space and time cost of the schemes plotted in Fig. 35. The number in parentheses are for linear arrangements of units
(dashed lines in Fig. 35).
patches and color-to-surface code lattice surgery [57] on quantum computers, PNAS 114, 7555 (2017).
should be used during the Pauli product measurement [2] R. Babbush, C. Gidney, D. W. Berry, N. Wiebe,
protocol, as described in Ref. [58]. J. McClean, A. Paler, A. Fowler, and H. Neven,
Outlook. If the number of qubits continues to dou- Encoding electronic spectra in quantum circuits
ble every 8 months [59], the 60,000 - 300,000 physi- with linear T complexity, arXiv:1805.03662 (2018).
cal qubits necessary for classically intractable Hubbard [3] J. Preskill, Reliable quantum computers, Proc. Roy.
model simulations with a T count of 108 will be avail- Soc. Lond. A 454, 385 (1998).
able in 7-9 years. If multiple quantum computers can [4] B. M. Terhal, Quantum error correction for quan-
be connected in a network, time-optimal quantum com- tum memories, Rev. Mod. Phys. 87, 307 (2015).
puting becomes available shortly thereafter, facilitating [5] E. T. Campbell, B. M. Terhal, and C. Vuil-
the implementation of more difficult algorithms such lot, Roads towards fault-tolerant universal quantum
as quantum chemistry simulations or Shor’s algorithm. computation, Nature 549, 172 (2017).
Classical processing in terms of measurements, feed- [6] A. Y. Kitaev, Fault-tolerant quantum computation
forward and decoding is expected to be a significant by anyons, Ann. Phys. 303, 2 (2003).
roadblock in speeding up quantum computers. Ulti- [7] A. G. Fowler, M. Mariantoni, J. M. Martinis, and
mately, faster classical control hardware will be nec- A. N. Cleland, Surface codes: Towards practical
essary to build faster quantum computers. I hope that large-scale quantum computation, Phys. Rev. A 86,
the schemes discussed in this work are a useful roadmap 032324 (2012).
towards large-scale quantum computing, and that the [8] H. Bombin, Topological order with a twist: Ising
patch-based framework is a valuable toolbox to con- anyons from an abelian model, Phys. Rev. Lett.
struct surface-code-based implementations of quantum 105, 030403 (2010).
algorithms.
[9] C. Horsman, A. G. Fowler, S. Devitt, and R. V.
Meter, Surface code quantum computing by lattice
surgery, New J. Phys. 14, 123011 (2012).
Acknowledgments [10] B. J. Brown, K. Laubscher, M. S. Kesselring, and
J. R. Wootton, Poking holes and cutting corners to
This work would not have been possible without in-
achieve Clifford gates with the surface code, Phys.
sightful discussion with Austin Fowler and Craig Gid-
Rev. X 7, 021029 (2017).
ney about Pauli product measurements and 15-to-1 dis-
tillation, with Jens Eisert, Markus Kesselring and Fe- [11] D. Litinski and F. v. Oppen, Lattice Surgery with a
lix von Oppen about Clifford tracking and space-time Twist: Simplifying Clifford Gates of Surface Codes,
trade-offs, with Jeongwan Haah and Matthew Hastings Quantum 2, 62 (2018).
about magic state distillation, with Guang Hao Low [12] A. J. Landahl and C. Ryan-Anderson, Quan-
and Nathan Wiebe about quantum simulation algo- tum computing by color-code lattice surgery,
rithms, and with Ali Lavasani about few-qubit surface- arXiv:1407.5103 (2014).
code architectures. This work has been supported by [13] Y. Li, A magic states fidelity can be superior to the
the Deutsche Forschungsgemeinschaft (Bonn) within operations that created it, New J. Phys. 17, 023037
the network CRC TR 183. (2015).
[14] D. Herr, F. Nori, and S. J. Devitt, Optimization
of lattice surgery is NP-hard, npj Quant. Inf. 3
References (2017), 10.1038/s41534-017-0035-1.
[15] S. Bravyi and A. Kitaev, Universal quantum com-
[1] M. Reiher, N. Wiebe, K. M. Svore, D. Wecker, putation with ideal Clifford gates and noisy ancil-
and M. Troyer, Elucidating reaction mechanisms las, Phys. Rev. A 71, 022316 (2005).
29
[16] J. Haah and M. B. Hastings, Codes and Protocols [32] E. T. Campbell and M. Howard, Magic state
for Distilling T , controlled-S, and Toffoli Gates, parity-checker with pre-distilled components,
Quantum 2, 71 (2018). Quantum 2, 56 (2018).
[17] S. Bravyi and J. Haah, Magic-state distillation with [33] A. M. Meier, B. Eastin, and E. Knill, Magic-
low overhead, Phys. Rev. A 86, 052329 (2012). state distillation with the four-qubit code, Quant.
[18] C. Jones, Multilevel distillation of magic states Inf. Comp. 13, 195 (2013).
for quantum computing, Phys. Rev. A 87, 042305 [34] E. T. Campbell and J. OGorman, An effi-
(2013). cient magic state approach to small angle rota-
[19] A. G. Fowler, S. J. Devitt, and C. Jones, Surface tions, Quantum Science and Technology 1, 015007
code implementation of block code state distillation, (2016).
Scientific rep. 3, 1939 (2013). [35] A. G. Fowler and S. J. Devitt, A bridge to lower
[20] A. G. Fowler, Time-optimal quantum computation, overhead quantum computation, arXiv:1209.0510
arXiv:1210.4626 (2012). (2012).
[21] D. Gottesman, The Heisenberg representation of [36] D. Herr, F. Nori, and S. J. Devitt, Lattice surgery
quantum computers, Proc. XXII Int. Coll. Group. translation for quantum computation, New J. Phys.
Th. Meth. Phys. 1, 32 (1999). 19, 013034 (2017).
[22] V. Kliuchnikov, D. Maslov, and M. Mosca, [37] C. H. Bennett, G. Brassard, S. Popescu, B. Schu-
Fast and efficient exact synthesis of single qubit macher, J. A. Smolin, and W. K. Wootters, Pu-
unitaries generated by Clifford and T gates, rification of noisy entanglement and faithful tele-
arXiv:1206.5236 (2012). portation via noisy channels, Phys. Rev. Lett. 76,
722 (1996).
[23] V. Kliuchnikov, D. Maslov, and M. Mosca, Asymp-
[38] C. H. Bennett, H. J. Bernstein, S. Popescu, and
totically optimal approximation of single qubit uni-
B. Schumacher, Concentrating partial entangle-
taries by Clifford and T circuits using a constant
ment by local operations, Phys. Rev. A 53, 2046
number of ancillary qubits, Phys. Rev. Lett. 110,
(1996).
190502 (2013).
[39] C. Dickel, J. J. Wesdorp, N. K. Langford, S. Peiter,
[24] D. Gosset, V. Kliuchnikov, M. Mosca, and
R. Sagastizabal, A. Bruno, B. Criger, F. Mot-
V. Russo, An algorithm for the T -count,
zoi, and L. DiCarlo, Chip-to-chip entanglement
arXiv:1308.4134 (2013).
of transmon qubits using engineered measurement
[25] L. Heyfron and E. T. Campbell, An effi- fields, Phys. Rev. B 97, 064508 (2018).
cient quantum compiler that reduces T count,
[40] P. Campagne-Ibarcq, E. Zalys-Geller, A. Narla,
arXiv:1712.01557 (2017).
S. Shankar, P. Reinhold, L. Burkhart, C. Ax-
[26] M. Amy, D. Maslov, M. Mosca, and M. Roetteler, line, W. Pfaff, L. Frunzio, R. J. Schoelkopf, and
A meet-in-the-middle algorithm for fast synthesis M. H. Devoret, Deterministic remote entanglement
of depth-optimal quantum circuits, IEEE Transac- of superconducting circuits through microwave two-
tions on Computer-Aided Design of Integrated Cir- photon transitions, Phys. Rev. Lett. 120, 200501
cuits and Systems 32, 818 (2013). (2018).
[27] P. Selinger, Quantum circuits of T -depth one, [41] C. J. Axline, L. D. Burkhart, W. Pfaff, M. Zhang,
Phys. Rev. A 87, 042302 (2013). K. Chou, P. Campagne-Ibarcq, P. Reinhold,
[28] M. Amy, D. Maslov, and M. Mosca, Polynomial- L. Frunzio, S. Girvin, L. Jiang, et al., On-demand
time T -depth optimization of Clifford+T circuits quantum state transfer and entanglement between
via matroid partitioning, IEEE Transactions on remote microwave cavity memories, Nat. Phys. ,
Computer-Aided Design of Integrated Circuits and 705 (2018).
Systems 33, 1476 (2014). [42] N. J. Ross and P. Selinger, Optimal ancilla-
[29] D. Litinski and F. von Oppen, Quantum computing free Clifford+T approximation of z-rotations,
with Majorana fermion codes, Phys. Rev. B 97, arXiv:1403.2975 (2014).
205404 (2018). [43] G. Duclos-Cianci and D. Poulin, Reducing the
[30] A. G. Fowler and C. Gidney, Low overhead quan- quantum-computing overhead with complex gate
tum computation using lattice surgery, in prepara- distillation, Phys. Rev. A 91, 042315 (2015).
tion . [44] N. C. Jones, J. D. Whitfield, P. L. McMahon, M.-
[31] A. Lavasani and M. Barkeshli, Low overhead H. Yung, R. V. Meter, A. Aspuru-Guzik, and
Clifford gates from joint measurements in sur- Y. Yamamoto, Faster quantum chemistry simula-
face, color, and hyperbolic codes, arXiv:1804.04144 tion on fault-tolerant quantum computers, New J.
(2018). Phys. 14, 115023 (2012).
30
[45] G. H. Low and I. L. Chuang, Hamiltonian simula- A Surface-code qubits and lattice-
tion by qubitization, arXiv:1610.06546 (2016).
surgery operations
[46] G. H. Low and I. L. Chuang, Optimal Hamil-
tonian simulation by quantum signal processing,
To illustrate the translation of our framework to
Phys. Rev. Lett. 118, 010501 (2017).
surface-code patches, we show how the protocols of
[47] R. Babbush, D. W. Berry, J. R. McClean, Fig. 2 are implemented with surface codes. This cor-
and H. Neven, Quantum simulation of chem- respondence is explained in detail in Ref. [11].
istry with sublinear scaling to the continuum,
Bell pair preparation. The first operation demon-
arXiv:1807.09802 (2018).
strates square patches, qubit initialization and standard
[48] C. Jones, Low-overhead constructions for the fault- lattice surgery. It is shown in Fig. 36. Physical qubits
tolerant Toffoli gate, Phys. Rev. A 87, 022328 are placed on vertices, light faces correspond to Z stabi-
(2013). lizers and dark faces to X stabilizers. Two surface-code
[49] C. Gidney, Halving the cost of quantum addition, patches are initialized in the logical |+i state by ini-
Quantum 2, 74 (2018). tializing all physical qubits in |+i and measuring the
[50] E. T. Campbell and M. Howard, Unified framework stabilizers. Simultaneously, lattice surgery between the
for magic state distillation and multiqubit gate syn- two patches is performed, measuring the logical Z ⊗ Z
thesis with reduced resource cost, Phys. Rev. A 95, operator as the product of newly introduced Z stabi-
022316 (2017). lizers. To account for measurement errors, this is done
[51] J. O’Gorman and E. T. Campbell, Quantum com- for d code cycles. Finally, the patch is split into two
putation with realistic magic-state factories, Phys. patches again.
Rev. A 95, 032338 (2017). Moving boundaries. The protocol to move patches
[52] K. K. Likharev and V. K. Semenov, RSFQ is essentially the same as the previous protocol. It is
logic/memory family: A new Josephson-junction shown in Fig. 38. Extending the patch via its Z bound-
technology for sub-terahertz-clock-frequency digital ary in the second step is the same operation as a Z ⊗ Z
systems, IEEE Transactions on Applied Supercon- lattice surgery between the patch and a rectangular |+i
ductivity 1, 3 (1991). ancilla qubit to the right. This needs to be done for d
[53] A. G. Fowler, S. J. Devitt, and C. Jones, Syn-
thesis of arbitrary quantum circuits to topological
assembly: Systematic, online and compact, Scien-
tific Rep. 7, 10414 (2017).
[54] A. Paler, I. Polian, K. Nemoto, and S. J. Devitt,
Fault-tolerant, high-level quantum circuits: form,
compilation and description, Quantum Science and Z Z
Technology 2, 025003 (2017).
[55] L. Lao, B. van Wee, I. Ashraf, J. van Someren, Z Z
N. Khammassi, K. Bertels, and C. Almudever,
Mapping of lattice surgery-based quantum circuits X X
on surface code architectures, arXiv:1805.11127
(2018).
X X
[56] H. Bombin and M. A. Martin-Delgado, Topological
quantum distillation, Phys. Rev. Lett. 97, 180501
Z
(2006).
[57] H. P. Nautrup, N. Friis, and H. J. Briegel, Fault-
tolerant interface between quantum memories and Z
quantum processors, Nat. Commun. 8, 1321 (2017).
X
[58] D. Litinski and F. von Oppen, Braiding by Ma-
jorana tracking and long-range CNOT gates with
color codes, Phys. Rev. B 96, 205413 (2017). X
[59] IBM doubling qubits every 8 months,
https://www.nextbigfuture.com/2018/02/ibm-
doubling-qubits-every-8-months-and-ecommerce-
cryptography-at-risk-in-7-15-years.html, accessed: Figure 36: Surface-code implementation of the protocol in
2018-08-01. Fig. 2a.
31
Z Z
X X
X X
Z Z
X Y Z
Z Z
code cycles to account for measurement errors. Finally, couplings, as we show in Fig. 39. For the measurement
the patch is shortened again by measuring the left 2/3 of twist operators and wide X and Z stabilizers, up to
of physical qubits in the X basis. three measurement ancillas can be used.
Y measurements. The third protocol in Fig. 37 Moving corners. The movement of corners of a
shows patch deformation and lattice surgery involving surface-code patch is shown in Fig. 41. It corresponds to
the Y operator. First, a patch is deformed to a wider a change of boundary stabilizers. In order to account for
patch by initializing physical qubits in the X basis and measurement errors of the newly measured stabilizers,
measuring the new stabilizers, which takes d code cy- this requires d code cycles.
cles. Note that the wide patch only occupies (2d−1)×d Six-corner patches and shortened boundaries.
physical qubits. Below the wide patch, a rectangular A six-corner patch corresponds to a surface-code patch
ancilla patch is initialized in the |0i basis. A column with three X boundaries and three Z boundaries. A
of physical qubits in the center is missing, so that, in distance-5 patch is shown in Fig. 42, which uses 2d2
the next step, the ancilla can be used for twist-based physical data qubits to encode two logical qubits. Short-
lattice surgery [11], measuring the Y operator. This ened boundaries correspond to decreasing the length of
lattice surgery in the third step involves dislocation op- certain surface-code boundaries, making them suscepti-
erators and a five-qubit twist defect. Even though these ble to errors. With the shortened boundaries in Fig. 42,
stabilizers are irregular, they can still be measured in a
square lattice of physical qubits with nearest-neighbor
32
Figure 40: Left: Naive straightforward translation of the Pauli product measurement protocol in Fig. 8 to surface codes. Right:
Topologically equivalent protocol with a reduced space cost. This way, any Pauli product measurement only requires a free ancilla
region of width d.
the distance to X errors is reduced to 2. Note that, in surement outcomes. Note that in this naive one-to-one
this case, the qubit occupies more than d2 physical data translation, there are small gaps between the patches
qubits. This will not be the case for the Pauli product to account for the shortened edges of the ancilla qubits.
measurement protocol in Appendix B, where shortened This can be avoided by, instead, performing the Pauli
patches are used. product measurement protocol in one combined step, as
shown in the right panels of Fig. 40. As pointed out in
Ref. [30], it is entirely equivalent to connect the patches
B Pauli product measurement protocol involved in the Pauli product measurement, such that
the resulting patch has the same shape as the config-
A straightforward implementation of the Pauli product uration in the bottom left panel of Fig. 40. Here, the
measurement protocol of Fig. 8 is shown in the left two patches shown in the in the two bottom panels of Fig. 40
panels of Fig. 40. Here, an 8-corner ancilla patch is have the exact same boundaries, except that the bound-
⊗3
initialized in the |+i state by initializing all physical ary lengths are different. In both cases, the separation
qubits in |+i and measuring the stabilizer. Next, a set of between boundaries is such, that the code distance is
lattice surgeries is performed, yielding the desired mea-
Figure 41: Surface-code implementation of the protocol in Figure 42: Surface-code implementation of six-corner patches
Fig. 2d. and shortened boundaries in Fig. 2e.
33
1 2 3 4
5 6 7 8 9
10 11 12 13 14
Figure 43: Proof-of-principle two-qubit device implemented with 48 physical data qubits.
not decreased. Therefore, for Pauli product measure- The next rotation is a Y ⊗ X rotation. Here, we
ments, it is sufficient to leave a free ancilla region of first need to deform |q1 i, such that both the X and Z
width d between the qubits that are part of the mea- boundaries of the qubit are accessible. Qubit |q2 i is
surement. rotated in steps 5-8 using the protocol in Fig. 9b. In
step 9, again, a magic state is initialized in a two-qubit
repetition code with ZL = Za1 ⊗ Za2 . In step 10, the
C Proof-of-principle device magic state is consumed via a Y1 ⊗ Za1 and a X1 ⊗ Za2
measurement.
Here, we discuss how (3d − 1) · 2d physical data qubits
can be used to build a proof-of-principle device that is a This kind of protocol consisting of patch deformations
universal two-qubit error-corrected quantum computer and patch rotations can be used to perform any π/8 ro-
that uses undistilled magic states and can demonstrate tation with the exception of (Y ⊗ Y )π/8 , since there is
all the operations required for large-scale quantum com- not enough space to make both Y operators accessible
puting. We go through the example of a computation for lattice surgery. For this rotation, we first explicitly
that starts with three π/8 rotations around Z⊗Z, Y ⊗X execute a Clifford gate to change (Y ⊗ Y )π/8 to any
and Y ⊗ Y in Fig. 43. For the first rotation, we need to other rotation. Any Clifford gate that does not com-
measure Z1 ⊗ Z2 ⊗ Z|mi . A magic state is initialized in mute with Y ⊗ Y will do. In our example, we choose a
a long patch in step 2, which is equivalent to initializing Zπ/4 rotation. It is performed by initializing a |0i state
a magic state and measuring X ⊗ X between the magic in step 13, and measuring Z1 ⊗ Y between |q1 i and the
state and neighboring |0i ancillas. This effectively en- ancilla, following the protocol of Fig. 9c.
codes the magic state in a three-qubit repetition code This demonstrates that a proof-of-principle experi-
with a logical Z operator ZL = Z ⊗ Z ⊗ Z. To consume ment can be built with 48 physical data qubits. In gen-
the magic state, Z1 ⊗ Z2 ⊗ ZL is measured in step 3. eral, this requires 6d2 − 2d qubits, i.e., 48 for d = 3, 140
This consumes a magic state for the Z ⊗ Z rotation. for d = 5 and 280 for d = 7. If measurement qubits are
34
required for syndrome readout, the number of physical (a) Steane code (b) Distillation block
qubits roughly doubles.
35