Quantum Convolutional Neural Networks
Quantum Convolutional Neural Networks
https://doi.org/10.1038/s41567-019-0648-8
Neural network-based machine learning has recently proven successful for many complex applications ranging from image
recognition to precision medicine. However, its direct application to problems in quantum physics is challenging due to the
exponential complexity of many-body systems. Motivated by recent advances in realizing quantum information processors, we
introduce and analyse a quantum circuit-based algorithm inspired by convolutional neural networks, a highly effective model
in machine learning. Our quantum convolutional neural network (QCNN) uses only O(log(N)) variational parameters for input
sizes of N qubits, allowing for its efficient training and implementation on realistic, near-term quantum devices. To explicitly
illustrate its capabilities, we show that QCNNs can accurately recognize quantum states associated with a one-dimensional
symmetry-protected topological phase, with performance surpassing existing approaches. We further demonstrate that
QCNNs can be used to devise a quantum error correction scheme optimized for a given, unknown error model that substantially
outperforms known quantum codes of comparable complexity. The potential experimental realizations and generalizations of
QCNNs are also discussed.
T
he complex nature of quantum many-body systems moti- (interleaved) layers of image processing; in each layer, an inter-
vates the use of machine learning techniques to analyse mediate two-dimensional array of pixels, called a feature map, is
them. Indeed, large-scale neural networks have successfully produced from the previous one (Fig. 1a). (More generally, CNN
solved classically difficult problems such as image recognition or layers connect ‘volumes’ of multiple feature maps to subsequent
optimization of classical error correction1, and their architectures volumes; for simplicity, we consider only a single feature map per
have been related to various physical concepts2,3. As such, a number volume and leave the generalization to future works.) The convolu-
ð‘Þ
of recent works have used neural networks to study properties of tion layers compute new pixel values xi;j from a linear combina-
w
quantum many-body systems4–10. However, the direct application of tion of nearby ones in the preceding mapI xð‘Þ ¼ P wa;b xð‘�1Þ ,
i;j
these classical algorithms is challenging for intrinsically quantum a;b¼1
iþa;jþb
problems, which take quantum states or processes as inputs. This is where the weights wa,b form a w × w matrix. IPooling layers reduce
because the extremely large many-body Hilbert space hinders the feature map size, for example by taking the maximum value from a
efficient translation of such problems into a classical framework few contiguous pixels, and are often followed by the application of a
without performing exponentially difficult quantum state or pro- nonlinear (activation) function. Once the feature map size becomes
cess tomography11,12. sufficiently small, the final output is computed from a function
Recent experimental progress towards realizing quantum infor- that depends on all remaining pixels (the fully connected layer).
mation processors13–16 has led to proposals for the use of quantum The weights and fully connected function are optimized by train-
computers to enhance conventional machine learning tasks17–20. ing on large datasets. In contrast, variables such as the number of
Motivated by such developments, we introduce and analyse a convolution and pooling layers and the size w of the weight matri-
machine learning-inspired quantum circuit model—the quantum ces (known as hyperparameters) are fixed for a specific CNN1. The
convolutional neural network (QCNN)—and demonstrate its abil- key properties of a CNN are thus translationally invariant convolu-
ity to solve important classes of intrinsically quantum many-body tion and pooling layers, each characterized by a constant number
problems. The first class of problems we consider is quantum phase of parameters (independent of system size) and sequential data size
recognition (QPR), which asks whether a given input quantum state reduction (that is, a hierarchical structure).
ρin belongs to a particular quantum phase of matter. In contrast to Motivated by this architecture, we introduce a QCNN circuit
many existing schemes based on tensor network descriptions21–23, model extending these key properties to the quantum domain
we assume that ρin is prepared in a physical system without direct (Fig. 1b). The circuit’s input is an unknown quantum state ρin. A
access to its classical description. The second class, quantum error convolution layer applies a single quasilocal unitary (Ui) in a trans-
correction (QEC) optimization, asks for an optimal QEC code for lationally invariant manner for finite depth. For pooling, a frac-
a given, a priori unknown error model such as dephasing or poten- tion of qubits are measured, and their outcomes determine unitary
tially correlated depolarization in realistic experimental settings. We rotations (Vj) applied to nearby qubits. Hence, nonlinearities in
provide both theoretical insight and numerical demonstrations for QCNNs arise from reducing the number of degrees of freedom.
the successful application of a QCNN to these important problems, Convolution and pooling layers are performed until the system
and show its feasibility for near-term experimental implementation. size is sufficiently small; then, a fully connected layer is applied
as a unitary F on the remaining qubits. Finally, the outcome of the
QCNN circuit model circuit is obtained by measuring a fixed number of output qubits.
Convolutional neural networks provide a successful machine As in the classical case, QCNN hyperparameters such as the num-
learning architecture for classification tasks such as image recog- ber of convolution and pooling layers are fixed, and the unitaries
nition1,24,25. A CNN generally consists of a sequence of different themselves are learned.
1
Department of Physics, Harvard University, Cambridge, MA, USA. 2Department of Physics, University of California, Berkeley, Berkeley, CA, USA.
*e-mail: [email protected]
MERA
tiple layers. In this sense, the QCNN circuit can mimic renormal-
QEC
= QCNN ization-group flow, a methodology that successfully classifies many
families of quantum phases30. For QEC optimization, the QCNN
∣ψ〉
structure allows for simultaneous optimization of efficient encoding
and decoding schemes with potentially rich entanglement structure.
Fig. 1 | The concept of QCNNs. a, Simplified illustration of classical CNNs.
A sequence of image-processing layers transforms an input image into
Detecting a 1D symmetry-protected topological phase
a series of feature maps (blue rectangles) and finally into an output
We first demonstrate the potential of a QCNN by applying it to
probability distribution (purple bars). C, convolution; P, pooling; FC, fully
QPR in a class of 1D many-body systems. Specifically, we consider a
connected. b, QCNNs inherit a similar layered structure. Boxes represent
Z2 ´ Z2 symmetry-protected topological (SPT) phase P, a phase con-
unitary gates or measurement with feed-forwarding. c, The QCNN and the
taining
I the S = 1 Haldane chain31, and ground states {|ψIG〉} of a family
MERA share the same circuit structure, but run in reverse directions. Image
of Hamiltonians on a spin-1/2 chain with open boundary conditions:
of cat from https://www.pexels.com/photo/grey-and-white-short-fur-
cat-104827/. N�2
X N
X N�1
X
H ¼ �J Zi Xiþ1 Ziþ2 � h1 Xi � h 2 Xi Xiþ1 ð2Þ
i¼1 i¼1 i¼1
A QCNN to classify N-qubit input states is thus characterized where Xi, Zi are Pauli operators for the spin at site i, and h1, h2 and J
by O(log(N)) parameters. This corresponds to a double exponential are parameters of the Hamiltonian.
Q The Z2 ´ Z2 symmetry is gener-
reduction compared with a generic quantum circuit-based classifier19 ated by XevenðoddÞ ¼ Xi. Figure 2aI shows the phase diagram
and allows for efficient learning and implementation. For example, i2evenðoddÞ
given a set of M classified training vectors {(|ψα〉, yα): α = 1, …, M}, I
as a function of (h1/J, h2/J). When h2 = 0, the Hamiltonian is exactly
where |ψα〉 are input states and yα = 0 or 1 are corresponding binary solvable via the Jordan–Wigner transformation30, confirming that P
classification outputs, one could compute the mean squared error is characterized by non-local order parameters. When h1 = h2 = 0, allI
terms are mutually commuting, and a ground state is the 1D cluster
M 2 state. Our goal is to identify whether a given, unknown ground state
1 X drawn from the phase diagram belongs to P.
MSE ¼ yi � ffUi ;Vj ;Fg ðjψ α iÞ ð1Þ
2M α¼1 As an example, we first present an exact, I analytical QCNN cir-
cuit that recognizes P (Fig. 2b). The convolution layers involve
Here, ffUi ;Vj ;Fg ðjψ α iÞ denotes the expected QCNN output value for controlled-phase gatesI as well as Toffoli gates with controls in the
input |ψI α〉. Learning then consists of initializing all unitaries and X-basis, and pooling layers perform phase-flips on remaining qubits
successively optimizing them until convergence, for example via when one adjacent measurement yields X = −1. This convolution–
gradient descent. pooling unit is repeated d times, where d is the QCNN depth. The
To gain physical insight into the mechanism underlying QCNNs fully connected layer measures Zi−1XiZi+1 on the remaining qubits.
and motivate their application to the problems under consideration, Figure 2c shows the QCNN output for a system of N = 135 spins
we now relate our circuit model to two well-known concepts in and d = 1, …, 4 along h1 = 0.5J, obtained using matrix product
quantum information theory—the multiscale entanglement renor- state simulations. As d increases, the measurement outcomes show
malization ansatz26 (MERA) and QEC. The MERA framework pro- sharper changes around the critical point, and the output of a d = 2
vides an efficient tensor network representation of many classes of circuit already reproduces the phase diagram with high accuracy
interesting many-body wavefunctions, including those associated (Fig. 2a). This QCNN can also be used for other Hamiltonian mod-
with critical systems26–28. A MERA can be understood as a quantum els belonging to the same phase, such as the S = 1 Haldane chain31
state generated by a sequence of unitary and isometry layers applied (see Methods).
0 ! X !
0.4
–0.4 SPT
0.2
–0.8
! !! XX X X X X X X !! !
...
–1.2 0 ... C
Antiferromagnetic
–1.6
0 0.4 0.8 1.2 1.6
h1 / J
c 1.0 d 100
5.5
5.0
4.5
Reduction
4.0
0.8 80
3.5
Sample complexity
3.0
2.5
0.6 60
1 2 3 4
〈X 〉
0.4 40
0.2 20
0 0
–1.5 –1.0 –0.5 0 0.5 1.0 1.5 0.36 0.37 0.38 0.39 0.40 0.41 0.42
h2 / J h2 / J
Fig. 2 | Application to quantum phase recognition. a, The phase diagram of the Hamiltonian in the main text. The phase boundary points (blue and red
diamonds) are extracted from infinite-size DMRG numerical simulations, while the background shading (colour scale) represents the output from the
exact QCNN circuit for input size N = 45 spins (see Methods). b, Exact QCNN circuit to recognize a Z2 ´ Z2 SPT phase. Blue line segments represent
controlled-phase gates, blue three-qubit gates are Toffoli gates with the control qubits in the X basis, Iand orange two-qubit gates flip the target qubit’s
phase when the X measurement yields −1. The fully connected layer applies controlled-phase gates followed by an Xi projection, effectively measuring
Zi−1XiZi+1. c, Exact QCNN output along h1 = 0.5J for N = 135 spins, depths d = 1, …, 4 (from light to dark blue). d, Sample complexity of QCNN at depths d = 1,
…, 4 (from light to dark blue) versus SOPs of length N/2, N/3, N/5 and N/6 (from light to dark red) to detect the SPT/paramagnet phase transition along
h1 = 0.5J for N = 135 spins. The critical point is identified as h2/J = 0.423 using infinite-size DMRG. In the shaded area, the correlation length exceeds the
system size, and finite-size effects can considerably affect our results. Inset: the ratio of SOP sample complexity to QCNN sample complexity is plotted as
a function of d on a logarithmic scale for h2/J = 0.3918. In the numerically accessible regime, this reduction of sample complexity scales exponentially as
1.73e0.28d (trendline).
Sample complexity. The performance of a QPR solver can be quan- M to test whether p > p0 with 95% confidence using an arcsine vari-
tified by sample complexity11: what is the expected number of copies ance-stabilizing transformation34:
of the input state required to identify its quantum phase? We dem-
onstrate that the sample complexity of our exact QCNN circuit is 1:962
Mmin ¼ pffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 ð4Þ
substantially better than that of conventional methods. In principle, ðarcsin p � arcsinp0 Þ
P can be detected by measuring a non-zero expectation value of
I
string order parameters (SOPs)32,33 S such as Similarly, the sample complexity for a QCNN can be determined by
I replacing 〈ψin|S|ψin〉 by the QCNN output expectation value in the
Sab ¼ Za Xaþ1 Xaþ3 ::: Xb�3 Xb�1 Zb ð3Þ expression for p.
Figure 2d shows the sample complexity for the QCNN at various
In practice, however, the expectation values of SOP vanish near the depths and SOPs of different lengths. The QCNN clearly requires
phase boundary due to diverging correlation length33; since quan- substantially fewer input copies throughout the parameter regime,
tum projection noise is maximal in this vicinity, many experimental especially near criticality. More importantly, although the SOP sam-
repetitions are required to affirm a non-zero expectation value. In ple complexity scales independently of string length, the QCNN
contrast, the QCNN output is much sharper near the phase transi- sample complexity consistently improves with increasing depth and
tion, so fewer repetitions are required. is limited only by finite size effects in our simulations. In particu-
Quantitatively, given some |ψin〉 and SOP S , a projective measure- lar, compared with SOPs, the QCNN reduces sample complexity by
ment of S can be modelled as a (generalized) IBernoulli random vari- a factor that scales exponentially with the depth of the QCNN in
able, where the outcome is 1 with probability p = (〈ψin|S|ψin〉 + 1) / 2 numerically accessible regimes (inset). Such scaling arises from the
and −1 with probability 1 − p (since S 2 equals the identity operator); iterative QEC performed at each depth and is not expected from
I
after M binary measurements, we estimate p. p > p0 = 0.5 signifies any measurements of simple (potentially nonlocal) observables. We
jψ in i 2 P. We define the sample complexity Mmin as the minimum show in the Methods that our QCNN circuit measures a multiscale
I
Nature Physics | VOL 15 | December 2019 | 1273–1278 | www.nature.com/naturephysics 1275
Articles NaTure PHysiCs
(L /3) –1.6
∣ψ0 〉⊗∣010...0〉x
0.8
–1.2
X Z X Z X Z QEC X Z
–0.8
Z Z Z Z 0.6
X X
–0.4
Z Z X
0.4
h2 / J
0
X X X X X X X X X X 0.4
0.2
0.8
X 1.2 0
Outlook
Logical error rate
jψ 0 ðPÞi has a tensor network representation in isometric or G-isometric form57 for pairs of nearby qubits, that is i ∈ {1, 2, 4, 5, 7, 8}. Such a geometrically local
(Supplementary
I Fig. 3a), one can systematically construct a corresponding QCNN correlation is motivated from experimental considerations. In this case, we train
circuit. This family of quantum phases includes all 1D SPT and 2D string-net our QCNN circuit on a specific error model with parameter choices px = 5.8 × 10−3,
phases47,57,58. In these cases, one can explicitly construct a commuting parent py = pz = 2 × 10−3, pxx = 2 × 10−4 and evaluate the logical error probabilities for
Hamiltonian for jψ 0 ðPÞi and a MERA structure in which jψ 0 ðPÞi is a fixed-point various physical error models with the same relative ratios but different total
wavefunction (Supplementary
I Fig. 3a for 1D systems). The Idiagrammatic proof of error per qubit px + py + pz + pxx. In general, for an anisotropic logical
P error model
this fixed-point property is given in Supplementary Fig. 3b. Furthermore, any ‘local with probabilities pμ for σμ logical errors, the overlap f is ð1 � 2 pμ =3Þ, since
δμ;ν þ1
error’ perturbing an input state away from jψ 0 ðPÞi can be identified by measuring h ± ν jσ μ j ± ν i ¼ ð�1Þ . Becuase of this, we compute the totalμ logical error
a fraction of terms in the parent Hamiltonian, I similar to syndrome measurements probability
I I
from f as 1.5(1 − f). Hence, our goal is to maximize the logical state
in stabilizer-based QEC29. Then, a QCNN for P simply consists of the MERA for overlap f defined in equation (5). If we naively apply the gradient descent method
jψ 0 ðPÞi and a nested QEC scheme in which anI input state with error density below based on f directly to both U1 and U2, we find that the optimization is easily
the
I QEC threshold59 ‘flows’ to the RG fixed point. Such a QCNN can be optimized trapped in a local optimum. Instead, we optimize two unitaries U1 and U2
via our learning procedure. sequentially, similar to the layer-by-layer optimization in backpropagation for
While our generic learning protocol begins with completely random unitaries, conventional CNN1.
as in the classical case1, this initialization may not be the most efficient for gradient A few remarks are in order. First, since U1 is optimized prior to U2, one
descent. Instead, motivated by deep learning techniques such as pre-training1, a needs to devise an efficient cost function C1 that is independent of U2. In particular,
better initial parameterization would consist of a MERA representation of jψ 0 ðPÞi simply maximizing f with an assumption on U2, for example that it equals the
and one choice of nested QEC. With such an initialization, the learning procedure I identity, may not be ideal, since such choice does not capture a potential interplay
serves to optimize the QEC scheme, expanding its threshold to the target phase between U1 and U2. Second, because U1 captures arbitrary single-qubit rotations,
boundary (Supplementary Fig. 3c). the definition of C1 should be basis-independent. Finally, we note that the tree
structure of our circuit allows one to view the first layer as an independent
Experimental resource analysis. To compute the gate depth of the cluster model quantum channel:
QCNN circuit in a Rydberg atom implementation, we analyse each gate shown in h i
Fig. 2b. By postponing pooling layer measurements to the end of the circuit, the MU1 : ρ7!tra U1 N U1y ðj0ih0j ρ j0ih0jÞU1 U1y ð17Þ
multi-qubit gates required are
where tra[⋅] denotes tracing over the ancilla qubits that are measured in the
Cz Zij ¼ eiπð�1þZi Þð�1þZj Þ=4 ð12Þ intermediate step. From this perspective, MU1 describes an effective error model to
be corrected by the second layer. I
Cx Zij ¼ eiπð�1þXi Þð�1þZj Þ=4 ð13Þ With these considerations, we optimize U1 such that the effective error model
MU1 becomes as classical as possible, that is MU1 is dominated by a ‘flip’ error
Cx Cx Xijk ¼ eiπð�1þXi Þð�1þXj Þð�1þXk Þ=8 ð14Þ along
I a certain axis with a strongly suppressedI ‘phase’ error. Only then will the
remnant, simpler errors be corrected by the second layer. More specifically, one
By using Rydberg blockade-mediated controlled gates , it is straightforward to
60
may represent MU1 using a map MU1 : r7!Mr þ c, where r 2 R3 is the Bloch
implement CzZij and Cz Cz Zijk ¼ eiπð�1þZi Þð�1þZj Þð�1þZk Þ=8 . The desired CxZij and I state ρ 12 1 þ rI σ, where 1 is the identityI operator and σ = (X,
vector for a qubit
CxCxXijk gates can thenI be obtained by conjugating CzZij and CzCzZijk by single- Y, Z) is the vector of Pauli
I matrices53. The singular values of the real matrix M
qubit rotations. For an input size of N spins, the kth convolution–pooling unit encode the probabilities p1 ≥ p2 ≥ p3 for three different types of errors. We choose
thus applies 4N/3k−1 CzZij gates, N/3k−1 CxCxXijk gates and 2N/3k−1 layers of CxZij our cost function for the first layer as C1 ¼ p21 þ p2 þ p3, which is relatively more
gates. The depth of single-qubit rotations required is 4d, as these rotations can be sensitive to p2 and p3 than p1 and ensureI that the resultant, optimized channel MU1
implemented in parallel on all N qubits. Finally, the fully connected layer consists is dominated by one type of error (with probability p1). We note that M can be I
of N31−d CzZij gates. Thus, the total number of multi-qubit operations required for a efficiently evaluated from a quantum device without knowing N , by performing
QCNN of depth d operating on N spins is 7N 2 ð1 � 3
1�d
Þ þ N31�d . Note that we do quantum process tomography for a single logical qubit. Once UI1 is optimized, we
not need to use SWAP gates since the Rydberg I interaction is long-range. use gradient descent to find an optimal U2 to maximize f. As with QPR, gradients
are computed via the finite-difference method, and the learning rate is determined
Demonstration of learning procedure for QEC. To obtain the QEC code by the bold driver technique1.
considered in the main text, we consider a QCNN with N = 9 input physical qubits
and simulate the circuit evolution of its 2N × 2N density matrix exactly. Strictly Data availability
speaking, our QCNN has three layers: a three-qubit convolution layer U1, a 3-to-1 The data that support the plots within this paper and other findings of this study
pooling layer and a 3-to-1 fully connected layer U2. Without loss of generality, we are available from the corresponding author on reasonable request.
may ignore the optimization over the pooling layer by absorbing its effect into the
first convolution layer, leading to the effective two-layer structure shown in Fig. 5a.
The generic three-qubit unitary operations U1 and U2 are parameterized using 63
Gell-Mann coefficients each. References
As discussed in the main text, we consider three different error models: (1) 51. McCulloch, I. P. Infinite size density matrix renormalization group, revisited.
independent single-qubit errors on all qubits with equal probabilities pμ for μ = X, Preprint at https://arxiv.org/abs/0804.2509 (2008).
Y and Z errors, (2) independent single-qubit errors on all qubits, with anisotropic 52. Vidal, G. Efficient classical simulation of slightly entangled quantum
probabilities px ≠ py = pz and (3) independent single-qubit anisotropic errors with computations. Phys. Rev. Lett. 91, 147902 (2003).
additional two-qubit correlated errors XiXi+1 with probability pxx. More specifically, 53. Nielsen, M. A. & Chuang, I. Quantum Computation and Quantum
the first two error models are realized by applying a (generally anisotropic) Information (Cambridge Univ. Press, 2000).
depolarization quantum channel to each of the nine physical qubits: 54. Verresen, R., Moessner, R. & Pollman, F. One-dimensional symmetry-
! protected topological phases and their transitions. Phys. Rev. B 96,
X X 165124 (2017).
N 1;i : ρ7! 1 � pμ ρ þ pμ σ μi ρσ μi ð15Þ 55. Bertlmann, R. A. & Krammer, P. Bloch vectors for qudits. J. Phys. A 41,,
μ μ 235303 (2008).
with Pauli matrices σ μi for i ∈ {1, 2, …, 9} (the qubit indices are defined from 56. Hinton, G. Lecture notes for CSC2515: Introduction to machine learning
bottom to top in Fig.I5a). For the anisotropic case, we trained the QCNN on various (Univ. Toronto, 2007).
different error models with the same total error probability px + py + pz = 0.001 but 57. Schuch, N., Pérez-García, D. & Cirac, J. I. Classifying quantum phases using
different relative ratios; the resulting ratio between the logical error probability of matrix product states and projected entangled pair states. Phys. Rev. B 84,
the Shor code and that of the QCNN code is plotted as a function of anisotropy in 165139 (2011).
Supplementary Fig. 4. For strongly anisotropic models, the QCNN outperforms the 58. Chen, X., Gu, Z.-C. & Wen, X.-G. Classification of gapped symmetric phases
Shor code, while for nearly isotropic models, the Shor code is optimal and QCNN in one-dimensional spin systems. Phys. Rev. B 83, 035107 (2011).
can achieve the same logical error rate. 59. Aharonov, D. & Ben-Or, M. in Proc. 29th Annu. ACM Symp. on the Theory of
For the correlated error model, we additionally apply a quantum channel: Computing 176–188 (ACM, 1997).
60. Saffman, M., Walker, T. & Molmer, K. Quantum information with Rydberg
N 2;i : ρ7!ð1 � pxx Þρ þ pxx Xi Xiþ1 ρXi Xiþ1 ð16Þ atoms. Rev. Mod. Phys. 82, 2313 (2010).