0% found this document useful (0 votes)

36 views

abbas2021 (1)

This document explores the capabilities of quantum neural networks compared to classical neural networks, focusing on their effective dimension as a measure of power and trainability. The authors demonstrate that quantum neural networks can achieve a better effective dimension and faster training than classical counterparts, suggesting potential advantages in machine learning tasks. The study highlights the importance of data encoding strategies and the challenges posed by noise in training quantum models.

Uploaded by

scribd.6t58z

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views

abbas2021 (1)

Uploaded by

scribd.6t58z

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Articles

https://doi.org/10.1038/s43588-021-00084-1

The power of quantum neural networks

Amira Abbas1,2, David Sutter1, Christa Zoufal1,3, Aurelien Lucchi3, Alessio Figalli3 and Stefan Woerner 1 ✉

It is unknown whether near-term quantum computers are advantageous for machine learning tasks. In this work we address this
question by trying to understand how powerful and trainable quantum machine learning models are in relation to popular clas-
sical neural networks. We propose the effective dimension—a measure that captures these qualities—and prove that it can be
used to assess any statistical model’s ability to generalize on new data. Crucially, the effective dimension is a data-dependent
measure that depends on the Fisher information, which allows us to gauge the ability of a model to train. We demonstrate
numerically that a class of quantum neural networks is able to achieve a considerably better effective dimension than compa-
rable feedforward networks and train faster, suggesting an advantage for quantum machine learning, which we verify on real
quantum hardware.

T
he power of a model lies in its ability to fit a variety of func- We turn our attention to measures that are easy to estimate in
tions1. In machine learning, power is often referred to as a practice and, importantly, incorporate the distribution of data. In
model’s capacity to express different relationships between particular, measures such as the effective dimension have been
variables2. Deep neural networks have proven to be extremely pow- motivated from an information-theoretic standpoint and depend
erful models, capable of capturing intricate relationships by learn- on the Fisher information, a quantity that describes the geometry
ing from data3. Quantum neural networks serve as a newer class of of a model’s parameter space and is essential in both statistics and
machine learning models that are deployed on quantum computers machine learning22–24. We argue that the effective dimension is a
and use quantum effects such as superposition, entanglement and robust capacity measure through proof of a generalization error
interference to perform computation. Some proposals for quantum bound and supporting numerical analyses, and thus use this mea-
neural networks include4–11—and hint at—potential advantages such sure to study the power of a popular class of neural networks in both
as speed-ups in training and faster processing. Although there has classical and quantum regimes.
been much development in the growing field of quantum machine Despite a lack of quantitative statements on the power of quan-
learning, a systematic study of the trade-offs between quantum and tum neural networks, another issue is rooted in the trainability
classical models has yet to be conducted12. In particular, the ques- of these models. A precise connection between expressibility and
tion of whether quantum neural networks are more powerful than trainability for certain classes of quantum neural networks is out-
classical neural networks is still open. lined in refs. 25,26. Quantum neural networks often suffer from the
A common way to quantify the power of a model is by its com- barren plateau phenomenon, wherein the loss landscape is peril-
plexity13. In statistical learning theory, the Vapnik–Chervonenkis ously flat and parameter optimization is therefore extremely diffi-
dimension is an established complexity measure, where error cult27. As shown in ref. 28, barren plateaus may be noise induced,
bounds on how well a model generalizes (that is, performs on unseen where certain noise models are assumed on the hardware. In other
data) can be derived14. Although the Vapnik–Chervonenkis dimen- words, the effect of hardware noise can make it very difficult to
sion has attractive properties in theory, computing it in practice is train a quantum model. Furthermore, barren plateaus can be circuit
notoriously difficult. Furthermore, using the Vapnik–Chervonenkis induced, which relates to the design of a model and random param-
dimension to bound generalization error requires several unreal- eter initialization. Methods to avoid the latter have been explored
istic assumptions, including that the model has access to infinite in refs. 29–32, but noise-induced barren plateaus remain problematic.
data15,16. The measure also scales with the number of parameters A particular attempt to understand the loss landscape of quan-
in the model and ignores the distribution of data. As modern deep tum models uses the Hessian33, which quantifies the curvature of a
neural networks are heavily overparameterized, generalization model’s loss function at a point in its parameter space34. Properties
bounds based on the Vapnik–Chervonenkis dimension—and other of the Hessian, such as its spectrum, provide useful diagnostic
measures alike—are typically vacuous17,18. information on the trainability of a model35. It was discovered that
In ref. 19, the authors analyzed the expressive power of param- the entries of the Hessian vanish exponentially in models suffering
eterized quantum circuits using memory capacity and found that from a barren plateau36. For certain loss functions, the Fisher infor-
quantum neural networks had limited advantages over classical mation matrix coincides with the Hessian of the loss function37.
neural networks. Memory capacity is, however, closely related to Consequently, we can examine the trainability of quantum and clas-
the Vapnik–Chervonenkis dimension and is thus subject to sim- sical neural networks by analyzing the Fisher information matrix,
ilar criticisms. In ref. 20, a quantum neural network is presented which is incorporated by the effective dimension. In this way, we
that exhibits a higher expressibility than certain classical models, may explicitly relate the effective dimension to model trainability38.
captured by the types of probability distributions it can gener- We find that a class of quantum neural networks is able to achieve
ate. Another result from ref. 21 is based on strong heuristics and a considerably higher capacity and faster training ability numeri-
provides systematic examples of possible advantages for quantum cally than comparable classical feedforward neural networks. A
neural networks. higher capacity is captured by a higher effective dimension, whereas

IBM Quantum, IBM Research—Zurich, Rueschlikon, Switzerland. 2University of KwaZulu-Natal, Durban, South Africa. 3ETH Zurich, Zurich, Switzerland.
1

✉e-mail: [email protected]

Nature Computational Science | VOL 1 | June 2021 | 403–409 | www.nature.com/natcomputsci 403

Articles Nature ComputatIonal ScIence

network, we apply a softmax function to the final layer, whereas in

∣0〉 z1 → the quantum network we obtain probabilities based on the parity
→
of the output bit strings. The input distribution p(x) is a prior dis-

OUTPUT
∣0〉 z2
INPUT

U x ∣0〉⊗S = ∣ψx 〉 ∣ψ x 〉 = ∣gθ(x)〉

x ..
θ
.. y tribution, whereas the conditional distribution p(y∣x;θ) describes
→
. .
the input–output relation of the model for a fixed θ ∈ Θ; Θ forms
∣0〉 zS →
a Riemannian space, which gives rise to a Riemannian metric,
namely, the Fisher information matrix
Feature map Variational model Measurement
Ux f(z) = y
θ
∂ ∂
F(θ ) = E(x,y)∼p [ log p(x, y;θ ) log p(x, y;θ )T ] ∈ Rd×d ,
∂θ ∂θ
Fig. 1 | Overview of the quantum neural network used in this study. The
input x ∈ Rsin is encoded into an S-qubit Hilbert space by applying the which can be approximated by the empirical Fisher information
feature map |ψ x ⟩ := Ux |0⟩⊗S. This state is then evolved via a variational matrix
form |gθ (x)⟩ := Gθ |ψ x ⟩, where G is a parameterized unitary evolving
the state after the feature map to a new state, and the parameters θ ∈ Θ
k
are chosen to minimize a certain loss function. Finally a measurement is 1∑ ∂ ∂
F̃k (θ ) = log p(xj , yj ;θ ) log p(xj , yj ;θ )T , (1)
performed whose outcome z = (z1, …, zS) is post-processed to extract the k ∂θ ∂θ
j= 1
output of the model y := f(z).
where (xj , yj )kj=1 are independent and identically distributed, drawn
from the distribution p(x,y;θ) (ref. 37). By definition, the Fisher
faster training implies that a model will reach a lower training error information matrix is positive semidefinite and hence its eigenval-
than another comparable model for a fixed number of training ues are non-negative, real numbers.
iterations. More generally, trainability is assessed by leveraging the The Fisher information conveniently helps capture the sensitivity
information-theoretic properties of the Fisher information, which of a neural network’s output relative to movements in the parameter
we connect to the barren plateau phenomenon. Our experiments space43. In ref. 44, the authors leverage geometric invariances associ-
reveal that how you encode data in a quantum neural network influ- ated with the Fisher information to produce the Fisher–Rao norm, a
ences the likelihood of your model encountering a barren plateau. robust norm-based capacity measure defined as the quadratic form
A quantum neural network with a data encoding strategy that is ||θ ||2fr := θ T F(θ )θ for θ. Notably, the Fisher–Rao norm acts as an
easy to simulate classically seems more likely to encounter a barren umbrella for several other existing norm-based measures45–47 and has
plateau, whereas a harder encoding strategy shows resilience to the demonstrated desirable properties both theoretically and empirically.
phenomenon. Noise, however, remains problematic by inhibiting
training in general. The effective dimension. The effective dimension is a complexity
measure motivated by information geometry, with useful qualities.
Results The goal of the effective dimension is to estimate the size that a
Quantum neural networks. Quantum neural networks are a sub- model occupies in model space—the space of all possible functions
class of variational quantum algorithms that comprise quantum for a particular model class, where the Fisher information matrix
circuits containing parameterized gate operations39. Information serves as the metric. Although there are many ways to define the
(usually in the form of classical data) is first encoded into a quan- effective dimension, a useful definition is presented in ref. 22, which
tum state via a state-preparation routine called a quantum feature is designed to be operationally meaningful in settings where data
map40. The choice of feature map is geared towards enhancing the are limited. More precisely, the number of data observations deter-
performance of the quantum neural network and is typically neither mines a natural scale or resolution used to observe model space.
optimized nor trained, although this idea is discussed in ref. 41. Once This is beneficial in practice where data are often scarce and can
data are encoded into a quantum state, a model called a variational help in understanding how data availability influences the accurate
model is applied, which contains parameterized gate operations that capture of model complexity.
are optimized for a particular task, analogous to classical machine The effective dimension is motivated by the theory of minimum
learning techniques5–7,42. The final output of the quantum neural description length, which is a model selection principle favoring
network is extracted from measurements made to the quantum models with the shortest description of the given data. Based on this
circuit after the variational model is applied. These measurements principle, it can be shown that the complexity at size n of a model
are often converted to labels or predictions through classical post- is given by
processing before being passed to a loss function, where the idea is (∫ )
d n √
to choose parameters for the variational model that minimize the log + log det F(θ )dθ + o(1) ,
loss function. 2 2π Θ
The quantum models we use can be summarized in Fig. 1, with
details of the structure and implementation in the Methods. We cre- where o(1) vanishes as n → ∞ (ref. 48). The first term containing d is
ate two model variants: one which we call a quantum neural net- usually interpreted as the dimension of the model, whereas the sec-
work and the other an easy quantum model. ond term is known as the geometric complexity. Information geo-
metric manipulations allow us to combine both terms into a single
The Fisher information. A way to assess the information gained by expression, referred to as the effective dimension22.
a particular parameterization of a statistical model is epitomized by Definition 1. The effective dimension of a statistical model
the Fisher information. By defining a neural network as a statisti- MΘ := {p(·, ·;θ ) : θ ∈ Θ} with respect to γ ∈ (0,1], a d-dimensional
cal model, we can describe the joint relationship between data pairs parameter space Θ ⊂ Rd and n ∈ N, n > 1 data samples is defined as
(x,y) as p(x,y;θ) = p(y∣x;θ)p(x) for all x ∈ X ⊂ Rsin, y ∈ Y ⊂ Rsout
and θ ∈ Θ ⊂ [−1,1]d (where θ is a vectorized parameter set, Θ is (
1
∫ √ γn
)
the full parameter space and d is the number of trainable param- log VΘ Θ
det ( idd + 2πlog n F̂(θ ) ) dθ
eters). This is achieved by applying an appropriate post-processing dγ,n (MΘ ) := 2 ( ) , (2)
γn
function in both classical and quantum networks. In the classical log 2πlog n

404 Nature Computational Science | VOL 1 | June 2021 | 403–409 | www.nature.com/natcomputsci

Nature ComputatIonal ScIence Articles
20.0 7

17.5 6 4

15.0 5
3
12.5
Eigenvalues

4
10.0
3 2
7.5
2
5.0
1
2.5 1

0 0 0

1.0 1.0 1.0

0.8 0.8 0.8

Eigenvalues <1

0.6 0.6 0.6

0.4 0.4 0.4

0.2 0.2 0.2

0 0 0

Classical neural network Easy quantum model Quantum neural network

Fig. 2 | Average Fisher information spectrum distribution. Here, box plots are used to reveal the average distribution of eigenvalues of the Fisher
information matrix for the classical feedforward neural network and the quantum neural network with two different feature maps. The dots in the box plots
represent outlier values relative to the length of the whiskers. The lower whiskers are at the lowest data points above Q1 – 1.5 × (Q3 – Q1), whereas the
upper whiskers are at the highest data points below Q3 + 1.5 × (Q3 – Q1), where Q1 and Q3 are the first and third quartiles, respectively. This is a standard
method to compute these plots. The easy quantum model has a classically simulatable data encoding strategy, whereas the quantum neural network’s
encoding scheme is conjectured to be difficult. In each model, we compute the Fisher information matrix 100 times using parameters sampled uniformly at
random and plot the resulting average distribution of the eigenvalues. We fix d = 40, input size at sin = 4 and output size at sout = 2. The top row contains the
average distribution of all eigenvalues for each model, whereas the bottom row contains the average distribution of eigenvalues less than 1 for each model.

∫
where VΘ := Θ dθ ∈ R+ is the volume of the parameter space. The proving a generalization bound such that the effective dimension
matrix F̂(θ ) ∈ Rd×d is the normalized Fisher information matrix may be interpreted as a bounded capacity measure, serving as a use-
defined as ful tool to analyze the power of statistical models. We demonstrate
this in the Methods.
VΘ
F̂ij (θ ) := d ∫ Fij (θ ) .
Θ
tr( F(θ ))dθ The Fisher information spectrum. Classically, the Fisher informa-
tion spectrum reveals a lot about the optimization landscape of a
Remark 1 (properties of the effective dimension). In the limit model. The magnitude of the eigenvalues illustrates the curvature
n → ∞, the effective dimension converges to the maximal rank of a model for a particular parameterization. If there is a large con-
r̄ := maxθ∈Θ rθ , where rθ ≤ d denotes the rank of the Fisher centration of eigenvalues near zero, the optimization landscape
information matrix F(θ). The proof of this result can be seen in will be predominantly flat and parameters become difficult to train
Supplementary Section 2.1, but it is worthwhile to note that the with gradient-based methods38. On the quantum side, we show in
effective dimension does not necessarily increase monotonically Supplementary Section 4 that if a model is in a barren plateau, the
with n, as explained in Supplementary Section 2.2. The geometric Fisher information spectrum will be concentrated around zero and
operational meaning of the effective dimension only holds if n is training also becomes unfeasible. We can thus make connections
sufficiently large. We conduct experiments over a wide range of to trainability via the spectrum of the Fisher information matrix by
n and ensure that conclusions are drawn from results where the using the effective dimension. Looking closely at equation (2), we
choice of n is sufficient. see that the effective dimension converges to its maximum fastest if
Another noteworthy point is that the effective dimension is easy the Fisher information spectrum is evenly distributed, on average.
to estimate. To see this, recall that we need to first estimate F(θ) and, We analyze the Fisher information spectra for the quantum neu-
second, calculate the integral over Θ given in equation (2). Both of ral network, the easy quantum model, and all possible configura-
these steps can be achieved via Monte Carlo integration which, in tions of the fully connected feedforward neural network—where all
practice, does not depend on the model’s dimension. models share a specified triple (d,sin,sout). To be robust, we sample
There are also two minor differences between equation (2) and 100 sets of parameters uniformly on Θ = [−1,1]d and compute the
the effective dimension from ref. 22: the presence of the constant Fisher information matrix 100 times using data sampled from a
γ ∈ (0,1], and the log n term. These modifications are helpful in standard Gaussian distribution. The resulting average distributions

Nature Computational Science | VOL 1 | June 2021 | 403–409 | www.nature.com/natcomputsci 405

Articles Nature ComputatIonal ScIence

a b
1.0
0.8 Classical neural network
Easy quantum model
Quantum neural network
0.9 0.7 Ibmq_montreal backend
Normalized effective dimension

0.6
0.8

Loss value
0.6
0.7

0.4

0.6
0.3

0.5 0.2
0 0.2 0.4 0.6 0.8 1.0 0 20 40 60 80 100
Number of data (×106) Number of training iterations

Fig. 3 | Normalized effective dimension and training loss. a, The normalized effective dimension plotted for the quantum neural network (green), the easy
quantum model (blue) and the classical feedforward neural network (purple). We fix sin = 4, sout = 2 and d = 40. b, Using the first two classes of the Iris
dataset55, we train all three models at d = 8, with a full batch size. The ADAM optimizer, with an initial learning rate of 0.1, is selected. For a fixed number
of training iterations (100), we train all models over 100 trials and plot the average training loss along with ± 1 s.d. We further verify the performance
of the quantum neural network on real quantum hardware and train the model using the ibmq_montreal 27-qubit device. We plot the hardware results
until they stabilize, at roughly 33 training iterations, thereafter we stop training and denote this final loss value with a dashed line. The actual hardware
implementation contains less CNOT gates by using linear connectivity for the feature map and variational circuit instead of all-to-all connectivity to cope
with limited resources, leading to the lower loss values.

of the eigenvalues of these 100 Fisher information matrices are plot- therefore consistently achieves the highest effective dimension over
ted in the top row of Fig. 2 for d = 40, sin = 4 and sout = 2. A sensitivity all ranges of finite data considered. Intuitively, we would expect the
analysis is included in Supplementary Section 3.1 to verify that 100 additional effects of quantum operations such as entanglement and
parameter samples are reasonable for the models we consider. In superposition—if used effectively—to generate models with higher
higher dimensions, this number will need to increase. The bottom capacity. The quantum neural network with a strong feature map
row of Fig. 2 contains the distribution for eigenvalues less than 1. is thus expected to deliver the highest capacity, but recall that in
The classical model depicted in Fig. 2 is the one with the highest the limit n → ∞, all models will converge to an effective dimension
average rank of Fisher information matrices. The majority of eigen- equal to the maximum rank of the Fisher information matrix (see
values are negligible (of the order 10−14), with a few very large val- Remark 1).
ues. This behavior is observed across all classical configurations that To support these observations, we calculate the capacity of each
we consider and is consistent with results from literature, where the model using a different measure, the Fisher–Rao norm44. The aver-
Fisher information matrix of non-linear feedforward neural net- age Fisher–Rao norm after training each model 100 times is roughly
works is known to be highly degenerate, with a few large eigenval- 250% higher in the quantum neural network than in the classical
ues38. The concentration around zero becomes more evident in the neural network, with the easy quantum model inbetween (see
bottom row of the plot, which depicts the eigenvalue distribution of Supplementary Section 3.3).
just the eigenvalues less than 1.
The easy quantum model also has most of its eigenvalues close Trainability. The observed Fisher information spectrum of the
to zero, and although there are some large eigenvalues, their magni- feedforward model is known to have undesirable optimization
tudes are not as extreme as the classical model. properties, where the outlying eigenvalues slow down training and
The quantum neural network, on the other hand, has a distribu- loss convergence35. These large eigenvalues become even more pro-
tion of eigenvalues that is more uniform, with no outlying values. nounced in bigger models, as seen in Supplementary Fig. 5. On
This can be seen from the range of the eigenvalues on the y-axis in examining the easy quantum model over an increasing system size,
Fig. 2. This distribution remains more or less constant as the num- the average Fisher spectrum becomes more concentrated around
ber of qubits increase, even in the presence of hardware noise (see zero. This is characteristic of models encountering a barren pla-
Supplementary Section 3.2); this has implications for capacity and teau, presenting another unfavorable scenario for optimization. The
trainability, which we examine next. quantum neural network, however, maintains its more even distri-
bution of eigenvalues as the number of qubits and trainable param-
Capacity analysis. In Fig. 3a, we plot the normalized effective eters increase. Furthermore, a large amount of the eigenvalues are
dimension for all three model types. The normalization ensures that not near zero. This highlights the importance of a feature map in
the effective dimension lies between 0 and 1 by simply dividing by d. a quantum model. The harder data encoding strategy used in the
The convergence speed of the effective dimension to its maximum is quantum neural network seems to structurally change the optimi-
slowed down by smaller eigenvalues and uneven Fisher information zation landscape and remove the flatness, usually associated with
spectra. As the classical models contain highly degenerate Fisher suboptimal optimization conditions such as barren plateaus.
matrices, the effective dimension converges the slowest, followed by We confirm the training statements for all three models with
the easy quantum model. The quantum neural network has non- an experiment illustrated in Fig. 3b. Using a cross-entropy loss
degenerate Fisher information matrices and more even spectra, it function, optimized with ADAM for a fixed number of training

406 Nature Computational Science | VOL 1 | June 2021 | 403–409 | www.nature.com/natcomputsci

Nature ComputatIonal ScIence Articles
iterations (100) and an initial learning rate of 0.1, the quantum neu- any qubits, as performed at the beginning of Supplementary Fig. 1. These
ral network trains to a lower loss, faster than the other two models operations are not repeated.
After the feature map circuit is applied, we apply another set of operations
over an average of 100 trials. To support the promising training per- that depend on traiable parameters. We call this the variational circuit, Gθ .
formance of the quantum neural network, we also train it once on Supplementary Fig. 2 depicts the variational form deployed in both the easy
real hardware using the ibmq_montreal 27-qubit device. We reduce quantum model and the quantum neural network. The circuit consists of S qubits,
the number of controlled NOT (CNOT) gates by only considering to which parameterized RY gates are applied to every qubit. CNOT gates are
thereafter applied between each pair of qubits in the circuit. Finally, another set of
linear entanglement instead of all-to-all entanglement in the feature
parameterized RY gates are applied to every qubit. This circuit has, by definition,
map and variational circuit. This is to cope with hardware limita- 2S parameters. If the depth is increased, the entangling layers and second set of
tions and could be the reason the hardware training performs even parameterized RY gates are repeated; d can be calculated as d = (D + 1)S, where S is
better than the simulated results, as too much entanglement has equal to sin due to the choice of both feature maps used in this study and D is called
been shown to have negative effects on model trainability49. The full the depth of the circuit (that is, how many times the entanglement and second set
of RY operations are repeated).
details of the experiment are contained in Supplementary Section We finally measure all qubits in the σz basis and classically compute the parity
3.4. We find that the quantum neural network tangibly demon- of the output bit strings. For simplicity, we consider binary classification, where the
strates faster training; however, the addition of hardware noise may probability of observing class 0 corresponds to the probability of seeing even parity
still make training difficult, regardless of the optimization land- bit strings and similarly, for class 1 with odd parity bit strings.
scape (see Supplementary Section 3.2). The are two reasons for the choice of these models’s architecture: first, the hard
feature map is motivated in ref. 50 to serve as a useful data embedding strategy that
is believed to be difficult to simulate classically as the depth and width increase,
Discussion and the easy feature map allows us to benchmark this; second, the variational
In stark contrast to classical models, understanding the capacity of design aims to create more expressive models for quantum algorithms51. We
quantum neural networks is not well explored. Moreover, classical benchmark the quantum models against a class of classical models that forms
neural networks are known to produce highly degenerate Fisher part of the foundation of deep learning, namely, feedforward neural networks.
We consider all possible topologies with full connectivity for a fixed number of
information matrices, which can considerably slow down training. trainable parameters. Networks with and without biases and different activation
No such analysis has been performed for quantum neural networks. functions are explored.
This work attempts to address this gap but leaves room for fur-
ther research. The feature map in a quantum model plays a large Generalization error bounds for the effective dimension. Suppose we are
role in determining both its capacity and trainability via the effec- given a hypothesis class, H, of functions mapping from X to Y and a training
set Sn = {(x1 , y1 ), …, (xn , yn )} ∈ (X × Y)n, where the pairs (xi,yi) are drawn
tive dimension and Fisher information spectrum. A deeper inves- independent and identically distributed from some unknown joint distribution,
tigation needs to be conducted on why the particular higher-order p. Furthermore, let L : Y × Y → R be a loss function. The challenge is to find a
feature map used in this study produces a desirable model landscape particular hypothesis h ∈ H with the smallest possible expected risk, defined as
that induces both a high capacity and faster training ability. Different R(h) := E(x,y)∼p [L(h(x), y)]. As we only have access to a training set Sn,
variational circuits could also influence the model’s landscape and ∑ h ∈ H is to minimize the so called
a good strategy to find the best hypothesis
empirical risk, defined as Rn (h) := n1 ni=1 L(h(xi ), yi ). The difference between
the effects of non-unitary operations (for example, induced through the expected and the empirical risk is the generalization error—an important
intermediate measurements) should be investigated. The Fisher quantity in machine learning that dictates whether a hypothesis h ∈ H learned
information spectra of certain quantum models seem robust against on a training set will perform well on unseen data, drawn from the unknown joint
hardware noise, but trainability remains problematic and the pos- distribution p (ref. 17). Therefore, an upper bound on the quantity
sibility of noise-induced barren plateaus needs examination. Finally, suph∈H |R(h) − Rn (h)| , (3)
understanding generalization performance on multiple datasets and
larger models with complexities that we would be interested in prac- which vanishes as n grows large, is of considerable interest. Capacity measures help
quantify the expressiveness and power of H. The generalization error in equation
tice, might prove insightful.
(3) is thus typically bounded by an expression that depends on a capacity measure,
Overall, we have shown that quantum neural networks can pos- such as the Vapnik–Chervonenkis dimension3 or the Fisher–Rao norm44. Theorem
sess a desirable Fisher information spectrum that enables them to 1 provides a bound based on the effective dimension, which we use to study the
train faster and express more functions than comparable classical power of neural networks from hereon.
and quantum models—a promising reveal for quantum machine In this manuscript we consider neural networks as models described by
stochastic maps, parameterized by some θ ∈ Θ. As a result, the variables h and
learning, which we hope leads to further studies on the power of H are replaced by θ and Θ, respectively. The corresponding loss functions
quantum models. are mappings L : P(Y) × P(Y) → R, where P(Y) denotes the set of
distributions on Y . We assume the following regularity assumption on the model
Methods MΘ := {p(·, ·;θ ) : θ ∈ Θ}:
Quantum models used in this study. The quantum models used in this study first
Θ ∋ θ �→ p(·, ·;θ ) is M1 -Lipschitz continuous w.r.t. the supremum norm. (4)
encode classical data x ∈ Rsin into an S-qubit Hilbert space using a feature map,
Ux . For the quantum neural network, we use a feature map originally proposed in
Theorem 1 (generalization bound for the effective dimension). Let Θ = [−1,1]d and
ref. 50, and in the easy quantum model we swap out this feature map for one that is
consider a statistical model MΘ := {p(·, ·;θ ) : θ ∈ Θ} that satisfies equation (4)
easy to simulate classically. Supplementary Fig. 1 contains a circuit representation
such that F̂(θ ) has full rank for all θ ∈ Θ, and ||∇θ log F̂(θ )|| ≤ Λ for some Λ ≥ 0
of the feature map from ref. 50, which we refer to as the hard feature map. Here the
and all θ ∈ Θ. Let dγ,n denote the effective dimension of MΘ as defined in equation
number of qubits in the model is chosen to equal the number of feature values of
(2). Furthermore, let L : P(Y) × P(Y) → [−B/2, B/2] for B > 0 be a loss function
the data (that is, S := sin). That way, we can associate the same index for each qubit,
that is α-Hölder continuous with constant M2 in the first argument with regards to
with each feature value of a data point; for example, if we have data that has three
the total variation distance for some α ∈ (0,1]. Then there exists a constant cd,Λ such
feature values (that is, x = (x1 , x2 , x3 )T), we will have a three qubit model with
that for γ ∈ (0,1] and all n ∈ N, we have
qubits = (q1,q2,q3).
The operations in the hard feature map first apply Hadamard gates to each of ( √ )
the qubits, followed by a layer of RZ gates, whereby the angle of the Z rotation on P supθ ∈Θ |R(θ ) − Rn (θ )| ≥ 4M 2πlog γn
n

qubit i depends on the ith feature of the data point x, normalized between [−1,1]. (5)
RZZ gates are then applied to every pair of qubits. This time, the value of the ( ) dγ,n1/α ( 2
)
γn1/α
controlled Z rotations depends on a product of feature values. For example, if ≤ cd,Λ 2πlogn1/α
2
exp − 16MB2πlog
γ
n
,
the RZZ gate is controlled by qubit i and targets qubit j, then the angle of the
controlled rotation applied to qubit j is dependent on the product of feature where M = M1α M2.
values xixj. The RZZ gates are implemented using a decomposition into two CNOT The proof is given in Supplementary Section 5.1. Note that the choice of
gates and one RZ gate; thereafter, the RZ and RZZ gates are repeated once. The the norm to bound the gradient of the Fisher information matrix is irrelevant
classically simulatable feature map employed in the easy quantum model is due to the presence of the dimensional constant cd,Λ. In the special case where
simply the first sets of Hadamard and RZ gates with no entanglement between the Fisher information matrix does not depend on θ, we have Λ = 0 and (5)

Nature Computational Science | VOL 1 | June 2021 | 403–409 | www.nature.com/natcomputsci 407

Articles Nature ComputatIonal ScIence
√
holds for cd,0 = 2 d . This may occur in scenarios where a neural network is 6. Romero, J., Olson, J. P. & Aspuru-Guzik, A. Quantum autoencoders for
already trained, that is, the parameters θ ∈ Θ are fixed. If we choose γ ∈ (0,1] efficient compression of quantum data. Quant. Sci. Technol. 2, 045001 (2017).
to be sufficiently small, we can ensure that the right-hand side of equation (5) 7. Dunjko, V. & Briegel, H. J. Machine learning & artificial intelligence in the
vanishes in the limit n → ∞. This is explained in Supplementary Section 5. To quantum domain: a review of recent progress. Rep. Prog. Phys. 81, 074001
verify the ability of the effective dimension to capture generalization behavior, we (2018).
conduct a numerical analysis similar to work presented in ref. 52. We find that the 8. Ciliberto, C. et al. Quantum machine learning: a classical perspective. Proc.
effective dimension for a model trained on confusion sets with increasing label Roy. Soc. A 474, 20170551 (2018).
corruption, accurately captures generalization behavior. The details can be found in 9. Killoran, N. et al. Continuous-variable quantum neural networks. Phys. Rev.
Supplementary Section 5.2. Res. 1, 033063 (2019).
The continuity assumptions of Theorem 1 are satisfied for a large class of 10. Schuld, M., Sinayskiy, I. & Petruccione, F. The quest for a quantum neural
classical and quantum statistical models53,54, as well as many popular loss functions. network. Quant. Inf. Proc. 13, 2567–2586 (2014).
The full rank assumption on the Fisher information matrix, however, often does 11. Farhi, E. & Neven, H. Classification with quantum neural networks on near
not hold in classical models. Non-linear feedforward neural networks, which we term processors. Quant. Rev. Lett. 1, 2 (2020).
consider in this study, have particularly degenerate Fisher information matrices38. 12. Aaronson, S. Read the fine print. Nat. Phys. 11, 291–293 (2015).
We thus further extend the generalization bound to account for a broad range of 13. Vapnik, V. The Nature of Statistical Learning Theory Vol. 8, 1–15 (Springer,
models that may not have a full rank Fisher information matrix. 2000).
Remark 2. (Relaxing the rank constraint in Theorem 1) The generalization 14. Vapnik, V. N. & Chervonenkis, A. Y. On the uniform convergence of relative
bound in equation (5) can be modified to hold for a statistical model without frequencies of events to their probabilities. Theory Probab. Appl. 16, 264–280
a full rank Fisher information matrix. By partitioning Θ, we discretize the (1971).
statistical model and prove a generalization bound for the discretized version of 15. Sontag, E. D Neural Networks and Machine Learning 69–95 (Springer, 1998).
(κ )
MΘ := {p(·, ·;θ ) : θ ∈ Θ} denoted by MΘ := {p(κ ) (·, ·;θ ) : θ ∈ Θ}, where 16. Vapnik, V., Levin, E. & Cun, Y. L. Measuring the VC-dimension of a learning
κ ∈ N is a discretization parameter. By choosing κ carefully, we can control the machine. Neural Comput. 6, 851–876 (1994).
discretization error. We then proceed similarly as in the proof of Theorem 1, that is, 17. Neyshabur, B., Bhojanapalli, S., McAllester, D. & Srebro, N. Exploring
first connecting the generalization error to the covering number and then relating generalization in deep learning. In Advances in Neural Information Processing
the covering number to the effective dimension. This is explained in detail, along Systems 30, 5947–5956 (NIPS, 2017).
with the proof, in Supplementary Section 5.3. 18. Arora, S., Ge, R., Neyshabur, B. & Zhang, Y. Stronger generalization bounds
for deep nets via a compression approach. In Proc. 35th International
Training the quantum neural network on real hardware. The hardware Conference on Machine Learning Vol. 80, 254–263 (PMLR, 2018); http://
experiment is conducted on the ibmq_montreal 27-qubit device. We use four proceedings.mlr.press/v80/arora18b.html
qubits with linear connectivity to train the quantum neural network on the first 19. Wright, L. G. & McMahon, P. L. The capacity of quantum neural networks. In
two classes of the Iris dataset55. We deploy the same training specifications as Conference on Lasers and Electro-Optics JM4G.5 (Optical Society of America,
in Supplementary Section 3.3 and randomly initialize the parameters. Once the 2020); http://www.osapublishing.org/abstract.
training loss stabilizes, that is the change in the loss from one iteration to the cfm?URI=CLEO_QELS-2020-JM4G.5
next is small, we stop the hardware training. This occurs after roughly 33 training 20. Du, Y., Hsieh, M.-H., Liu, T. & Tao, D. Expressive power of parametrized
steps. The results are contained in Fig. 3b and the real hardware shows remarkable quantum circuits. Phys. Rev. Res. 2, 033125 (2020).
performance relative to all other models. Due to limited hardware availability, this 21. Huang, H.-Y. et al. Power of data in quantum machine learning. Nat.
experiment is only run once and an analysis of the hardware noise and the spread Commun. 12, 2631 (2021).
of the training loss for differently sampled initial parameters would make these 22. Berezniuk, O., Figalli, A., Ghigliazza, R. & Musaelian, K. A scale-dependent
results more robust. notion of effective dimension. Preprint at https://arxiv.org/abs/2001.10872
We plot the circuit that is implemented on the quantum device in (2020).
Supplementary Fig. 8. As in the quantum neural network discussed in 23. Rissanen, J. J. Fisher information and stochastic complexity. IEEE Trans. Inf.
Supplementary Section 1, the circuit contains parameterized RZ and RZZ rotations Theory 42, 40–47 (1996).
that depend on the data, as well as parameterized RY gates with eight trainable 24. Cover, T. M. & Thomas, J. A. Elements of Information Theory (Wiley, 2006).
parameters. Note the different entanglement structure presented here as opposed 25. Nakaji, K. & Yamamoto, N. Expressibility of the alternating layered ansatz for
to the circuits in Supplementary Figs. 1 and 2. This is to reduce the number of quantum computation. Quantum 5, 434 (2021).
CNOT gates required to incorporate current hardware constraints and could 26. Holmes, Z., Sharma, K., Cerezo, M. & Coles, P. J. Connecting ansatz
be the reason the actual hardware implementation trains so well as too much expressibility to gradient magnitudes and barren plateaus. Preprint at https://
entanglement has been shown to have a negative effect on model trainability49. arxiv.org/abs/2101.02138 (2021).
The full circuit repeats the feature map encoding once before the variational form 27. McClean, J. R., Boixo, S., Smelyanskiy, V. N., Babbush, R. & Neven, H. Barren
is applied. plateaus in quantum neural network training landscapes. Nat. Commun. 9,
1–6 (2018).
Data availability 28. Wang, S. et al. Noise-induced barren plateaus in variational quantum
The data for the graphs and analyses in this study was generated using Python. algorithms. Preprint at https://arxiv.org/abs/2007.14384 (2020).
Source data are provided with this paper. All other data can be accessed via the 29. Cerezo, M., Sone, A., Volkoff, T., Cincio, L. & Coles, P. J. Cost function
following Zenodo repository: https://doi.org/10.5281/zenodo.4732830 (ref. 56). dependent barren plateaus in shallow parametrized quantum circuits. Nat.
Commun. 12, 1791 (2021).
30. Verdon, G. et al. Learning to learn with quantum neural networks via
Code availability classical neural networks. Preprint at https://arxiv.org/abs/1907.05415 (2019).
All code to generate the data, figures and analyses in this study is publicly available 31. Volkoff, T. & Coles, P. J. Large gradients via correlation in random
with detailed information on the implementation via the following Zenodo parameterized quantum circuits. Quant. Sci. Technol. 6, 025008 (2021).
repository: https://doi.org/10.5281/zenodo.4732830 (ref. 56). 32. Skolik, A., McClean, J. R., Mohseni, M., van der Smagt, P. & Leib, M.
Layerwise learning for quantum neural networks. Quant. Mach. Intell. 3, 5
Received: 20 November 2020; Accepted: 14 May 2021; (2021).
Published online: 24 June 2021 33. Huembeli, P. & Dauphin, A. Characterizing the loss landscape of variational
quantum circuits. Quant. Sci. Technol. 6, 025011 (2021).
34. Bishop, C. Exact calculation of the Hessian matrix for the multilayer
References perceptron. Neural Comput. 4, 494–501 (1992).
1. Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning (MIT Press, 2016); 35. LeCun, Y. A., Bottou, L., Orr, G. B. & Müller, K.-R. Efficient BackProp 9–48
http://www.deeplearningbook.org (Springer, 2012); https://doi.org/10.1007/978-3-642-35289-8_3
2. Baldi, P. & Vershynin, R. The capacity of feedforward neural networks. Neural 36. Cerezo, M. & Coles, P. J. Higher order derivatives of quantum neural
Networks 116, 288–311 (2019). networks with barren plateaus. Quant. Sci. Technol. 6, 035006 (2021).
3. Dziugaite, G. K. & Roy, D. M. Computing nonvacuous generalization bounds 37. Kunstner, F., Hennig, P. & Balles, L. Limitations of the empirical Fisher
for deep (stochastic) neural networks with many more parameters than approximation for natural gradient descent. In Advances in Neural
training data. In Proc. 33rd Conference on Uncertainty in Artificial Intelligence Information Processing Systems 32 4156–4167 (NIPS, 2019); http://papers.nips.
(UAI, 2017). cc/paper/limitations-of-fisher-approximation
4. Schuld, M. Supervised Learning with Quantum Computers (Springer, 2018). 38. Karakida, R., Akaho, S. & Amari, S.-I. Universal statistics of Fisher
5. Zoufal, C., Lucchi, A. & Woerner, S. Quantum generative adversarial information in deep neural networks: mean field approach. In Proc. Machine
networks for learning and loading random distributions. npj Quant. Inf. 5, Learning Research Vol. 89, 1032–1041 (PMLR, 2019); http://proceedings.mlr.
1–9 (2019). press/v89/karakida19a.html

408 Nature Computational Science | VOL 1 | June 2021 | 403–409 | www.nature.com/natcomputsci

Nature ComputatIonal ScIence Articles
39. Schuld, M., Bocharov, A., Svore, K. M. & Wiebe, N. Circuit-centric quantum 53. Virmaux, A. & Scaman, K. Lipschitz regularity of deep neural networks:
classifiers. Phys. Rev. A 101, 032308 (2020). analysis and efficient estimation. In Advances in Neural Information Processing
40. Schuld, M., Sweke, R. & Meyer, J. J. Effect of data encoding on the expressive Systems 31, 3835–3844 (NIPS, 2018); http://papers.nips.cc/paper/
power of variational quantum-machine-learning models. Phys. Rev. A 103, lipschitz-regularity-of-deep-neural-networks
032430 (2021). 54. Sweke, R. et al. Stochastic gradient descent for hybrid quantum-classical
41. Lloyd, S., Schuld, M., Ijaz, A., Izaac, J. & Killoran, N. Quantum embeddings optimization. Quantum 4, 314 (2020).
for machine learning. Preprint at https://arxiv.org/abs/2001.03622 55. Dua, D. & Graff, C. UCI Machine Learning Repository (2017); http://archive.
(2020). ics.uci.edu/ml
42. Cong, I., Choi, S. & Lukin, M. D. Quantum convolutional neural networks. 56. Abbas, A. et al. amyami187/effective_dimension: The Effective Dimension Code
Nat. Phys. 15, 1273–1278 (2019). (Zenodo, 2021); https://doi.org/10.5281/zenodo.4732830
43. Amari, S.-I. Natural gradient works efficiently in learning. Neural Comput. 10,
251–276 (1998). Acknowledgements
44. Liang, T., Poggio, T., Rakhlin, A. & Stokes, J. Fisher–Rao metric, geometry, We thank M. Schuld for insightful discussions on data embedding in quantum models.
and complexity of neural networks. In Proc. Machine Learning Research Vol. We also thank T. L. Scholten for constructive feedback on the manuscript. C.Z.
89, 888–896 (PMLR, 2019); http://proceedings.mlr.press/v89/liang19a.html acknowledges support from the National Centre of Competence in Research Quantum
45. Neyshabur, B., Salakhutdinov, R. R. & Srebro, N. Path-SGD: path-normalized Science and Technology (QSIT).
optimization in deep neural networks. In Advances in Neural Information
Processing Systems 28, 2422–2430 (NIPS, 2015).
46. Neyshabur, B., Tomioka, R. & Srebro, N. Norm-based capacity control in Author contributions
neural networks. In Proc. Machine Learning Research Vol. 40, 1376–1401 The main ideas were developed by all of the authors. A.A. provided numerical simulations.
(PMLR, 2015); http://proceedings.mlr.press/v40/Neyshabur15.html D.S. and A.F. proved the technical claims. All authors contributed to the write-up.
47. Bartlett, P. L., Foster, D. J. & Telgarsky, M. J. Spectrally-normalized margin
bounds for neural networks. In Advances in Neural Information Processing Competing interests
Systems 30, 6240–6249 (NIPS, 2017); http://papers.nips.cc/ The authors declare no competing interests.
paper/7204-spectrally-normalized
48. Rissanen, J. J. Fisher information and stochastic complexity. IEEE Trans. Inf. Additional information
Theory 42, 40–47 (1996). Supplementary information The online version contains supplementary material
49. Marrero, C. O., Kieferová, M. & Wiebe, N. Entanglement induced barren available at https://doi.org/10.1038/s43588-021-00084-1.
plateaus. Preprint at https://arxiv.org/abs/2010.15968 (2020).
50. Havlíček, V. et al. Supervised learning with quantum-enhanced feature spaces. Correspondence and requests for materials should be addressed to S.W.
Nature 567, 209–212 (2019). Peer review information Nature Computational Science thanks Patrick Coles and the
51. Sim, S., Johnson, P. D. & Aspuru-Guzik, A. Expressibility and entangling other, anonymous, reviewer(s) for their contribution to the peer review of this work.
capability of parameterized quantum circuits for hybrid quantum-classical Handling editor: Jie Pan, in collaboration with the Nature Computational Science team.
algorithms. Adv. Quant. Technol. 2, 1900070 (2019). Reprints and permissions information is available at www.nature.com/reprints.
52. Jia, Z. & Su, H. Information-theoretic local minima characterization and
regularization. In Proc. 37th International Conference on Machine Learning Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
Vol. 119, 4773–4783 (PMLR, 2020); http://proceedings.mlr.press/v119/jia20a. published maps and institutional affiliations.
html © The Author(s), under exclusive licence to Springer Nature America, Inc. 2021

Nature Computational Science | VOL 1 | June 2021 | 403–409 | www.nature.com/natcomputsci 409

Goto - Japan's National Innovation System, Current Status and Problems
No ratings yet
Goto - Japan's National Innovation System, Current Status and Problems
11 pages
Module 2 - 21st Century Skill Categories
100% (4)
Module 2 - 21st Century Skill Categories
34 pages
GRP Pipe Lamination Procedure
No ratings yet
GRP Pipe Lamination Procedure
27 pages
The Power of Data in QML
No ratings yet
The Power of Data in QML
34 pages
Quantum Neural Network Classifiers: A Tutorial: Submission
No ratings yet
Quantum Neural Network Classifiers: A Tutorial: Submission
30 pages
The Dilemma of Quantum Neural Networks
No ratings yet
The Dilemma of Quantum Neural Networks
13 pages
Quantum Computing Models For Articial Neural Networks
No ratings yet
Quantum Computing Models For Articial Neural Networks
9 pages
A Hybrid Quantum-Classical Neural Network Architecture For Binary Classification
No ratings yet
A Hybrid Quantum-Classical Neural Network Architecture For Binary Classification
9 pages
Solving Machine Learning Optimization Problems Using Quantum Computers
No ratings yet
Solving Machine Learning Optimization Problems Using Quantum Computers
6 pages
Application of quantum machine learning using quantum kernel algorithms on multiclass neuron M‐type classification
No ratings yet
Application of quantum machine learning using quantum kernel algorithms on multiclass neuron M‐type classification
15 pages
Design Space Exploration of Hybrid Quantum-Classical
No ratings yet
Design Space Exploration of Hybrid Quantum-Classical
20 pages
Evaluation of Parameterized Quantum Circuits: On The Relation Between Classification Accuracy, Expressibility and Entangling Capability
No ratings yet
Evaluation of Parameterized Quantum Circuits: On The Relation Between Classification Accuracy, Expressibility and Entangling Capability
19 pages
6533_On_the_Relation_between_T
No ratings yet
6533_On_the_Relation_between_T
23 pages
Q NeuroEvolution Arxiv
No ratings yet
Q NeuroEvolution Arxiv
12 pages
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
No ratings yet
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
25 pages
QQL NS 2019012315280690
No ratings yet
QQL NS 2019012315280690
9 pages
Efficient Learning For Deep Quantum Neural Networks
No ratings yet
Efficient Learning For Deep Quantum Neural Networks
24 pages
Transfer Learning in Hybrid Classical-Quantum Neural Networks
No ratings yet
Transfer Learning in Hybrid Classical-Quantum Neural Networks
13 pages
sciadv.aat9004
No ratings yet
sciadv.aat9004
7 pages
s42484-023-00114-3
No ratings yet
s42484-023-00114-3
24 pages
Hybrid Quantum-Classical Neural Networks
No ratings yet
Hybrid Quantum-Classical Neural Networks
7 pages
2412.09486v1
No ratings yet
2412.09486v1
21 pages
2402.16465v1
No ratings yet
2402.16465v1
8 pages
Intro To QMLand QNN
No ratings yet
Intro To QMLand QNN
13 pages
NN 3
No ratings yet
NN 3
15 pages
s10773-024-05669-w
No ratings yet
s10773-024-05669-w
17 pages
Hybrid Quantum Neural Network Structures For Image Multi-Classification
No ratings yet
Hybrid Quantum Neural Network Structures For Image Multi-Classification
20 pages
Coherent Feed Forward Quantum Neural Network
No ratings yet
Coherent Feed Forward Quantum Neural Network
11 pages
Dear The Weight
From Everand
Dear The Weight
Masud Rana
No ratings yet
2103.05577
No ratings yet
2103.05577
27 pages
Evaluating Generalization in CLassical An Quantum Generative Models
No ratings yet
Evaluating Generalization in CLassical An Quantum Generative Models
27 pages
Machine Learning and Quantum Devices: Florian Marquardt April 22, 2020
No ratings yet
Machine Learning and Quantum Devices: Florian Marquardt April 22, 2020
40 pages
Quantum_Deep_Learning
No ratings yet
Quantum_Deep_Learning
35 pages
Quantum Neural Network Model For Token Allocation
No ratings yet
Quantum Neural Network Model For Token Allocation
7 pages
2019-Quantum Convolutional Neural Networks
No ratings yet
2019-Quantum Convolutional Neural Networks
8 pages
Lecture W15ab
No ratings yet
Lecture W15ab
44 pages
New Quantum Neural Network Designs
No ratings yet
New Quantum Neural Network Designs
16 pages
Quantum Convolutional Neural Networks
No ratings yet
Quantum Convolutional Neural Networks
8 pages
Insights Into Quantum Support Vector Machine
No ratings yet
Insights Into Quantum Support Vector Machine
27 pages
Information Plane and Compression-Gnostic Feedback in Quantum Machine Learning
No ratings yet
Information Plane and Compression-Gnostic Feedback in Quantum Machine Learning
16 pages
Conference 101719
No ratings yet
Conference 101719
10 pages
Tensor Flow Q
No ratings yet
Tensor Flow Q
39 pages
Bv Capacity Neural Networks
No ratings yet
Bv Capacity Neural Networks
49 pages
zhao2019
No ratings yet
zhao2019
11 pages
On The Expressive Power of Deep Neural Networks 1606.05336v4
No ratings yet
On The Expressive Power of Deep Neural Networks 1606.05336v4
20 pages
Better Than Classical The Subtle Art of Benchmarking Quantum Machine Learning Models
No ratings yet
Better Than Classical The Subtle Art of Benchmarking Quantum Machine Learning Models
41 pages
Statistical Mechanics of Deep Learning
No ratings yet
Statistical Mechanics of Deep Learning
30 pages
Entropy 25 00287 v2
No ratings yet
Entropy 25 00287 v2
41 pages
20 StatMechDeep
No ratings yet
20 StatMechDeep
30 pages
GNN-Foundations-Frontiers-and-Applications-chapter5
No ratings yet
GNN-Foundations-Frontiers-and-Applications-chapter5
36 pages
Quantum Machine Learning
No ratings yet
Quantum Machine Learning
8 pages
s41467-024-54813-x
No ratings yet
s41467-024-54813-x
9 pages
QNNs Are Kernel Methods
No ratings yet
QNNs Are Kernel Methods
25 pages
A Practical Approach To Sizing Neural Networks
No ratings yet
A Practical Approach To Sizing Neural Networks
13 pages
nwy149
No ratings yet
nwy149
3 pages
Quantum Boltzmann Machine
No ratings yet
Quantum Boltzmann Machine
10 pages
Quantum Machine Learning
No ratings yet
Quantum Machine Learning
24 pages
Quantum PAper
No ratings yet
Quantum PAper
12 pages
Quanvolutional Neural Networks Powering Image Recognition With Quantum Circuits
No ratings yet
Quanvolutional Neural Networks Powering Image Recognition With Quantum Circuits
7 pages
RM Paper Quantum Machine Learning 2
No ratings yet
RM Paper Quantum Machine Learning 2
7 pages
Learning To Learn With Quantum Neural Networks Via Classical Neural Networks
No ratings yet
Learning To Learn With Quantum Neural Networks Via Classical Neural Networks
12 pages
THE_DEEP_NEURAL_NETWORK-A_REVIEW
No ratings yet
THE_DEEP_NEURAL_NETWORK-A_REVIEW
5 pages
UNIT-II DLL
No ratings yet
UNIT-II DLL
19 pages
EcoGuide Operating-Manual 04 en
No ratings yet
EcoGuide Operating-Manual 04 en
70 pages
DeepAffinity: Interpretable Deep Learning of Compound-Protein Affinity Through Unified Recurrent and Convolutional Neural Networks
No ratings yet
DeepAffinity: Interpretable Deep Learning of Compound-Protein Affinity Through Unified Recurrent and Convolutional Neural Networks
10 pages
Installation and Parts Manual: FOR D8R Series Ii Tractors
No ratings yet
Installation and Parts Manual: FOR D8R Series Ii Tractors
11 pages
Module in Fire Technology and Arson Investigation (Cdi 6)
No ratings yet
Module in Fire Technology and Arson Investigation (Cdi 6)
5 pages
Quality Specialist - JobFinder
No ratings yet
Quality Specialist - JobFinder
4 pages
Feedback: Give The Shortcut Keys of The Following Actions: Italic
100% (1)
Feedback: Give The Shortcut Keys of The Following Actions: Italic
23 pages
A Reinforcement Learning Approach To Vehicle Coor - 2024 - Green Energy and Inte
No ratings yet
A Reinforcement Learning Approach To Vehicle Coor - 2024 - Green Energy and Inte
18 pages
BMW X1 SDrive20d Expedition 2018-05-11
No ratings yet
BMW X1 SDrive20d Expedition 2018-05-11
5 pages
Samplig-Probability & Non Probability
No ratings yet
Samplig-Probability & Non Probability
4 pages
Health Interventions and Digital Technologies in Public Health (Assigment One)
No ratings yet
Health Interventions and Digital Technologies in Public Health (Assigment One)
10 pages
HTML Iframes
No ratings yet
HTML Iframes
9 pages
Questions for Practice
No ratings yet
Questions for Practice
2 pages
2015 Aime I
No ratings yet
2015 Aime I
5 pages
ManagerLetter Cognizant Entity
No ratings yet
ManagerLetter Cognizant Entity
8 pages
JavaScript Interview Questions
No ratings yet
JavaScript Interview Questions
13 pages
CODE OF CONDUCT FOR DATA INTEGRITY
No ratings yet
CODE OF CONDUCT FOR DATA INTEGRITY
5 pages
Bigdrone
No ratings yet
Bigdrone
3 pages
Master XDC File Fpga Board
No ratings yet
Master XDC File Fpga Board
21 pages
Exp No 2 Soham 22202A0066
No ratings yet
Exp No 2 Soham 22202A0066
4 pages
Q.1) Choose The Correct Alternative:: Grade:4 Maths Descriptive Practice Ws
No ratings yet
Q.1) Choose The Correct Alternative:: Grade:4 Maths Descriptive Practice Ws
5 pages
01 Data Communication Network Basic 2
No ratings yet
01 Data Communication Network Basic 2
28 pages
Telecommunicati On Industry: Reliance Jio
No ratings yet
Telecommunicati On Industry: Reliance Jio
10 pages
Bangladesh Railway
No ratings yet
Bangladesh Railway
2 pages
Tariq Jarrar Instrument
No ratings yet
Tariq Jarrar Instrument
6 pages
NWR Poster
No ratings yet
NWR Poster
1 page
Advance Excel Course Notes PDF (Sscstudy - Com)
100% (2)
Advance Excel Course Notes PDF (Sscstudy - Com)
58 pages
Vvism Placement Brochure(1)
No ratings yet
Vvism Placement Brochure(1)
17 pages

abbas2021 (1)

Uploaded by

abbas2021 (1)

Uploaded by

Articles

The power of quantum neural networks

Nature Computational Science | VOL 1 | June 2021 | 403–409 | www.nature.com/natcomputsci 403

network, we apply a softmax function to the final layer, whereas in

U x ∣0〉⊗S = ∣ψx 〉 ∣ψ x 〉 = ∣gθ(x)〉

404 Nature Computational Science | VOL 1 | June 2021 | 403–409 | www.nature.com/natcomputsci

1.0 1.0 1.0

0.8 0.8 0.8

0.6 0.6 0.6

0.4 0.4 0.4

0.2 0.2 0.2

Classical neural network Easy quantum model Quantum neural network

Nature Computational Science | VOL 1 | June 2021 | 403–409 | www.nature.com/natcomputsci 405

406 Nature Computational Science | VOL 1 | June 2021 | 403–409 | www.nature.com/natcomputsci

Nature Computational Science | VOL 1 | June 2021 | 403–409 | www.nature.com/natcomputsci 407

408 Nature Computational Science | VOL 1 | June 2021 | 403–409 | www.nature.com/natcomputsci

Nature Computational Science | VOL 1 | June 2021 | 403–409 | www.nature.com/natcomputsci 409

You might also like