0% found this document useful (0 votes)

12 views

Quadratic Quantum Speedup in Evaluating Bilinear r

The document presents a hybrid quantum-classical algorithm that achieves quadratic quantum speedup in evaluating bilinear risk functions, particularly in the context of energy economics. It addresses challenges in representing nonlinearities in quantum circuits and demonstrates a comprehensive study of various implementation variants, ultimately selecting an optimized approach that enhances circuit efficiency. The findings are validated through experiments on IBM Quantum systems, showcasing the algorithm's potential for real-time risk management applications.

Uploaded by

azertqsdfg01

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

Quadratic Quantum Speedup in Evaluating Bilinear r

Uploaded by

azertqsdfg01

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

Quadratic quantum speedup in evaluating bilinear risk

functions
Gabriele Agliardi1 , Corey O’Meara2 , Kavitha Yogaraj3 , Kumar Ghosh4 , Piergiacomo Sabino5,6 ,
Marina Fernández-Campoamor4 , Giorgio Cortiana4 , Juan Bernabé-Moreno7 , Francesco Tacchino8 ,
Antonio Mezzacapo9 , and Omar Shehab9

1
IBM Italia, Milan, Italy
2
E.ON Digital Technology GmbH, Hannover, Germany
3
IBM Quantum, IBM Research, India
4
E.ON Digital Technology GmbH, Essen, Germany
5
E.ON SE, Essen, Germany
arXiv:2304.10385v1 [quant-ph] 20 Apr 2023

6
University of Helsinki, Finland, Department of Mathematics and Statistics
7
IBM Research Europe, Dublin, Ireland
8
IBM Quantum, IBM Research Europe, Zurich, Switzerland
9
IBM Quantum, IBM Thomas J Watson Research Center, Yorktown Heights, NY, USA

Computing nonlinear functions over 1 Introduction

multilinear forms is a general problem with
applications in risk analysis. For instance Any speedup in the calculation of contract values
in the domain of energy economics, ac- has a paramount importance in the risk manage-
curate and timely risk management de- ment of the energy industry, enabling real-time
mands for efficient simulation of millions planning, finer risk diversification, and faster re-
of scenarios, largely benefiting from com- negotiation of hedging contracts [1]. Gaining a
putational speedups. We develop a novel speedup with quantum computing though poses
hybrid quantum-classical algorithm based two challenges: on one side, the native linear be-
on polynomial approximation of nonlin- havior of quantum operators makes nonlineari-
ear functions and compare different imple- ties in the contract functions less simple to repre-
mentation variants. We prove a quadratic sent through quantum circuits, and on the other
quantum speedup, up to polylogarithmic side, classical algorithms with linear scaling in the
factors, when forms are bilinear and ap- number of data points are available. In this work,
proximating polynomials have second de- we design a quantum approach based on poly-
gree, if efficient loading unitaries are avail- nomial approximations, and provide a compre-
able for the input data sets. We also en- hensive study of alternate implementations based
hance the bidirectional encoding, that al- on tuned combinations of quantum subroutines,
lows tuning the balance between circuit comparing their performance. Finally, we se-
depth and width, proposing an improved lect an optimized implementation with quadratic
version that can be exploited for the cal- speedup up to polylogarithmic terms, for second-
culation of inner products. Lastly, we ex- degree polynomials.
ploit the dynamic circuit capabilities, re- The idea of applying quantum computing tech-
cently introduced on IBM Quantum de- niques to risk management problems was widely
vices, to reduce the average depth of the explored in the context of finance [2–4] and re-
Quantum Hadamard Product circuit. A cently exported to the energy field by the authors
proof of principle is implemented and val- in Ref. [1]. These works have the main objec-
idated on IBM Quantum systems. tive of accelerating the execution of Monte Carlo
methods used for the assessment of risk measures
of contract portfolios, exploiting the Quantum
Gabriele Agliardi: [email protected]
Amplitude Estimation algorithm that provides a
Corey O’Meara: corey.o’[email protected]
quadratic speedup to classical Monte Carlo sim-

1
P Objective P
Estimation of j
Ej p(Tj ) as an approximation for j
Ej f (Tj ),
where f is an arbitrary function and p is a polynomial.

Intermediate findings
Proposed approach Implementation variants Final result
with broader implications
(a)
Data (i)
Ancilla-free
preparation Dynamic stopping
inner product
and mid-resets
(ii)
Data loading Quantum methods (b)
for inner products Ancilla-free Quadratic speedup
inner product up to polylogarithmic
Polynomial (iii) and no mid-resets factors
approximation Investigation for polynomial p
with QHP of QHP (c) of degree 2
Ancilla-free under efficient
Estimation (iv) inner product data loading
of inner Adaptation and *QAE
products for *QAE

(v) (d)
Post-processing
Bidirectional orthogonal BOE and *QAE
encoding

Figure 1: A conceptual representation of the paper, detailed in Subsec. 1.1. We propose a hybrid approach for
the problem resolution. While tuning and implementing the algorithm, we derive five intermediate findings, which
contribute to different implementation variants. Variant (c) is finally selected for providing the desired speedup, which
constitutes our final result. The following terms are used in the picture: QHP for Quantum Hadamard Product (see
Subsec. 2.2), *QAE for Quantum Amplitude Estimation or equivalent alternatives, BOE for Bidirectional Orthogonal
Encoding (see Subsec. 2.1).

ulations [5, 6]. ure for nonlinear circuits. The work [17] pursues
an objective that is complementary to ours: it
In this manuscript, we rather focus on the
estimates an arbitrary (activation) function that
more fundamental task of efficiently calculating
takes an inner product as the input, while we ap-
the value of a contract, and by extension of port-
proximate the inner product of two vectors, one
folios, thus providing the data for subsequent risk
of which is the output of an arbitrary function.
measure calculation. The conceptual structure of
The remainder of the section provides a techni-
the work is summarized in Fig. 1 and discribed in
cal overview of our approach and findings, as well
Subsec. 1.1. Our target quantity takes the form
P as the context of risk analysis. Section 2 describes
of an inner product j Ej p(Tj ), where [Ej ] and
the proposed approach, the main building blocks,
[Tj ] are arbitrary vectors, and p is a polynomial
and the different implementation variants, with
approximating a function f . This formulation is
a focus on the most performing one, namely (c).
highly general and allows to address diverse forms
Section 3 studies the algorithmic complexity from
of energy contracts, beyond the exemplary one
a theoretical stand point, while Section 4 gives ex-
adopted here, as well as other use cases across in-
perimental results. The final Section 5 contains
dustries, e.g. finance [7], climate science [8], etc.
conclusions and outlooks. The paper is comple-
Quantum circuits that produce nonlinear be- mented with a rich set of Appendices, containing
haviors are not straightforwardly available, due to multiple implementation variants with the tech-
the native linearity of quantum operators [9–12]. nical details for the underlying theory.
Methods for treating nonlinearities either exploit
black-box approaches for usage in trainable cir-
1.1 Technical overview
cuits [13, 14], or represent specific types of non-
linear functions [15,16]. Ref. [9] discusses the role Our overall approach is sketched in the first col-
of selective operations and the probability of fail- umn of Fig. 1. Specifically, we employ polyno-

2
mial approximation of nonlinear functions and approach, that specifically differ for the encod-
then compute polynomials by resorting to Quan- ing and loading subroutines, for the presence of
tum Hadamard Products (QHP) [10,18] for pow- mid-circuit measurements and resets in the cir-
ers. Note that in a classical workflow, a polyno- cuits, for the quantum subroutine applied to cal-
mial approximation would also be a step, hidden culate inner products, as well as for the presence
inside the low-level calculation of the nonlinear of the Quantum Amplitude Estimation (QAE) as
function. a performance booster. Not all of these combi-
In the process of implementing and optimizing nations are compatible with each other, result-
the algorithm, we derive the following intermedi- ing in four selected variants (see third column in
ate findings that are interesting also beyond our Fig. 1 and details in Fig. 2), namely: (a) ancilla-
specific use case (see second column in Fig. 1): free inner product and mid-reset, (b) ancilla-free
inner product and no mid-reset, (c) ancilla-free
(i) Dynamic stopping: we show that QHP can inner product and *QAE, (d) BOE and *QAE.
be implemented via dynamic circuits, intro- The first three variants use the amplitude encod-
ducing the dynamic stopping, namely the ing and the ancilla-free method for inner prod-
early abortion of the circuit execution to re- ucts, while the last one employs the Bidirectional
duce the average circuit depth, based on mid- Orthogonal Encoding (BOE) and the swap test.
circuit measurements which are a relatively The notation *QAE remarks that QAE can be re-
new capability in commercial quantum pro- placed by any equivalent alternative, such as the
cessors, Iterative QAE (IQAE) [20], the Chebyshev QAE
(ChebQAE) [6] or the Dynamic QAE [1].
(ii) Quantum methods for inner products: we We prove that variant (c) outperforms the oth-
show that the sampling complexity of the ers and achieves a quadratic speedup up to poly-
swap test for the calculation of an inner logarithmic factors when the approximating poly-
product p is unbounded when p tends to 0, nomial has degree 2, under the assumption that
making other techniques (e.g. what we call an efficient data loading unitary is available in
ancilla-free) more convenient, whenever ap- amplitude encoding. Since the cost of contract
plicable, valuation is already linear in the number of data
points on classical computers, the possibility to
(iii) Investigation of QHP: we provide a simpli-
improve on that result via quantum algorithms is
fied formalization of the QHP, that describes
strictly connected to the ability of loading data
it as a unitary providing a desired output
efficiently, because in general the data loading
state under a success rate, and we clarify the
procedure has a linear cost. We deeply discuss
impact of normalization on the performance
the impact of data encoding and loading on the
of the QHP algorithm,
performance of the methods.
(iv) Adaptation for QAE: we show how to adapt Our approach is validated on IBM Quantum
our approach based on QHP, in order to devices, for small problem instances (N = 4 data
make use of the QAE algorithm, appropri- points). In this setting, overall errors are in line
ately modifying the circuit structure to post- with the theory, but the effect of noise is already
pone measurements, and clearly observable when drilling down to the er-
rors associated to high powers in the approximat-
(v) Bidirectional Orthogonal Encoding: we high- ing polynomial.
light that the encoding strategy proposed in
Ref. [19] is not compatible with swap tests 1.2 Background on portfolio risk analysis in
(used for computing the inner product of the energy industry
data vectors), but it can be modified giving
rise to the new Bidirectional Orthogonal En- The gas demand of households or heating can
coding (BOE), which is suitable for such a be well described by a deterministic dependency
task. of gas volumes and weather variables, typically
the temperature. Standard contracts for pri-
By combining the considerations above, we vate or industrial customers normally entail full-
generate multiple implementation variants for our supply gas delivery, without volume constraints.

3
Variant design Complexity

Encoding
Classical Inner Circuit Time
and Powers Remarks
data product design performance
loading

(a)
QHP
Ancilla-free
with Depending on Low width
inner product
mid-reset depth and
and mid-resets Without
pre-processing
Assumption: *QAE
(b) required
Ancilla-
Ancilla-free efficient data loading by UT , UE
free Low depth
inner prod and unitaries UT , UE
method
no mid-resets in amplitude encoding

(c) As above *QAE implies

QHP much higher
Ancilla-free plus additional
without depths than
inner product speedup given
mid-resets (a) and (b) due
and *QAE With by *QAE
*QAE to Grover powers
Assumptn:
(d) binary BOE Swap Depending on (d) with s = 1
BOE and *QAE tree loading test split level s has a bad
for T̃ , Ẽ depth scaling

Figure 2: Characteristics of the four proposed implementation variants. Each row is a variant. See Table 1 for
quantitative time scaling analysis. Variant (c) is highlighted as it provides the desired quantum speedup. Terms
used in the table are defined in Section 2: see Subsec. 2.1 for encoding (UT and UE are the loading unitaries for
temperatures and prices, BOE stands for Bidirectional Orthogonal Encoding), Subsec. 2.2 for powers (QHP is the
Quantum Hadamard Product), Subsec. 2.3 for inner product, Subsec. 2.4 for the circuit design (*QAE is the Quantum
Amplitude Estimation, or any equivalent technique). Refer to Appendix D for a detailed description of the variants
and for the extensive complexity analysis comparison.

Namely, the customer pays a fixed price for the in- and time-consuming Monte Carlo simulations to
dividual consumption whereas the supplier takes estimate the fair value, which is seen as the sam-
the risk of volume deviations from the projected ple mean, and some risk measures related to the
load profile of the customer. For example, on cold sample statistics [21–24].
days the gas demand is likely to be higher than Let us consider a simplified weather-related
expected, therefore in order to meet the demand, portfolio which depends on gas and temperature.
a supplier has to buy the extra gas needed on For simplicity, we consider one market and one
the day-ahead market, typically at prices that are temperature station. On the other hand, a real
higher than those contracted with the customer. portfolio would consider multiple markets as well
In contrast, excess volumes need to be sold as several weather stations. Suppose we have a
by the supplier for lower prices in order to bal- one-year time horizon from 1-Jan-2022 to 31-Dec-
ance the economic position when temperatures 2022, with daily granularity.
are higher than expected. This leads to costs and We assume that the gas prices and the temper-
risks for the gas supplier which can be managed atures, denoted1 [Ej0 ] and [Tj0 ], j = 0, 1, ..., 364,
with either purely temperature-based weather respectively, are given. Typically they are gener-
derivatives or with cross-commodity temperature ated by a two-factor Markov model each, namely
derivative contracts, often called quantos. the value at time j is updated with 2 random vari-
Accordingly, risk managers perform risk anal- ables, hence the total number of random draws is
ysis and compute the fair value and some statis- 365 × 2 = 730 both for the gas and for the tem-
tics of the entire weather-related portfolio and of perature, times the number of markets, leading
the contracts it consists of. To this end, one de-
fines upfront a joint stochastic model for the gas We introduce here primed notations for non-
1

prices and temperatures and relies on extensive normalized vectors, consistently with the entire paper.

4
Algorithm variant Time complexity
Classical exact O(N )
Classical Classical polynomial approx O(N )
Sampling-based polynomial approx Oβ,K (−2 lg N )

(a) Ancilla-free inner prod, mid-resets Oβ,K,b Cc,load (N ) + Cd,load (N )N K−1 −2
QAE-free
(b) Ancilla-free inner prod, no mid-res Oβ,K,b Cc,load (N ) + Cd,load (N )N K−1 −2

(c) Ancilla-free inner prod, *QAE Oβ,K,b Cc,load (N ) + [Cd,load (N ) + lg N ] N 2 −1
K−1

QAE-based
(d) BOE and *QAE Oβ,K,b N K 2s + lg2 N − s2 −1

Table 1: Time complexity of the proposed algorithm variants, introduced in Fig. 2, in comparison with the classical
benchmarks. Variant (c) is highlighted as it provides the desired speedup. An extended version is Table 7 of
Appendix D, that contains other complexity metrics as well as references to the Propositions justifying the results.
Notations are summarized in Table 2. The parameters in the analysis are the data set size N , the polynomial degree
K (K ≥ 1), the target precision and confidence level β such that P (|V − v ∗ | ≤ |v|) ≥ β, the coefficients bk (η)
in Eq. (8), and the split level s = s(N ) ∈ {1, ..., lg N } for the BOE. Remarkably, the error is measured against
the exact polynomial evaluation, under the assumption that the polynomial itself is a good approximation of the
target volume function. Asymptotic estimations are provided for → 0 or N → ∞. Constants affecting asymptotic
estimates are marked in the subscript of the big O notation: for instance the notation Oβ (−2 ) is intended for → 0
uniformly in N , with factors depending only on β. For readability in the subscripts
√ we use b for bk (η). The time
scaling is calculated under the additional assumption that norms scale as N , as justified in Subsection A.4. We
use the star notation in front of QAE (*QAE) to emphasize that any QAE technique is applicable, such as Iterative
QAE (IQAE) [20] Dynamic QAE [1], as long as it shares the same substantial time-scaling with the usual QAE.

to thousands. These random variables are usually belongs to (eg. households, medium-size enter-
assumed to be normally distributed and mutually prises, etc.), and normally one takes T0 = 40 de-
correlated even across gas and temperature. If gree Celsius.
then one considers R Monte Carlo repetitions one Finally, we are given a vector τj0 named season-
has to generate a random sample linearly scaling normal temperature which describes our daily ex-
with R. pectation of the temperature station.
As mentioned, the portfolio we focus on con- In our study we focus on the calculation of the
sists of fully supply contracts based on which the change of gross margin, defined as the (unknown
customer can nominate gas volumes at an agreed random) difference between the net random sales
sales price, denoted asp. These contracts are then less the random costs at a certain future time j
implicitly temperature dependent indeed, their and the planned, therefore known, sales minus
volume can be described by a function f of the cost at the same future time j. Formally, the
temperature which is supposed to model the cus- change (delta) gross margin (∆GM ) of a con-
tomers’ demand. Such a function is often a so- tract, depending on a certain gas market and a
called2 sigmoid function, namely certain temperature can be written as
X
A ∆GM = (f (Tj0 ) − f (τj0 ))(asp − Ej0 )). (2)
f (T ) =
0
C + D, (1) j
1+ B
T 0 −T0 In this study, we present multiple quantum ap-
proaches to evaluate ∆GM given the tempera-
with T 0 ≤ T0 . Of course, the higher the temper- ture and energy price vectors, and discuss their
ature, the lower is the volume demand and vice- potential advantages over classical counterparts
versa. The parameters A ≥ 0, B < 0, C > 1, under different conditions. More specifically, we
D > 0 are given, in the sense that they are either provide methods to efficiently compute contract
part of the contract or are estimated from the his- value functions of the form
torical demand of the cluster a specific customer X
v= f (Tj0 )Ej0 , (3)
j
2
Despite the name being widespread in the energy in-
dustry, the function is not a sigmoid according to the usual which can be used to reconstruct the expression
mathematical definition. in Eq. (2).

5
Notations introduced across the paper are col- Finally, the contract value in Eq. (8) is recon-
lected in Table 2. structed as the summation
K
X
v ≈ v ∗ := T ρE bk (η) yk ,
ρ−k −1
(10)
2 Hybrid quantum-classical approach k=0

Consider a single realization of the time series where bk (η) are known classically. It will be useful
P
representing temperatures and prices, namely two to write v ∗ = k bk (η) yk0 where
real vectors [Tj0 ]j=0
N −1
and [Ej0 ]N −1
j=0 indexed over
time. Suppose they are classically generated, and yk0 := ρ−k −1
T ρE yk . (11)
transformed into the normalized versions [Tj ]N −1
j=0 In the remainder of this Section, we discuss in
and [Ej ]N −1
j=0 through the affinities: detail the key parts of the quantum algorithm,
( namely data encoding, power calculation, and in-
Tj = ρT (Tj0 − η),
(4) ner product. We focus on the main implementa-
Ej = ρE Ej0 , tion variant (c), while highlighting the key differ-
ences where relevant.
where η is an appropriate translation to guaran-
tee Tj > 0 for all j at least in probability (refer
to Assumption 2.1 below), and ρT and ρE are
2.1 Data encoding and loading
suitable scale factors: There are multiple ways to represent the same
N
X −1 data set in quantum registers [25–32], such as the
T =
ρ−2 (Tj0 − η)2 (5) basis (aka digital, or equally-weighted ) encoding,
j=0 the amplitude (aka analog) encoding, the angle
N
X −1 encoding, etc. The subsequent quantum process-
E =
ρ−2 (Ej0 )2 . (6) ing techniques are highly dependent on the data
j=0
encoding protocol. Here, we focus on the am-
Our proposed quantum algorithm, outlined in plitude encoding. Additionally, we introduce the
Fig. 3, approximates the volume function f (T 0 ) Bidirectional Orthogonal Encoding (BOE), which
by means of a polynomial of degree K. WLOG, can be seen as a variant of the amplitude encod-
we can write the polynomial in the form ing designed to balance circuit width and depth.
K In the amplitude encoding, a normalized clas-
X
f (Tj0 ) ≈ bk (η) (Tj0 − η)k , (7) sical vector [Dj ] of linear size N is represented
k=0 as
N
X −1
where bk (η) are real coefficients. Consequently |ψD i := Dj |ji , (12)
the contract value of Eq. (3) writes j=0

N
X −1 X where {|ji} are computational basis states of lg N
v= Ej0 f (Tj0 ) ≈ bk (η) Ej0 (Tj0 − η)k . (8) qubits. While amplitude encoding offers an effec-
j=0 j,k tive use of memory resources, the exact prepa-
ration of an arbitrary state |ψD i of the form
The algorithm starts by loading normalized tem-
shown in Eq. (12) requires O (N ) operations in
perature and price vectors into a quantum reg-
the worst case, thus jeopardizing the benefits of
ister. A sequence of non-linear transformations
many quantum algorithms. In practice, ampli-
is then exploited to calculate the powers Tjk for
tude encoding remains attractive in combination
all j = 0, . . . , N − 1 and k = 0, . . . , K. Finally,
with approximate data loading techniques (e.g.,
the inner product of the processed vectors is effi-
qGANs) [33, 34], in the presence of specific data
ciently evaluated, thus returning
structures [35] or under standard quantum mem-
N
X −1 ory assumptions [36]. In our complexity analysis,
yk := Ej Tjk (9) we assume access to amplitude-encoded quantum
j=0
states to derive complexity considerations in the
for all k. The estimation of yk is boosted by limit of qubit-efficient representations.
the QAE algorithm, that provides the quadratic Appendix C details a second version of our pro-
speedup (up to polylogarithmic factors). tocol that exploits an alternative scheme, based

6
Generation of
Encoding of Calculation of
normalized
temperatures powers
temperatures
Contract
P (k)
E Estimation
|ψT ⟩ = Tj |j⟩ = value
[Tj ]N −1 j ψT of inner
j=0 P reconstruction
ak j Tjk |j⟩ , products
P v≈
k = 2, . . . , K yk = Ej Tjk P
Generation of
j
Encoding of k ρ−k −1
T ρE
normalized k = 0, . . . , K
prices bk (η) yk
prices
P
|ψE ⟩ = Ej |j⟩
[Ej ]N −1
j=0
j

Figure 3: Algorithm for the estimation of the contract value through the polynomial approximation of f . The
figure instantiates the approach represented in the first column of Fig. 1. Blue boxes represent classical pre- and
post-processing, while green boxes are the core quantum processing.

on a newly introduced data encoding that we mation on the error scaling. The data loading
name Bidirectional Orthogonal Encoding (BOE). procedure is detailed in Subsection C.1.
Even though in the current application the BOE
does not achieve the same performance as the am- 2.2 Non-linear transformation with QHP
plitude encoding, it may turn useful beyond the
our use case, as it is designed to balance memory Let us now discuss the non-linear transformation,
and time resources (i.e., circuit width and depth). namely the calculation of monomial powers. As-
On one side, it builds upon the bidirectional en- sume temperatures can be encoded in the am-
coding [19], and on the other side, it guarantees plitudes of a quantum state. More precisely, let
that side registers are orthogonal [37], thus en- [Tj ]N −1
j=0 be a normalized time series of tempera-
abling subsequent processing through swap tests. tures, namely a vector of real numbers such that
P 2
j Tj = 1. We can assume [Tj ]j=0 is obtained
N −1
The BOE is therefore characterised by states of
the form by simulated temperatures through a translation
and re-scaling as in Eq. (4). We are able to pro-
E N
X −1 E duce the state
ψ̃D := Dj |ji φD
j , (13)
N
X −1
j=0
|ψT i := Tj |ji . (14)
j=0
where the second register is auxiliary, and con-
tains states entangled to theD main Eregister, with In order to evaluate the approximating polyno-
the orthonormal property φD D = δ . Sim-
i φj ij mial, we need the powers Tjk , in the form of the
ilar to the bidirectional encoding, the BOE re- state
quires classical data to be organized in a binary E N
X −1
(k)
ψT := ak Tjk |ji , (15)
tree structure (Fig. 13). The split level steers
j=0
the balance between circuit depth and width: for
s = 1, the width is O(N ) and the depth O(lg2 N ) for k = 1, . . . , K, where ak is the appropriate
(depth efficient), while for s = lg N the width is scale factor,
O(lg N ) and the depth O(N ) (memory efficient,  −1/2
akin to amplitude encoding). The technique is X
ak :=  Tj2k  . (16)
less favourable than the amplitude encoding in j
our context as Eq. (10) gets replaced by Eq. (60),
that has a quadratic dependence on the norms Note that ak is an increasing finite sequence,
instead of a linear one, implying a worse scaling when k grows. In particular, ak ≥ 1 for all k
with N . Refer to Subsection 2.5 for more infor- since a1 = 1.

7
For the calculation of the powers, we resort to n
the non-linear transformation known as Quantum |ψ0 ⟩ |ψ0 ⊙ ψ1 ⟩
Hadamard Product (QHP), recently introduced n
|ψ1 ⟩ |0⟩⟨0|
by Holmes et al. [10]. Given two states |ψ0 i and
|ψ1 i, their QHP is the state
N −1 Figure 4: Circuit implementation of the Quantum
X
|ψ0 ψ1 i := a hψ0 |ji hψ1 |ji |ji , (17) Hadamard Product producing the state |ψ0 ψ1 i when
⊗n
j=1
the bottom register is measured in the state |0i . The
success probability is a−2 , where a is the normalization
which, in the circuit of Fig. 4, is obtained in the constant appearing in Eq. (17).
first register as the result of a post-selection con-
ditioned on the second register being in |0i⊗n .
The probability to measure |0i in the second reg- 2.3 Computing inner products
ister, i.e. the success rate for the calculation of
The third quantum step of the algorithm (Fig. 3)
the QHP, equals a−2 . It is important to men-
is the calculation of the inner products
tion that the QHP was originally defined [10] as
the (not necessarily normalized) weighted state N
X −1
arising from the application of a rank-1 measure- yk := Ej Tjk , (19)
ment operator |0ih0|⊗n , i.e., as the output of the j=0
circuit in Fig. 4 without post-selection. Our sim-
pler formulation in Eq. (17) is, nevertheless, fully for k = 1, . . . , K.
equivalent and well suited for our purposes. Again, let us start assuming that we have two
The QHP can easily be iterated to compute real vectors mapped to quantum states of the
P (α)
the Hadamard product of multiple vectors, hence form |ψα i = j ψj |ji via amplitude encoding.
producing higher-order Epowers. In particular, Multiple quantum techniques for the computa-
states of the form ψT
(k)
as defined in Eq. (15) tion of the absolute inner product |hψ0 |ψ1 i| =
P (0) (1)
are obtained by loading k copies of |ψT i as input j ψj ψj := p are known. Specifically, in Ap-
states and calculating their QHPs: pendix A, we describe and compare the so-called
+ swap test and the ancilla-free method. The latter
E k
K
(k)
ψT := ψT . (18) is simply grounded on the observation that, given
the two loading unitaries UT and UE , the expres-
Figures 5 and 6 show two implementations for sion h0|UE† UT |0i calculates the inner product of
the QHP of four states. Note that the former re- the two vectors.
quires mid-circuit measurements, and has higher Now, the swap test provides an estimation of
1 1 2
depth but lower width than the latter. The for- 2 + 2 p , while the ancilla-free method directly
mer is also suited for an additional improvement, outputs p2 . As an effect, one obtains that with
that we call dynamic stopping: thanks to the fea- the ancilla-free method, the sampling complexity
ture of ‘dynamic circuits’ recently made available is independent of p, while with the swap test it
on commercial hardware [38,39], the execution of is unbounded for p → 0, as shown numerically in
the circuit can be aborted right after a mid-circuit Fig. 10. In other words, it is impossible to esti-
measurements if the measurement does not out- mate the number of required samples to achieve a
put a 0, as this corresponds to a failure in the given precision, without knowing (an estimation
QHP. This way, dynamic stopping allows reduc- of) the result itself, see Fig. 7. For this reason
ing the average circuit depth. we choose, for all results discussed in the main
In the main algorithm variant (c), highlighted text, the ancilla-free method in association with
in Fig. 2 and in Table 1, we want to exploit the amplitude encoding. Refer to the Appendix A
QAE and related techniques for improved perfor- for more information on both techniques, and in
mance, and therefore we need a unitary circuit, particular to Table 6 for an in-depth comparison.
thus forcing us to the adoption of the implemen- Assembling all the algorithm components
D E in-
(k)
tation without mid-resets. Variants that include troduced so far, the inner product ψT ψE can
mid-resets and dynamic stopping are discussed in be estimated, provided that the following holds
the Appendices A and D.

8
n
|ψ0 ⟩ |ψ0 ⊙ ψ1 ⊙ ψ2 ⊙ ψ3 ⟩

n
|ψ1 ⟩ |0⟩⟨0| |ψ2 ⟩ |0⟩⟨0| |ψ3 ⟩ |0⟩⟨0|

Figure 5: QHP of k = 4 vectors with mid-measurements and mid-resets, requiring k − 1 = 3 iterations.

n
|ψ0 ⟩ |ψ0 ⊙ ψ1 ⊙ ψ2 ⊙ ψ3 ⟩
n
|ψ1 ⟩ |0⟩⟨0|

n
|ψ2 ⟩ |0⟩⟨0|

n
|ψ3 ⟩ |0⟩⟨0|

Figure 6: QHP of k = 4 vectors with no mid-measurements and no mid-resets, requiring lg k = 2 iterations (note
that all measurements can be postponed to the end). This implementation is adopted in [18], without being named
QHP.

P
4 assume Tj > 0 and Ej > 0, and Tjk Ej is not
Sampling complexity (prefactor)

j
2
1−p
direct method f (p) = 4
null (and therefore positive).
1−p4
3 swap test f (p) = 4p2
and similarly for E, under hypothesis of avail-
ability of the binary trees for T̃ and Ẽ, so that
2 the inner product provides the expected result.
Details are given in Subsec. C.2.
1
2.4 Quantum Amplitude Estimation tech-
0 niques
0 0.2 0.4 0.6 0.8 1 The final building block for our main implemen-
p tation variant (c) is the Quantum Amplitude Es-
timation (QAE), that is typically described as
Figure 7: The dependence of the sampling complexity
the quantum alternative to classical Monte Carlo
on the target inner product p between two statevectors.
For the swap test, the complexity is not upper bounded methods, providing a quadratic speedup against
when p → 0. Refer to Prop. A.2 for the details. A the classical version (see e.g. [40]). In this case,
numerical evidence can be found in Fig. 10. we are rather interested in the fact that it also
provides a quadratic speedup in terms of preci-
sion when estimating the amplitude of a state,
Assumption 2.1. For all powers k = 0, . . . , K, against the straight-forward averaging of measur-
we assume that the inner product of the normal- ing repeated shots of a quantum circuit (refer
P
ized vectors j Tjk Ej is not null, and that its sign again to Ref. [40] for a synthetic explanation).
can be determined a priori. We often use the star notation in front of QAE
For simplicity, we rely on the following stronger (*QAE) to emphasize that any QAE technique is
version: applicable, such as Iterative QAE (IQAE) [20] or
Dynamic QAE [1], as long as it shares the same
Assumption 2.2. For all j = 0, . . . , N − 1, we substantial time-scaling with the usual QAE.

9
2.5 Assembling the main variant and adapting measure of the overall quantum circuit execution
the circuit for QAE cost under the simplification that all circuit layers
have similar duration.
We assume data is available in the amplitude en-
The overall assumption is that the polynomial
coding. The circuit is fed with k copies of the
of degree K represents the objective function suf-
temperature state |ψT i in Eq. (14), so that the
ficiently well on the relevant domain. Under this
power Eq. (15) of the temperature vector is cal-
condition, we can measure error scaling of each
culated trough the QHP. Moreover, we resort to
monomial individually, and combine in an overall
the QHP implementation without mid-circuit re-
estimate.
sets, as depicted in Fig. 6. Afterwards, under the
Additionally we can consider the √ 2-norms of
assumption that a loading unitary UE for |ψE i
the input vectors to scale as O( N ) when N
is known, the inner products yk can be calcu-
grows: for instance, this holds when all tempera-
lated with the ancilla-free method. The circuit
tures (prices) are independently sampled from the
described so far can be slightly modified as shown
same random variable T 0 (E 0 ), whatever its dis-
in Fig. 8(c), to be fed to a *QAE technique. Fi-
tribution, as shown in Subsection A.4 (see specif-
nally the volume v is reconstructed classically via
ically Remark A.13 and Example A.14).
Eq. (10).
If an efficient loading procedure in am-
It should be noted now that various monomials
plitude encoding is known, having classical
in Eq. (7) affect the overall error differently, as a
cost Cc,load (N ) N and quantum depth
T ρE bk (η) in Eq. (10). In
result of the factors ρ−k −1
Cd,load (N ) N 1/2 , then the proposed main vari-
order to achieve a relative error ≤ on the final
ant (c) has a quantum speedup against the clas-
result v, the absolute error of the monomial of
sical case when K = 2. If instead Cc,load (N ) =
index k must be controlled by
O(1) and Cd,load (N ) = O(1), for K = 3 the
k := ρTk−1 K −1 |bk (η)|−1 . (20) quantum complexity is comparable to the clas-
sical one, up to logarithmic factors.
The reason for such scaling is that the monomial
More in general, for any K, the time complex-
of power k decays as ρkT when N grows, meaning
ity of the main variant (c) is
that, when mapping back yk to yk0 , the absolute

error k associated to yk becomes 0k = ρ−kT k in O Cc,load (N ) + −1 [Cd,load (N ) + lg N ] N
K−1
2 ,
terms of yk0 . On the other side, the final result
v only scales as ρ−1
T due to bilinearity. The rel- where is the acceptable relative error threshold.
ative error, defined as the absolute error divided This cost compares with classical benchmarks of
by the target value, then scales as ρ1−k
T k . The O(N ), implying an advantage for K ≤ 2 and effi-
argument is formalized and discussed in Subsec- cient data loading. Table 1 (and the extended Ta-
tion A.4. Consequently, k is the threshold used ble 7) show that the other selected variants have
for the *QAE. higher costs. In particular, the analyzed QAE-
Theoretical details for the implementation out- free methods have the same asymptotic time com-
lined above are presented in the Appendix B, plexity of the classical techniques when N grows,
building on the version without *QAE introduced if K = 2 and data loading is performed in O(1)
in Appendix A (specifically in Subsec. A.2). depth.
As a final remark on time, consider what hap-
3 Quantum complexity analysis pens if the same temperature series and the same
price series are used for multiple contracts, that
In this Section we discuss the space and time com- differ for the definition of the volume function f
plexity of the algorithm. (for instance, constants A, B, C, D in Eq. (1)
The time complexity is summarized by the are varied). Then the polynomial approximation
sum of the classical runtime and quantum run- approach, both quantumly or classically imple-
time. The latter is defined in turn as the sum mented, is advantageous as yk in Eq. (10) can be
on k = 0, ..., K of the circuit depths (number computed just once for the different versions of
of layers in the quantum circuit) multiplied by f . Only the polynomial coefficients bk (η) and the
the respective sampling complexities (number cir- sum in Eq. (10) need to be recomputed for each
cuit executions). This product is an approximate f . Specifically for the quantum case, the circuits

10
R=0 R=0
n
|0⟩
n
UT UE†
|0⟩ UT UE†

Z1 = 0 Z1 = 0
n
|0⟩ UT |0⟩⟨0| n
|0⟩ UT
(a) (b)

n
|0⟩ UT UE†

n
|0⟩ UT

X=1
|0⟩

Figure 8: An exemplary demonstration of the algorithm based on QHP and the ancilla-free method, for the calculation
of the inner product between temperatures to the power k and prices, when k = 2. UT and UE are the loading
unitaries for T and E respectively. In (a), the algorithm in its original formulation, where the gray box highlights the
QHP. In (b), the version without mid-measurements. All measurements are deferred to the end. In (c), through a
multi-controlled NOT, a single qubit needs measurement. This suggests the definition of unitaries A and R, marked
in green, that can be fed to a QAE algorithm for efficient estimation, as described in Prop. B.2.

are executed once, independently of how many f totically advantageous in N only for K = 2.
functions are being evaluated. Note that asymptotically in N implies infinitely
The space complexity is represented by the cir- in the future temperature and weather forecast
cuit width Cw , namely the total number of qubits. data, which clearly does not occur in any real
For our main variant (c), it is O(k lg N ). application as described here. Polynomial coef-
The derivation of the theoretical bounds and ficients are [bk ]k = [17976, −360, −7.17, 0.0072]
their validation on simulators is contained in the if we consider the Taylor expansion in 0, or
Appendices, that consider one variant at a time. [bk ]k = [17957, −393, −6.50, 0.225] if we consider
The detailed comparison of the variants, under the best-fit polynomial. In both cases, given bk
multiple complexity metrics, can be found in Ap- are fast decreasing, the only relevant terms for
pendix D and specifically in Table 7. with a 1% error threshold, are b0 and b1 , accord-
ing to Eq. (20). Let us emphasize again that this
is due to small N , while for increasing N the term
4 Experimental demonstration ρKT always becomes dominant, independently of
We focus our experiments on an instance as small the coefficients.
as N = 4. We take realistic temperatures sim- To provide a more insightful example, we run
ulated for a weather station, and the volume the algorithm with k = 0.04 for all k, violating
function in Eq. (1) with parameters T0 = 40, the prescription in Eq. (20). We apply IQAE with
A = 20000, B = −35, C = 3, D = 6000. These 100 shots per iteration. Circuits are executed on
constants specify the shape of the sigmoid func- the IBM Jakarta device, having 7 qubits, a Quan-
tion and are generated by fitting the curve to tum Volume [41] of 16, and 2.4K CLOPS [42].
historic weather temperatures and gas volumes. Data loading in amplitude encoding is per-
Since all generated temperatures are positive, we formed through usual non-efficient methods,
take η = 0. For energy prices, we use a four- given the unstructured nature of input data. For
dimensional random vector. circuit optimization and error reduction we use
For demonstration purposes, we consider here three techniques, namely: ‘Mapomatic’ for find-
K = 3, even though our algorithm is asymp- ing the best qubit-layout identifying the low noise

11
subgraph [43], ‘dynamic decoupling’ (using single vectors through QHP. We introduced a variant
X-gate configuration) for mitigating decoherence of QHP with dynamic stopping thereby leverag-
in the ideal qubits [44] and ‘M3 error mitigation’ ing dynamic quantum circuits, achieving an im-
for measurement error mitigation [45]. provement in the leading constants of time scal-
Tables 3 and 4 show the outputs. As expected, ing, compared to the naive implementation. This
the error is very high for k = 2 and 3, but con- variant though is incompatible with QAE and is
tributes little to the overall result given the low therefore excluded from the core algorithm.
associated coefficients bk (η). The error level for We merged these building blocks and adapted
large k is not only a consequence of the choice of them to apply *QAE, thus getting an asymptotic
k , but mostly an effect of noise: indeed, some it- improvement against the classical performance
erations have a depth as high as 596 or 652, with under the assumption of efficient data loading,
more than 400 CNOT gates, as shown in Table 5. for degree K = 2 of the approximating polyno-
mial. The optimal implementation was selected
after evaluating multiple variants, collected and
5 Conclusion discussed in detail in accompanying Appendices.
We proposed an approach for the calculation of We provided a rigorous analysis and evaluation
P
the inner product j f (Tj0 )Ej0 where [Tj0 ]j and of four variants of the algorithm, differing with
[Ej0 ]j are two input vectors, and f is a function respect to data loading, inner product computa-
well approximated by a polynomial p. The ap- tions, and sequential algorithmic steps. A poten-
proach allows to break the workload into multiple tial area of future investigation is the scope of
parts, where the bottleneck becomes the calcula- improvement if one adopts the Szegedy walk [46]
P
tion of the inner products j Tjk Ej , where [Tj ]j to create the input.
and [Ej ]j are suitably normalized vectors. Con- Experiments on real quantum hardware were
sequently, we explored the application of quan- conducted for the main variant, and showed er-
tum computing to accelerate the summations rors in line with theoretical expectations for small
P k problem instances. The effect of noise becomes
j Tj Ej for all relevant k.
By applying additional QHPs, the method ex- relevant for high power orders of polynomial ex-
tends to the calculation of values resulting from pansion, and it is known from the theory that er-
P ror gets amplified when dealing with lengthier in-
multilinear functions j Xj,1 · · · Xj,v , where Xj,i
are the inputs, thus generalizing the bilinearity in put vectors. Extensive experiments on quantum
our methodology. It also adapts to the case where simulators are included in the Appendices and
Xj,i result from the elementwise application of validate the overall theoretical framework across
functions fi , as long as the fi can be polynomi- the different variants.
ally approximated by pi for all i. Said extensions,
though, do not preserve the asymptotic analysis Acknowledgements
expressed in the manuscript.
To encode the input vectors, we adopted the G.A., K.Y., and O.S. acknowledge Travis L.
amplitude encoding technique, after having eval- Scholten, Raja Hebbar, and Morgan Delk for
uated the novel Bidirectional Orthogonal Encod- helping with the business case analysis; Fran-
ing (BOE). The latter was proposed here to over- cois Varchon, Winona Murphy, and Matthew Sty-
come the limitations of the bidirectional encoding pulkoski from the IBM Quantum Support team to
for our purposes, and specifically for the calcula- help executing the experiments; Kristan Temme,
tion of inner products through the swap test. Daniel Egger, and Stefan Woerner for their feed-
We compared the ancilla-free method against backs on the manuscript; Jay Gambetta, Thomas
the swap test approach for the calculation of inner Alexander and Sarah Sheldon for allocating com-
products, discussing their asymptotic width and pute time on advanced hardware; Maria Cristina
time complexity as functions of the data set size Ferri, Jeannette M. Garcia, Gianmarco Quarti
and the error, and highlighting that the ancilla- Trevano, Katie Pizzolato, Jae-Eun Park, Heather
free method is preferable. Higgins, and Saif Rayyan for their support in
We proposed to calculate powers of the input cross-team collaborations.

12
Symbol Meaning
j = 0, ..., N − 1 Index over time
[Tj0 ], [Ej0 ] Temperature and price series
[Tj ], [Ej ] Normalized temperature and
price series, see Eqs. (5), (6)
[T̃j ], [Ẽj ] Normalized square-rooted
temperature and price series,
see Eq. (53)
ρT , ρE , ρ̃T , ρ̃E Normalization factors, see
Eqs. (5), (6), (54)
η Translation term for
temperatures, see Eq. (4)
f (T 0 ) Volume function, see Eq. (1)
v Contract value, see Eq. (3)
v∗ Polynomial approximate
contract value, see Eq. (10)
k = 0, ..., K Index over monomials in the
approximating polynomial, see
Eq. (7)
bk (η) Coefficients in the
approximating polynomial, see
Eq.(7)
yk Normalized inner product
defined in Eq. (9)
yk0 Inner product defined in
E Eq. (11)
(k)
|ψT i , ψT Amplitude encoding for [Tj ],
E [Tjk ], see Eqs. (14) and (15)
(k)
ψ̃T̃ , ψ̃T̃ BOE for [Tj ], [Tjk ], see Eqs. (55)
and (56)
ak , ãk Normalization factor for [Tjk ]
and [T̃jk ], see Eqs. (16) and (57)
Quantum Hadamard Product,
see Eq. (17)
s Split level in the BOE
S Number of samples
C· Complexity measures, see
Subsec. D.3
, α Acceptable error threshold and
associated confidence level for
estimators
lg Base-2 logarithm
k·k 2-norm of a vector (Euclidean
norm)

Table 2: Summary of notations. Primed letters are used for the non-normalized version of variables, while tildes
indicate normalization according to the sqare.

13
Overall Relative error for yk0 Error for the contribution of yk0
relative
error k=0 k=1 k=2 k=3 k=0 k=1 k=2 k=3
1.36% 1.71% 0.05% 16.52% 89.86% 2.23% 0.01% 0.83% 0.05%
1.02% 1.71% 0.37% 21.25% 76.94% 2.23% 0.09% 1.07% 0.04%
4.97% 1.67% 0.34% 57.79% 82.79% 2.18% 0.09% 2.92% 0.04%
4.76% 2.94% 0.07% 17.05% 78.73% 3.84% 0.02% 0.86% 0.04%
0.81% 0.20% 0.05% 20.57% 90.48% 0.26% 0.01% 1.04% 0.05%
2.58% 1.65% 0.18% 26.64% 83.76% 2.15% 0.04% 1.35% 0.04%
(0.0209) (0.0097) (0.0016) (0.1754) (0.0622) (0.0126) (0.0004) (0.0080) (0.0000)

Table 3: Errors arising in the estimation, in five independent algorithm runs of the algorithm with N = 4 and K = 3.
The algorithm is modified to set k = 0.04 as described in Sec. 4. The first column represents the overall relative
∗ Y 0 −y 0
error V v−v
∗ . The second group of columns contains the relative error for the estimators of yk0 , namely ky0 k .
k
The last group contains the same errors as the second, but rescaled according to the contributions they give to the
b (η) (Y −y )
0 0
overall estimation, namely k v∗k k . On the bottom two rows, the average and standard deviation of the values
above. In the table, the coefficients bk (η) are those for the Taylor expansion centered in 0, see Sec. 4.

Overall Relative error for yk0 Error for the contribution of yk0
relative
error k=0 k=1 k=2 k=3 k=0 k=1 k=2 k=3
0.03% 1.71% 0.05% 16.52% 89.86% 2.23% 0.01% 0.76% 1.46%
0.10% 1.71% 0.37% 21.25% 76.94% 2.23% 0.10% 0.98% 1.25%
3.41% 1.67% 0.34% 57.79% 82.79% 2.19% 0.10% 2.66% 1.35%
5.93% 2.94% 0.07% 17.05% 78.73% 3.85% 0.02% 0.78% 1.28%
2.14% 0.20% 0.05% 20.57% 90.48% 0.27% 0.01% 0.95% 1.47%
2.32% 1.65% 0.18% 26.64% 83.76% 2.16% 0.05% 1.23% 1.36%
(0.0247) (0.0097) (0.0016) (0.1754) (0.0622) (0.0127) (0.0004) (0.0080) (0.0010)

Table 4: Same as Table 3. Here, the coefficients bk (η) are those for the best-fit polynomial, see Sec. 4. Estimation
is based on the same shots as in the previous Table, only the post-processing is modified to account for the modified
coefficients.

k Grover Width Depth RZ count SX count X count CNOT Meas

power count count
0 20 2 11 9 5 1 2 2
0 20 2 11 9 5 1 2 2
0 22 2 118 73 37 6 52 2
1 20 2 12 8 6 1 2 2
1 20 2 12 8 6 1 2 2
1 22 2 14 9 8 0 2 2
2 20 4 22 16 18 2 12 4
2 21 4 167 101 73 7 112 4
2 24 4 596 364 243 27 416 4
3 20 6 39 30 29 3 23 6
3 21 6 652 365 162 23 470 6

Table 5: Collection of the IQAE iterations needed for the estimation underlying Tables 3 and 4. Each row represents
an iteration, and contains the metrics of the circuit transpiled for IBM Jakarta. For high k, some circuits have depths
that lie beyond the possibilities of current hardware, even under error mitigation.

14
References [10] Zoë Holmes, Nolan Coble, Andrew T. Sorn-
borger, and Yiğit Subaşı. “On nonlin-
[1] Kumar Ghosh, Corey O’Meara, Kavitha ear transformations in quantum computa-
Yogaraj, Gabriele Agliardi, Omar She- tion”. arXiv:2112.12307 [quant-ph] (2021).
hab, Piergiacomo Sabino, Giorgio Cortiana, arXiv:2112.12307.
Marina Fernández-Campoamor, and Juan [11] Paweł Horodecki. “From limits of quan-
Bernabé-Moreno. “Energy contract portfolio tum operations to multicopy entanglement
risk analysis using quantum amplitude esti- witnesses and state-spectrum estimation”.
mation”. unpublished (2023). Physical Review A 68, 052101 (2003).
[2] Nikitas Stamatopoulos, Daniel J. Egger, Yue [12] Maria Schuld, Ilya Sinayskiy, and Francesco
Sun, Christa Zoufal, Raban Iten, Ning Shen, Petruccione. “The quest for a quantum neu-
and Stefan Woerner. “Option Pricing us- ral network”. Quantum Information Process-
ing Quantum Computers”. Quantum 4, ing 13, 2567–2586 (2014).
291 (2020). [13] Iris Cong, Soonwon Choi, and Mikhail D.
[3] Stefan Woerner and Daniel J. Egger. “Quan- Lukin. “Quantum convolutional neural
tum Risk Analysis”. npj Quantum Informa- networks”. Nature Physics 15, 1273–
tion 5, 15 (2019). 1278 (2019).
[14] Kerstin Beer, Dmytro Bondarenko, Terry
[4] Dylan Herman, Cody Googin, Xiaoyuan Liu, Farrelly, Tobias J. Osborne, Robert Salz-
Alexey Galda, Ilya Safro, Yue Sun, Marco mann, Daniel Scheiermann, and Ramona
Pistoia, and Yuri Alexeev. “A Survey of Wolf. “Training deep quantum neural
Quantum Computing for Finance” (2022). networks”. Nature Communications 11,
arXiv:2201.02773. 808 (2020).
[5] Gilles Brassard, Peter Høyer, Michele [15] Sarah K. Leyton and Tobias J. Osborne. “A
Mosca, and Alain Tapp. “Quantum am- quantum algorithm to solve nonlinear dif-
plitude amplification and estimation”. In ferential equations” (2008) arXiv:0812.4423
Samuel J. Lomonaco and Howard E. Brandt, [quant-ph].
editors, Contemporary Mathematics. Vol- [16] Todd A. Bruni. “Measurimg polynomial
ume 305, pages 53–74. American Mathemat- functions of states”. Quantum Info. Comput.
ical Society (2002). 4, 401–408 (2004).
[6] Patrick Rall and Bryce Fuller. “Am- [17] Marco Maronese, Claudio Destri, and En-
plitude Estimation from Quantum Sig- rico Prati. “Quantum activation functions
nal Processing” (2022) arXiv:2207.08628. for quantum neural networks”. Quantum In-
arXiv:2207.08628 [quant-ph]. formation Processing 21, 128 (2022).

[7] Adam Bouland, Wim van Dam, Hamed [18] Michael Lubasch, Jaewoo Joo, Pierre
Joorati, Iordanis Kerenidis, and Anupam Moinier, Martin Kiffner, and Dieter Jaksch.
Prakash. “Prospects and challenges of “Variational quantum algorithms for non-
quantum finance” (2020) arXiv:2011.06492 linear problems”. Physical Review A 101,
[quant-ph, q-fin]. 010301 (2020).
[19] Israel F. Araujo, Daniel K. Park, Teresa B.
[8] Casey Berger, Agustin Di Paolo, Tracey Ludermir, Wilson R. Oliveira, Francesco
Forrest, Stuart Hadfield, Nicolas Sawaya, Petruccione, and Adenilton J. da Silva.
Michał Stęchły, and Karl Thibault. “Quan- “Configurable sublinear circuits for quantum
tum technologies for climate change: Prelim- state preparation”. arXiv:2108.10182 [quant-
inary assessment” (2021) arXiv:2107.05362 ph] (2022). arXiv:2108.10182.
[quant-ph].
[20] Dmitry Grinko, Julien Gacon, Christa Zo-
[9] Daniel R. Terno. “Nonlinear operations in ufal, and Stefan Woerner. “Iterative quan-
quantum-information theory”. Physical Re- tum amplitude estimation”. npj Quantum
view A 59, 3320–3324 (1999). Information 7, 52 (2021). arXiv:1912.05559.

15
[21] F.E. Benth and J. Saltyte-Benth. “Stochastic distributions”. npj Quantum Information 5,
modelling of temperature variations with a 1–9 (2019).
view towards weather derivatives”. Applied
[34] Gabriele Agliardi and Enrico Prati. “Op-
Mathematical Finance 12, 53–85 (2005).
timal Tuning of Quantum Generative Ad-
[22] F.E. Benth and J. Saltyte-Benth. “The versarial Networks for Multivariate Distri-
volatility of temperature and pricing of bution Loading”. Quantum Reports 4, 75–
weather derivatives”. Quantitative Finance 105 (2022).
7, 553–561 (2007).
[35] Lov Grover and Terry Rudolph. “Creating
[23] L. Cucu, R. Döttling, P. Heider, and superpositions that correspond to efficiently
S. Maina. “Managing temperature-driven integrable probability distributions” (2002).
volume risks”. Journal of energy markets 9, arXiv:quant-ph/0208112.
95–110 (2016).
[36] Vittorio Giovannetti, Seth Lloyd, and
[24] P. Sabino and N. Cufaro Petroni. “Fast Lorenzo Maccone. “Architectures for a quan-
Pricing of Energy Derivatives with Mean- tum random access memory”. Physical Re-
Reverting Jump-diffusion Processes”. Ap- view A 78, 052310 (2008).
plied Mathematical Finance 0, 1–22 (2021).
[37] Israel F Araujo, Daniel K Park, Francesco
[25] Manuela Weigold, Johanna Barzen, Frank
Petruccione, and Adenilton J da Silva. “A
Leymann, and Marie Salm. “Encoding pat-
divide-and-conquer algorithm for quantum
terns for quantum algorithms”. IET Quan-
state preparation”. Scientific Reports 11, 1–
tum Communication 2, 141–152 (2021).
12 (2021).
[26] Adriano Barenco, Charles H Bennett,
[38] “Quantum circuits get a dynamic up-
Richard Cleve, David P DiVincenzo, Nor-
grade with the help of concurrent
man Margolus, Peter Shor, Tycho Sleator,
classical computation”. url: https:
John A Smolin, and Harald Weinfurter. “El-
//www.ibm.com/blogs/research/2021/
ementary gates for quantum computation”.
02/quantum-phase-estimation/ (2021).
Physical review A 52, 3457 (1995).
[27] P. Kumar. “Direct implementation of an [39] A. D. Córcoles, Maika Takita, Ken Inoue,
n-qubit controlled-unitary gate in a single Scott Lekuch, Zlatko K. Minev, Jerry M.
step”. Quantum information processing 12, Chow, and Jay M. Gambetta. “Exploiting
1201–1223 (2013). dynamic quantum circuits in a quantum al-
gorithm with superconducting qubits”. Phys.
[28] J. A. Cortese and T. M. Braje. “Loading clas- Rev. Lett. 127, 100501 (2021).
sical data into a quantum computer” (2018).
arXiv:1803.01958. [40] Patrick Rebentrost, Brajesh Gupt, and
Thomas R. Bromley. “Quantum computa-
[29] M. Plesch and Časlav Brukner. “Quantum-
tional finance: Monte carlo pricing of fi-
state preparation with universal gate decom-
nancial derivatives”. Physical Review A 98,
positions”. Phys. Rev. A 83, 032302 (2011).
022321 (2018). arXiv:1805.00109.
[30] J. A. Miszczak. “Singular value decomposi-
[41] Andrew W. Cross, Lev S. Bishop, Sarah
tion and matrix reorderings in quantum in-
Sheldon, Paul D. Nation, and Jay M. Gam-
formation theory”. International Journal of
betta. “Validating quantum computers using
Modern Physics C 22, 897–918 (2011).
randomized model circuits”. Physical Review
[31] T. Heinosaari and M. Ziman. “Guide to A 100, 032328 (2019).
mathematical concepts of quantum theory”.
AcPSl 58, 487–674 (2008). [42] Andrew Wack, Hanhee Paik, Ali Javadi-
Abhari, Petar Jurcevic, Ismael Faro, Jay M.
[32] I. Bengtsson and K. Zyczkowski. “Geometry Gambetta, and Blake R. Johnson. “Qual-
of quantum states” (2006). ity, speed, and scale: three key attributes to
[33] Christa Zoufal, Aurélien Lucchi, and Stefan measure the performance of near-term quan-
Woerner. “Quantum generative adversarial tum computers” (2021) arXiv:2110.14108
networks for learning and loading random [quant-ph].

16
[43] Matthew Treinish et al. “mapomatic: Auto- [49] Harry Buhrman, Richard Cleve, John Wa-
matic mapping of compiled circuits to low- trous, and Ronald de Wolf. “Quantum fin-
noise sub-graphs”. url: https://github. gerprinting”. Physical Review Letters 87,
com/Qiskit-Partners/mapomatic (2022). 167902 (2001).
[44] Qiskit software developers. “Dynamical de- [50] Maria Schuld, Ilya Sinayskiy, and Francesco
coupling insertion pass”. url: https: Petruccione. “An introduction to quantum
//qiskit.org/documentation/ machine learning”. Contemporary Physics
stubs/qiskit.transpiler.passes. 56, 172–185 (2015).
DynamicalDecoupling.html (2022).
[51] Kouhei Nakaji. “Faster amplitude es-
[45] Paul D. Nation, Hwajung Kang, Neereja timation”. Quantum Information and
Sundaresan, and Jay M. Gambetta. “Scal- Computation 20, 1109–1123 (2020).
able mitigation of measurement errors on arXiv:2003.02417.
quantum computers”. PRX Quantum 2,
040326 (2021). [52] Ashley Montanaro. “Quantum speedup
of monte carlo methods”. Proceedings
[46] M. Szegedy. “Quantum Speed-Up of Markov
of the Royal Society A: Mathematical,
Chain Based Algorithms”. In 45th An-
Physical and Engineering Sciences 471,
nual IEEE Symposium on Foundations of
20150301 (2015).
Computer Science. Pages 32–41. Rome,
Italy (2004). IEEE. [53] Dmitri Maslov. “Advantages of using
[47] M. Fanizza, M. Rosati, M. Skotiniotis, relative-phase Toffoli gates with an applica-
J. Calsamiglia, and V. Giovannetti. “Beyond tion to multiple control Toffoli optimization”.
the swap test: Optimal estimation of quan- Physical Review A 93, 022311 (2016).
tum state overlap”. Physical Review Letters [54] Ewin Tang. “A quantum-inspired clas-
124, 060503 (2020). sical algorithm for recommendation sys-
[48] Vanio Markov, Charlee Stefanski, Abhijit tems”. In Proceedings of the 51st An-
Rao, and Constantin Gonciulea. “A gen- nual ACM SIGACT Symposium on Theory
eralized quantum inner product and ap- of Computing. Pages 217–228. (2019).
plications to financial engineering” (2022) arXiv:1807.04271.
arXiv:2201.09845.

17
A Alternate ways to compute the inner product in amplitude encoding
Consider two vectors |ψ0 i and |ψ1 i, and suppose one wants to calculate their inner product p. Multiple
quantum techniques for the computation of their inner product are known [47]. In this Appendix we
focus on two, namely the swap test and the ancilla-free method. We show in Appendix B how these
methods can be further enhanced recurring to Quantum Amplitude Estimation techniques as suggested
in Ref. [48].

A.1 The swap test and the ancilla-free method: definition and sampling complexity
The swap-test [49] is depicted in Fig. 9. Being characterized by low gate depth, it is widely used in
near-term applications including quantum machine learning [50].
To discuss its convergence, we need a Lemma:

Lemma A.1. pLet X̄S be the mean of S i.i.d. random variables with mean µ > 0 and variance σ 2 ,
√ YS := max{aXS + b, 0}, for some real constants a, b with a 6= 0 and aµ + b > 0. Then
and let
YS − aµ+b
√ is asymptotically a standard normal random variable when S → ∞. Therefore, the
|a|σ/ 4S(aµ+b)
√
error is controlled by P YS − aµ + b < = α once S is chosen as
2
a2 σ 2 1+α
S= Φ−1 (21)
4(aµ + b)2 2

asymptotically when → 0, where Φ is the CDF of the standard normal distribution.

Proof. By the Central Limit Theorem, for any real β,

!
X̄S − µ
P √ <β → Φ−1 (β)
σ/ S

when S → ∞. Therefore, using the symmetry of Φ if a < 0,

!
aX̄S + aµ
P √ <β → Φ−1 (β)
|a|σ/ S

and !
(aX̄S + b) − (aµ + b)
P √ <β → Φ−1 (β).
|a|σ/ S
Then also √ !
(aX̄S + b) − (aµ + b) 2 |a|σ/ S
P √ <β+β → Φ−1 (β)
|a|σ/ S 4(aµ + b)
by continuity of Φ−1 , since the additional term is defined for aµ + b > 0 and tends to 0. Rearranging
the inequality: √ !
aX̄S + b aµ + b 2 |a|σ/ S
P √ < √ +β+β → Φ−1 (β)
|a|σ/ S |a|σ/ S 4(aµ + b)
namely s
! s √
aX̄S + b aµ + b |a|σ/ S
P √ < c2 →Φ −1
(β) where c := √ +β .
|a|σ/ S |a|σ/ S 4(aµ + b)
At the same time, the probability can be decomposed as
! !
aX̄S + b aX̄S + b
P √ < c2 | X̄S ≥ 0 P X̄S ≥ 0 + P √ < c2 | X̄S < 0 P X̄S < 0
|a|σ/ S |a|σ/ S

18
n
|ψ0 ⟩
n
|ψ1 ⟩

|0⟩ H H

1 1 2
Figure 9: The swap test. The probability to measure |0i in the result qubit is 2 + 2 |hψ0 |ψ1 i| .

where the second term tends to 0 for the strong law of big numbers, since µ > 0. On the first term,
aX̄S + b equals Y 2 , so that we obtained
!
YS2
P √ < c2 → Φ−1 (β)
|a|σ/ S

Taking the square roots:

 s s √ 
YS aµ + b |a|σ/ S 
P q √ < √ +β → Φ−1 (β)
|a|σ/ S |a|σ/ S 4(aµ + b)

which rewrites √ !
YS − aµ + b
P p < β → Φ−1 (β).
|a|σ/ 4S(aµ + b)
√
This proves the asymptotic standard normality. Therefore P YS − aµ + b < = α is equivalent
to
1+α
p = Φ−1
|a|σ/ 4S(aµ + b) 2
asymptotically, where the last equation turns to be (21).

Proposition A.2 (Sampling complexity of the swap test in amplitude encoding). Let α ∈ (0, 1).
Let Xi , for i = 1, ..., S, be a r.v. representing the output of the swap-test measurement after the
i-th shot
q of circuit in Fig. 9. Call X̄S the mean r.v. resulting from the S independent shots. Then
YS := max{0, 1 − 2X̄S } is an estimator for p = hψ0 |ψ1 i. Assuming p > 0, the error is controlled by

P (|YS − p| < ) = α, (22)

once S is chosen as 2

1 − p4 1+α
S= Φ−1 (23)
42 p2 2
asymptotically when → 0, where Φ is the CDF of the standard normal distribution.

Proof. The probability to measure |0i in the swap test qubit is 21 + 12 p2 . Therefore, Xi are independent
Bernoulli with mean µ = 12 − 12 p2 and variance σ 2 = µ(1 − µ) = 14 − 14 p4 . Then Apply Lemma A.1.

Remark A.3. Since the presence of p in the denominator of Eq. (23) may come unexpected, let us
shortly comment: it derives from the fact that our estimator is bound to the mean r.v. through a square
−µ YS2 −p2
root. Indeed, X̄S√
σ/ S
being asymptotically standard normal is equivalent to √ being asymptotically
2σ/ S
standard normal. Now, we can write YS2 − p2 = (YS − p)(YS + p) and informally observe that YS + p →
2p. So, informally, 2p(Y −p)
S√
2σ/ S
= σ/(p
YS −p
√
S)
is asymptotically standard normal, and p appears at the
denominator in the estimator variance.

19
Remark A.4. The previous Proposition gives a sufficient condition for y 6= 0; we also have the
following sufficient condition for yk = 0 (and more in general for |yk | < ):
" # 2
1 1 1+α
S≥ 2 − yk4 Φ −1
2 − yk2 ak ρ4k
4
T 2

via CLT applied to YS2 − yk2 . Notice that the denominator scales as 4 instead of 2 in this case.

We call ancilla-free method another, even simpler way [48] to calculate the inner product of two
statevectors |ψ0 i and |ψ1 i. Its application is possible once an (efficient) unitary for loading at least one
of the two states, say |ψ1 i, is known. Namely, an operator U1 is given such that U1 |0i = |ψ1 i. Indeed:

h0|U1† |ψ0 i = hψ1 |ψ0 i = p, (24)

so that it is sufficient to build the U1† |ψ0 i circuit, and project on |0i.

Proposition A.5 (Sampling complexity of the ancilla-free method). Let α ∈ (0, 1). Suppose the
ancilla-free method for the calculation of the inner product is implemented, the register is measured,
and the execution is repeated S times. Let Ri ∈ {0, ..., N − 1} be the measurement output for the i-th
shot, for i = 1, ..., S, and let Xi be a r.v. valued 1 if Ri = 0, and
q valued 0 otherwise. Call X̄S the
mean r.v. resulting from the S independent shots. Then YS := X̄S is an estimator for p, and the
error is controlled by
P (|YS − p| < ) = α, (25)
once S is chosen as 2
1 − p2 1+α
S= Φ−1 (26)
42 2
asymptotically when → 0, where Φ is the CDF of the standard normal distribution.

Proof. By Eq. (24), Xi are independent Bernoulli with mean µ = p2 and variance σ 2 = µ(1 − µ) =
p2 (1 − p2 ). The proof is an application of Lemma A.1.

Remark A.6. Lemma A.1 and therefore the proof of Prop. A.2 leverage the fact that YS is definitely
positive. Nonetheless we comment in Fig. 10 that when p is small (p = 0.072), even with S as high
as 10, 000, we empirically get the left-hand side to be negative with a probability of 38%. This implies
that the estimator YS is remarkably biased for the swap test. Fortunately this is not the case of the
ancilla-free method since YS is always positive.

The key differences between the ancilla-free method and the swap test are summarized in Table 6.
The sampling complexity is plotted in Fig. 7 as a function of the inner product p. It is easy to verify
analytically that the sampling complexity of the swap test is unbounded for small p, what makes it
impossible to choose the number of shots a priori. Fig. 10 empirically shows the different behavior of
the two methods when p is large (left plot) rather than small (right plot).

A.2 Using the inner product after the Quantum Hadamard Product
Two aspects must be taken into account to exploit the inner product techniques expressed so far into
our algorithm based on the polynomial expansion: on one hand, the effect of the rescaling factors ρT
and ρE on the sampling complexity, and on the other one, the consequences of the success rate a−2
k of
the QHPs.

Proposition A.7 (Algorithm QHP + swap testEin amplitude encoding). Let k be a fixed power order.
(k)
Consider a circuit that produces the state ψT defined in Eq. (15) through QHPs (with or without
E
(k)
mid-measurements), then loads |ψE i and applies the swap test between ψT and |ψE i, as depicted

20
Ancilla-free method Swap test
lg N 1
Number of measured qubits
[Higher] [Lower]
2 2
1 − p2 −1 1 + α 1 − p4 −1 1 + α
Sampling complexity for absolute Φ Φ
42 2 42 p2 2
error and confidence α
[Bounded in p] [Unbounded for p → 0]

Need for the loading unitary U1 Yes No

U1† is applied on the same register |ψ0 i and |ψ1 i are loaded in
where |ψ0 i is loaded parallel, and the test takes depth
Depth
O(lg N )
[Higher] [Lower]

Table 6: Comparison of the ancilla-free method and the swap test for the calculation of the inner product p between
two statevectors |ψ0 i and |ψ1 i.

direct method direct method

swap test 0.15 swap test
0.78
true inner product p swap test ≤ 0
true inner product p
Estimated value

Estimated value

0.10
0.77

0.05
0.76

0.00

0 20 40 60 80 100 0 20 40 60 80 100
Estimation experiment Estimation experiment
(a) (b)

Figure 10: Each figure shows 100 independent estimations of the inner product between the same two statevectors.
All estimations are obtained through 10,000 shots of the circuits on a noiseless simulator, so that the different
outcomes are only an effect of random sampling. (a) The target inner product is big (p = 0.767). In this case,
the variance is 1.01 · 10−5 for the ancilla-free method, and 2.77 · 10−5 for the swap test. (b) The behavior of the
swap test worsens when p is small (in this case, p = 0.072). The variance is 2.60 · 10−5 for the ancilla-free method
and 3.20 · 10−3 for the swap test, which is significantly higher. Notice the presence of runs that provide a negative
squared inner product, in 38 cases out of 100.

21
n
|0⟩ UT
Z1 = 0
n
|0⟩ UT |0⟩⟨0|
n
|0⟩ UE
X=1
|0⟩ H H
(a)

n
|0⟩ UT
Z1 = 0
n
|0⟩ UT
n
|0⟩ UE
X=1
|0⟩ H H
(b)
Figure 11: An exemplary demonstration of the algorithm based on QHP and the swap test, for the calculation of the
inner product between temperatures to the power k and prices, when k = 2. UT and UE are the loading unitaries
for T and E respectively. In (a), the algorithm in its original formulation, where the gray box highlights the QHP,
and the blue box the swap test. In (b), the version without mid-measurements. All measurements are deferred to
the end.

in Fig. 11. Call X ∈ {0, 1} the output of the measurement of the control qubit in the swap test, and
Z ∈ {0, ..., N − 1}k−1 the outputs of all the k − 1 measurements in the QHPs. Define Xi ∼ X and
Zi ∼ Z, for i = 1, ..., S, as the outcomes of S independent samples from the circuit. Let
s
2#{i : Xi = 0, Zi = 0} − #{i : Zi = 0}
YS := ; YS0 := ρ−k −1
T ρE YS .
S
Then
P P
1. E[YS ] → j Ej Tjk =: yk when S → ∞ and E[YS0 ] → j Ej0 (Tj0 − η)k =: yk0 when S → ∞;

2. assuming yk 6= 0, the absolute error for YS is controlled by P (|YS − yk | < ) ≤ α once S is chosen
as ( ) 2
2 2 1 − a4k yk4 1 −1 3 + α
S ≥ max 4yk (ak − 1), Φ (27)
a2k yk2 2 4
asymptotically when → 0, where Φ is the CDF of the standard normal distribution;

3. assuming again yk 6= 0, P (|YS − yk | < ) ≤ α is also guaranteed by the stronger condition

o 1 2
−1 3 + α
n
S ≥ max 4, a−2 −2
k yk Φ (28)
2 4
asymptotically when → 0;

4. assuming yk0 6= 0, any of the conditions in Eqs. (27) or (28)

is also sufficient to control the error
of the originally scaled problem in the following sense: P |YS0 − yk0 | < ρ−k −1
T ρE ≤ α.

Proof. X is a Bernoulli r.v. and, by the swap test theory,

 2
1 1 D (k)
E 2 1 1 NX−1
P(X = 0|Z = 0) = + ψE ψT = + a2k  Ej Tjk  (29)
2 2 2 2 j=0

22
On the other hand,
P(X = 0, Z = 0)
P(X = 0|Z = 0) = . (30)
P(Z = 0)
Recalling that
P(Z = 0) = a−2
k , (31)
by Eqs.(29) and (30) we derive
 2
1 1 NX
−1
P(X = 0, Z = 0) = 2 +  Ej Tjk 
2ak 2 j=0

and therefore  2
N
X −1
2P(X = 0, Z = 0) − P(Z = 0) =  Ej Tjk  . (32)
j=0

The first claim is an application of the law of large numbers. For the second part of this claim, consider
that yk0 = ρ−k
T ρE yk .
−1

Moving to the second claim, observe that

s P s
i:Zi =0 Xi #{i : Zi = 0}
YS = 1−2 .
#{i : Zi = 0} S

Therefore, let us name the two square roots AS and BS respectively. We know BS → a−1
k from
Eq. (31), then

P (|YS − yk | < ) = P k + AS ak − yk <
AS BS − AS a−1 −1

≥P AS BS − AS a−1
k < /2 and AS a−1
k − yk < /2

=1−P AS BS − AS a−1
k > /2 or AS a−1
k − yk > /2

≥1−P AS BS − AS a−1
k > /2 − P AS a−1
k − yk > /2

=P AS BS − AS a−1
k < /2 + P k − yk < /2 − 1
AS a−1
≥ 2β − 1 = α,
1+α
once we prove that the following inequalities hold for β = 2 :

P A B − A a−1 < /2 ≥ β
S S S k (33)

P (|AS − ak yk | < ak /2) ≥ β (34)

Let us start with the latter. Apply Prop. A.2 to Xi conditioned to Zi = 0, taking ak yk as the p
in Prop. A.2, ak /2 as the in Prop. A.2, #{i : Zi = 0} as the S in Prop. A.2, and β as the α in
Prop. A.2, thus getting
2
1 − a4k yk4 −1 1 + β
#{i : Zi = 0} = 4 2 2 Φ .
ak yk 2
Since #{i : Zi = 0} is asymptotic to a−2 k S, Eq. (27) guarantees the last expression and therefore
Eq. (34).
Let us consider Eq. (33) now. Since AS tends to ak yk , it is definitely dominated by 2ak yk . Eq. (33)
is then implied by

BS − a−1 < .
k
4ak yk

23
Now, BS2 is the mean of S i.i.d. Bernoulli variables with µ = a−2 2
k and σ = µ(1 − µ) = ak (1 − ak ).
−2 −2

Applying Lemma A.1, an asymptotically sufficient condition for Eq. (33) is

2
k (1 − ak )
a−2 1+β
−2
S≥ Φ−1 ,
4a−2
k (2 /16a2 y 2 )
k k
2

which is again implied by Eq. (27). This way the second claim is proved.
P
The third claim derives from the second one, once we consider that yk = j Ej Tjk ≤ kEk T k = a−1
k .
The last claim is trivial.

Proposition A.8 (Algorithm QHP + ancilla-free method in amplitudeE encoding). Let k be a fixed
(k)
power order. Implement a circuit that produces the state ψT defined in Eq. (15) through QHPs
(with or without mid-measurements),
E then loads |ψE i and applies the ancilla-free method for the inner
(k)
product between ψT and |ψE i, and subsequently measures the target register, as depicted in Fig. 8(a)
and (b). Let R ∈ {0, ..., N −1} be the measurement output of the target register, let Z ∈ {0, ..., N −1}k−1
the outputs of all the k − 1 measurements in the QHPs, and let X be a r.v. valued 1 if R = 0 and
Z = 0, and valued 0 otherwise. Consider S independent shots, and let Xi ∼ X be their outcomes, for
i = 1, ..., S. Finally define q
YS := X̄S ; YS0 := ρ−k −1
T ρE YS .
Then
P P
1. E[YS ] → j Ej Tjk =: yk when S → ∞ and E[YS0 ] → j Ej0 (Tj0 − η)k =: yk0 when S → ∞;

2. assuming yk 6= 0, the absolute error for YS is controlled by P (|YS − yk | < ) ≤ α once S is chosen
as 2
1 − yk2 −1 1 + α
S= Φ (35)
42 2
asymptotically when → 0, where Φ is the CDF of the standard normal distribution;

3. assuming again yk 6= 0, P (|YS − yk | < ) ≤ α is also guaranteed by the stronger condition

2
1 1+α
S= Φ−1 (36)
42 2
asymptotically when → 0;

4. assuming yk0 6= 0, any of the conditions in Eqs. (35) or (36) is also sufficient
for the error of the
0 0 −k −1
originally scaled problem in the following sense: P |YS − yk | < ρT ρE ≤ α.

Proof. Consider that

E[X] = P(R = 0, Z = 0) = P(R = 0|Z = 0) P(Z = 0) = a2k yk2 a−2 2

k = yk

Now the first claim is an application of the law of large numbers. For the second part of this claim,
consider that yk0 = ρ−k
T ρE yk . The second claim is an application of Lemma A.1. The third and fourth
−1

claims are obvious.

A.3 Circuit width and depth

Proposition A.9. Let k be a fixed power order. Let an amplitude-encoding state preparation routine
be given for both |ψT i and |ψP i. Suppose each state preparation works in depth Cd,load (N ). Then the
algorithm described in Prop. A.8 (called QHP + ancilla-free method) and that in Prop. A.7 (QHP +
swap test) have the following width and depth:

Cw (k) = r(k) lg N + δ (swap) , (37)

24
Cd (k) ≤ m(k)Cd,load (N ) + k + (3 lg N + 1)δ (swap) , (38)
where (
1 if swap test is used,
δ (swap) =
0 if ancilla-free method is used,
(
2 if mid-reset is used,
r(k) =
k + δ (swap) otherwise,

(swap) )
k + (1 − δ

 if mid-reset is used and µ1 = · · · = µk−1 = 0,
m(k) = t + (1 − δ (swap) ) if mid-reset is used, µ1 = · · · = µt−1 = 0 and µt = 1, some t ≤ k − 1,


1 + (1 − δ (swap) ) if no mid-reset is used,
µt = 0 being a successful application of the QHP (namely µt being the output of measuring the Zt
variable, defined in Fig. 11 and Fig. 8). If mid-reset is used m(k) ≤ k and
k−1
X −2(k0 −1)
E[m(k)] = + (1 − k )
−2(k−1)
kak a−2 k 0 ak .
k0 =1

Proof. Let us start calculating the space complexity Cw of the quantum circuit, namely the circuit
width. The width required to load a data set of size N is lg N . Let us now justify the prefactor
r(k). When calculating the width required to encode data for the calculation, two scenarios must be
taken into account. If we do not resort to mid-measurements, k copies of |ψT i in different registers
are needed, plus one copy of |ψE i that lies in a different register only in the case of the swap test.
If we conversely can apply mid-circuit resets, the number of registers can be reduced to 2, regardless
of k. Finally, the swap test requires only one additional qubit, and the overall space cost is that in
Eq. (37).
Moving to depth, if mid-reset is not used, the data encoding of the k copies |ψT i is performed in
parallel, as well as that of |ψE i. Vice versa, if mid-reset is used, data encoding is done in series,
and in a given shot, an iteration of encoding is performed only if the measurement of the previous
iteration was successful. This is called dynamic stopping. The inequality and the expectation of m(k)
are obvious.
To complete the derivation of Eq. (38), notice that once data is loaded, the additional depth for
each QHP is 1, since all ECNOTs can be performed in parallel. Consequently, the additional depth
(k)
required to produce ψT , is t, and is therefore dominated by k. Finally, the swap test has a depth
of 3n + 1, as all swaps are controlled by the same ancilla.

A.4 Putting powers together

So far, we constructed an algorithm providing an estimator YS , called Yk in this subsection, that is
able to approximate yk , for a single power k, up to a given error. Now, following Eq. (10), we define
an estimator
K
X
V := T ρE bk (η)Yk
ρ−k −1
(39)
k=0

and we want to verify that V is a good approximation for v, thanks to Eq. (10).
As a part of out asymptotic analysis, we shall discuss the error scaling when N → ∞. Since we can
expect the contract value to be affected by the growth of N , the analysis must be conducted under
relative error.
P
Proposition A.10 (Convergence rate in amplitude encoding). Let w ∈ [0, 1]K such that k wk = 1.
P
Also, let α ∈ [0, 1)K such that k (1−αk ) = 1−β for some β ∈ (0, 1). Finally, let > 0. For instance,
one may take wk = K −1 for all k and αk = K−1+β
K for all k.

25
Then V defined in Eq. (39) is an estimator for v ∗ such that

K
X K
X
P (|V − v ∗ | ≤ ) ≥ β, where v ∗ := bk (η) yk0 = T ρE bk (η) yk ,
ρ−k −1

k=0 k=0

T ρE |bk (η)| |Yk − yk | ≤ wk ) ≥ αk for all k ≤ K.

provided that P(ρ−k −1

Consequently, a similar estimation holds for the relative error:

P (|V − v ∗ | ≤ |v|) ≥ β,

provided that P |Yk − yk | ≤ rk wk |bk (η)|−1 ≥ αk for all k ≤ K, where

rk := |v| ρkT ρE . (40)

Finally, let K be fixed. When N grows, rK dominates the asymptotic behavior of rk for all other
k ≤ K.

Proof. Consider that

K
X
≥1− T ρE |bk (η)| |Yk − yk | ≥ wk
P ρ−k −1

k=0
XK
≥1− (1 − αk ) = β.
k=0

The second claim is an application of the first one. As for the third, if rk → 0 for some k when
N → ∞, then ρkT ρE → 0. In such case, since ρT ≤ 1, rK goes to 0 at least as fast as rk .

Remark A.11. αk = K−1+β K is very close to 1 if K grows, implying a very high sampling complexity
from Eq. (28) or (36). Therefore the technique is effective only if K is low. Additionally, let us point
out that one may leverage the knowledge of bk (η) to refine the definition of αk and wk .

Remark A.12. Given the linear behavior of the inner product, we can assume |v| = O(ρ−1 T ρE ) when
−1

N grows. Additionally, by the Cauchy-Schwarz inequality, |v| ≈ |v ∗ | ≤ ρ−1 −1

T ρE . As a consequence, we
k−1
often replace rk with the estimation ρT .

It is clear from the previous Proposition that the scaling of the error when N grows is bound to that
of the norms ρ−1 −1
E and ρT , as well as to the powers k.
√
Remark A.13. In general, it is reasonable to assume that the norms ρ−1 E , ρ−1
T scale as O( N ): indeed
√ p
if Tj0 are sampled from a same r.v. T 0 , Tj0 − η / N tends to the finite quantity E[(T 0 − η)2 ], and
similar for E 0 . In the same fashion, one can assume |v| = O(N ). As a consequence,

rk = O(N K/2+1/2−1 ) = O(N K/2−1/2 ).

Figure 12 confirms the previous remark. Additionally, it shows the effect of bk in agreement with
Prop. A.10.

26
(a) S = 100
120%
k =0 k =0
100% k =1 k =1
=2 =2

|
k k

|bk (η)| |Yk′ −yk

′
80% 4%
=3 =3
|Yk′ −yk′ |

k k

|v|
yk
′

60%
Error

40% 2%

Error
20%

0% 0%

2 4 8 16 32 64 128 256 512 2 4 8 16 32 64 128 256 512

k k
|v| 1.5%
yk
′

60%
Error

1%
40%
Error

20% 0.5%

0% 0%

2 4 8 16 32 64 128 256 512 2 4 8 16 32 64 128 256 512

N N

Figure 12: The error of the estimator Yk0 , relatively to its own target value yk0 (left) and the error of the power
contribution bk (η)Yk0 , relatively to the global target v (right), averaged over S samples, for the ancilla-free algorithm
without mid-resets. Each point in the plot is the average of 50 independent runs on the qasm simulator. Since the
polynomial has a very low coefficient b3 , the error for k = 3 provides a modest contribution to the overall result for
small problem sizes.

Example A.14. Consider a first problem. We are given two four-dimensional inputs [Tj0 ]30 = [xj ]30
and [Ej0 ]30 = [yj ]30 , which we assume to be positive and normalized. For simplicity, take the volume
P P
function to be f (x) = xK . Then the quantum algorithm is able to estimate j (Tj0 )K Ej0 = j xK j yj .
Now consider a second problem. This time, we are given as inputs the same values as be-
fore, but twice: so we have two eight-dimensional vectors [Tj0 ]70 = [x0 , ..., x3 , x0 , ..., x3 ] and [Ej0 ]70 =
P P
[y0 , ..., y3 , y0 , ..., y3 ]. Obviously this time j (Tj0 )K Ej0 = 2 j xKj yj , and therefore we accept an er-
ror that is the double of the one that we would accept in the previous problem. Now, we ap-
ply the quantum algorithm: we encode Tj = 2−1/2 Tj0 and Ej = 2−1/2 Ej0 , obtaining as a result
P −K/2 0 K −1/2 0 P
2 (Tj ) 2 Ej = 2−K/2−1/2 (Tj0 )K Ej0 , which needs to be rescaled by a factor 2K/2+1/2 to
obtain the final result. Unfortunately though this rescaling implies an error propagation that is not 2,
but 2K/2+1/2 .

Coherently with Remark A.13, the previous examples shows that the relative error scales as rk =
O(N k/2−1/2 ). If we additionally consider that the sampling complexity of the method described in this
Appendix scales as O(2 ), and that we need to add the circuit depth scaling on top to calculate the
quantum time, we conclude that we can improve on the classical case O(N ) only for K = 1. In the
next Appendix, we introduce QAE to partially overcome such limitation.

27
B Applying Quantum Amplitude Estimation techniques in amplitude enconding
To outperform known classical results, we leverage on the Quantum Amplitude Estimation technique [5]
and its variants, such as Faster QAE [51], Iterative QAE (IQAE) [20], Chebyshev QAE (ChebQAE) [6]
and Dynamic QAE [1].
Let us recall the general result for QAE. Suppose to be given a r.v. providing x with probability
|wx |2 , with x ∈ 0, . . . , 2n − 1. Let f be a real function defined on the same integer domain. Suppose to
P
have an n-qubit loading p unitary A such p that A |0i = x wx |xi, and an (n + 1)-qubit unitary
P
R such
that R |xi |0i = |xi ( 1 − f (x) |0i + f (x) |1i). The objective is to estimate E[f (A)] = x f (x)|wx |2 .
Define F := R(A ⊗ I) and |χi := F |0i. Let Z := I − 2 |0ih0| and U := I − 2 |χihχ| = FZF † . The
following holds:

Theorem B.1 (QAE scaling with bounded output values). Let f and A as defined above, such that
f (A) is valued in [0, 1]. Let the desired accuracy be . There exists a quantum algorithm, called QAE,
that uses O(1) copies of |χi and uses U for a number of times O(1/), and estimates z := E[f (A)]
up to an additive error with success probability at least 8/π 2 > 0.81. It suffices to sample from the
quantum circuit S = O(lg 1/(1 − α)) times and take the median ZS to obtain an estimate such that
P(|ZS − z| ≤ ) ≥ α.

Proof. See [52, Thm 2.3 and Lemma 2.1]. Also refer to [40].

The idea of the QAE is to connect the desired expectation value to an eigenfrequency of an oscillating
quantum system and then use the phase estimation algorithm to obtain the estimation up to a desired
accuracy [40]. The desired expectation z is linked to the corresponding phase θ via

1 θ
z= 1 − cos , (41)
2 2

and similarly the estimator for z is defined by

!
1 θ̂
Z= 1 − cos . (42)
2 2

The error in z is then linearly controlled by the error in θ when θ̂ → θ by

1 θ
|Z − z| = sin θ̂ − θ + o θ̂ − θ = O θ̂ − θ , (43)
2 2
through a Taylor expansion of Eq. (41), as shown for instance in Ref. [40, Appendix F].
Now, let us apply the Theorem to our case. Specifically, we start discussing QAE where the unitary
is taken from the ancilla-free method without mid-measurements, as depicted in Fig. 8(c). Set A to
be the full unitary of the ancilla-free method, that loads temperatures, computes QHPs without mid-
P
measurements, and applies the inverse of the price loading. By Eq. (24), we get w0 = j Tjk Ej , while
wx is garbage for x 6= 0. Therefore it is sufficient to define
(
1 if x = 0,
f (x) := (44)
0 otherwise,

and the algorithm will estimate the desired inner product. Implementing f through a quantum circuit
R is trivial, as it is simply a multi-controlled NOT gate, testing
√ all qubits in all registers to be 0. Now,
√
z is the squared inner product, so that we can use YS := ZS to estimate the inner product y := z.
√
It turns out that the error bounds given for z in Prop. B.1 are also valid for y = z, as stated by the
following Proposition.

28
Proposition B.2 (Oracle complexity for QHP + ancilla-free method + QAE). Let f and A be those
P
specified right above. By applying QAE, it is possible to estimate yk := j Tjk Ej up to an additive
error with success probability at least 8/π 2 > 0.81, using O(1) copies of |χi and using U for a number
of times O(1/). It suffices to sample the circuit output Y for S = O(lg 1/(1 − α)) times and take the
median YS of the samples, to obtain an estimate such that P(|YS − yk | ≤ ) ≥ α.
√
Proof. Applying the Taylor expansion to y = z, as done in Eq. (43), one obtains again

|Y − y| = O θ̂ − θ (45)

for θ̂ → θ, uniformly in θ. The rest of the proof of Thm. B.1 flows alike.

Proposition B.3 (Oracle complexity for QHP + ancilla-free method + IQAE). The results of
Prop. B.2 are valid also if the Iterative QAE is applied instead of QAE.

Proof. The scaling of IQAE is grounded on the estimate Eq. (43) too, refer to [20, Algorithm 1 and
Appendix B]. Therefore, the same argument of Prop. B.2 can be adopted.

To perform a comparison between the classical and the quantum case, the cost of a single query
must be considered. The cost of U is obviously derived by the cost of A and of R. Now, the cost of
A was already calculated in Prop. A.9. As far as R is concerned, Ref. [26] shows that an n-controlled
NOT can either be implemented with 1 ancilla qubit and O(2n ) gates and depth [26, Lemma 7.1], or
with n − 1 ancillas with O(n) gates and depth [26, Lemma 7.2]. Ref. [53] further improved the ancillas
2 e for n ≥ 5.
necessary to achieve a linear depth, to d n−3

Proposition B.4. The implementation of the oracle U required by Prop. B.2 and B.3, has a width
and depth respectively of:
Cw (k) = O(k lg N ), (46)

Cd (k) = 2Cd,load (N ) + O(k lg N ), (47)

where k is the usual monomial degree in the approximating polynomial.

Proof. Recall that U = FZF † , where F := R(A ⊗ I) and Z := I − 2 |0ih0|. Z can be implemented
with two 1-qubit gates, plus a (k lg N )-controlled NOT.
Concerning width, k copies of |ψT i need to be loaded in parallel for A, leading to k lg N . The
(k lg N )-CNOT requires O(k lg N ) ancillas [53]3 . Finally, R is a (k lg N )-CNOT as well.
Moving to depth, the data loading is performed in parallel on the different registers for |ψT i, as well
as CNOTs for QHPs. The data loading of |ψE i in performed afterwards. The other operations are
dominated by the two (k lg N )-CNOTs, which require O(k lg N ) depth.

Remark B.5. For the sake of completeness, let us highlight that the same QAE techniques can be
applied to the swap test as well, with a slightly more complex design, without any additional advantage
compared to QAE with the ancilla-free method. In this case, indeed, one needs to estimate two quan-
tities: with a first estimation problem, by defining f1 = 1 when QHPs are successful, one derives the
k . Then a second estimation problem is run, by setting f2 = 1 when both the QHPs are
success rate a−2
successful and the swap test ancilla provide 1. Finally, the two quantities are merged into Eq. (32),
recalling also Eq. (31), to estimate the desired inner product.

We comment the asymptotic performance of this technique, in comparison with the others, in Ap-
pendix D.
3
For detailed depth and width constants of n-CNOT refer to the cited manuscript.

29
−
D02 + D12 + D22 + D32
D 2 +D 2 D 2 +D 2
2 arcsin D2 +D02 +D12 +D2 2 arcsin D2 +D22 +D32 +D2
D02 + D12 D22 + D32 0 1 2 3 0 1 2 3

D2 D2 D2 D2
D02 D12 D22 D32 2 arcsin D2 +D
0
2 2 arcsin D2 +D
1
2 2 arcsin D2 +D
2
2 2 arcsin D2 +D
3
2
0 1 0 1 2 3 2 3

Figure 13: Two classical binary tree representations of a data set [19]: on the left, the state decomposition represen-
tation, and on the right the angle representation. The state decomposition can be built bottom up starting from a
classical array, and also applies to unnormalized data sets. The angle representation is specifically suited for quantum
data loading, and can be derived travelling the state decomposition tree top-down. Dashed nodes are redundant,
since they can inferred from their sibling.

C Bidirectional Orthogonal Encoding of Ddta

In this section, we define the novel concept of Bidirectional orthogonal encoding (BOE) of data. Then
we derive the same results covered in Appendices A and B, but this time in the context of the bidir-orth
encoding.

C.1 Towards the BOE

Let [Dj ] be a vector of length N , a power of 2. Suppose an angle tree is available (see Fig. 13). In
this setting, multiple encodings are possible: (a) the amplitude encoding, (b) the D&C encoding, (c)
the bidirectional encoding, and (d) the D&C-orth encoding. Let us describe these encodings shortly,
thus explaining why we introduced an additional one, that we call Bidirectional Orthogonal Encoding
(BOE).
We are already familiar with the amplitude encoding, that produces the state
N
X −1
|ψD i := Dj |ji . (48)
j=0

It has the advantage of requiring only lg N qubits, but unfortunately needs O(N ) depth for exact
loading.
The D&C encoding is a variant of the analog encoding, recently proposed in Ref. [37]. We call it
divide-and-conquer encoding, or shortly D&C encoding, after the paper title. In this case, the state
produced is
E N
X −1 E
ψ̃D := Dj |ji φ̃j . (49)
j=0

The idea is to resort to an additional register, that contains auxiliary qubits, entangled with the main
register.
The advantage of this method is that exact loading can then be performed efficiently, namely
in O lg2 N . The downside is that the required side register is sized O (N ).
This led to the definition of the bidirectional encoding [19], a configurable mixed encoding, that
combines the amplitude and the D&C approaches, and defines a family of encoding techniques param-
eterized over a so-called split level s ∈ {1, ..., lg N }, that steers the balance between circuit depth and
width. For s = 1 the encoding coincides with the D&C, while for s = lg N the amplitude encoding is
retrieved. Finally, for s = 21 lg N , it is possible to achieve a sublinear scaling both in depth and width.
The state takes again the form of Eq. (49).
Despite the similarity between Eqs. (48) and (49), and despite the fact that measurement of the
primary register provides the same results in both cases, it is essential to remark that D&C is in fact
a different encoding from the amplitude. The algorithms that require the amplitude encoding cannot
all be trivially applied to data in the D&C encoding, and specifically the techniques introduces so far

30
for the calculation of the inner product do not apply in the D&C encoding. Even more so, they cannot
be employed in the bidirectional encoding.
Here the D&C-orth encoding comes to the aid. As the original paper [37] shows, the D&C encoding
can be modified to guarantee that the auxiliary states are orthonormal, i.e. hφi |φj i = δi,j , at the
expense of an additional side register of small width lg N . The new encoding is relevant for us, since
it is compatible with the application of the swap test, that does not provide the same result as in the
amplitude encoding (see Prop. C.2), but is still useful for the calculation of the inner product.
Combining these elements, we define the Bidirectional Orthogonal Encoding (BOE) in the following
way:
E N
X −1 E N
X −1
ψ̃D := Dj |ji φ̃j |ji = Dj |ji |φj i . (50)
j=0 j=0

P E
−1
where N j=0 Dj |ji φ̃j is constructed the bidirectional encoding, and the additional third register
obviously guarantees orthonormality of {|φj i}j .
When s = 1, the D&C-orth encoding is retrieved.

Proposition C.1 (Circuit depth and width of the BOE). Let [Dj ] be a vector of length N , a power of
2, and suppose a classical binary tree representation is available. Let s be any integer in {1, . . . , lg N
}
1 2 2
called split level. Then the state in Eq. (50) can be constructed in depth 2 + 2 lg N − lg N − s + s +
s

1 = O 2s + lg2 N − s2 and width (s + 1)N 2−s − 1 + lg N = O ((s + 1)N 2−s ).

PN −1 E
Proof. It is known [19] that j=0 Dj |ji φ̃j , namely the associated bidirectional encoding, can be

1 2
constructed in depth 2s + 2 lg N − lg N − s2 + s and width (s + 1)N 2−s − 1. The conclusion is
then trivial.

1
For s = 2 lg N , both width and depth are sublinear.
h i h i
(0) (1)
Proposition C.2 (Swap test in the BOE). Let Dj and Dj be two vectors of length N , a power
of 2, represented in the BOE with split level s. Apply the swap test between the primary register of the
two statevectors (see Fig. 9). Then

1 1 P (0) 2 (1) 2
1. The swap test qubit is measured in the state |1i with probability 2 + 2 j Dj Dj .

2. Let > 0 and α ∈ (0, 1). Let Xi , for i = 1, ..., S, be a r.v. representing the output of the swap-test
measurement after the i-th shot of circuit. Call X̄S the mean r.v. resulting from the S independent
P (0) 2 (1) 2
shots. Then YS := 1 − 2X̄S is an estimator for p = j Dj Dj and the error is controlled
by
P (|YS − p| < ) = α, (51)

once S is chosen as
2
1 − p2 1+α
S= Φ−1 (52)
2 2
asymptotically when S → ∞, where Φ is the CDF of the standard normal distribution.

Proof. The proof of the first claim is the same as that of the swap test for the D&C encoding,
see [37]. For the second claim, consider that Xi are i.i.d. Bernoullis with mean 21 − 12 p and variance
1
σ 2 = µ(1 − µ) = 4 − 14 p2 . By the Central Limit Theorem, −µ
X̄S√
σ/ S
is asymptotically a standard normal.
Therefore YS −p
√
2σ/ S
= √ YS −p√ is asymptotically a standard normal as well.
1−p2 / S

31
C.2 Data encoding and inner product
√ √
The previous Proposition motivates us to load the normalized versions of T 0 − η and E 0 , under the
Assumption 2.2 that all terms are positive, so that the estimator provides a normalized version of the
desired inner product. More specifically, define
q q
T̃j = ρ̃T Tj0 − η, Ẽj = ρ̃E Ej0 , (53)
where
N
X −1 N
X −1

T =
ρ̃−2 (Tj0 − η), E =
ρ̃−2 Ej0 . (54)
j=0 j=0

In other words, we start with two BOE states

E N
X −1 E E N
X −1 E
ψ̃T̃ := T̃j |ji φTj , ψ̃Ẽ := Ẽj |ji φE
j . (55)
j=0 j=0
E
It is trivial to verify that the QHP applied to the primary registers of multiple copies of ψ̃T̃ still
calculates the power and outputs
E N
X −1 E
(k)
ψ̃T̃ := ãk T̃jk |ji φTj (56)
j=0

when successful, preserving orthonormality of the side register, where ãk is the appropriate scale factor
 −1/2
X
ãk :=  T̃j2k  . (57)
j

and the success rate of the QHP is ã−2

k .

Remark C.3. This setting can be exploited to estimate

N
X −1 N
X −1 N
X −1
ỹk := Ẽj2 T̃j2k = ρ̃2E ρ̃2k
T Ej0 (Tj0 − η)k = yk0 , (58)
j=0 j=0 j=0

as we demonstrate shortly. Before moving to that, it is worth underlying some differences between
the newly introduced ãk , ỹk and the previous ak , yk . First of all, ỹk scales quadratically with factors
applied to [T̃jk ] or [Ẽj ], while yk scales linearly with [Tjk ] and [Ej ]. We will comment later on this fact,
that has a major impact on performance (see Prop. C.6). Secondly, and coherently with the former
observation, the bounds 0 ≤ ak yk ≤ 1 are now replaced by 0 ≤ ã2k ỹk ≤ 1. Indeed:
X 2 2
ỹk = Ẽj2 T̃j2k ≤ Ej2 Tj2k = kEj k24 Tjk ≤ kEj k2 Tjk = ã−2
k . (59)
4

Third, in the context of the BOE, Eq. (10) is replaced by:

K
X K
X
v ≈ v∗ = bk (η) yk0 = E bk (η) ỹk .
ρ̃T−2k ρ̃−2 (60)
k=0 k=0

Correspondingly, the estimator for v ∗ is defined as

K
X
Ṽ := E bk (η) Ỹk
ρT−2k ρ−2 (61)
k=0

Even though the estimation method is different, yk0 and therefore v ∗ are the same as in the amplitude
encoding.

32
Proposition C.4 (Algorithm E BOE + swap test). Let k be a fixed power order. Implement a circuit
(k)
that produces the state ψ̃T̃ defined in Eq. (56) through QHPs (with or without mid-measurements),
E E E
(k)
then loads ψ̃Ẽ and applies the swap test between ψ̃T̃ and ψ̃Ẽ , as depicted in Fig. 14. Call
X ∈ {0, 1} the output of the measurement of the control qubit in the swap test, and Z ∈ {0, ..., N −1}k−1
the outputs of all the k − 1 measurements in the QHPs. Define Xi ∼ X and Zi ∼ Z, for i = 1, ..., S,
as the outcomes of S independent samples from the circuit. Let
2#{i : Xi = 0, Zi = 0} − #{i : Zi = 0}
ỸS := ; YS0 := ρ̃−2k −2
T ρ̃E YS .
S
Then
P P
1. E[ỸS ] → j Ẽj2 T̃j2k =: ỹk when S → ∞ and E[YS0 ] → j Ej0 (Tj0 − η)k =: yk0 when S → ∞;

2. the absolute error for ỸS is controlled by P ỸS − ỹk < ≤ α once S is chosen as
( ) 2
1 − ã4k ỹk2 1 3+α
S ≥ max 16ỹk2 (ã2k − 1), 4 Φ−1 (62)
ã2k 2 4

asymptotically when → 0, where Φ is the CDF of the standard normal distribution;

3. P ỸS − ỹk < ≤ α is also guaranteed by the stronger condition
2
1 3+α
S ≥ 16 Φ−1 (63)
2 4
asymptotically when → 0;
4. any of the conditions in Eqs.(62) or (63) is also sufficient
for the error of the originally scaled
0 0 −2k −2
problem in the following sense: P |YS − yk | < ρ̃T ρ̃E ≤ α.

Proof. The proof follows that in Prop. A.7. In particular, the first claim is straight-forward. Concern-
ing the second claim, one defines
P
i:Zi =0 Xi #{i : Zi = 0}
AS := 1 − 2 and BS := .
#{i : Zi = 0} S

This time BS → ã−2

k . Following the proof structure of Prop. A.7,

P (|YS − ỹk | < ) ≥ P AS BS − AS ã−2
k < /2 + P k − ỹk < /2 − 1 ≥ 2β − 1 = α,
AS ã−2
1+α
once β = 2 and


P AS BS − AS ã−2k < /2 ≥β (64)

P AS − ã2k ỹk < ã2k /2 ≥ β
 (65)

For the latter, apply Prop. C.2 to Xi conditioned to Zi = 0, taking ã2k ỹk as the p in Prop. C.2,
ã2k /2
as the in Prop. C.2, #{i : Zi = 0} as the S in Prop. C.2, and β as the α in Prop. C.2, and
recalling that #{i : Zi = 0} is asymptotic to ã−2
k S.
2
For the former instead, since AS tends to ãk ỹk , it is sufficient to prove BS − ã−2 < 4ã2 ỹ . BS is
k
k k
the mean of S i.i.d. Bernoulli variables with µ = ã−2 2
k and σ = µ(1 − µ) = ãk (1 − ãk ). Since there
−2 −2

are no square roots in this case, the CLT can be applied directly without need for Lemma A.1, to
verify that the condition in Eq. (62) is sufficient. This proves the second claim.
The third one is simple through Eq. (59), also remembering ã−2 k ≤ 1. The fourth is trivial as
well.

33
n
|0⟩ ψ̃T̃
UT̃
|0⟩ aux
Z1 = 0
n
|0⟩ ψ̃T̃ |0⟩⟨0|
UT̃
|0⟩ aux

n
|0⟩ ψ̃Ẽ
UẼ
|0⟩ aux
X=1
|0⟩ H H
(a)

n
|0⟩ ψ̃T̃
UT̃
|0⟩ aux
Z1 = 0
n
|0⟩ ψ̃T̃
UT̃
|0⟩ aux

n
|0⟩ ψ̃Ẽ
UẼ
|0⟩ aux
X=1
|0⟩ H H
(b)

n
|0⟩ ψ̃T̃
UT̃
|0⟩ aux

n
|0⟩ ψ̃Ẽ
UẼ
|0⟩ aux

|0⟩ H H
X=1 X=1
|0⟩

Figure 14: An exemplary demonstration of the algorithm based on BOE data loading, QHP, and the swap test, for
the calculation of the inner product between temperatures to the power k and prices, when k = 2. UT̃ and UẼ are
the loading unitaries for T̃ and Ẽ respectively, in the BOE. In (a), the algorithm in its original formulation, where
the gray box highlights the QHP, and the blue box the swap test. In (b), the version without mid-measurements.
All measurements are deferred to the end. In (c), a version with single-qubit measurement, suggesting unitaries A
and R for QAE techniques. To adopt QAE, a separate circuit based on A and R0 is required to estimate the success
rate of QHPs (for this purpose A can be reduced to a smaller circuit that only computes the QHPs). As usual, we
highlighted in gray the QHP, in blue the swap test, and in green the QAE unitaries.

34
Proposition C.5 (Circuit width and depth in BOE). Let k be a fixed power order. Then the
algorithm
described in Prop. C.4 (called BOE + swap test) have the following width and depth:
2 + 2 lg2 N − lg N − s2 + s + 1.
s 1

Cw (k) = r(k) (s + 1)N 2−s − 1 + lg N + 1, (66)

1 2
Cd (k) ≤ m(k) 2s + lg N − lg N − s2 + s + k + (3 lg N + 1)δ (swap) , (67)
2
where r(k) and m(k) are those defined in Prop. A.9 with δ (swap) = 1.
Proof. Similar to that of Prop. A.9, exploiting also Prop. C.1.
P
Proposition C.6 (Convergence rate in BOE). Let w ∈ [0, 1]K such that k wk = 1. Also, let
P
α ∈ [0, 1)K such that k (1 − αk ) = 1 − β for some β ∈ (0, 1). Finally, let > 0. For instance, one
may take wk = K −1 for all k and αk = K−1+β K for all k.
Then Ṽ defined in Eq. (61) is an estimator for v ∗ such that P (|V − v ∗ | ≤ ) ≥ β provided that
T ρE |bk (η)| |Yk − yk | ≤ wk ) ≥ αk for all k ≤ K.
P(ρ−k −1

holds for the relative error: P (|V − v | ≤ |v|) ≥ β, provided

Consequently, a similar estimation ∗

that P |Yk − yk | ≤ r̃k wk |bk (η)|−1 ≥ αk for all k ≤ K, where

r̃k := |v| ρ̃2k 2

T ρ̃E . (68)

Finally, let K be fixed. When N grows, rK dominates the asymptotic behavior of rk for all other
k ≤ K.
Proof. Similar to that of Prop. A.10.

Remark C.7. Let us emphasize here that r̃k scales quadratically √with ρ̃kT , whereas rk scales linearly
with ρkT . Under the assumptions of Remark A.13, ρ̃T scales as O( N ), similarly to ρT . As a conse-
quence,
r̃k = O(N K+1−1 ) = O(N K ).
Given the discussion at the end of Subsection A.4, this compares to rk = O(N K/2+1/2 ) of the amplitude
encoding, providing strong limitations to the applicability of the BOE for big values of N .
Complexity of the different techniques is summarized and discussed in Appendix D. For the moment,
let us say that it can be improved by resorting to QAE, as done for the amplitude encoding.

C.3 QAE techniques

Inspired by Fig. 14 and similar to Remark B.5, one can define two QAE estimation
problems, the
1
former to approximate the success rate ãk and the latter to estimate 2 ỹk + ãk . As shown in the
−2 −2

figure, A is composed by data loading, QHP, and swap test. Then we define R0 to value an additional
qubit as |1i iff all the QHPs were successful, and R to value an additional qubits |1i iff all the QHPs
were successful and the swap test is |1i. With these building blocks, we can achieve the following
performance.
Proposition C.8 (Complexity for BOE + *QHP + swap + QAE). Let A, R0 and R as specified
right above. Let U 0 and U be the Grover oracles for A and R0 , and A and R respectively, according to
(1)
the usual QAE technique. Let YS the median outcome of S executions of a QAE estimation applied
(0)
to U 0 , and let YS be similarly the median outcome of the QAE technique applied to A and R. Then
(1) (0) P
YS := 2YS − YS is an estimator for ỹk := j Ẽj2 T̃j2k .
Moreover, P(|YS − ỹk | ≤ ) ≥ α is obtained by using the Grover oracles U and U 0 for a number of
times O(1/), and taking S = O(lg 1/(1 − α)).
Additionally, under the same conditions, P(|YS0 − ỹk0 | ≤ ρ̃T−2k ρ̃−2
E ) ≥ α.

35
Proof. Consider a circuit that performs U 0 and measures the last qubit: by definition of A and R0 , the
probability to get 1 is the success
rate ã−2
k . Similarly, if one executes U and measures the last qubits,
1
obtains 1 with probability 2 ỹk + ãk , applying the usual argument of Eq. (32).
−2

(1)
As a consequence, for Thm. B.1, we obtain P YS − ã−2
k ≤
2 ≥ β by using U 0 a num-
ber

of times O(1/), and taking

S = O(lg 1/(1 − β)), where β = 1+α 2 . In the same way,
(0) 1
P YS − 2 ỹk + ãk −2
≤ 2 ≥ β by using U a number of times O(1/), and taking S = O(lg 1/(1 −

β)).
Combining the two estimates, we get P (|YS − ỹk | ≤ ) ≥ α. The proof is complete once we observe
that O(lg 1/(1 − β)) = O(lg 1/(1 − α)).
As far as the originally scaled version of the estimation is concerned, simply substitute YS0 and yk0
in the previous result.

Proposition C.9 (Oracle depth and width in the BOE). In the setting of the previous Proposition,
and assuming a split level s = s(N ) ∈ {1, . . . , lg N }, the depth and width of U are:

Cw (k) = O (s + 1)N 2−s + k lg N , (69)

Cd (k) = O 2s + lg2 N − s2 + k lg N , (70)

and they dominate those of U 0 .

Proof. The proof is similar to that of Prop. B.4, using the fact that data loading requires resources
listed in Prop. C.1.

C.4 The classical sampling-based algorithm as a benchmark

For comparison with the classical case, we recall the following results, that exploit the idea of sample
access to design efficient classical quantum-inspired algorithms [54]. Having sample access to a real
non-null vector v, means having a tool that (efficiently) provides the index j of any element from the
array, with a probability proportional to its squared value. In other words, it means having access to
a random variable J such that P(J = j) = vj2 /kvk. Instead, we say that query access is given to a
vector v, if it is possible to obtain vj from j.

Proposition C.10 (Sample access [54, Prop. 3.2]). The state decomposition tree of a vector of length
N , described in Fig. 13, also allows for classical sampling from the vector in O(lg N ) time.

Proposition C.11 (Sampling-based inner product [54, Prop. 4.2]). Given query access to two real
P
vectors v, w, sample access to v, and knowledge of kvk, the inner product j vj wj can be estimated to

1 1
additive error kvk kwk with probability at least α using O 2
lg 1−α queries and samples.

The better error scaling of Prop. B.3 and Prop. C.4, in contrast with the classical sampling-based
version in Prop. C.11, is empirically demonstrated in Fig. 15.

Remark C.12. Let us emphasize that the classical sampling-based algorithm requires sample access
to only one of the inputs: in our case, we can assume sample access to prices, that do not undergo any
transformation. On the contrary, in the quantum case we obviously need to normalize both vectors.
This translates into the fact that the scaling of the sampling-based algorithm in N does not depend
on k, since we can assume [f (Tj0 )]j = O(N 1/2 ) for any k, in contrast with the quantum behavior
highlighted in Subsection A.4.

In conclusion, the error scaling of the proposed quantum algorithm is better, but only for fixed vector
size N .

36
(a) k = 1

10−1

10−2

10−3

10−4
error

10−5

10−6
direct+IQAE
D&C-orth+swap+IQAE
10−7 classical sampling-based
error=1/queries
10−8 error=1/sqrt(queries)

102 103 104 105 106 107

queries

(b) k = 2

100 direct+IQAE
D&C-orth +swap+IQAE
classical sampling-based
10−1
error=1/queries
error=1/sqrt(queries)
10−2

10−3
error

10−4

10−5

10−6

10−7

102 103 104 105 106 107

queries

Figure 15: IQAE scales quadratically better than the classical sampling-based algorithm described in Prop. C.10, in
terms of error, when the number of queries grows. Queries are intended as the distribution samples in the classical
case, and the oracle calls in the quantum case. Let us remind that D&C-orth is the same as BOE with s = 1. The
underlying problem is that of calculating the inner product of two fixed vectors of dimension N = 4, and power k = 1
in (a), k = 2 in (b). Quantum runs are executed on a noiseless QASM simulator in Qiskit. Dashed lines are the
best-fit lines in the log-log space.

37
D Comparative complexity analysis
D.1 Summary of the algorithm variants
In the previous appendices we introduced multiple implementations of our approach, generated by
different data encodings (amplitude or BOE), different implementations of the QHP (with or without
mid-reset), and different techniques for the inner product (ancilla-free method or swap test), as well
as by the introduction of *QAE in some cases. Here we collect and compare the main ones, to discuss
their complexity. Their key characteristics are also summarized in Fig. 2.
(a) Ancilla-free inner product and mid-resets. The first variant we consider assumes data are avail-
able in the amplitude encoding. Each circuit is fed with k copies of the temperature state |ψT i in
Eq. (14), so that the power Eq. (15) of the temperature vector is calculated trough the QHP. More-
over, the same register is reused for the multiple copies of |ψT i, as shown in Fig. 5. At the level of
quantum circuit implementation, this variant can benefit from the application of the dynamic stopping
technique, as introduced in Subsec. 2.2. Afterwards, under the assumption that a loading unitary UE
for |ψE i is known, the inner products yk can be calculated with the direct method. Finally the volume
v is reconstructed classically via Eq. (10). We refer to Subsection A.2 for the detailed implementation
and analysis.
(b) Ancilla-free inner product and no mid-resets. In a second variant, we resort to the QHP imple-
mentation without mid-circuit resets, as depicted in Fig. 6, while leaving all other steps unchanged.
This results in a shallower circuit, at the cost of more qubits. Details are again in Subsec. A.2.
In both techniques introduced so far, the ancilla-free method for the estimation of inner products can
be replaced with the swap test. We do not extensively discuss these variants here, since, as previously
mentioned, the swap test comes with the relevant disadvantage that the sampling complexity is in
general unbounded (Fig. 7). On the other hand, the swap test does not require knowledge of a loading
unitary UE , as it is enough to have access to a copy of the state |ψE i at each iteration. This does not
appear to be a significant limitation in our use case, where we assume data are classically available
and loaded via circuits. The algorithm complexity is also not improved by using the swap test, see
Subsec. A.2.
(c) Ancilla-free inner product and *QAE. The version above without mid-circuit resets inspires
another interesting variant: under the assumption that loading unitaries UT and UE are available for
both vectors, the overall circuit before any measurement is made can be seen as a unitary, that is fed
to a Quantum Amplitude Estimation (QAE) technique to obtain a quadratic speedup in the precision
, in line with the general result of QAE. Details are presented in Appendix B. This version (c) is the
main variant described in the body of the manuscript.
(d) BOE and *QAE. Additional algorithms can be obtained by resorting to the BOE, as discussed
in Appendix C. In this context, the swap test is the only viable option. We focus on the variant that
loads data in BOE, applies the QHP for the power calculation, the swap test for the inner product,
and boosts precision via QAE techniques.

D.2 Classical benchmarking algorithms

For comparison, we consider three classical algorithms:
Exact algorithm By this term, we refer to the direct calculation of f (Tj0 ) for all j, and of the inner
P
product j f (Tj0 )Ej0 , without resorting to a polynomial approximation.
Classical polynomial approximation This algorithm is the evaluation of the summation in Eq. (10),
where yk are computed classically.
Sampling-based polynomial approximation In her recent work [54], Tang showed that inner products
can be approximated efficiently on classical computers as well, under the assumption of availability of
a binary tree data structure similar to the one needed for the BOE loading for at least one of the two
inputs, as further described in Subsec. C.4. With the term ‘Sampling-based polynomial approximation’
we refer to an algorithm made of the efficient calculation of yk by means of such technique, and then

38
of the usual summation in Eq. (10).

D.3 Complexity measures

We use the following metrics for complexity. The classical time Cc is the pre- and post-processing time
computed in terms of elementary operations. In case of purely classical benchmarking algorithms, it
accounts for the all processing. Classical data is assumed to be given in a specific format, and the
cost of data preparation is not included in the pre-processing phase. The circuit depth Cd is the depth
of a single, independently instantiated run of a circuit shot, measured in terms of layers of 1-qubit
gates and CNOTs (in terms of big-O notation, this is equivalent to measure it in terms of 1- and
2-qubit gates). Notice that Cd may even depend on the single shot, if dynamic stopping is applied:
in this case, we can treat it as a random variable. The maximal Cd is an important metric for an
approximate estimation of the impact of circuit noise. The sampling complexity CS is the number
of repeated executions (shots) of each circuit required for the statistical estimation of outputs, and
is again indexed on the circuit parameters k and l. The oracle complexity Co , which appears in the
discussion of the classical sampling-based algorithm or when QAE techniques are involved, represents
the number of calls made to a black-box operation by a general-purpose method. We also define the
quantum time Cq as the sum of circuit depths multiplied by the respective sampling complexities, which
gives an approximate measure of the overall quantum circuit execution cost under the assumption that
all circuit layers have similar duration. We summarize the sum of quantum time and classical time
as √
time scaling, and for this specific metric, we add the additional assumption that all norms scale as
O( N ) (see Subsection A.4), instead of keeping the explicit contribution of norms, so that the scaling
in N becomes more clearly readable. Finally, the circuit width Cw denotes the spatial requirements of
an algorithm, as measured by the total number of qubits necessary for its implementation.

D.4 Space and time complexity

Table 7 collects the results obtained so far on the performance of the techniques listed in Subsec. D.1,
and for all the metrics defined above. In terms of time scaling in N , the best method is variant (c),
which is O(Cc,load (N ) + Cd,load (N )N K/2−1/2 lg N ). If Cd,load (N ) N , Cd,load (N ) N 1/2 and K ≤ 2,
then we have a quantum speedup against the classical case (notice that the sampling-based method
is not applicable since we are not supposing that a binary tree structure is available). If instead
Cd,load (N ) = O(1) and K = 3, the quantum complexity is almost comparable to the classical one, in
the sense that it is worse only by logarithmic factors.
The following main takeaways can also be observed. Methods (a) and (b) have the same asymptotic
time complexity of the classical techniques when N grows, if K = 2 and data loading is performed
in O(1) depth. Furthermore, the method (d) is not competitive against the classical counterparts in
terms on N , while keeping better than the sampling-based algorithm in terms of for fixed N , as
shown in Fig. 15. Finally, note that the space complexity, indicated in the Table as circuit depth, is
O(k lg N ) for algorithms (a), (b), (c), with better constants in the QAE-free cases.

39
Classical algorithms Classical exact Classical polynomial Sampling-based
approximation polynomial
approximation
Classical Encoding Array Array Binary Tree
Oracle 
complexity Co (k) N/A N/A Oα (ρ2E ρ2T |v|−2 ) c
 Classical time Cc O(N ) b O(N ) b O(lg N ) Oβ (r12 −2 ) +
Summary OK (1) cde

Time scaling Cc a O(N ) O(N ) Oβ,K (−2 lg N )
QAE-free algorithms (a) Ancilla-free and mid-resets (b) Ancilla-free, no mid-resets
Classical Encoding Various Various
Quantum
 Encoding Amplitude Amplitude
 Depth Cd (k) ≤ Cd,load (N )k + k f 2Cd,load (N ) + k f
Circuit Samples CS (k) Oα ρ2E ρ2k 0 −2 −2 g
T (yk ) Oα ρ2E ρ2k 0 −2 −2 g
T (yk )

 Width Cw (k) 2 lg N f k lg N f

 Classical time Cc 2Cc,load (N ) + OK (1) he 2Cc,load (N ) + OK (1) he
 2 −2 j 2 −2 j
Quantum time Cq Cd,load (N ) Oβ,K,b (rK ) Cd,load (N ) Oβ,K,b (rK )
Summary a

 Time scaling Cc + Cq Oβ,K,b (Cc,load (N ) + Oβ,K,b (Cc,load (N ) +

Cd,load (N )N K−1 −2 ) k Cd,load (N )N K−1 −2 ) k
QAE-based algorithms (c) Ancilla-free and *QAE (d) BOE and *QAE
Classical Encoding Various Sqrt Binary Tree
Quantum
 Encoding Amplitude BOE
 Depth Cd (k) 2Cd,load (N ) + Ok (lg N ) l O 2s + lg2 N − s2 + k lg N p
Oracle Width Cw (k) Ok (lg N ) l O ((s + 1)N 2−s + k lgN ) p
 0 −1 −1 m
 Complexity Co (k) O ρE ρT (yk )
k
O ρ̃E ρ̃kT (ỹk0 )−1 −1 q
n
 Depth Cd (k) Various Various n
l
Circuit Width Cw (k) Ok (lg N ) O ((s + 1)N 2−s + k lg N ) p
 m
 Samples CS (k) Oα (1) Oα (1) q
he

 Classical time Cc 2Cc,load (N ) + OK (1) OK (1) de


 Quantum time Cq Cd,load (N ) Oβ,K,b (rK )+
−1
Oβ,K,b r̃K −1
2s + lg2 N − s2
Summary Oβ,K,b (rK −1 lg N ) o rp




 Time scaling Cc + Cq a
Oβ,K,b (Cc,load (N ) + −1 K
Oβ,K,b N 2s + lg2 N − s2
 s
−1 [Cd,load (N ) + lg N ] N 2 ) k
K−1

a
√
Under the additional assumption that norms scale as N , see Subsection A.4. b Under the simplification that
c
floating point operations are performed in O(1) time and without precision loss. Prop. C.10, Prop. C.11
d
and Remark C.12. Assuming that the input binary tree is available; otherwise, its preparation requires
O(N ) time [37]. e The term OK (1) encompasses the polynomial reconstruction of Eq. (10) or (60). f Prop. A.9
g
Prop. A.8. h Cc,load is the classical pre-processing needed to load a copy of one input vector. i Easily derived
from the circuit depth. j Use Prop. A.10 and the fact that quantum time is defined as the product of circuit
depth and samples. k Remark A.13. l Prop. B.4. m Prop. B.2 and Prop. B.3. n The circuit depth depends
on the specific QAE technique, but all the techniques considered share the same overall depth. o Prop. A.10 and
Prop. B.2 (Prop. B.3). p Prop. C.9. q Prop. C.8. r Prop. C.6 and Prop. C.8. s Remark C.7.

Table 7: Complexity analysis of algorithm variants proposed in Subsec. D.1, in comparison with the classical bench-
marks listed in Subsec. D.2. Refer to Subsec. D.3 for a description of the complexity measures. In the case of circuit
complexity and oracle complexity measures, the parameters in the analysis are the data set size N , the monomial
degree k (k ≥ 1), the target precision and confidence level α such that P (|Yk0 − yk0 | ≤ yk0 ) ≥ α, the success
rates a−2
k , the target value yk , the scaling factors ρE , ρT , the ‘tilde version’ of the variables above and the split
level s = s(N ) ∈ {1, ..., lg N } for the BOE. In the case of summary complexity measures, we use N as before, the
polynomial degree K (K ≥ 1), the target precision and confidence level β such that P (|V − v ∗ | ≤ |v|) ≥ β,
the coefficients bk (η) in Eq. (8), the scaling factors ρE , ρT , the convergence ratios rk defined in Eq. (40) or r̃k in
Eq. 68. Remarkably, the error is measured against the exact polynomial evaluation, under the assumption that
the polynomial itself is a good approximation of the target volume function. Asymptotic estimations are provided
for → 0 or N → ∞. Constants affecting asymptotic estimates are marked in the subscript of the big O notation:
for instance the notation Oα (−2 ) is intended for → 0 uniformly in N , with factors depending only on α. For
readability in the subscripts we use b for bk (η) and a for ak . It is worth emphasising that all estimations in the table
are independent of ak .

40
k =0 k =0
k =1 k =1
=2 1% =2

|
20% k k

|bk (η)| |Yk′ −yk

′
=3 =3
|Yk′ −yk′ |

k k

|v|
Erroryk
′

10% 0.5%

Error
0% 0%

2 4 8 16 32 64 128 256 2 4 8 16 32 64 128 256

N N

Figure 16: The error of the estimator Yk0 , relatively to its own target value yk0 (left), and the error of the power
contribution bk (η)Yk0 , relatively to the global target v (right), for IQAE with the ancilla-free method. The IQAE is
set to obtain a precision for Yk of 0.5 N −k/2+1/2 , where 0.5 is an arbitrary constant and the scaling in N is designed
to compensate with the error scaling. Each point in the plot is the average of 20 independent runs on the qasm
simulator. There is no increasing or decreasing trend in the lines of the plot, consistently with the theory, stating
that the error keeps approximately constant if the required precision scales as the inverse of rk = O(N k/2−1/2 ). As
a comparison, in Fig. 12 the samples S are kept constant when N grows, and so the error increases. By contrast
here the precision is appropriately scaled with N , to keep the relative error constant.

Portfolio Assignment SUS1501 - Sustainability and Greed
100% (2)
Portfolio Assignment SUS1501 - Sustainability and Greed
14 pages
PhD Thesis Haoya-Augmented
No ratings yet
PhD Thesis Haoya-Augmented
238 pages
1504.06987
No ratings yet
1504.06987
28 pages
Quantum Algorithm For Nonhomogeneous Linear Partial Differential Equations
No ratings yet
Quantum Algorithm For Nonhomogeneous Linear Partial Differential Equations
9 pages
2407.19857v1
No ratings yet
2407.19857v1
7 pages
Solving Partial Differential Equations in Quantum Computers
No ratings yet
Solving Partial Differential Equations in Quantum Computers
17 pages
Potential Applications of Quantum Computing For The Insurance Industry
No ratings yet
Potential Applications of Quantum Computing For The Insurance Industry
43 pages
VQNet Library For A Quantum-Classical Hybrid Neural Network
No ratings yet
VQNet Library For A Quantum-Classical Hybrid Neural Network
11 pages
QuantumLinearSolvers
No ratings yet
QuantumLinearSolvers
42 pages
2403.16791v2 (1)
No ratings yet
2403.16791v2 (1)
21 pages
Experimental Quantum Learning Spectral Decomposition
No ratings yet
Experimental Quantum Learning Spectral Decomposition
10 pages
Homework_1
No ratings yet
Homework_1
3 pages
Quantum Data-Fitting: PACS Numbers: 03.67.-A, 03.67.ac, 42.50.Dv
No ratings yet
Quantum Data-Fitting: PACS Numbers: 03.67.-A, 03.67.ac, 42.50.Dv
6 pages
Deutsch - S and The Deutsch-Jozsa Algorithms
No ratings yet
Deutsch - S and The Deutsch-Jozsa Algorithms
9 pages
2505.17713v1
No ratings yet
2505.17713v1
10 pages
Classical Simulation of Peaked Shallow
No ratings yet
Classical Simulation of Peaked Shallow
32 pages
线性代数的变分算法
No ratings yet
线性代数的变分算法
15 pages
Experimental Quantum Computing To Solve Systems of Linear Equations
No ratings yet
Experimental Quantum Computing To Solve Systems of Linear Equations
5 pages
2501.01154v1
No ratings yet
2501.01154v1
6 pages
2412.19150v1
No ratings yet
2412.19150v1
18 pages
Markov Chain Monte-Carlo Enhanced Variational Quantum Algorithms
No ratings yet
Markov Chain Monte-Carlo Enhanced Variational Quantum Algorithms
9 pages
Quantum Algorithms for Reinformcementlearning
No ratings yet
Quantum Algorithms for Reinformcementlearning
11 pages
Barren Plateaus in Quantum Neural Network Training Landscapes
No ratings yet
Barren Plateaus in Quantum Neural Network Training Landscapes
7 pages
Option Pricing Ibm Q Skit
No ratings yet
Option Pricing Ibm Q Skit
14 pages
Classiq’s Generalized Arithmetic Challenge
No ratings yet
Classiq’s Generalized Arithmetic Challenge
3 pages
QPOL Meyer23a
No ratings yet
QPOL Meyer23a
22 pages
QANplatform Technical White Paper
No ratings yet
QANplatform Technical White Paper
21 pages
Quantum Advantage With Shallow Circuits
No ratings yet
Quantum Advantage With Shallow Circuits
23 pages
Building Quantum Software in Python: A Complete Developer's Guide to Quantum Programming and Applications
From Everand
Building Quantum Software in Python: A Complete Developer's Guide to Quantum Programming and Applications
Aarav Joshi
No ratings yet
Quantum Algorithms For Solving Ordinary Differential Equations Via Classical Integration Methods
No ratings yet
Quantum Algorithms For Solving Ordinary Differential Equations Via Classical Integration Methods
13 pages
Quantum Learning Algorithms Imply Circuit Lower Bounds: Srinivasan Arunachalam Alex B. Grilo
No ratings yet
Quantum Learning Algorithms Imply Circuit Lower Bounds: Srinivasan Arunachalam Alex B. Grilo
74 pages
Quantum Computing Basics
No ratings yet
Quantum Computing Basics
27 pages
Avoiding Symmetry Roadblocks and Minimizing The Measure-Ment Overhead of Adaptive Variational Quantum Eigensolvers
No ratings yet
Avoiding Symmetry Roadblocks and Minimizing The Measure-Ment Overhead of Adaptive Variational Quantum Eigensolvers
25 pages
A Quantum Algorithm For Solving Systems of Nonlinear Algebraic Equations
No ratings yet
A Quantum Algorithm For Solving Systems of Nonlinear Algebraic Equations
10 pages
Quantum Hyperparallel Algorithm
No ratings yet
Quantum Hyperparallel Algorithm
7 pages
E0_270_RL
No ratings yet
E0_270_RL
10 pages
Quantum Algorithms in Action: A Practical Guide to Implementation with Qiskit
From Everand
Quantum Algorithms in Action: A Practical Guide to Implementation with Qiskit
Robert Johnson
No ratings yet
1995 - Arxiv Preprint Quant-Ph9511026 - Kitaev
No ratings yet
1995 - Arxiv Preprint Quant-Ph9511026 - Kitaev
22 pages
Quantum Machine Learning Matrix Product States
No ratings yet
Quantum Machine Learning Matrix Product States
6 pages
s41534-020-00347-1
No ratings yet
s41534-020-00347-1
10 pages
1887_4139200-Full text
No ratings yet
1887_4139200-Full text
20 pages
QPOL - s41534 019 0141 3
No ratings yet
QPOL - s41534 019 0141 3
8 pages
A Quantum Algorithm For The Kalman Filter Using
No ratings yet
A Quantum Algorithm For The Kalman Filter Using
23 pages
Provably Efficient Exploration in Quantum Reinforcement Learning With Logarithmic Worst-Case Regret
No ratings yet
Provably Efficient Exploration in Quantum Reinforcement Learning With Logarithmic Worst-Case Regret
39 pages
2403.14703v3
No ratings yet
2403.14703v3
19 pages
10.1515 - Phys 2019 0087
No ratings yet
10.1515 - Phys 2019 0087
11 pages
Safe Optimal Control Using Stochastic Barrier Functions and Deep Forward-Backward Sdes
No ratings yet
Safe Optimal Control Using Stochastic Barrier Functions and Deep Forward-Backward Sdes
19 pages
Quantum Algorithms For Fixed Qubit Architectures
No ratings yet
Quantum Algorithms For Fixed Qubit Architectures
20 pages
Quantum Algorithm for Linear Systems of Equations
No ratings yet
Quantum Algorithm for Linear Systems of Equations
4 pages
Articulo 2
No ratings yet
Articulo 2
17 pages
CSE 599d - Quantum Computing Grover's Algorithm: Department of Computer Science & Engineering, University of Washington
No ratings yet
CSE 599d - Quantum Computing Grover's Algorithm: Department of Computer Science & Engineering, University of Washington
6 pages
1704.06174v2
No ratings yet
1704.06174v2
6 pages
Efficient Quantum State Preparation with Walsh Series
No ratings yet
Efficient Quantum State Preparation with Walsh Series
19 pages
Lecture 9
No ratings yet
Lecture 9
87 pages
OSQP
No ratings yet
OSQP
39 pages
Analysis of Quantum Circuits Via Abstract Stabilizer Simulation
No ratings yet
Analysis of Quantum Circuits Via Abstract Stabilizer Simulation
22 pages
Quantum Summary
No ratings yet
Quantum Summary
2 pages
Linear Quadratic Control
No ratings yet
Linear Quadratic Control
7 pages
bolean to hamilton
No ratings yet
bolean to hamilton
20 pages
Quantum Algorithms Lecture Notes Waterloo Co781 Itebooks pdf download
No ratings yet
Quantum Algorithms Lecture Notes Waterloo Co781 Itebooks pdf download
45 pages
3 Reachability
No ratings yet
3 Reachability
5 pages
Fisiha Fikiru - The Effect of HRM Practices On HCQ at TASH-1
No ratings yet
Fisiha Fikiru - The Effect of HRM Practices On HCQ at TASH-1
110 pages
Indian Towns On River Banks GK Notes in PDF 1
No ratings yet
Indian Towns On River Banks GK Notes in PDF 1
2 pages
RULA: A Survey Method For The - Irwestigation of World-Related Upper Limb Disorders
No ratings yet
RULA: A Survey Method For The - Irwestigation of World-Related Upper Limb Disorders
10 pages
DUTY OF DIRECTORS-iPleaders
No ratings yet
DUTY OF DIRECTORS-iPleaders
3 pages
Las LS4 Work Sample Ppa
No ratings yet
Las LS4 Work Sample Ppa
9 pages
PDF Clinical Nursing Skills at a Glance (At a Glance (Nursing and Healthcare)) (Nov 8, 2021)_(1119035902)_(Wiley-Blackwell) 1st Edition Fordham-Clarke download
100% (8)
PDF Clinical Nursing Skills at a Glance (At a Glance (Nursing and Healthcare)) (Nov 8, 2021)_(1119035902)_(Wiley-Blackwell) 1st Edition Fordham-Clarke download
66 pages
DATA FOR Mannwhitney PDF
No ratings yet
DATA FOR Mannwhitney PDF
3 pages
2403 Syllabus HKLegal System
No ratings yet
2403 Syllabus HKLegal System
5 pages
Reading Comprehension - Food Fright
No ratings yet
Reading Comprehension - Food Fright
4 pages
Lopsa Feb 2016
No ratings yet
Lopsa Feb 2016
63 pages
DOC-20250311-WA0001.
No ratings yet
DOC-20250311-WA0001.
172 pages
BSNL Postpaid Mobile Plans-290317
No ratings yet
BSNL Postpaid Mobile Plans-290317
4 pages
Abdulazeez Et Al - Assessing Strucctural Intergrity of Existing Building Structures
No ratings yet
Abdulazeez Et Al - Assessing Strucctural Intergrity of Existing Building Structures
10 pages
V3_IATG-05.40_en
No ratings yet
V3_IATG-05.40_en
70 pages
Pulse Processing PPT 2
No ratings yet
Pulse Processing PPT 2
41 pages
Viking Lander Sterilization
100% (1)
Viking Lander Sterilization
2 pages
Business Expert, Nour Hamed
No ratings yet
Business Expert, Nour Hamed
14 pages
WELS Guidebook PDF
No ratings yet
WELS Guidebook PDF
23 pages
Procurement Process
No ratings yet
Procurement Process
9 pages
BRE Modern Methods of Construction
No ratings yet
BRE Modern Methods of Construction
10 pages
Central Luzon State University: Republic of The Philippines Science City of Muñoz, Nueva Ecija
No ratings yet
Central Luzon State University: Republic of The Philippines Science City of Muñoz, Nueva Ecija
2 pages
[5]A simulation method for accurately determining DC and dynamic offsets in comparators
No ratings yet
[5]A simulation method for accurately determining DC and dynamic offsets in comparators
4 pages
Project Portfolio
No ratings yet
Project Portfolio
20 pages
Esop 07
No ratings yet
Esop 07
3 pages
Roof Framing Plan: FT-1 FT-1
No ratings yet
Roof Framing Plan: FT-1 FT-1
1 page
social-impact-assessment-of-road-infrastructure-projects
No ratings yet
social-impact-assessment-of-road-infrastructure-projects
21 pages
2014.02.20.secrecy Statutes
No ratings yet
2014.02.20.secrecy Statutes
5 pages
Part 1: Low Pass Filter Simulation (LPF)
No ratings yet
Part 1: Low Pass Filter Simulation (LPF)
21 pages
6 XII-Hospital Management-Pro Documentation
No ratings yet
6 XII-Hospital Management-Pro Documentation
18 pages

Quadratic Quantum Speedup in Evaluating Bilinear r

Uploaded by

Quadratic Quantum Speedup in Evaluating Bilinear r

Uploaded by

Quadratic quantum speedup in evaluating bilinear risk

Computing nonlinear functions over 1 Introduction

(c) As above *QAE implies

Figure 5: QHP of k = 4 vectors with mid-measurements and mid-resets, requiring k − 1 = 3 iterations.

k Grover Width Depth RZ count SX count X count CNOT Meas

asymptotically when  → 0, where Φ is the CDF of the standard normal distribution.

Proof. By the Central Limit Theorem, for any real β,

when S → ∞. Therefore, using the symmetry of Φ if a < 0,

Taking the square roots:

P (|YS − p| < ) = α, (22)

once S is chosen as   2

h0|U1† |ψ0 i = hψ1 |ψ0 i = p, (24)

Need for the loading unitary U1 Yes No

direct method direct method

3. assuming again yk 6= 0, P (|YS − yk | < ) ≤ α is also guaranteed by the stronger condition

4. assuming yk0 6= 0, any of the conditions in Eqs. (27) or (28)

Proof. X is a Bernoulli r.v. and, by the swap test theory,

Moving to the second claim, observe that

Applying Lemma A.1, an asymptotically sufficient condition for Eq. (33) is

3. assuming again yk 6= 0, P (|YS − yk | < ) ≤ α is also guaranteed by the stronger condition

Proof. Consider that

E[X] = P(R = 0, Z = 0) = P(R = 0|Z = 0) P(Z = 0) = a2k yk2 a−2 2

claims are obvious.

A.3 Circuit width and depth

Cw (k) = r(k) lg N + δ (swap) , (37)

A.4 Putting powers together

T ρE |bk (η)| |Yk − yk | ≤ wk ) ≥ αk for all k ≤ K.

Consequently, a similar estimation holds for the relative error:

rk := |v| ρkT ρE . (40)

Proof. Consider that

N grows. Additionally, by the Cauchy-Schwarz inequality, |v| ≈ |v ∗ | ≤ ρ−1 −1

rk = O(N K/2+1/2−1 ) = O(N K/2−1/2 ).

|bk (η)| |Yk′ −yk

2 4 8 16 32 64 128 256 512 2 4 8 16 32 64 128 256 512

2 4 8 16 32 64 128 256 512 2 4 8 16 32 64 128 256 512

and similarly the estimator for z is defined by

The error in z is then linearly controlled by the error in θ when θ̂ → θ by

Cd (k) = 2Cd,load (N ) + O(k lg N ), (47)

C Bidirectional Orthogonal Encoding of Ddta

C.1 Towards the BOE

In other words, we start with two BOE states

and the success rate of the QHP is ã−2

Remark C.3. This setting can be exploited to estimate

Third, in the context of the BOE, Eq. (10) is replaced by:

Correspondingly, the estimator for v ∗ is defined as

asymptotically when  → 0, where Φ is the CDF of the standard normal distribution;

This time BS → ã−2

 holds for the relative error: P (|V − v | ≤ |v|) ≥ β, provided

r̃k := |v| ρ̃2k 2

C.3 QAE techniques

and they dominate those of U 0 .

C.4 The classical sampling-based algorithm as a benchmark

102 103 104 105 106 107

102 103 104 105 106 107

D.2 Classical benchmarking algorithms

D.3 Complexity measures

D.4 Space and time complexity

|bk (η)| |Yk′ −yk

2 4 8 16 32 64 128 256 2 4 8 16 32 64 128 256

You might also like

asymptotically when → 0, where Φ is the CDF of the standard normal distribution.

P (|YS − p| < ) = α, (22)

once S is chosen as 2

3. assuming again yk 6= 0, P (|YS − yk | < ) ≤ α is also guaranteed by the stronger condition

3. assuming again yk 6= 0, P (|YS − yk | < ) ≤ α is also guaranteed by the stronger condition

T ρE |bk (η)| |Yk − yk | ≤ wk ) ≥ αk for all k ≤ K.

asymptotically when → 0, where Φ is the CDF of the standard normal distribution;

holds for the relative error: P (|V − v | ≤ |v|) ≥ β, provided