Quadratic Quantum Speedup in Evaluating Bilinear r
Quadratic Quantum Speedup in Evaluating Bilinear r
functions
Gabriele Agliardi1 , Corey O’Meara2 , Kavitha Yogaraj3 , Kumar Ghosh4 , Piergiacomo Sabino5,6 ,
Marina Fernández-Campoamor4 , Giorgio Cortiana4 , Juan Bernabé-Moreno7 , Francesco Tacchino8 ,
Antonio Mezzacapo9 , and Omar Shehab9
1
IBM Italia, Milan, Italy
2
E.ON Digital Technology GmbH, Hannover, Germany
3
IBM Quantum, IBM Research, India
4
E.ON Digital Technology GmbH, Essen, Germany
5
E.ON SE, Essen, Germany
arXiv:2304.10385v1 [quant-ph] 20 Apr 2023
6
University of Helsinki, Finland, Department of Mathematics and Statistics
7
IBM Research Europe, Dublin, Ireland
8
IBM Quantum, IBM Research Europe, Zurich, Switzerland
9
IBM Quantum, IBM Thomas J Watson Research Center, Yorktown Heights, NY, USA
1
P Objective P
Estimation of j
Ej p(Tj ) as an approximation for j
Ej f (Tj ),
where f is an arbitrary function and p is a polynomial.
Intermediate findings
Proposed approach Implementation variants Final result
with broader implications
(a)
Data (i)
Ancilla-free
preparation Dynamic stopping
inner product
and mid-resets
(ii)
Data loading Quantum methods (b)
for inner products Ancilla-free Quadratic speedup
inner product up to polylogarithmic
Polynomial (iii) and no mid-resets factors
approximation Investigation for polynomial p
with QHP of QHP (c) of degree 2
Ancilla-free under efficient
Estimation (iv) inner product data loading
of inner Adaptation and *QAE
products for *QAE
(v) (d)
Post-processing
Bidirectional orthogonal BOE and *QAE
encoding
Figure 1: A conceptual representation of the paper, detailed in Subsec. 1.1. We propose a hybrid approach for
the problem resolution. While tuning and implementing the algorithm, we derive five intermediate findings, which
contribute to different implementation variants. Variant (c) is finally selected for providing the desired speedup, which
constitutes our final result. The following terms are used in the picture: QHP for Quantum Hadamard Product (see
Subsec. 2.2), *QAE for Quantum Amplitude Estimation or equivalent alternatives, BOE for Bidirectional Orthogonal
Encoding (see Subsec. 2.1).
ulations [5, 6]. ure for nonlinear circuits. The work [17] pursues
an objective that is complementary to ours: it
In this manuscript, we rather focus on the
estimates an arbitrary (activation) function that
more fundamental task of efficiently calculating
takes an inner product as the input, while we ap-
the value of a contract, and by extension of port-
proximate the inner product of two vectors, one
folios, thus providing the data for subsequent risk
of which is the output of an arbitrary function.
measure calculation. The conceptual structure of
The remainder of the section provides a techni-
the work is summarized in Fig. 1 and discribed in
cal overview of our approach and findings, as well
Subsec. 1.1. Our target quantity takes the form
P as the context of risk analysis. Section 2 describes
of an inner product j Ej p(Tj ), where [Ej ] and
the proposed approach, the main building blocks,
[Tj ] are arbitrary vectors, and p is a polynomial
and the different implementation variants, with
approximating a function f . This formulation is
a focus on the most performing one, namely (c).
highly general and allows to address diverse forms
Section 3 studies the algorithmic complexity from
of energy contracts, beyond the exemplary one
a theoretical stand point, while Section 4 gives ex-
adopted here, as well as other use cases across in-
perimental results. The final Section 5 contains
dustries, e.g. finance [7], climate science [8], etc.
conclusions and outlooks. The paper is comple-
Quantum circuits that produce nonlinear be- mented with a rich set of Appendices, containing
haviors are not straightforwardly available, due to multiple implementation variants with the tech-
the native linearity of quantum operators [9–12]. nical details for the underlying theory.
Methods for treating nonlinearities either exploit
black-box approaches for usage in trainable cir-
1.1 Technical overview
cuits [13, 14], or represent specific types of non-
linear functions [15,16]. Ref. [9] discusses the role Our overall approach is sketched in the first col-
of selective operations and the probability of fail- umn of Fig. 1. Specifically, we employ polyno-
2
mial approximation of nonlinear functions and approach, that specifically differ for the encod-
then compute polynomials by resorting to Quan- ing and loading subroutines, for the presence of
tum Hadamard Products (QHP) [10,18] for pow- mid-circuit measurements and resets in the cir-
ers. Note that in a classical workflow, a polyno- cuits, for the quantum subroutine applied to cal-
mial approximation would also be a step, hidden culate inner products, as well as for the presence
inside the low-level calculation of the nonlinear of the Quantum Amplitude Estimation (QAE) as
function. a performance booster. Not all of these combi-
In the process of implementing and optimizing nations are compatible with each other, result-
the algorithm, we derive the following intermedi- ing in four selected variants (see third column in
ate findings that are interesting also beyond our Fig. 1 and details in Fig. 2), namely: (a) ancilla-
specific use case (see second column in Fig. 1): free inner product and mid-reset, (b) ancilla-free
inner product and no mid-reset, (c) ancilla-free
(i) Dynamic stopping: we show that QHP can inner product and *QAE, (d) BOE and *QAE.
be implemented via dynamic circuits, intro- The first three variants use the amplitude encod-
ducing the dynamic stopping, namely the ing and the ancilla-free method for inner prod-
early abortion of the circuit execution to re- ucts, while the last one employs the Bidirectional
duce the average circuit depth, based on mid- Orthogonal Encoding (BOE) and the swap test.
circuit measurements which are a relatively The notation *QAE remarks that QAE can be re-
new capability in commercial quantum pro- placed by any equivalent alternative, such as the
cessors, Iterative QAE (IQAE) [20], the Chebyshev QAE
(ChebQAE) [6] or the Dynamic QAE [1].
(ii) Quantum methods for inner products: we We prove that variant (c) outperforms the oth-
show that the sampling complexity of the ers and achieves a quadratic speedup up to poly-
swap test for the calculation of an inner logarithmic factors when the approximating poly-
product p is unbounded when p tends to 0, nomial has degree 2, under the assumption that
making other techniques (e.g. what we call an efficient data loading unitary is available in
ancilla-free) more convenient, whenever ap- amplitude encoding. Since the cost of contract
plicable, valuation is already linear in the number of data
points on classical computers, the possibility to
(iii) Investigation of QHP: we provide a simpli-
improve on that result via quantum algorithms is
fied formalization of the QHP, that describes
strictly connected to the ability of loading data
it as a unitary providing a desired output
efficiently, because in general the data loading
state under a success rate, and we clarify the
procedure has a linear cost. We deeply discuss
impact of normalization on the performance
the impact of data encoding and loading on the
of the QHP algorithm,
performance of the methods.
(iv) Adaptation for QAE: we show how to adapt Our approach is validated on IBM Quantum
our approach based on QHP, in order to devices, for small problem instances (N = 4 data
make use of the QAE algorithm, appropri- points). In this setting, overall errors are in line
ately modifying the circuit structure to post- with the theory, but the effect of noise is already
pone measurements, and clearly observable when drilling down to the er-
rors associated to high powers in the approximat-
(v) Bidirectional Orthogonal Encoding: we high- ing polynomial.
light that the encoding strategy proposed in
Ref. [19] is not compatible with swap tests 1.2 Background on portfolio risk analysis in
(used for computing the inner product of the energy industry
data vectors), but it can be modified giving
rise to the new Bidirectional Orthogonal En- The gas demand of households or heating can
coding (BOE), which is suitable for such a be well described by a deterministic dependency
task. of gas volumes and weather variables, typically
the temperature. Standard contracts for pri-
By combining the considerations above, we vate or industrial customers normally entail full-
generate multiple implementation variants for our supply gas delivery, without volume constraints.
3
Variant design Complexity
Encoding
Classical Inner Circuit Time
and Powers Remarks
data product design performance
loading
(a)
QHP
Ancilla-free
with Depending on Low width
inner product
mid-reset depth and
and mid-resets Without
pre-processing
Assumption: *QAE
(b) required
Ancilla-
Ancilla-free efficient data loading by UT , UE
free Low depth
inner prod and unitaries UT , UE
method
no mid-resets in amplitude encoding
Figure 2: Characteristics of the four proposed implementation variants. Each row is a variant. See Table 1 for
quantitative time scaling analysis. Variant (c) is highlighted as it provides the desired quantum speedup. Terms
used in the table are defined in Section 2: see Subsec. 2.1 for encoding (UT and UE are the loading unitaries for
temperatures and prices, BOE stands for Bidirectional Orthogonal Encoding), Subsec. 2.2 for powers (QHP is the
Quantum Hadamard Product), Subsec. 2.3 for inner product, Subsec. 2.4 for the circuit design (*QAE is the Quantum
Amplitude Estimation, or any equivalent technique). Refer to Appendix D for a detailed description of the variants
and for the extensive complexity analysis comparison.
Namely, the customer pays a fixed price for the in- and time-consuming Monte Carlo simulations to
dividual consumption whereas the supplier takes estimate the fair value, which is seen as the sam-
the risk of volume deviations from the projected ple mean, and some risk measures related to the
load profile of the customer. For example, on cold sample statistics [21–24].
days the gas demand is likely to be higher than Let us consider a simplified weather-related
expected, therefore in order to meet the demand, portfolio which depends on gas and temperature.
a supplier has to buy the extra gas needed on For simplicity, we consider one market and one
the day-ahead market, typically at prices that are temperature station. On the other hand, a real
higher than those contracted with the customer. portfolio would consider multiple markets as well
In contrast, excess volumes need to be sold as several weather stations. Suppose we have a
by the supplier for lower prices in order to bal- one-year time horizon from 1-Jan-2022 to 31-Dec-
ance the economic position when temperatures 2022, with daily granularity.
are higher than expected. This leads to costs and We assume that the gas prices and the temper-
risks for the gas supplier which can be managed atures, denoted1 [Ej0 ] and [Tj0 ], j = 0, 1, ..., 364,
with either purely temperature-based weather respectively, are given. Typically they are gener-
derivatives or with cross-commodity temperature ated by a two-factor Markov model each, namely
derivative contracts, often called quantos. the value at time j is updated with 2 random vari-
Accordingly, risk managers perform risk anal- ables, hence the total number of random draws is
ysis and compute the fair value and some statis- 365 × 2 = 730 both for the gas and for the tem-
tics of the entire weather-related portfolio and of perature, times the number of markets, leading
the contracts it consists of. To this end, one de-
fines upfront a joint stochastic model for the gas We introduce here primed notations for non-
1
prices and temperatures and relies on extensive normalized vectors, consistently with the entire paper.
4
Algorithm variant Time complexity
Classical exact O(N )
Classical Classical polynomial approx O(N )
Sampling-based polynomial approx Oβ,K (−2 lg N )
(a) Ancilla-free inner prod, mid-resets Oβ,K,b Cc,load (N ) + Cd,load (N )N K−1 −2
QAE-free
(b) Ancilla-free inner prod, no mid-res Oβ,K,b Cc,load (N ) + Cd,load (N )N K−1 −2
(c) Ancilla-free inner prod, *QAE Oβ,K,b Cc,load (N ) + [Cd,load (N ) + lg N ] N 2 −1
K−1
QAE-based
(d) BOE and *QAE Oβ,K,b N K 2s + lg2 N − s2 −1
Table 1: Time complexity of the proposed algorithm variants, introduced in Fig. 2, in comparison with the classical
benchmarks. Variant (c) is highlighted as it provides the desired speedup. An extended version is Table 7 of
Appendix D, that contains other complexity metrics as well as references to the Propositions justifying the results.
Notations are summarized in Table 2. The parameters in the analysis are the data set size N , the polynomial degree
K (K ≥ 1), the target precision and confidence level β such that P (|V − v ∗ | ≤ |v|) ≥ β, the coefficients bk (η)
in Eq. (8), and the split level s = s(N ) ∈ {1, ..., lg N } for the BOE. Remarkably, the error is measured against
the exact polynomial evaluation, under the assumption that the polynomial itself is a good approximation of the
target volume function. Asymptotic estimations are provided for → 0 or N → ∞. Constants affecting asymptotic
estimates are marked in the subscript of the big O notation: for instance the notation Oβ (−2 ) is intended for → 0
uniformly in N , with factors depending only on β. For readability in the subscripts
√ we use b for bk (η). The time
scaling is calculated under the additional assumption that norms scale as N , as justified in Subsection A.4. We
use the star notation in front of QAE (*QAE) to emphasize that any QAE technique is applicable, such as Iterative
QAE (IQAE) [20] Dynamic QAE [1], as long as it shares the same substantial time-scaling with the usual QAE.
to thousands. These random variables are usually belongs to (eg. households, medium-size enter-
assumed to be normally distributed and mutually prises, etc.), and normally one takes T0 = 40 de-
correlated even across gas and temperature. If gree Celsius.
then one considers R Monte Carlo repetitions one Finally, we are given a vector τj0 named season-
has to generate a random sample linearly scaling normal temperature which describes our daily ex-
with R. pectation of the temperature station.
As mentioned, the portfolio we focus on con- In our study we focus on the calculation of the
sists of fully supply contracts based on which the change of gross margin, defined as the (unknown
customer can nominate gas volumes at an agreed random) difference between the net random sales
sales price, denoted asp. These contracts are then less the random costs at a certain future time j
implicitly temperature dependent indeed, their and the planned, therefore known, sales minus
volume can be described by a function f of the cost at the same future time j. Formally, the
temperature which is supposed to model the cus- change (delta) gross margin (∆GM ) of a con-
tomers’ demand. Such a function is often a so- tract, depending on a certain gas market and a
called2 sigmoid function, namely certain temperature can be written as
X
A ∆GM = (f (Tj0 ) − f (τj0 ))(asp − Ej0 )). (2)
f (T ) =
0
C + D, (1) j
1+ B
T 0 −T0 In this study, we present multiple quantum ap-
proaches to evaluate ∆GM given the tempera-
with T 0 ≤ T0 . Of course, the higher the temper- ture and energy price vectors, and discuss their
ature, the lower is the volume demand and vice- potential advantages over classical counterparts
versa. The parameters A ≥ 0, B < 0, C > 1, under different conditions. More specifically, we
D > 0 are given, in the sense that they are either provide methods to efficiently compute contract
part of the contract or are estimated from the his- value functions of the form
torical demand of the cluster a specific customer X
v= f (Tj0 )Ej0 , (3)
j
2
Despite the name being widespread in the energy in-
dustry, the function is not a sigmoid according to the usual which can be used to reconstruct the expression
mathematical definition. in Eq. (2).
5
Notations introduced across the paper are col- Finally, the contract value in Eq. (8) is recon-
lected in Table 2. structed as the summation
K
X
v ≈ v ∗ := T ρE bk (η) yk ,
ρ−k −1
(10)
2 Hybrid quantum-classical approach k=0
Consider a single realization of the time series where bk (η) are known classically. It will be useful
P
representing temperatures and prices, namely two to write v ∗ = k bk (η) yk0 where
real vectors [Tj0 ]j=0
N −1
and [Ej0 ]N −1
j=0 indexed over
time. Suppose they are classically generated, and yk0 := ρ−k −1
T ρE yk . (11)
transformed into the normalized versions [Tj ]N −1
j=0 In the remainder of this Section, we discuss in
and [Ej ]N −1
j=0 through the affinities: detail the key parts of the quantum algorithm,
( namely data encoding, power calculation, and in-
Tj = ρT (Tj0 − η),
(4) ner product. We focus on the main implementa-
Ej = ρE Ej0 , tion variant (c), while highlighting the key differ-
ences where relevant.
where η is an appropriate translation to guaran-
tee Tj > 0 for all j at least in probability (refer
to Assumption 2.1 below), and ρT and ρE are
2.1 Data encoding and loading
suitable scale factors: There are multiple ways to represent the same
N
X −1 data set in quantum registers [25–32], such as the
T =
ρ−2 (Tj0 − η)2 (5) basis (aka digital, or equally-weighted ) encoding,
j=0 the amplitude (aka analog) encoding, the angle
N
X −1 encoding, etc. The subsequent quantum process-
E =
ρ−2 (Ej0 )2 . (6) ing techniques are highly dependent on the data
j=0
encoding protocol. Here, we focus on the am-
Our proposed quantum algorithm, outlined in plitude encoding. Additionally, we introduce the
Fig. 3, approximates the volume function f (T 0 ) Bidirectional Orthogonal Encoding (BOE), which
by means of a polynomial of degree K. WLOG, can be seen as a variant of the amplitude encod-
we can write the polynomial in the form ing designed to balance circuit width and depth.
K In the amplitude encoding, a normalized clas-
X
f (Tj0 ) ≈ bk (η) (Tj0 − η)k , (7) sical vector [Dj ] of linear size N is represented
k=0 as
N
X −1
where bk (η) are real coefficients. Consequently |ψD i := Dj |ji , (12)
the contract value of Eq. (3) writes j=0
N
X −1 X where {|ji} are computational basis states of lg N
v= Ej0 f (Tj0 ) ≈ bk (η) Ej0 (Tj0 − η)k . (8) qubits. While amplitude encoding offers an effec-
j=0 j,k tive use of memory resources, the exact prepa-
ration of an arbitrary state |ψD i of the form
The algorithm starts by loading normalized tem-
shown in Eq. (12) requires O (N ) operations in
perature and price vectors into a quantum reg-
the worst case, thus jeopardizing the benefits of
ister. A sequence of non-linear transformations
many quantum algorithms. In practice, ampli-
is then exploited to calculate the powers Tjk for
tude encoding remains attractive in combination
all j = 0, . . . , N − 1 and k = 0, . . . , K. Finally,
with approximate data loading techniques (e.g.,
the inner product of the processed vectors is effi-
qGANs) [33, 34], in the presence of specific data
ciently evaluated, thus returning
structures [35] or under standard quantum mem-
N
X −1 ory assumptions [36]. In our complexity analysis,
yk := Ej Tjk (9) we assume access to amplitude-encoded quantum
j=0
states to derive complexity considerations in the
for all k. The estimation of yk is boosted by limit of qubit-efficient representations.
the QAE algorithm, that provides the quadratic Appendix C details a second version of our pro-
speedup (up to polylogarithmic factors). tocol that exploits an alternative scheme, based
6
Generation of
Encoding of Calculation of
normalized
temperatures powers
temperatures
Contract
P (k)
E Estimation
|ψT ⟩ = Tj |j⟩ = value
[Tj ]N −1 j ψT of inner
j=0 P reconstruction
ak j Tjk |j⟩ , products
P v≈
k = 2, . . . , K yk = Ej Tjk P
Generation of
j
Encoding of k ρ−k −1
T ρE
normalized k = 0, . . . , K
prices bk (η) yk
prices
P
|ψE ⟩ = Ej |j⟩
[Ej ]N −1
j=0
j
Figure 3: Algorithm for the estimation of the contract value through the polynomial approximation of f . The
figure instantiates the approach represented in the first column of Fig. 1. Blue boxes represent classical pre- and
post-processing, while green boxes are the core quantum processing.
on a newly introduced data encoding that we mation on the error scaling. The data loading
name Bidirectional Orthogonal Encoding (BOE). procedure is detailed in Subsection C.1.
Even though in the current application the BOE
does not achieve the same performance as the am- 2.2 Non-linear transformation with QHP
plitude encoding, it may turn useful beyond the
our use case, as it is designed to balance memory Let us now discuss the non-linear transformation,
and time resources (i.e., circuit width and depth). namely the calculation of monomial powers. As-
On one side, it builds upon the bidirectional en- sume temperatures can be encoded in the am-
coding [19], and on the other side, it guarantees plitudes of a quantum state. More precisely, let
that side registers are orthogonal [37], thus en- [Tj ]N −1
j=0 be a normalized time series of tempera-
abling subsequent processing through swap tests. tures, namely a vector of real numbers such that
P 2
j Tj = 1. We can assume [Tj ]j=0 is obtained
N −1
The BOE is therefore characterised by states of
the form by simulated temperatures through a translation
and re-scaling as in Eq. (4). We are able to pro-
E N
X −1 E duce the state
ψ̃D := Dj |ji φD
j , (13)
N
X −1
j=0
|ψT i := Tj |ji . (14)
j=0
where the second register is auxiliary, and con-
tains states entangled to theD main Eregister, with In order to evaluate the approximating polyno-
the orthonormal property φD D = δ . Sim-
i φj ij mial, we need the powers Tjk , in the form of the
ilar to the bidirectional encoding, the BOE re- state
quires classical data to be organized in a binary E N
X −1
(k)
ψT := ak Tjk |ji , (15)
tree structure (Fig. 13). The split level steers
j=0
the balance between circuit depth and width: for
s = 1, the width is O(N ) and the depth O(lg2 N ) for k = 1, . . . , K, where ak is the appropriate
(depth efficient), while for s = lg N the width is scale factor,
O(lg N ) and the depth O(N ) (memory efficient, −1/2
akin to amplitude encoding). The technique is X
ak := Tj2k . (16)
less favourable than the amplitude encoding in j
our context as Eq. (10) gets replaced by Eq. (60),
that has a quadratic dependence on the norms Note that ak is an increasing finite sequence,
instead of a linear one, implying a worse scaling when k grows. In particular, ak ≥ 1 for all k
with N . Refer to Subsection 2.5 for more infor- since a1 = 1.
7
For the calculation of the powers, we resort to n
the non-linear transformation known as Quantum |ψ0 ⟩ |ψ0 ⊙ ψ1 ⟩
Hadamard Product (QHP), recently introduced n
|ψ1 ⟩ |0⟩⟨0|
by Holmes et al. [10]. Given two states |ψ0 i and
|ψ1 i, their QHP is the state
N −1 Figure 4: Circuit implementation of the Quantum
X
|ψ0 ψ1 i := a hψ0 |ji hψ1 |ji |ji , (17) Hadamard Product producing the state |ψ0 ψ1 i when
⊗n
j=1
the bottom register is measured in the state |0i . The
success probability is a−2 , where a is the normalization
which, in the circuit of Fig. 4, is obtained in the constant appearing in Eq. (17).
first register as the result of a post-selection con-
ditioned on the second register being in |0i⊗n .
The probability to measure |0i in the second reg- 2.3 Computing inner products
ister, i.e. the success rate for the calculation of
The third quantum step of the algorithm (Fig. 3)
the QHP, equals a−2 . It is important to men-
is the calculation of the inner products
tion that the QHP was originally defined [10] as
the (not necessarily normalized) weighted state N
X −1
arising from the application of a rank-1 measure- yk := Ej Tjk , (19)
ment operator |0ih0|⊗n , i.e., as the output of the j=0
circuit in Fig. 4 without post-selection. Our sim-
pler formulation in Eq. (17) is, nevertheless, fully for k = 1, . . . , K.
equivalent and well suited for our purposes. Again, let us start assuming that we have two
The QHP can easily be iterated to compute real vectors mapped to quantum states of the
P (α)
the Hadamard product of multiple vectors, hence form |ψα i = j ψj |ji via amplitude encoding.
producing higher-order Epowers. In particular, Multiple quantum techniques for the computa-
states of the form ψT
(k)
as defined in Eq. (15) tion of the absolute inner product |hψ0 |ψ1 i| =
P (0) (1)
are obtained by loading k copies of |ψT i as input j ψj ψj := p are known. Specifically, in Ap-
states and calculating their QHPs: pendix A, we describe and compare the so-called
+ swap test and the ancilla-free method. The latter
E k
K
(k)
ψT := ψT . (18) is simply grounded on the observation that, given
the two loading unitaries UT and UE , the expres-
Figures 5 and 6 show two implementations for sion h0|UE† UT |0i calculates the inner product of
the QHP of four states. Note that the former re- the two vectors.
quires mid-circuit measurements, and has higher Now, the swap test provides an estimation of
1 1 2
depth but lower width than the latter. The for- 2 + 2 p , while the ancilla-free method directly
mer is also suited for an additional improvement, outputs p2 . As an effect, one obtains that with
that we call dynamic stopping: thanks to the fea- the ancilla-free method, the sampling complexity
ture of ‘dynamic circuits’ recently made available is independent of p, while with the swap test it
on commercial hardware [38,39], the execution of is unbounded for p → 0, as shown numerically in
the circuit can be aborted right after a mid-circuit Fig. 10. In other words, it is impossible to esti-
measurements if the measurement does not out- mate the number of required samples to achieve a
put a 0, as this corresponds to a failure in the given precision, without knowing (an estimation
QHP. This way, dynamic stopping allows reduc- of) the result itself, see Fig. 7. For this reason
ing the average circuit depth. we choose, for all results discussed in the main
In the main algorithm variant (c), highlighted text, the ancilla-free method in association with
in Fig. 2 and in Table 1, we want to exploit the amplitude encoding. Refer to the Appendix A
QAE and related techniques for improved perfor- for more information on both techniques, and in
mance, and therefore we need a unitary circuit, particular to Table 6 for an in-depth comparison.
thus forcing us to the adoption of the implemen- Assembling all the algorithm components
D E in-
(k)
tation without mid-resets. Variants that include troduced so far, the inner product ψT ψE can
mid-resets and dynamic stopping are discussed in be estimated, provided that the following holds
the Appendices A and D.
8
n
|ψ0 ⟩ |ψ0 ⊙ ψ1 ⊙ ψ2 ⊙ ψ3 ⟩
n
|ψ1 ⟩ |0⟩⟨0| |ψ2 ⟩ |0⟩⟨0| |ψ3 ⟩ |0⟩⟨0|
n
|ψ0 ⟩ |ψ0 ⊙ ψ1 ⊙ ψ2 ⊙ ψ3 ⟩
n
|ψ1 ⟩ |0⟩⟨0|
n
|ψ2 ⟩ |0⟩⟨0|
n
|ψ3 ⟩ |0⟩⟨0|
Figure 6: QHP of k = 4 vectors with no mid-measurements and no mid-resets, requiring lg k = 2 iterations (note
that all measurements can be postponed to the end). This implementation is adopted in [18], without being named
QHP.
P
4 assume Tj > 0 and Ej > 0, and Tjk Ej is not
Sampling complexity (prefactor)
j
2
1−p
direct method f (p) = 4
null (and therefore positive).
1−p4
3 swap test f (p) = 4p2
and similarly for E, under hypothesis of avail-
ability of the binary trees for T̃ and Ẽ, so that
2 the inner product provides the expected result.
Details are given in Subsec. C.2.
1
2.4 Quantum Amplitude Estimation tech-
0 niques
0 0.2 0.4 0.6 0.8 1 The final building block for our main implemen-
p tation variant (c) is the Quantum Amplitude Es-
timation (QAE), that is typically described as
Figure 7: The dependence of the sampling complexity
the quantum alternative to classical Monte Carlo
on the target inner product p between two statevectors.
For the swap test, the complexity is not upper bounded methods, providing a quadratic speedup against
when p → 0. Refer to Prop. A.2 for the details. A the classical version (see e.g. [40]). In this case,
numerical evidence can be found in Fig. 10. we are rather interested in the fact that it also
provides a quadratic speedup in terms of preci-
sion when estimating the amplitude of a state,
Assumption 2.1. For all powers k = 0, . . . , K, against the straight-forward averaging of measur-
we assume that the inner product of the normal- ing repeated shots of a quantum circuit (refer
P
ized vectors j Tjk Ej is not null, and that its sign again to Ref. [40] for a synthetic explanation).
can be determined a priori. We often use the star notation in front of QAE
For simplicity, we rely on the following stronger (*QAE) to emphasize that any QAE technique is
version: applicable, such as Iterative QAE (IQAE) [20] or
Dynamic QAE [1], as long as it shares the same
Assumption 2.2. For all j = 0, . . . , N − 1, we substantial time-scaling with the usual QAE.
9
2.5 Assembling the main variant and adapting measure of the overall quantum circuit execution
the circuit for QAE cost under the simplification that all circuit layers
have similar duration.
We assume data is available in the amplitude en-
The overall assumption is that the polynomial
coding. The circuit is fed with k copies of the
of degree K represents the objective function suf-
temperature state |ψT i in Eq. (14), so that the
ficiently well on the relevant domain. Under this
power Eq. (15) of the temperature vector is cal-
condition, we can measure error scaling of each
culated trough the QHP. Moreover, we resort to
monomial individually, and combine in an overall
the QHP implementation without mid-circuit re-
estimate.
sets, as depicted in Fig. 6. Afterwards, under the
Additionally we can consider the √ 2-norms of
assumption that a loading unitary UE for |ψE i
the input vectors to scale as O( N ) when N
is known, the inner products yk can be calcu-
grows: for instance, this holds when all tempera-
lated with the ancilla-free method. The circuit
tures (prices) are independently sampled from the
described so far can be slightly modified as shown
same random variable T 0 (E 0 ), whatever its dis-
in Fig. 8(c), to be fed to a *QAE technique. Fi-
tribution, as shown in Subsection A.4 (see specif-
nally the volume v is reconstructed classically via
ically Remark A.13 and Example A.14).
Eq. (10).
If an efficient loading procedure in am-
It should be noted now that various monomials
plitude encoding is known, having classical
in Eq. (7) affect the overall error differently, as a
cost Cc,load (N ) N and quantum depth
T ρE bk (η) in Eq. (10). In
result of the factors ρ−k −1
Cd,load (N ) N 1/2 , then the proposed main vari-
order to achieve a relative error ≤ on the final
ant (c) has a quantum speedup against the clas-
result v, the absolute error of the monomial of
sical case when K = 2. If instead Cc,load (N ) =
index k must be controlled by
O(1) and Cd,load (N ) = O(1), for K = 3 the
k := ρTk−1 K −1 |bk (η)|−1 . (20) quantum complexity is comparable to the clas-
sical one, up to logarithmic factors.
The reason for such scaling is that the monomial
More in general, for any K, the time complex-
of power k decays as ρkT when N grows, meaning
ity of the main variant (c) is
that, when mapping back yk to yk0 , the absolute
error k associated to yk becomes 0k = ρ−kT k in O Cc,load (N ) + −1 [Cd,load (N ) + lg N ] N
K−1
2 ,
terms of yk0 . On the other side, the final result
v only scales as ρ−1
T due to bilinearity. The rel- where is the acceptable relative error threshold.
ative error, defined as the absolute error divided This cost compares with classical benchmarks of
by the target value, then scales as ρ1−k
T k . The O(N ), implying an advantage for K ≤ 2 and effi-
argument is formalized and discussed in Subsec- cient data loading. Table 1 (and the extended Ta-
tion A.4. Consequently, k is the threshold used ble 7) show that the other selected variants have
for the *QAE. higher costs. In particular, the analyzed QAE-
Theoretical details for the implementation out- free methods have the same asymptotic time com-
lined above are presented in the Appendix B, plexity of the classical techniques when N grows,
building on the version without *QAE introduced if K = 2 and data loading is performed in O(1)
in Appendix A (specifically in Subsec. A.2). depth.
As a final remark on time, consider what hap-
3 Quantum complexity analysis pens if the same temperature series and the same
price series are used for multiple contracts, that
In this Section we discuss the space and time com- differ for the definition of the volume function f
plexity of the algorithm. (for instance, constants A, B, C, D in Eq. (1)
The time complexity is summarized by the are varied). Then the polynomial approximation
sum of the classical runtime and quantum run- approach, both quantumly or classically imple-
time. The latter is defined in turn as the sum mented, is advantageous as yk in Eq. (10) can be
on k = 0, ..., K of the circuit depths (number computed just once for the different versions of
of layers in the quantum circuit) multiplied by f . Only the polynomial coefficients bk (η) and the
the respective sampling complexities (number cir- sum in Eq. (10) need to be recomputed for each
cuit executions). This product is an approximate f . Specifically for the quantum case, the circuits
10
R=0 R=0
n
|0⟩
n
UT UE†
|0⟩ UT UE†
Z1 = 0 Z1 = 0
n
|0⟩ UT |0⟩⟨0| n
|0⟩ UT
(a) (b)
n
|0⟩ UT UE†
n
|0⟩ UT
X=1
|0⟩
(c) A R
Figure 8: An exemplary demonstration of the algorithm based on QHP and the ancilla-free method, for the calculation
of the inner product between temperatures to the power k and prices, when k = 2. UT and UE are the loading
unitaries for T and E respectively. In (a), the algorithm in its original formulation, where the gray box highlights the
QHP. In (b), the version without mid-measurements. All measurements are deferred to the end. In (c), through a
multi-controlled NOT, a single qubit needs measurement. This suggests the definition of unitaries A and R, marked
in green, that can be fed to a QAE algorithm for efficient estimation, as described in Prop. B.2.
are executed once, independently of how many f totically advantageous in N only for K = 2.
functions are being evaluated. Note that asymptotically in N implies infinitely
The space complexity is represented by the cir- in the future temperature and weather forecast
cuit width Cw , namely the total number of qubits. data, which clearly does not occur in any real
For our main variant (c), it is O(k lg N ). application as described here. Polynomial coef-
The derivation of the theoretical bounds and ficients are [bk ]k = [17976, −360, −7.17, 0.0072]
their validation on simulators is contained in the if we consider the Taylor expansion in 0, or
Appendices, that consider one variant at a time. [bk ]k = [17957, −393, −6.50, 0.225] if we consider
The detailed comparison of the variants, under the best-fit polynomial. In both cases, given bk
multiple complexity metrics, can be found in Ap- are fast decreasing, the only relevant terms for
pendix D and specifically in Table 7. with a 1% error threshold, are b0 and b1 , accord-
ing to Eq. (20). Let us emphasize again that this
is due to small N , while for increasing N the term
4 Experimental demonstration ρKT always becomes dominant, independently of
We focus our experiments on an instance as small the coefficients.
as N = 4. We take realistic temperatures sim- To provide a more insightful example, we run
ulated for a weather station, and the volume the algorithm with k = 0.04 for all k, violating
function in Eq. (1) with parameters T0 = 40, the prescription in Eq. (20). We apply IQAE with
A = 20000, B = −35, C = 3, D = 6000. These 100 shots per iteration. Circuits are executed on
constants specify the shape of the sigmoid func- the IBM Jakarta device, having 7 qubits, a Quan-
tion and are generated by fitting the curve to tum Volume [41] of 16, and 2.4K CLOPS [42].
historic weather temperatures and gas volumes. Data loading in amplitude encoding is per-
Since all generated temperatures are positive, we formed through usual non-efficient methods,
take η = 0. For energy prices, we use a four- given the unstructured nature of input data. For
dimensional random vector. circuit optimization and error reduction we use
For demonstration purposes, we consider here three techniques, namely: ‘Mapomatic’ for find-
K = 3, even though our algorithm is asymp- ing the best qubit-layout identifying the low noise
11
subgraph [43], ‘dynamic decoupling’ (using single vectors through QHP. We introduced a variant
X-gate configuration) for mitigating decoherence of QHP with dynamic stopping thereby leverag-
in the ideal qubits [44] and ‘M3 error mitigation’ ing dynamic quantum circuits, achieving an im-
for measurement error mitigation [45]. provement in the leading constants of time scal-
Tables 3 and 4 show the outputs. As expected, ing, compared to the naive implementation. This
the error is very high for k = 2 and 3, but con- variant though is incompatible with QAE and is
tributes little to the overall result given the low therefore excluded from the core algorithm.
associated coefficients bk (η). The error level for We merged these building blocks and adapted
large k is not only a consequence of the choice of them to apply *QAE, thus getting an asymptotic
k , but mostly an effect of noise: indeed, some it- improvement against the classical performance
erations have a depth as high as 596 or 652, with under the assumption of efficient data loading,
more than 400 CNOT gates, as shown in Table 5. for degree K = 2 of the approximating polyno-
mial. The optimal implementation was selected
after evaluating multiple variants, collected and
5 Conclusion discussed in detail in accompanying Appendices.
We proposed an approach for the calculation of We provided a rigorous analysis and evaluation
P
the inner product j f (Tj0 )Ej0 where [Tj0 ]j and of four variants of the algorithm, differing with
[Ej0 ]j are two input vectors, and f is a function respect to data loading, inner product computa-
well approximated by a polynomial p. The ap- tions, and sequential algorithmic steps. A poten-
proach allows to break the workload into multiple tial area of future investigation is the scope of
parts, where the bottleneck becomes the calcula- improvement if one adopts the Szegedy walk [46]
P
tion of the inner products j Tjk Ej , where [Tj ]j to create the input.
and [Ej ]j are suitably normalized vectors. Con- Experiments on real quantum hardware were
sequently, we explored the application of quan- conducted for the main variant, and showed er-
tum computing to accelerate the summations rors in line with theoretical expectations for small
P k problem instances. The effect of noise becomes
j Tj Ej for all relevant k.
By applying additional QHPs, the method ex- relevant for high power orders of polynomial ex-
tends to the calculation of values resulting from pansion, and it is known from the theory that er-
P ror gets amplified when dealing with lengthier in-
multilinear functions j Xj,1 · · · Xj,v , where Xj,i
are the inputs, thus generalizing the bilinearity in put vectors. Extensive experiments on quantum
our methodology. It also adapts to the case where simulators are included in the Appendices and
Xj,i result from the elementwise application of validate the overall theoretical framework across
functions fi , as long as the fi can be polynomi- the different variants.
ally approximated by pi for all i. Said extensions,
though, do not preserve the asymptotic analysis Acknowledgements
expressed in the manuscript.
To encode the input vectors, we adopted the G.A., K.Y., and O.S. acknowledge Travis L.
amplitude encoding technique, after having eval- Scholten, Raja Hebbar, and Morgan Delk for
uated the novel Bidirectional Orthogonal Encod- helping with the business case analysis; Fran-
ing (BOE). The latter was proposed here to over- cois Varchon, Winona Murphy, and Matthew Sty-
come the limitations of the bidirectional encoding pulkoski from the IBM Quantum Support team to
for our purposes, and specifically for the calcula- help executing the experiments; Kristan Temme,
tion of inner products through the swap test. Daniel Egger, and Stefan Woerner for their feed-
We compared the ancilla-free method against backs on the manuscript; Jay Gambetta, Thomas
the swap test approach for the calculation of inner Alexander and Sarah Sheldon for allocating com-
products, discussing their asymptotic width and pute time on advanced hardware; Maria Cristina
time complexity as functions of the data set size Ferri, Jeannette M. Garcia, Gianmarco Quarti
and the error, and highlighting that the ancilla- Trevano, Katie Pizzolato, Jae-Eun Park, Heather
free method is preferable. Higgins, and Saif Rayyan for their support in
We proposed to calculate powers of the input cross-team collaborations.
12
Symbol Meaning
j = 0, ..., N − 1 Index over time
[Tj0 ], [Ej0 ] Temperature and price series
[Tj ], [Ej ] Normalized temperature and
price series, see Eqs. (5), (6)
[T̃j ], [Ẽj ] Normalized square-rooted
temperature and price series,
see Eq. (53)
ρT , ρE , ρ̃T , ρ̃E Normalization factors, see
Eqs. (5), (6), (54)
η Translation term for
temperatures, see Eq. (4)
f (T 0 ) Volume function, see Eq. (1)
v Contract value, see Eq. (3)
v∗ Polynomial approximate
contract value, see Eq. (10)
k = 0, ..., K Index over monomials in the
approximating polynomial, see
Eq. (7)
bk (η) Coefficients in the
approximating polynomial, see
Eq.(7)
yk Normalized inner product
defined in Eq. (9)
yk0 Inner product defined in
E Eq. (11)
(k)
|ψT i , ψT Amplitude encoding for [Tj ],
E [Tjk ], see Eqs. (14) and (15)
(k)
ψ̃T̃ , ψ̃T̃ BOE for [Tj ], [Tjk ], see Eqs. (55)
and (56)
ak , ãk Normalization factor for [Tjk ]
and [T̃jk ], see Eqs. (16) and (57)
Quantum Hadamard Product,
see Eq. (17)
s Split level in the BOE
S Number of samples
C· Complexity measures, see
Subsec. D.3
, α Acceptable error threshold and
associated confidence level for
estimators
lg Base-2 logarithm
k·k 2-norm of a vector (Euclidean
norm)
Table 2: Summary of notations. Primed letters are used for the non-normalized version of variables, while tildes
indicate normalization according to the sqare.
13
Overall Relative error for yk0 Error for the contribution of yk0
relative
error k=0 k=1 k=2 k=3 k=0 k=1 k=2 k=3
1.36% 1.71% 0.05% 16.52% 89.86% 2.23% 0.01% 0.83% 0.05%
1.02% 1.71% 0.37% 21.25% 76.94% 2.23% 0.09% 1.07% 0.04%
4.97% 1.67% 0.34% 57.79% 82.79% 2.18% 0.09% 2.92% 0.04%
4.76% 2.94% 0.07% 17.05% 78.73% 3.84% 0.02% 0.86% 0.04%
0.81% 0.20% 0.05% 20.57% 90.48% 0.26% 0.01% 1.04% 0.05%
2.58% 1.65% 0.18% 26.64% 83.76% 2.15% 0.04% 1.35% 0.04%
(0.0209) (0.0097) (0.0016) (0.1754) (0.0622) (0.0126) (0.0004) (0.0080) (0.0000)
Table 3: Errors arising in the estimation, in five independent algorithm runs of the algorithm with N = 4 and K = 3.
The algorithm is modified to set k = 0.04 as described in Sec. 4. The first column represents the overall relative
∗ Y 0 −y 0
error V v−v
∗ . The second group of columns contains the relative error for the estimators of yk0 , namely ky0 k .
k
The last group contains the same errors as the second, but rescaled according to the contributions they give to the
b (η) (Y −y )
0 0
overall estimation, namely k v∗k k . On the bottom two rows, the average and standard deviation of the values
above. In the table, the coefficients bk (η) are those for the Taylor expansion centered in 0, see Sec. 4.
Overall Relative error for yk0 Error for the contribution of yk0
relative
error k=0 k=1 k=2 k=3 k=0 k=1 k=2 k=3
0.03% 1.71% 0.05% 16.52% 89.86% 2.23% 0.01% 0.76% 1.46%
0.10% 1.71% 0.37% 21.25% 76.94% 2.23% 0.10% 0.98% 1.25%
3.41% 1.67% 0.34% 57.79% 82.79% 2.19% 0.10% 2.66% 1.35%
5.93% 2.94% 0.07% 17.05% 78.73% 3.85% 0.02% 0.78% 1.28%
2.14% 0.20% 0.05% 20.57% 90.48% 0.27% 0.01% 0.95% 1.47%
2.32% 1.65% 0.18% 26.64% 83.76% 2.16% 0.05% 1.23% 1.36%
(0.0247) (0.0097) (0.0016) (0.1754) (0.0622) (0.0127) (0.0004) (0.0080) (0.0010)
Table 4: Same as Table 3. Here, the coefficients bk (η) are those for the best-fit polynomial, see Sec. 4. Estimation
is based on the same shots as in the previous Table, only the post-processing is modified to account for the modified
coefficients.
Table 5: Collection of the IQAE iterations needed for the estimation underlying Tables 3 and 4. Each row represents
an iteration, and contains the metrics of the circuit transpiled for IBM Jakarta. For high k, some circuits have depths
that lie beyond the possibilities of current hardware, even under error mitigation.
14
References [10] Zoë Holmes, Nolan Coble, Andrew T. Sorn-
borger, and Yiğit Subaşı. “On nonlin-
[1] Kumar Ghosh, Corey O’Meara, Kavitha ear transformations in quantum computa-
Yogaraj, Gabriele Agliardi, Omar She- tion”. arXiv:2112.12307 [quant-ph] (2021).
hab, Piergiacomo Sabino, Giorgio Cortiana, arXiv:2112.12307.
Marina Fernández-Campoamor, and Juan [11] Paweł Horodecki. “From limits of quan-
Bernabé-Moreno. “Energy contract portfolio tum operations to multicopy entanglement
risk analysis using quantum amplitude esti- witnesses and state-spectrum estimation”.
mation”. unpublished (2023). Physical Review A 68, 052101 (2003).
[2] Nikitas Stamatopoulos, Daniel J. Egger, Yue [12] Maria Schuld, Ilya Sinayskiy, and Francesco
Sun, Christa Zoufal, Raban Iten, Ning Shen, Petruccione. “The quest for a quantum neu-
and Stefan Woerner. “Option Pricing us- ral network”. Quantum Information Process-
ing Quantum Computers”. Quantum 4, ing 13, 2567–2586 (2014).
291 (2020). [13] Iris Cong, Soonwon Choi, and Mikhail D.
[3] Stefan Woerner and Daniel J. Egger. “Quan- Lukin. “Quantum convolutional neural
tum Risk Analysis”. npj Quantum Informa- networks”. Nature Physics 15, 1273–
tion 5, 15 (2019). 1278 (2019).
[14] Kerstin Beer, Dmytro Bondarenko, Terry
[4] Dylan Herman, Cody Googin, Xiaoyuan Liu, Farrelly, Tobias J. Osborne, Robert Salz-
Alexey Galda, Ilya Safro, Yue Sun, Marco mann, Daniel Scheiermann, and Ramona
Pistoia, and Yuri Alexeev. “A Survey of Wolf. “Training deep quantum neural
Quantum Computing for Finance” (2022). networks”. Nature Communications 11,
arXiv:2201.02773. 808 (2020).
[5] Gilles Brassard, Peter Høyer, Michele [15] Sarah K. Leyton and Tobias J. Osborne. “A
Mosca, and Alain Tapp. “Quantum am- quantum algorithm to solve nonlinear dif-
plitude amplification and estimation”. In ferential equations” (2008) arXiv:0812.4423
Samuel J. Lomonaco and Howard E. Brandt, [quant-ph].
editors, Contemporary Mathematics. Vol- [16] Todd A. Bruni. “Measurimg polynomial
ume 305, pages 53–74. American Mathemat- functions of states”. Quantum Info. Comput.
ical Society (2002). 4, 401–408 (2004).
[6] Patrick Rall and Bryce Fuller. “Am- [17] Marco Maronese, Claudio Destri, and En-
plitude Estimation from Quantum Sig- rico Prati. “Quantum activation functions
nal Processing” (2022) arXiv:2207.08628. for quantum neural networks”. Quantum In-
arXiv:2207.08628 [quant-ph]. formation Processing 21, 128 (2022).
[7] Adam Bouland, Wim van Dam, Hamed [18] Michael Lubasch, Jaewoo Joo, Pierre
Joorati, Iordanis Kerenidis, and Anupam Moinier, Martin Kiffner, and Dieter Jaksch.
Prakash. “Prospects and challenges of “Variational quantum algorithms for non-
quantum finance” (2020) arXiv:2011.06492 linear problems”. Physical Review A 101,
[quant-ph, q-fin]. 010301 (2020).
[19] Israel F. Araujo, Daniel K. Park, Teresa B.
[8] Casey Berger, Agustin Di Paolo, Tracey Ludermir, Wilson R. Oliveira, Francesco
Forrest, Stuart Hadfield, Nicolas Sawaya, Petruccione, and Adenilton J. da Silva.
Michał Stęchły, and Karl Thibault. “Quan- “Configurable sublinear circuits for quantum
tum technologies for climate change: Prelim- state preparation”. arXiv:2108.10182 [quant-
inary assessment” (2021) arXiv:2107.05362 ph] (2022). arXiv:2108.10182.
[quant-ph].
[20] Dmitry Grinko, Julien Gacon, Christa Zo-
[9] Daniel R. Terno. “Nonlinear operations in ufal, and Stefan Woerner. “Iterative quan-
quantum-information theory”. Physical Re- tum amplitude estimation”. npj Quantum
view A 59, 3320–3324 (1999). Information 7, 52 (2021). arXiv:1912.05559.
15
[21] F.E. Benth and J. Saltyte-Benth. “Stochastic distributions”. npj Quantum Information 5,
modelling of temperature variations with a 1–9 (2019).
view towards weather derivatives”. Applied
[34] Gabriele Agliardi and Enrico Prati. “Op-
Mathematical Finance 12, 53–85 (2005).
timal Tuning of Quantum Generative Ad-
[22] F.E. Benth and J. Saltyte-Benth. “The versarial Networks for Multivariate Distri-
volatility of temperature and pricing of bution Loading”. Quantum Reports 4, 75–
weather derivatives”. Quantitative Finance 105 (2022).
7, 553–561 (2007).
[35] Lov Grover and Terry Rudolph. “Creating
[23] L. Cucu, R. Döttling, P. Heider, and superpositions that correspond to efficiently
S. Maina. “Managing temperature-driven integrable probability distributions” (2002).
volume risks”. Journal of energy markets 9, arXiv:quant-ph/0208112.
95–110 (2016).
[36] Vittorio Giovannetti, Seth Lloyd, and
[24] P. Sabino and N. Cufaro Petroni. “Fast Lorenzo Maccone. “Architectures for a quan-
Pricing of Energy Derivatives with Mean- tum random access memory”. Physical Re-
Reverting Jump-diffusion Processes”. Ap- view A 78, 052310 (2008).
plied Mathematical Finance 0, 1–22 (2021).
[37] Israel F Araujo, Daniel K Park, Francesco
[25] Manuela Weigold, Johanna Barzen, Frank
Petruccione, and Adenilton J da Silva. “A
Leymann, and Marie Salm. “Encoding pat-
divide-and-conquer algorithm for quantum
terns for quantum algorithms”. IET Quan-
state preparation”. Scientific Reports 11, 1–
tum Communication 2, 141–152 (2021).
12 (2021).
[26] Adriano Barenco, Charles H Bennett,
[38] “Quantum circuits get a dynamic up-
Richard Cleve, David P DiVincenzo, Nor-
grade with the help of concurrent
man Margolus, Peter Shor, Tycho Sleator,
classical computation”. url: https:
John A Smolin, and Harald Weinfurter. “El-
//www.ibm.com/blogs/research/2021/
ementary gates for quantum computation”.
02/quantum-phase-estimation/ (2021).
Physical review A 52, 3457 (1995).
[27] P. Kumar. “Direct implementation of an [39] A. D. Córcoles, Maika Takita, Ken Inoue,
n-qubit controlled-unitary gate in a single Scott Lekuch, Zlatko K. Minev, Jerry M.
step”. Quantum information processing 12, Chow, and Jay M. Gambetta. “Exploiting
1201–1223 (2013). dynamic quantum circuits in a quantum al-
gorithm with superconducting qubits”. Phys.
[28] J. A. Cortese and T. M. Braje. “Loading clas- Rev. Lett. 127, 100501 (2021).
sical data into a quantum computer” (2018).
arXiv:1803.01958. [40] Patrick Rebentrost, Brajesh Gupt, and
Thomas R. Bromley. “Quantum computa-
[29] M. Plesch and Časlav Brukner. “Quantum-
tional finance: Monte carlo pricing of fi-
state preparation with universal gate decom-
nancial derivatives”. Physical Review A 98,
positions”. Phys. Rev. A 83, 032302 (2011).
022321 (2018). arXiv:1805.00109.
[30] J. A. Miszczak. “Singular value decomposi-
[41] Andrew W. Cross, Lev S. Bishop, Sarah
tion and matrix reorderings in quantum in-
Sheldon, Paul D. Nation, and Jay M. Gam-
formation theory”. International Journal of
betta. “Validating quantum computers using
Modern Physics C 22, 897–918 (2011).
randomized model circuits”. Physical Review
[31] T. Heinosaari and M. Ziman. “Guide to A 100, 032328 (2019).
mathematical concepts of quantum theory”.
AcPSl 58, 487–674 (2008). [42] Andrew Wack, Hanhee Paik, Ali Javadi-
Abhari, Petar Jurcevic, Ismael Faro, Jay M.
[32] I. Bengtsson and K. Zyczkowski. “Geometry Gambetta, and Blake R. Johnson. “Qual-
of quantum states” (2006). ity, speed, and scale: three key attributes to
[33] Christa Zoufal, Aurélien Lucchi, and Stefan measure the performance of near-term quan-
Woerner. “Quantum generative adversarial tum computers” (2021) arXiv:2110.14108
networks for learning and loading random [quant-ph].
16
[43] Matthew Treinish et al. “mapomatic: Auto- [49] Harry Buhrman, Richard Cleve, John Wa-
matic mapping of compiled circuits to low- trous, and Ronald de Wolf. “Quantum fin-
noise sub-graphs”. url: https://github. gerprinting”. Physical Review Letters 87,
com/Qiskit-Partners/mapomatic (2022). 167902 (2001).
[44] Qiskit software developers. “Dynamical de- [50] Maria Schuld, Ilya Sinayskiy, and Francesco
coupling insertion pass”. url: https: Petruccione. “An introduction to quantum
//qiskit.org/documentation/ machine learning”. Contemporary Physics
stubs/qiskit.transpiler.passes. 56, 172–185 (2015).
DynamicalDecoupling.html (2022).
[51] Kouhei Nakaji. “Faster amplitude es-
[45] Paul D. Nation, Hwajung Kang, Neereja timation”. Quantum Information and
Sundaresan, and Jay M. Gambetta. “Scal- Computation 20, 1109–1123 (2020).
able mitigation of measurement errors on arXiv:2003.02417.
quantum computers”. PRX Quantum 2,
040326 (2021). [52] Ashley Montanaro. “Quantum speedup
of monte carlo methods”. Proceedings
[46] M. Szegedy. “Quantum Speed-Up of Markov
of the Royal Society A: Mathematical,
Chain Based Algorithms”. In 45th An-
Physical and Engineering Sciences 471,
nual IEEE Symposium on Foundations of
20150301 (2015).
Computer Science. Pages 32–41. Rome,
Italy (2004). IEEE. [53] Dmitri Maslov. “Advantages of using
[47] M. Fanizza, M. Rosati, M. Skotiniotis, relative-phase Toffoli gates with an applica-
J. Calsamiglia, and V. Giovannetti. “Beyond tion to multiple control Toffoli optimization”.
the swap test: Optimal estimation of quan- Physical Review A 93, 022311 (2016).
tum state overlap”. Physical Review Letters [54] Ewin Tang. “A quantum-inspired clas-
124, 060503 (2020). sical algorithm for recommendation sys-
[48] Vanio Markov, Charlee Stefanski, Abhijit tems”. In Proceedings of the 51st An-
Rao, and Constantin Gonciulea. “A gen- nual ACM SIGACT Symposium on Theory
eralized quantum inner product and ap- of Computing. Pages 217–228. (2019).
plications to financial engineering” (2022) arXiv:1807.04271.
arXiv:2201.09845.
17
A Alternate ways to compute the inner product in amplitude encoding
Consider two vectors |ψ0 i and |ψ1 i, and suppose one wants to calculate their inner product p. Multiple
quantum techniques for the computation of their inner product are known [47]. In this Appendix we
focus on two, namely the swap test and the ancilla-free method. We show in Appendix B how these
methods can be further enhanced recurring to Quantum Amplitude Estimation techniques as suggested
in Ref. [48].
A.1 The swap test and the ancilla-free method: definition and sampling complexity
The swap-test [49] is depicted in Fig. 9. Being characterized by low gate depth, it is widely used in
near-term applications including quantum machine learning [50].
To discuss its convergence, we need a Lemma:
Lemma A.1. pLet X̄S be the mean of S i.i.d. random variables with mean µ > 0 and variance σ 2 ,
√ YS := max{aXS + b, 0}, for some real constants a, b with a 6= 0 and aµ + b > 0. Then
and let
YS − aµ+b
√ is asymptotically a standard normal random variable when S → ∞. Therefore, the
|a|σ/ 4S(aµ+b)
√
error is controlled by P YS − aµ + b < = α once S is chosen as
2
a2 σ 2 1+α
S= Φ−1 (21)
4(aµ + b)2 2
and !
(aX̄S + b) − (aµ + b)
P √ <β → Φ−1 (β).
|a|σ/ S
Then also √ !
(aX̄S + b) − (aµ + b) 2 |a|σ/ S
P √ <β+β → Φ−1 (β)
|a|σ/ S 4(aµ + b)
by continuity of Φ−1 , since the additional term is defined for aµ + b > 0 and tends to 0. Rearranging
the inequality: √ !
aX̄S + b aµ + b 2 |a|σ/ S
P √ < √ +β+β → Φ−1 (β)
|a|σ/ S |a|σ/ S 4(aµ + b)
namely s
! s √
aX̄S + b aµ + b |a|σ/ S
P √ < c2 →Φ −1
(β) where c := √ +β .
|a|σ/ S |a|σ/ S 4(aµ + b)
At the same time, the probability can be decomposed as
! !
aX̄S + b aX̄S + b
P √ < c2 | X̄S ≥ 0 P X̄S ≥ 0 + P √ < c2 | X̄S < 0 P X̄S < 0
|a|σ/ S |a|σ/ S
18
n
|ψ0 ⟩
n
|ψ1 ⟩
|0⟩ H H
1 1 2
Figure 9: The swap test. The probability to measure |0i in the result qubit is 2 + 2 |hψ0 |ψ1 i| .
where the second term tends to 0 for the strong law of big numbers, since µ > 0. On the first term,
aX̄S + b equals Y 2 , so that we obtained
!
YS2
P √ < c2 → Φ−1 (β)
|a|σ/ S
which rewrites √ !
YS − aµ + b
P p < β → Φ−1 (β).
|a|σ/ 4S(aµ + b)
√
This proves the asymptotic standard normality. Therefore P YS − aµ + b < = α is equivalent
to
1+α
p = Φ−1
|a|σ/ 4S(aµ + b) 2
asymptotically, where the last equation turns to be (21).
Proposition A.2 (Sampling complexity of the swap test in amplitude encoding). Let α ∈ (0, 1).
Let Xi , for i = 1, ..., S, be a r.v. representing the output of the swap-test measurement after the
i-th shot
q of circuit in Fig. 9. Call X̄S the mean r.v. resulting from the S independent shots. Then
YS := max{0, 1 − 2X̄S } is an estimator for p = hψ0 |ψ1 i. Assuming p > 0, the error is controlled by
Proof. The probability to measure |0i in the swap test qubit is 21 + 12 p2 . Therefore, Xi are independent
Bernoulli with mean µ = 12 − 12 p2 and variance σ 2 = µ(1 − µ) = 14 − 14 p4 . Then Apply Lemma A.1.
Remark A.3. Since the presence of p in the denominator of Eq. (23) may come unexpected, let us
shortly comment: it derives from the fact that our estimator is bound to the mean r.v. through a square
−µ YS2 −p2
root. Indeed, X̄S√
σ/ S
being asymptotically standard normal is equivalent to √ being asymptotically
2σ/ S
standard normal. Now, we can write YS2 − p2 = (YS − p)(YS + p) and informally observe that YS + p →
2p. So, informally, 2p(Y −p)
S√
2σ/ S
= σ/(p
YS −p
√
S)
is asymptotically standard normal, and p appears at the
denominator in the estimator variance.
19
Remark A.4. The previous Proposition gives a sufficient condition for y 6= 0; we also have the
following sufficient condition for yk = 0 (and more in general for |yk | < ):
" # 2
1 1 1+α
S≥ 2 − yk4 Φ −1
2 − yk2 ak ρ4k
4
T 2
via CLT applied to YS2 − yk2 . Notice that the denominator scales as 4 instead of 2 in this case.
We call ancilla-free method another, even simpler way [48] to calculate the inner product of two
statevectors |ψ0 i and |ψ1 i. Its application is possible once an (efficient) unitary for loading at least one
of the two states, say |ψ1 i, is known. Namely, an operator U1 is given such that U1 |0i = |ψ1 i. Indeed:
so that it is sufficient to build the U1† |ψ0 i circuit, and project on |0i.
Proposition A.5 (Sampling complexity of the ancilla-free method). Let α ∈ (0, 1). Suppose the
ancilla-free method for the calculation of the inner product is implemented, the register is measured,
and the execution is repeated S times. Let Ri ∈ {0, ..., N − 1} be the measurement output for the i-th
shot, for i = 1, ..., S, and let Xi be a r.v. valued 1 if Ri = 0, and
q valued 0 otherwise. Call X̄S the
mean r.v. resulting from the S independent shots. Then YS := X̄S is an estimator for p, and the
error is controlled by
P (|YS − p| < ) = α, (25)
once S is chosen as 2
1 − p2 1+α
S= Φ−1 (26)
42 2
asymptotically when → 0, where Φ is the CDF of the standard normal distribution.
Proof. By Eq. (24), Xi are independent Bernoulli with mean µ = p2 and variance σ 2 = µ(1 − µ) =
p2 (1 − p2 ). The proof is an application of Lemma A.1.
Remark A.6. Lemma A.1 and therefore the proof of Prop. A.2 leverage the fact that YS is definitely
positive. Nonetheless we comment in Fig. 10 that when p is small (p = 0.072), even with S as high
as 10, 000, we empirically get the left-hand side to be negative with a probability of 38%. This implies
that the estimator YS is remarkably biased for the swap test. Fortunately this is not the case of the
ancilla-free method since YS is always positive.
The key differences between the ancilla-free method and the swap test are summarized in Table 6.
The sampling complexity is plotted in Fig. 7 as a function of the inner product p. It is easy to verify
analytically that the sampling complexity of the swap test is unbounded for small p, what makes it
impossible to choose the number of shots a priori. Fig. 10 empirically shows the different behavior of
the two methods when p is large (left plot) rather than small (right plot).
A.2 Using the inner product after the Quantum Hadamard Product
Two aspects must be taken into account to exploit the inner product techniques expressed so far into
our algorithm based on the polynomial expansion: on one hand, the effect of the rescaling factors ρT
and ρE on the sampling complexity, and on the other one, the consequences of the success rate a−2
k of
the QHPs.
Proposition A.7 (Algorithm QHP + swap testEin amplitude encoding). Let k be a fixed power order.
(k)
Consider a circuit that produces the state ψT defined in Eq. (15) through QHPs (with or without
E
(k)
mid-measurements), then loads |ψE i and applies the swap test between ψT and |ψE i, as depicted
20
Ancilla-free method Swap test
lg N 1
Number of measured qubits
[Higher] [Lower]
2 2
1 − p2 −1 1 + α 1 − p4 −1 1 + α
Sampling complexity for absolute Φ Φ
42 2 42 p2 2
error and confidence α
[Bounded in p] [Unbounded for p → 0]
U1† is applied on the same register |ψ0 i and |ψ1 i are loaded in
where |ψ0 i is loaded parallel, and the test takes depth
Depth
O(lg N )
[Higher] [Lower]
Table 6: Comparison of the ancilla-free method and the swap test for the calculation of the inner product p between
two statevectors |ψ0 i and |ψ1 i.
Estimated value
0.10
0.77
0.05
0.76
0.00
0 20 40 60 80 100 0 20 40 60 80 100
Estimation experiment Estimation experiment
(a) (b)
Figure 10: Each figure shows 100 independent estimations of the inner product between the same two statevectors.
All estimations are obtained through 10,000 shots of the circuits on a noiseless simulator, so that the different
outcomes are only an effect of random sampling. (a) The target inner product is big (p = 0.767). In this case,
the variance is 1.01 · 10−5 for the ancilla-free method, and 2.77 · 10−5 for the swap test. (b) The behavior of the
swap test worsens when p is small (in this case, p = 0.072). The variance is 2.60 · 10−5 for the ancilla-free method
and 3.20 · 10−3 for the swap test, which is significantly higher. Notice the presence of runs that provide a negative
squared inner product, in 38 cases out of 100.
21
n
|0⟩ UT
Z1 = 0
n
|0⟩ UT |0⟩⟨0|
n
|0⟩ UE
X=1
|0⟩ H H
(a)
n
|0⟩ UT
Z1 = 0
n
|0⟩ UT
n
|0⟩ UE
X=1
|0⟩ H H
(b)
Figure 11: An exemplary demonstration of the algorithm based on QHP and the swap test, for the calculation of the
inner product between temperatures to the power k and prices, when k = 2. UT and UE are the loading unitaries
for T and E respectively. In (a), the algorithm in its original formulation, where the gray box highlights the QHP,
and the blue box the swap test. In (b), the version without mid-measurements. All measurements are deferred to
the end.
in Fig. 11. Call X ∈ {0, 1} the output of the measurement of the control qubit in the swap test, and
Z ∈ {0, ..., N − 1}k−1 the outputs of all the k − 1 measurements in the QHPs. Define Xi ∼ X and
Zi ∼ Z, for i = 1, ..., S, as the outcomes of S independent samples from the circuit. Let
s
2#{i : Xi = 0, Zi = 0} − #{i : Zi = 0}
YS := ; YS0 := ρ−k −1
T ρE YS .
S
Then
P P
1. E[YS ] → j Ej Tjk =: yk when S → ∞ and E[YS0 ] → j Ej0 (Tj0 − η)k =: yk0 when S → ∞;
2. assuming yk 6= 0, the absolute error for YS is controlled by P (|YS − yk | < ) ≤ α once S is chosen
as ( ) 2
2 2 1 − a4k yk4 1 −1 3 + α
S ≥ max 4yk (ak − 1), Φ (27)
a2k yk2 2 4
asymptotically when → 0, where Φ is the CDF of the standard normal distribution;
22
On the other hand,
P(X = 0, Z = 0)
P(X = 0|Z = 0) = . (30)
P(Z = 0)
Recalling that
P(Z = 0) = a−2
k , (31)
by Eqs.(29) and (30) we derive
2
1 1 NX
−1
P(X = 0, Z = 0) = 2 + Ej Tjk
2ak 2 j=0
and therefore 2
N
X −1
2P(X = 0, Z = 0) − P(Z = 0) = Ej Tjk . (32)
j=0
The first claim is an application of the law of large numbers. For the second part of this claim, consider
that yk0 = ρ−k
T ρE yk .
−1
Therefore, let us name the two square roots AS and BS respectively. We know BS → a−1
k from
Eq. (31), then
P (|YS − yk | < ) = P k + AS ak − yk <
AS BS − AS a−1 −1
≥P AS BS − AS a−1
k < /2 and AS a−1
k − yk < /2
=1−P AS BS − AS a−1
k > /2 or AS a−1
k − yk > /2
≥1−P AS BS − AS a−1
k > /2 − P AS a−1
k − yk > /2
=P AS BS − AS a−1
k < /2 + P k − yk < /2 − 1
AS a−1
≥ 2β − 1 = α,
1+α
once we prove that the following inequalities hold for β = 2 :
P A B − A a−1 < /2 ≥ β
S S S k (33)
P (|AS − ak yk | < ak /2) ≥ β (34)
Let us start with the latter. Apply Prop. A.2 to Xi conditioned to Zi = 0, taking ak yk as the p
in Prop. A.2, ak /2 as the in Prop. A.2, #{i : Zi = 0} as the S in Prop. A.2, and β as the α in
Prop. A.2, thus getting
2
1 − a4k yk4 −1 1 + β
#{i : Zi = 0} = 4 2 2 Φ .
ak yk 2
Since #{i : Zi = 0} is asymptotic to a−2 k S, Eq. (27) guarantees the last expression and therefore
Eq. (34).
Let us consider Eq. (33) now. Since AS tends to ak yk , it is definitely dominated by 2ak yk . Eq. (33)
is then implied by
BS − a−1 < .
k
4ak yk
23
Now, BS2 is the mean of S i.i.d. Bernoulli variables with µ = a−2 2
k and σ = µ(1 − µ) = ak (1 − ak ).
−2 −2
which is again implied by Eq. (27). This way the second claim is proved.
P
The third claim derives from the second one, once we consider that yk = j Ej Tjk ≤ kEk T k = a−1
k .
The last claim is trivial.
Proposition A.8 (Algorithm QHP + ancilla-free method in amplitudeE encoding). Let k be a fixed
(k)
power order. Implement a circuit that produces the state ψT defined in Eq. (15) through QHPs
(with or without mid-measurements),
E then loads |ψE i and applies the ancilla-free method for the inner
(k)
product between ψT and |ψE i, and subsequently measures the target register, as depicted in Fig. 8(a)
and (b). Let R ∈ {0, ..., N −1} be the measurement output of the target register, let Z ∈ {0, ..., N −1}k−1
the outputs of all the k − 1 measurements in the QHPs, and let X be a r.v. valued 1 if R = 0 and
Z = 0, and valued 0 otherwise. Consider S independent shots, and let Xi ∼ X be their outcomes, for
i = 1, ..., S. Finally define q
YS := X̄S ; YS0 := ρ−k −1
T ρE YS .
Then
P P
1. E[YS ] → j Ej Tjk =: yk when S → ∞ and E[YS0 ] → j Ej0 (Tj0 − η)k =: yk0 when S → ∞;
2. assuming yk 6= 0, the absolute error for YS is controlled by P (|YS − yk | < ) ≤ α once S is chosen
as 2
1 − yk2 −1 1 + α
S= Φ (35)
42 2
asymptotically when → 0, where Φ is the CDF of the standard normal distribution;
4. assuming yk0 6= 0, any of the conditions in Eqs. (35) or (36) is also sufficient
for the error of the
0 0 −k −1
originally scaled problem in the following sense: P |YS − yk | < ρT ρE ≤ α.
Now the first claim is an application of the law of large numbers. For the second part of this claim,
consider that yk0 = ρ−k
T ρE yk . The second claim is an application of Lemma A.1. The third and fourth
−1
24
Cd (k) ≤ m(k)Cd,load (N ) + k + (3 lg N + 1)δ (swap) , (38)
where (
1 if swap test is used,
δ (swap) =
0 if ancilla-free method is used,
(
2 if mid-reset is used,
r(k) =
k + δ (swap) otherwise,
(swap) )
k + (1 − δ
if mid-reset is used and µ1 = · · · = µk−1 = 0,
m(k) = t + (1 − δ (swap) ) if mid-reset is used, µ1 = · · · = µt−1 = 0 and µt = 1, some t ≤ k − 1,
1 + (1 − δ (swap) ) if no mid-reset is used,
µt = 0 being a successful application of the QHP (namely µt being the output of measuring the Zt
variable, defined in Fig. 11 and Fig. 8). If mid-reset is used m(k) ≤ k and
k−1
X −2(k0 −1)
E[m(k)] = + (1 − k )
−2(k−1)
kak a−2 k 0 ak .
k0 =1
Proof. Let us start calculating the space complexity Cw of the quantum circuit, namely the circuit
width. The width required to load a data set of size N is lg N . Let us now justify the prefactor
r(k). When calculating the width required to encode data for the calculation, two scenarios must be
taken into account. If we do not resort to mid-measurements, k copies of |ψT i in different registers
are needed, plus one copy of |ψE i that lies in a different register only in the case of the swap test.
If we conversely can apply mid-circuit resets, the number of registers can be reduced to 2, regardless
of k. Finally, the swap test requires only one additional qubit, and the overall space cost is that in
Eq. (37).
Moving to depth, if mid-reset is not used, the data encoding of the k copies |ψT i is performed in
parallel, as well as that of |ψE i. Vice versa, if mid-reset is used, data encoding is done in series,
and in a given shot, an iteration of encoding is performed only if the measurement of the previous
iteration was successful. This is called dynamic stopping. The inequality and the expectation of m(k)
are obvious.
To complete the derivation of Eq. (38), notice that once data is loaded, the additional depth for
each QHP is 1, since all ECNOTs can be performed in parallel. Consequently, the additional depth
(k)
required to produce ψT , is t, and is therefore dominated by k. Finally, the swap test has a depth
of 3n + 1, as all swaps are controlled by the same ancilla.
and we want to verify that V is a good approximation for v, thanks to Eq. (10).
As a part of out asymptotic analysis, we shall discuss the error scaling when N → ∞. Since we can
expect the contract value to be affected by the growth of N , the analysis must be conducted under
relative error.
P
Proposition A.10 (Convergence rate in amplitude encoding). Let w ∈ [0, 1]K such that k wk = 1.
P
Also, let α ∈ [0, 1)K such that k (1−αk ) = 1−β for some β ∈ (0, 1). Finally, let > 0. For instance,
one may take wk = K −1 for all k and αk = K−1+β
K for all k.
25
Then V defined in Eq. (39) is an estimator for v ∗ such that
K
X K
X
P (|V − v ∗ | ≤ ) ≥ β, where v ∗ := bk (η) yk0 = T ρE bk (η) yk ,
ρ−k −1
k=0 k=0
P (|V − v ∗ | ≤ |v|) ≥ β,
provided that P |Yk − yk | ≤ rk wk |bk (η)|−1 ≥ αk for all k ≤ K, where
Finally, let K be fixed. When N grows, rK dominates the asymptotic behavior of rk for all other
k ≤ K.
K
X
≥1− T ρE |bk (η)| |Yk − yk | ≥ wk
P ρ−k −1
k=0
XK
≥1− (1 − αk ) = β.
k=0
The second claim is an application of the first one. As for the third, if rk → 0 for some k when
N → ∞, then ρkT ρE → 0. In such case, since ρT ≤ 1, rK goes to 0 at least as fast as rk .
Remark A.11. αk = K−1+β K is very close to 1 if K grows, implying a very high sampling complexity
from Eq. (28) or (36). Therefore the technique is effective only if K is low. Additionally, let us point
out that one may leverage the knowledge of bk (η) to refine the definition of αk and wk .
Remark A.12. Given the linear behavior of the inner product, we can assume |v| = O(ρ−1 T ρE ) when
−1
It is clear from the previous Proposition that the scaling of the error when N grows is bound to that
of the norms ρ−1 −1
E and ρT , as well as to the powers k.
√
Remark A.13. In general, it is reasonable to assume that the norms ρ−1 E , ρ−1
T scale as O( N ): indeed
√ p
if Tj0 are sampled from a same r.v. T 0 , Tj0 − η / N tends to the finite quantity E[(T 0 − η)2 ], and
similar for E 0 . In the same fashion, one can assume |v| = O(N ). As a consequence,
Figure 12 confirms the previous remark. Additionally, it shows the effect of bk in agreement with
Prop. A.10.
26
(a) S = 100
120%
k =0 k =0
100% k =1 k =1
=2 =2
|
k k
k k
|v|
yk
′
60%
Error
40% 2%
Error
20%
0% 0%
k k
|v| 1.5%
yk
′
60%
Error
1%
40%
Error
20% 0.5%
0% 0%
Figure 12: The error of the estimator Yk0 , relatively to its own target value yk0 (left) and the error of the power
contribution bk (η)Yk0 , relatively to the global target v (right), averaged over S samples, for the ancilla-free algorithm
without mid-resets. Each point in the plot is the average of 50 independent runs on the qasm simulator. Since the
polynomial has a very low coefficient b3 , the error for k = 3 provides a modest contribution to the overall result for
small problem sizes.
Example A.14. Consider a first problem. We are given two four-dimensional inputs [Tj0 ]30 = [xj ]30
and [Ej0 ]30 = [yj ]30 , which we assume to be positive and normalized. For simplicity, take the volume
P P
function to be f (x) = xK . Then the quantum algorithm is able to estimate j (Tj0 )K Ej0 = j xK j yj .
Now consider a second problem. This time, we are given as inputs the same values as be-
fore, but twice: so we have two eight-dimensional vectors [Tj0 ]70 = [x0 , ..., x3 , x0 , ..., x3 ] and [Ej0 ]70 =
P P
[y0 , ..., y3 , y0 , ..., y3 ]. Obviously this time j (Tj0 )K Ej0 = 2 j xKj yj , and therefore we accept an er-
ror that is the double of the one that we would accept in the previous problem. Now, we ap-
ply the quantum algorithm: we encode Tj = 2−1/2 Tj0 and Ej = 2−1/2 Ej0 , obtaining as a result
P −K/2 0 K −1/2 0 P
2 (Tj ) 2 Ej = 2−K/2−1/2 (Tj0 )K Ej0 , which needs to be rescaled by a factor 2K/2+1/2 to
obtain the final result. Unfortunately though this rescaling implies an error propagation that is not 2,
but 2K/2+1/2 .
Coherently with Remark A.13, the previous examples shows that the relative error scales as rk =
O(N k/2−1/2 ). If we additionally consider that the sampling complexity of the method described in this
Appendix scales as O(2 ), and that we need to add the circuit depth scaling on top to calculate the
quantum time, we conclude that we can improve on the classical case O(N ) only for K = 1. In the
next Appendix, we introduce QAE to partially overcome such limitation.
27
B Applying Quantum Amplitude Estimation techniques in amplitude enconding
To outperform known classical results, we leverage on the Quantum Amplitude Estimation technique [5]
and its variants, such as Faster QAE [51], Iterative QAE (IQAE) [20], Chebyshev QAE (ChebQAE) [6]
and Dynamic QAE [1].
Let us recall the general result for QAE. Suppose to be given a r.v. providing x with probability
|wx |2 , with x ∈ 0, . . . , 2n − 1. Let f be a real function defined on the same integer domain. Suppose to
P
have an n-qubit loading p unitary A such p that A |0i = x wx |xi, and an (n + 1)-qubit unitary
P
R such
that R |xi |0i = |xi ( 1 − f (x) |0i + f (x) |1i). The objective is to estimate E[f (A)] = x f (x)|wx |2 .
Define F := R(A ⊗ I) and |χi := F |0i. Let Z := I − 2 |0ih0| and U := I − 2 |χihχ| = FZF † . The
following holds:
Theorem B.1 (QAE scaling with bounded output values). Let f and A as defined above, such that
f (A) is valued in [0, 1]. Let the desired accuracy be . There exists a quantum algorithm, called QAE,
that uses O(1) copies of |χi and uses U for a number of times O(1/), and estimates z := E[f (A)]
up to an additive error with success probability at least 8/π 2 > 0.81. It suffices to sample from the
quantum circuit S = O(lg 1/(1 − α)) times and take the median ZS to obtain an estimate such that
P(|ZS − z| ≤ ) ≥ α.
Proof. See [52, Thm 2.3 and Lemma 2.1]. Also refer to [40].
The idea of the QAE is to connect the desired expectation value to an eigenfrequency of an oscillating
quantum system and then use the phase estimation algorithm to obtain the estimation up to a desired
accuracy [40]. The desired expectation z is linked to the corresponding phase θ via
1 θ
z= 1 − cos , (41)
2 2
1 θ
|Z − z| = sin θ̂ − θ + o θ̂ − θ = O θ̂ − θ , (43)
2 2
through a Taylor expansion of Eq. (41), as shown for instance in Ref. [40, Appendix F].
Now, let us apply the Theorem to our case. Specifically, we start discussing QAE where the unitary
is taken from the ancilla-free method without mid-measurements, as depicted in Fig. 8(c). Set A to
be the full unitary of the ancilla-free method, that loads temperatures, computes QHPs without mid-
P
measurements, and applies the inverse of the price loading. By Eq. (24), we get w0 = j Tjk Ej , while
wx is garbage for x 6= 0. Therefore it is sufficient to define
(
1 if x = 0,
f (x) := (44)
0 otherwise,
and the algorithm will estimate the desired inner product. Implementing f through a quantum circuit
R is trivial, as it is simply a multi-controlled NOT gate, testing
√ all qubits in all registers to be 0. Now,
√
z is the squared inner product, so that we can use YS := ZS to estimate the inner product y := z.
√
It turns out that the error bounds given for z in Prop. B.1 are also valid for y = z, as stated by the
following Proposition.
28
Proposition B.2 (Oracle complexity for QHP + ancilla-free method + QAE). Let f and A be those
P
specified right above. By applying QAE, it is possible to estimate yk := j Tjk Ej up to an additive
error with success probability at least 8/π 2 > 0.81, using O(1) copies of |χi and using U for a number
of times O(1/). It suffices to sample the circuit output Y for S = O(lg 1/(1 − α)) times and take the
median YS of the samples, to obtain an estimate such that P(|YS − yk | ≤ ) ≥ α.
√
Proof. Applying the Taylor expansion to y = z, as done in Eq. (43), one obtains again
|Y − y| = O θ̂ − θ (45)
for θ̂ → θ, uniformly in θ. The rest of the proof of Thm. B.1 flows alike.
Proposition B.3 (Oracle complexity for QHP + ancilla-free method + IQAE). The results of
Prop. B.2 are valid also if the Iterative QAE is applied instead of QAE.
Proof. The scaling of IQAE is grounded on the estimate Eq. (43) too, refer to [20, Algorithm 1 and
Appendix B]. Therefore, the same argument of Prop. B.2 can be adopted.
To perform a comparison between the classical and the quantum case, the cost of a single query
must be considered. The cost of U is obviously derived by the cost of A and of R. Now, the cost of
A was already calculated in Prop. A.9. As far as R is concerned, Ref. [26] shows that an n-controlled
NOT can either be implemented with 1 ancilla qubit and O(2n ) gates and depth [26, Lemma 7.1], or
with n − 1 ancillas with O(n) gates and depth [26, Lemma 7.2]. Ref. [53] further improved the ancillas
2 e for n ≥ 5.
necessary to achieve a linear depth, to d n−3
Proposition B.4. The implementation of the oracle U required by Prop. B.2 and B.3, has a width
and depth respectively of:
Cw (k) = O(k lg N ), (46)
Proof. Recall that U = FZF † , where F := R(A ⊗ I) and Z := I − 2 |0ih0|. Z can be implemented
with two 1-qubit gates, plus a (k lg N )-controlled NOT.
Concerning width, k copies of |ψT i need to be loaded in parallel for A, leading to k lg N . The
(k lg N )-CNOT requires O(k lg N ) ancillas [53]3 . Finally, R is a (k lg N )-CNOT as well.
Moving to depth, the data loading is performed in parallel on the different registers for |ψT i, as well
as CNOTs for QHPs. The data loading of |ψE i in performed afterwards. The other operations are
dominated by the two (k lg N )-CNOTs, which require O(k lg N ) depth.
Remark B.5. For the sake of completeness, let us highlight that the same QAE techniques can be
applied to the swap test as well, with a slightly more complex design, without any additional advantage
compared to QAE with the ancilla-free method. In this case, indeed, one needs to estimate two quan-
tities: with a first estimation problem, by defining f1 = 1 when QHPs are successful, one derives the
k . Then a second estimation problem is run, by setting f2 = 1 when both the QHPs are
success rate a−2
successful and the swap test ancilla provide 1. Finally, the two quantities are merged into Eq. (32),
recalling also Eq. (31), to estimate the desired inner product.
We comment the asymptotic performance of this technique, in comparison with the others, in Ap-
pendix D.
3
For detailed depth and width constants of n-CNOT refer to the cited manuscript.
29
−
D02 + D12 + D22 + D32
D 2 +D 2 D 2 +D 2
2 arcsin D2 +D02 +D12 +D2 2 arcsin D2 +D22 +D32 +D2
D02 + D12 D22 + D32 0 1 2 3 0 1 2 3
D2 D2 D2 D2
D02 D12 D22 D32 2 arcsin D2 +D
0
2 2 arcsin D2 +D
1
2 2 arcsin D2 +D
2
2 2 arcsin D2 +D
3
2
0 1 0 1 2 3 2 3
Figure 13: Two classical binary tree representations of a data set [19]: on the left, the state decomposition represen-
tation, and on the right the angle representation. The state decomposition can be built bottom up starting from a
classical array, and also applies to unnormalized data sets. The angle representation is specifically suited for quantum
data loading, and can be derived travelling the state decomposition tree top-down. Dashed nodes are redundant,
since they can inferred from their sibling.
It has the advantage of requiring only lg N qubits, but unfortunately needs O(N ) depth for exact
loading.
The D&C encoding is a variant of the analog encoding, recently proposed in Ref. [37]. We call it
divide-and-conquer encoding, or shortly D&C encoding, after the paper title. In this case, the state
produced is
E N
X −1 E
ψ̃D := Dj |ji φ̃j . (49)
j=0
The idea is to resort to an additional register, that contains auxiliary qubits, entangled with the main
register.
The advantage of this method is that exact loading can then be performed efficiently, namely
in O lg2 N . The downside is that the required side register is sized O (N ).
This led to the definition of the bidirectional encoding [19], a configurable mixed encoding, that
combines the amplitude and the D&C approaches, and defines a family of encoding techniques param-
eterized over a so-called split level s ∈ {1, ..., lg N }, that steers the balance between circuit depth and
width. For s = 1 the encoding coincides with the D&C, while for s = lg N the amplitude encoding is
retrieved. Finally, for s = 21 lg N , it is possible to achieve a sublinear scaling both in depth and width.
The state takes again the form of Eq. (49).
Despite the similarity between Eqs. (48) and (49), and despite the fact that measurement of the
primary register provides the same results in both cases, it is essential to remark that D&C is in fact
a different encoding from the amplitude. The algorithms that require the amplitude encoding cannot
all be trivially applied to data in the D&C encoding, and specifically the techniques introduces so far
30
for the calculation of the inner product do not apply in the D&C encoding. Even more so, they cannot
be employed in the bidirectional encoding.
Here the D&C-orth encoding comes to the aid. As the original paper [37] shows, the D&C encoding
can be modified to guarantee that the auxiliary states are orthonormal, i.e. hφi |φj i = δi,j , at the
expense of an additional side register of small width lg N . The new encoding is relevant for us, since
it is compatible with the application of the swap test, that does not provide the same result as in the
amplitude encoding (see Prop. C.2), but is still useful for the calculation of the inner product.
Combining these elements, we define the Bidirectional Orthogonal Encoding (BOE) in the following
way:
E N
X −1 E N
X −1
ψ̃D := Dj |ji φ̃j |ji = Dj |ji |φj i . (50)
j=0 j=0
P E
−1
where N j=0 Dj |ji φ̃j is constructed the bidirectional encoding, and the additional third register
obviously guarantees orthonormality of {|φj i}j .
When s = 1, the D&C-orth encoding is retrieved.
Proposition C.1 (Circuit depth and width of the BOE). Let [Dj ] be a vector of length N , a power of
2, and suppose a classical binary tree representation is available. Let s be any integer in {1, . . . , lg N
}
1 2 2
called split level. Then the state in Eq. (50) can be constructed in depth 2 + 2 lg N − lg N − s + s +
s
1 = O 2s + lg2 N − s2 and width (s + 1)N 2−s − 1 + lg N = O ((s + 1)N 2−s ).
PN −1 E
Proof. It is known [19] that j=0 Dj |ji φ̃j , namely the associated bidirectional encoding, can be
1 2
constructed in depth 2s + 2 lg N − lg N − s2 + s and width (s + 1)N 2−s − 1. The conclusion is
then trivial.
1
For s = 2 lg N , both width and depth are sublinear.
h i h i
(0) (1)
Proposition C.2 (Swap test in the BOE). Let Dj and Dj be two vectors of length N , a power
of 2, represented in the BOE with split level s. Apply the swap test between the primary register of the
two statevectors (see Fig. 9). Then
1 1 P (0) 2 (1) 2
1. The swap test qubit is measured in the state |1i with probability 2 + 2 j Dj Dj .
2. Let > 0 and α ∈ (0, 1). Let Xi , for i = 1, ..., S, be a r.v. representing the output of the swap-test
measurement after the i-th shot of circuit. Call X̄S the mean r.v. resulting from the S independent
P (0) 2 (1) 2
shots. Then YS := 1 − 2X̄S is an estimator for p = j Dj Dj and the error is controlled
by
P (|YS − p| < ) = α, (51)
once S is chosen as
2
1 − p2 1+α
S= Φ−1 (52)
2 2
asymptotically when S → ∞, where Φ is the CDF of the standard normal distribution.
Proof. The proof of the first claim is the same as that of the swap test for the D&C encoding,
see [37]. For the second claim, consider that Xi are i.i.d. Bernoullis with mean 21 − 12 p and variance
1
σ 2 = µ(1 − µ) = 4 − 14 p2 . By the Central Limit Theorem, −µ
X̄S√
σ/ S
is asymptotically a standard normal.
Therefore YS −p
√
2σ/ S
= √ YS −p√ is asymptotically a standard normal as well.
1−p2 / S
31
C.2 Data encoding and inner product
√ √
The previous Proposition motivates us to load the normalized versions of T 0 − η and E 0 , under the
Assumption 2.2 that all terms are positive, so that the estimator provides a normalized version of the
desired inner product. More specifically, define
q q
T̃j = ρ̃T Tj0 − η, Ẽj = ρ̃E Ej0 , (53)
where
N
X −1 N
X −1
T =
ρ̃−2 (Tj0 − η), E =
ρ̃−2 Ej0 . (54)
j=0 j=0
when successful, preserving orthonormality of the side register, where ãk is the appropriate scale factor
−1/2
X
ãk := T̃j2k . (57)
j
as we demonstrate shortly. Before moving to that, it is worth underlying some differences between
the newly introduced ãk , ỹk and the previous ak , yk . First of all, ỹk scales quadratically with factors
applied to [T̃jk ] or [Ẽj ], while yk scales linearly with [Tjk ] and [Ej ]. We will comment later on this fact,
that has a major impact on performance (see Prop. C.6). Secondly, and coherently with the former
observation, the bounds 0 ≤ ak yk ≤ 1 are now replaced by 0 ≤ ã2k ỹk ≤ 1. Indeed:
X 2 2
ỹk = Ẽj2 T̃j2k ≤ Ej2 Tj2k = kEj k24 Tjk ≤ kEj k2 Tjk = ã−2
k . (59)
4
Even though the estimation method is different, yk0 and therefore v ∗ are the same as in the amplitude
encoding.
32
Proposition C.4 (Algorithm E BOE + swap test). Let k be a fixed power order. Implement a circuit
(k)
that produces the state ψ̃T̃ defined in Eq. (56) through QHPs (with or without mid-measurements),
E E E
(k)
then loads ψ̃Ẽ and applies the swap test between ψ̃T̃ and ψ̃Ẽ , as depicted in Fig. 14. Call
X ∈ {0, 1} the output of the measurement of the control qubit in the swap test, and Z ∈ {0, ..., N −1}k−1
the outputs of all the k − 1 measurements in the QHPs. Define Xi ∼ X and Zi ∼ Z, for i = 1, ..., S,
as the outcomes of S independent samples from the circuit. Let
2#{i : Xi = 0, Zi = 0} − #{i : Zi = 0}
ỸS := ; YS0 := ρ̃−2k −2
T ρ̃E YS .
S
Then
P P
1. E[ỸS ] → j Ẽj2 T̃j2k =: ỹk when S → ∞ and E[YS0 ] → j Ej0 (Tj0 − η)k =: yk0 when S → ∞;
2. the absolute error for ỸS is controlled by P ỸS − ỹk < ≤ α once S is chosen as
( ) 2
1 − ã4k ỹk2 1 3+α
S ≥ max 16ỹk2 (ã2k − 1), 4 Φ−1 (62)
ã2k 2 4
Proof. The proof follows that in Prop. A.7. In particular, the first claim is straight-forward. Concern-
ing the second claim, one defines
P
i:Zi =0 Xi #{i : Zi = 0}
AS := 1 − 2 and BS := .
#{i : Zi = 0} S
For the latter, apply Prop. C.2 to Xi conditioned to Zi = 0, taking ã2k ỹk as the p in Prop. C.2,
ã2k /2
as the in Prop. C.2, #{i : Zi = 0} as the S in Prop. C.2, and β as the α in Prop. C.2, and
recalling that #{i : Zi = 0} is asymptotic to ã−2
k S.
2
For the former instead, since AS tends to ãk ỹk , it is sufficient to prove BS − ã−2 < 4ã2 ỹ . BS is
k
k k
the mean of S i.i.d. Bernoulli variables with µ = ã−2 2
k and σ = µ(1 − µ) = ãk (1 − ãk ). Since there
−2 −2
are no square roots in this case, the CLT can be applied directly without need for Lemma A.1, to
verify that the condition in Eq. (62) is sufficient. This proves the second claim.
The third one is simple through Eq. (59), also remembering ã−2 k ≤ 1. The fourth is trivial as
well.
33
n
|0⟩ ψ̃T̃
UT̃
|0⟩ aux
Z1 = 0
n
|0⟩ ψ̃T̃ |0⟩⟨0|
UT̃
|0⟩ aux
n
|0⟩ ψ̃Ẽ
UẼ
|0⟩ aux
X=1
|0⟩ H H
(a)
n
|0⟩ ψ̃T̃
UT̃
|0⟩ aux
Z1 = 0
n
|0⟩ ψ̃T̃
UT̃
|0⟩ aux
n
|0⟩ ψ̃Ẽ
UẼ
|0⟩ aux
X=1
|0⟩ H H
(b)
n
|0⟩ ψ̃T̃
UT̃
|0⟩ aux
n
|0⟩ ψ̃T̃
UT̃
|0⟩ aux
n
|0⟩ ψ̃Ẽ
UẼ
|0⟩ aux
|0⟩ H H
X=1 X=1
|0⟩
(c) A R R′
Figure 14: An exemplary demonstration of the algorithm based on BOE data loading, QHP, and the swap test, for
the calculation of the inner product between temperatures to the power k and prices, when k = 2. UT̃ and UẼ are
the loading unitaries for T̃ and Ẽ respectively, in the BOE. In (a), the algorithm in its original formulation, where
the gray box highlights the QHP, and the blue box the swap test. In (b), the version without mid-measurements.
All measurements are deferred to the end. In (c), a version with single-qubit measurement, suggesting unitaries A
and R for QAE techniques. To adopt QAE, a separate circuit based on A and R0 is required to estimate the success
rate of QHPs (for this purpose A can be reduced to a smaller circuit that only computes the QHPs). As usual, we
highlighted in gray the QHP, in blue the swap test, and in green the QAE unitaries.
34
Proposition C.5 (Circuit width and depth in BOE). Let k be a fixed power order. Then the
algorithm
described in Prop. C.4 (called BOE + swap test) have the following width and depth:
2 + 2 lg2 N − lg N − s2 + s + 1.
s 1
Cw (k) = r(k) (s + 1)N 2−s − 1 + lg N + 1, (66)
1 2
Cd (k) ≤ m(k) 2s + lg N − lg N − s2 + s + k + (3 lg N + 1)δ (swap) , (67)
2
where r(k) and m(k) are those defined in Prop. A.9 with δ (swap) = 1.
Proof. Similar to that of Prop. A.9, exploiting also Prop. C.1.
P
Proposition C.6 (Convergence rate in BOE). Let w ∈ [0, 1]K such that k wk = 1. Also, let
P
α ∈ [0, 1)K such that k (1 − αk ) = 1 − β for some β ∈ (0, 1). Finally, let > 0. For instance, one
may take wk = K −1 for all k and αk = K−1+β K for all k.
Then Ṽ defined in Eq. (61) is an estimator for v ∗ such that P (|V − v ∗ | ≤ ) ≥ β provided that
T ρE |bk (η)| |Yk − yk | ≤ wk ) ≥ αk for all k ≤ K.
P(ρ−k −1
Finally, let K be fixed. When N grows, rK dominates the asymptotic behavior of rk for all other
k ≤ K.
Proof. Similar to that of Prop. A.10.
Remark C.7. Let us emphasize here that r̃k scales quadratically √with ρ̃kT , whereas rk scales linearly
with ρkT . Under the assumptions of Remark A.13, ρ̃T scales as O( N ), similarly to ρT . As a conse-
quence,
r̃k = O(N K+1−1 ) = O(N K ).
Given the discussion at the end of Subsection A.4, this compares to rk = O(N K/2+1/2 ) of the amplitude
encoding, providing strong limitations to the applicability of the BOE for big values of N .
Complexity of the different techniques is summarized and discussed in Appendix D. For the moment,
let us say that it can be improved by resorting to QAE, as done for the amplitude encoding.
figure, A is composed by data loading, QHP, and swap test. Then we define R0 to value an additional
qubit as |1i iff all the QHPs were successful, and R to value an additional qubits |1i iff all the QHPs
were successful and the swap test is |1i. With these building blocks, we can achieve the following
performance.
Proposition C.8 (Complexity for BOE + *QHP + swap + QAE). Let A, R0 and R as specified
right above. Let U 0 and U be the Grover oracles for A and R0 , and A and R respectively, according to
(1)
the usual QAE technique. Let YS the median outcome of S executions of a QAE estimation applied
(0)
to U 0 , and let YS be similarly the median outcome of the QAE technique applied to A and R. Then
(1) (0) P
YS := 2YS − YS is an estimator for ỹk := j Ẽj2 T̃j2k .
Moreover, P(|YS − ỹk | ≤ ) ≥ α is obtained by using the Grover oracles U and U 0 for a number of
times O(1/), and taking S = O(lg 1/(1 − α)).
Additionally, under the same conditions, P(|YS0 − ỹk0 | ≤ ρ̃T−2k ρ̃−2
E ) ≥ α.
35
Proof. Consider a circuit that performs U 0 and measures the last qubit: by definition of A and R0 , the
probability to get 1 is the success
rate ã−2
k . Similarly, if one executes U and measures the last qubits,
1
obtains 1 with probability 2 ỹk + ãk , applying the usual argument of Eq. (32).
−2
(1)
As a consequence, for Thm. B.1, we obtain P YS − ã−2
k ≤
2 ≥ β by using U 0 a num-
ber
of times O(1/), and taking
S = O(lg 1/(1 − β)), where β = 1+α 2 . In the same way,
(0) 1
P YS − 2 ỹk + ãk −2
≤ 2 ≥ β by using U a number of times O(1/), and taking S = O(lg 1/(1 −
β)).
Combining the two estimates, we get P (|YS − ỹk | ≤ ) ≥ α. The proof is complete once we observe
that O(lg 1/(1 − β)) = O(lg 1/(1 − α)).
As far as the originally scaled version of the estimation is concerned, simply substitute YS0 and yk0
in the previous result.
Proposition C.9 (Oracle depth and width in the BOE). In the setting of the previous Proposition,
and assuming a split level s = s(N ) ∈ {1, . . . , lg N }, the depth and width of U are:
Cw (k) = O (s + 1)N 2−s + k lg N , (69)
Cd (k) = O 2s + lg2 N − s2 + k lg N , (70)
Proof. The proof is similar to that of Prop. B.4, using the fact that data loading requires resources
listed in Prop. C.1.
Proposition C.10 (Sample access [54, Prop. 3.2]). The state decomposition tree of a vector of length
N , described in Fig. 13, also allows for classical sampling from the vector in O(lg N ) time.
Proposition C.11 (Sampling-based inner product [54, Prop. 4.2]). Given query access to two real
P
vectors v, w, sample access to v, and knowledge of kvk, the inner product j vj wj can be estimated to
1 1
additive error kvk kwk with probability at least α using O 2
lg 1−α queries and samples.
The better error scaling of Prop. B.3 and Prop. C.4, in contrast with the classical sampling-based
version in Prop. C.11, is empirically demonstrated in Fig. 15.
Remark C.12. Let us emphasize that the classical sampling-based algorithm requires sample access
to only one of the inputs: in our case, we can assume sample access to prices, that do not undergo any
transformation. On the contrary, in the quantum case we obviously need to normalize both vectors.
This translates into the fact that the scaling of the sampling-based algorithm in N does not depend
on k, since we can assume [f (Tj0 )]j = O(N 1/2 ) for any k, in contrast with the quantum behavior
highlighted in Subsection A.4.
In conclusion, the error scaling of the proposed quantum algorithm is better, but only for fixed vector
size N .
36
(a) k = 1
10−1
10−2
10−3
10−4
error
10−5
10−6
direct+IQAE
D&C-orth+swap+IQAE
10−7 classical sampling-based
error=1/queries
10−8 error=1/sqrt(queries)
(b) k = 2
100 direct+IQAE
D&C-orth +swap+IQAE
classical sampling-based
10−1
error=1/queries
error=1/sqrt(queries)
10−2
10−3
error
10−4
10−5
10−6
10−7
Figure 15: IQAE scales quadratically better than the classical sampling-based algorithm described in Prop. C.10, in
terms of error, when the number of queries grows. Queries are intended as the distribution samples in the classical
case, and the oracle calls in the quantum case. Let us remind that D&C-orth is the same as BOE with s = 1. The
underlying problem is that of calculating the inner product of two fixed vectors of dimension N = 4, and power k = 1
in (a), k = 2 in (b). Quantum runs are executed on a noiseless QASM simulator in Qiskit. Dashed lines are the
best-fit lines in the log-log space.
37
D Comparative complexity analysis
D.1 Summary of the algorithm variants
In the previous appendices we introduced multiple implementations of our approach, generated by
different data encodings (amplitude or BOE), different implementations of the QHP (with or without
mid-reset), and different techniques for the inner product (ancilla-free method or swap test), as well
as by the introduction of *QAE in some cases. Here we collect and compare the main ones, to discuss
their complexity. Their key characteristics are also summarized in Fig. 2.
(a) Ancilla-free inner product and mid-resets. The first variant we consider assumes data are avail-
able in the amplitude encoding. Each circuit is fed with k copies of the temperature state |ψT i in
Eq. (14), so that the power Eq. (15) of the temperature vector is calculated trough the QHP. More-
over, the same register is reused for the multiple copies of |ψT i, as shown in Fig. 5. At the level of
quantum circuit implementation, this variant can benefit from the application of the dynamic stopping
technique, as introduced in Subsec. 2.2. Afterwards, under the assumption that a loading unitary UE
for |ψE i is known, the inner products yk can be calculated with the direct method. Finally the volume
v is reconstructed classically via Eq. (10). We refer to Subsection A.2 for the detailed implementation
and analysis.
(b) Ancilla-free inner product and no mid-resets. In a second variant, we resort to the QHP imple-
mentation without mid-circuit resets, as depicted in Fig. 6, while leaving all other steps unchanged.
This results in a shallower circuit, at the cost of more qubits. Details are again in Subsec. A.2.
In both techniques introduced so far, the ancilla-free method for the estimation of inner products can
be replaced with the swap test. We do not extensively discuss these variants here, since, as previously
mentioned, the swap test comes with the relevant disadvantage that the sampling complexity is in
general unbounded (Fig. 7). On the other hand, the swap test does not require knowledge of a loading
unitary UE , as it is enough to have access to a copy of the state |ψE i at each iteration. This does not
appear to be a significant limitation in our use case, where we assume data are classically available
and loaded via circuits. The algorithm complexity is also not improved by using the swap test, see
Subsec. A.2.
(c) Ancilla-free inner product and *QAE. The version above without mid-circuit resets inspires
another interesting variant: under the assumption that loading unitaries UT and UE are available for
both vectors, the overall circuit before any measurement is made can be seen as a unitary, that is fed
to a Quantum Amplitude Estimation (QAE) technique to obtain a quadratic speedup in the precision
, in line with the general result of QAE. Details are presented in Appendix B. This version (c) is the
main variant described in the body of the manuscript.
(d) BOE and *QAE. Additional algorithms can be obtained by resorting to the BOE, as discussed
in Appendix C. In this context, the swap test is the only viable option. We focus on the variant that
loads data in BOE, applies the QHP for the power calculation, the swap test for the inner product,
and boosts precision via QAE techniques.
38
of the usual summation in Eq. (10).
39
Classical algorithms Classical exact Classical polynomial Sampling-based
approximation polynomial
approximation
Classical Encoding Array Array Binary Tree
Oracle
complexity Co (k) N/A N/A Oα (ρ2E ρ2T |v|−2 ) c
Classical time Cc O(N ) b O(N ) b O(lg N ) Oβ (r12 −2 ) +
Summary OK (1) cde
Time scaling Cc a O(N ) O(N ) Oβ,K (−2 lg N )
QAE-free algorithms (a) Ancilla-free and mid-resets (b) Ancilla-free, no mid-resets
Classical Encoding Various Various
Quantum
Encoding Amplitude Amplitude
Depth Cd (k) ≤ Cd,load (N )k + k f 2Cd,load (N ) + k f
Circuit Samples CS (k) Oα ρ2E ρ2k 0 −2 −2 g
T (yk ) Oα ρ2E ρ2k 0 −2 −2 g
T (yk )
Width Cw (k) 2 lg N f k lg N f
Classical time Cc 2Cc,load (N ) + OK (1) he 2Cc,load (N ) + OK (1) he
2 −2 j 2 −2 j
Quantum time Cq Cd,load (N ) Oβ,K,b (rK ) Cd,load (N ) Oβ,K,b (rK )
Summary a
Time scaling Cc + Cq Oβ,K,b (Cc,load (N ) + Oβ,K,b (Cc,load (N ) +
Cd,load (N )N K−1 −2 ) k Cd,load (N )N K−1 −2 ) k
QAE-based algorithms (c) Ancilla-free and *QAE (d) BOE and *QAE
Classical Encoding Various Sqrt Binary Tree
Quantum
Encoding Amplitude BOE
Depth Cd (k) 2Cd,load (N ) + Ok (lg N ) l O 2s + lg2 N − s2 + k lg N p
Oracle Width Cw (k) Ok (lg N ) l O ((s + 1)N 2−s + k lgN ) p
0 −1 −1 m
Complexity Co (k) O ρE ρT (yk )
k
O ρ̃E ρ̃kT (ỹk0 )−1 −1 q
n
Depth Cd (k) Various Various n
l
Circuit Width Cw (k) Ok (lg N ) O ((s + 1)N 2−s + k lg N ) p
m
Samples CS (k) Oα (1) Oα (1) q
he
Classical time Cc 2Cc,load (N ) + OK (1) OK (1) de
Quantum time Cq Cd,load (N ) Oβ,K,b (rK )+
−1
Oβ,K,b r̃K −1
2s + lg2 N − s2
Summary Oβ,K,b (rK −1 lg N ) o rp
Time scaling Cc + Cq a
Oβ,K,b (Cc,load (N ) + −1 K
Oβ,K,b N 2s + lg2 N − s2
s
−1 [Cd,load (N ) + lg N ] N 2 ) k
K−1
a
√
Under the additional assumption that norms scale as N , see Subsection A.4. b Under the simplification that
c
floating point operations are performed in O(1) time and without precision loss. Prop. C.10, Prop. C.11
d
and Remark C.12. Assuming that the input binary tree is available; otherwise, its preparation requires
O(N ) time [37]. e The term OK (1) encompasses the polynomial reconstruction of Eq. (10) or (60). f Prop. A.9
g
Prop. A.8. h Cc,load is the classical pre-processing needed to load a copy of one input vector. i Easily derived
from the circuit depth. j Use Prop. A.10 and the fact that quantum time is defined as the product of circuit
depth and samples. k Remark A.13. l Prop. B.4. m Prop. B.2 and Prop. B.3. n The circuit depth depends
on the specific QAE technique, but all the techniques considered share the same overall depth. o Prop. A.10 and
Prop. B.2 (Prop. B.3). p Prop. C.9. q Prop. C.8. r Prop. C.6 and Prop. C.8. s Remark C.7.
Table 7: Complexity analysis of algorithm variants proposed in Subsec. D.1, in comparison with the classical bench-
marks listed in Subsec. D.2. Refer to Subsec. D.3 for a description of the complexity measures. In the case of circuit
complexity and oracle complexity measures, the parameters in the analysis are the data set size N , the monomial
degree k (k ≥ 1), the target precision and confidence level α such that P (|Yk0 − yk0 | ≤ yk0 ) ≥ α, the success
rates a−2
k , the target value yk , the scaling factors ρE , ρT , the ‘tilde version’ of the variables above and the split
level s = s(N ) ∈ {1, ..., lg N } for the BOE. In the case of summary complexity measures, we use N as before, the
polynomial degree K (K ≥ 1), the target precision and confidence level β such that P (|V − v ∗ | ≤ |v|) ≥ β,
the coefficients bk (η) in Eq. (8), the scaling factors ρE , ρT , the convergence ratios rk defined in Eq. (40) or r̃k in
Eq. 68. Remarkably, the error is measured against the exact polynomial evaluation, under the assumption that
the polynomial itself is a good approximation of the target volume function. Asymptotic estimations are provided
for → 0 or N → ∞. Constants affecting asymptotic estimates are marked in the subscript of the big O notation:
for instance the notation Oα (−2 ) is intended for → 0 uniformly in N , with factors depending only on α. For
readability in the subscripts we use b for bk (η) and a for ak . It is worth emphasising that all estimations in the table
are independent of ak .
40
k =0 k =0
k =1 k =1
=2 1% =2
|
20% k k
k k
|v|
Erroryk
′
10% 0.5%
Error
0% 0%
Figure 16: The error of the estimator Yk0 , relatively to its own target value yk0 (left), and the error of the power
contribution bk (η)Yk0 , relatively to the global target v (right), for IQAE with the ancilla-free method. The IQAE is
set to obtain a precision for Yk of 0.5 N −k/2+1/2 , where 0.5 is an arbitrary constant and the scaling in N is designed
to compensate with the error scaling. Each point in the plot is the average of 20 independent runs on the qasm
simulator. There is no increasing or decreasing trend in the lines of the plot, consistently with the theory, stating
that the error keeps approximately constant if the required precision scales as the inverse of rk = O(N k/2−1/2 ). As
a comparison, in Fig. 12 the samples S are kept constant when N grows, and so the error increases. By contrast
here the precision is appropriately scaled with N , to keep the relative error constant.
41