0% found this document useful (0 votes)

8 views33 pages

Adaptive Sampling Bi-Fidelity Stochastic Trust Region Method For Derivative-Free Stochastic Optimization

Uploaded by

mavilihuv.hurulobut

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views33 pages

Adaptive Sampling Bi-Fidelity Stochastic Trust Region Method For Derivative-Free Stochastic Optimization

Uploaded by

mavilihuv.hurulobut

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Adaptive Sampling Bi-Fidelity Stochastic Trust Region Method for

Derivative-Free Stochastic Optimization

Yunsoo Ha∗1 and Juliane Mueller†1
1
Computational Science Center, National Renewable Energy Laboratory,
15013 Denver West Parkway, Golden, 80401, Colorado, USA

Abstract
Bi-fidelity stochastic optimization is increasingly favored for streamlining optimization pro-
cesses by employing a cost-effective low-fidelity (LF) function, with the goal of optimizing a
more expensive high-fidelity (HF) function. In this paper, we introduce ASTRO-BFDF, a
new adaptive sampling trust region method specifically designed for solving unconstrained
bi-fidelity stochastic derivative-free optimization problems. Within ASTRO-BFDF, the LF
function serves two purposes: first, to identify better iterates for the HF function when a high
correlation between them is indicated by the optimization process, and second, to reduce the
variance of the HF function estimates by Bi-fidelity Monte Carlo (BFMC). In particular, the
sample sizes are dynamically determined with the option of employing either crude Monte
Carlo or BFMC, while balancing optimization error and sampling error. We demonstrate that
the iterates generated by ASTRO-BFDF converge to the first-order stationary point almost
surely. Additionally, we numerically demonstrate the superiority of our proposed algorithm
by testing it on synthetic problems and simulation optimization problems with discrete event
simulations.

1 Introduction
We consider the simulation optimization problem
Z
min f h (x) := E[F h (x, ξ)] = F h (x, ξ) P (dξ), (1)
x∈ℜd Ξ

where f h : ℜd → ℜ is nonconvex and bounded from below, and F h : ℜd × Ξ → ℜ is a random

function with the random element ξ having distribution P on the measurable space (Ξ, F). So,
ξ : Ω → Ξ is an F-measurable function. In addition, we consider zeroth-order stochastic oracles,
where the derivative information is not directly available from the Monte Carlo simulation.
The function F h (x, ξ) can be generated through the invocation of a stochastic simulation ora-
cle. Furthermore, we assume that there exists an additional stochastic simulation oracle capable of
approximating F h (x, ξ) at a lower cost than the original simulation oracle. This cost-effective ora-
cle, termed the low-fidelity (LF) simulation, generates F l : ℜd × Ξ → ℜ with f l (x) := E[F l (x, ξ)].
∗
[email protected]
†
[email protected]

1
1.1 Bi-fidelity Derivative-free Stochastic Optimization
The iterative algorithms designed to solve problem (1) typically produce a random sequence
{Xk , k ≥ 1}. In the context of SO, these algorithms generate the sequence by determining
both the direction and the step size. Given that direct gradient information is not available from
the simulation oracle, we rely on approximation techniques like a finite difference method [1],
interpolation/regression models [2, 3, 4], and Gaussian smoothing [5] to determine the direction.
These approximation methods are based on function estimates, which are obtained by repeatedly
invoking the stochastic simulation oracle, as shown below:
n
1X h
F̄ h (x, n) = F (x, ξi ), (2)
n
i=1
h 2 −1
Pn h h
2
with a variance estimate (σ̂ ) (x, n) := n i=1 F (x, ξi ) − F̄ (x, n) . To obtain the conver-
gence results in some probabilistic senses, the algorithms must have sufficiently large sample sizes
for each design point during the optimization process. Therefore, it is a logical step to aim for
reducing the total number of simulation replications during the optimization process while still
achieving convergence, as this is typically the main source of computational load. In line with
these efforts, one strategy involves leveraging a LF simulation oracle F l (·, ξ), which is less costly
than the original high-fidelity (HF) simulation oracle F h (·, ξ), whenever possible throughout the
optimization process. This particular method of optimization falls under the category of bi-fidelity
stochastic optimization [6, 7, 8, 9].

High-fidelity function
15
High-fidelity function with stochastic noise
Low-fidelity function
Objective Function Value

10 Low-fidelity function with stochastic noise

10
0.0 0.2 0.4 0.6 0.8 1.0
x
Figure 1: An illustration of bi-fidelity functions. The black and red curves represent the true
objective functions of the HF and LF versions, respectively. Meanwhile, the blue and orange
curves illustrate a single sample path of the stochastic objective functions for the HF and LF
versions, respectively.

It is reasonable to expect that employing a LF simulation oracle can expedite convergence

by providing more information at a lower cost. For example, the iterative algorithm can yield

2
a near-optimal solution of the HF function, even though the LF function is solely utilized, as
illustrated in Figure 1. However, the specifics of how and when to utilize the LF simulation oracle
have remained elusive, prompting us to pose two overarching questions.

Q1. When is it appropriate to utilize the LF simulation oracle, and when should it not be used
during optimization?
Q2. What number of sample sizes for HF and LF simulation oracles are necessary to attain
sufficiently accurate function estimates for optimization?

In this paper, we propose a sample-efficient solver for bi-fidelity stochastic optimization, aiming
to address questions Q1 and Q2 specifically. We begin by introducing relevant existing sampling
methods that have been used for sample-efficient uncertainty quantification, regardless of their
purpose for optimization.

1.2 Adaptive Sampling

In derivative-free stochastic optimization, ensuring the precision of gradient approximations relies
on acquiring accurate estimates of function values, a task typically accomplished by averaging a
significant number of samples. Given the large computational cost associated with each stochastic
simulation oracle call, the necessity for numerous calls to achieve the required precision poses a
challenge in finding a sufficiently viable solution to the optimization problem within a reasonable
timeframe. This prompts the research question: “How can we ascertain the appropriate sample
sizes for each design point visited by the optimization algorithm?”
Typically, a fixed sample size may not ensure a convergence of {Xk } because when Xk is
close enough to the stationary point, the stochastic error may exceed the gap in function values
between the current iterate and the next iterate. Therefore, many stochastic algorithms [2, 10]
typically utilize a deterministic sample size, which increases based on optimality measures such
as gradient approximations. Nonetheless, relying solely on deterministic sample sizes could pose
practical challenges, as the variance of each design point is often unknown. Therefore, in order to
intelligently ascertain an appropriate sample size, an Adaptive Sampling (AS) strategy has been
implemented within the iterative algorithms [10, 11]. The AS strategy dynamically decides the
sample size by striking a balance between the estimation error at each point and the optimality
measure. With the estimation error continuously updated through replicating function evalua-
tions, the AS strategy yields a random sample size based on the observations generated at the
design point of interest.
The AS strategy has been proposed based on a crude Monte Carlo (CMC) estimation (2)
for situations where only one simulation oracle is available [10, 11, 12]. However, when multiple
simulation oracles of different fidelities are available, a different Monte Carlo method, referred to
as multi-fidelity Monte Carlo (MFMC) [13], can be used to reduce the variance of the estimator,
which will be introduced in the next section.

1.3 Bi-fidelity Monte Carlo

In the domain of uncertainty quantification, a function estimate for a single design point is typically
derived through a crude Monte Carlo (CMC) estimation (2). However, due to the slow convergence
rate of CMC, which grows only as the square root of the sample sizes, it may be impractical to
obtain sufficiently accurate function estimates within a reasonable timeframe, particularly when
dealing with costly stochastic simulation oracles. Hence, when multi-fidelity simulation oracles are

3
available, the HF function estimate is obtained using both LF and HF simulation oracles hoping
that it can reduce the variance of the HF function estimate [13]. A bi-fidelity Monte Carlo (BFMC)
is then F̄ bf (x, n, v, c) = F̄ h (x, n) − c(F̄ l (x, n) − F̄ l (x, v)) for any c ∈ ℜ, which is an unbiased
estimator of the expectation of the HF function. Here, the independence or dependence between
F̄ l (x, v) and F̄ h (x, n), as well as F̄ l (x, n),P
hinges on how the random variable ξ is managed. For
−1 v l l
instance, consider (3), which employs v i=1 F (x, ξi ) for F̄ (x, v):

n n v
!
bf 1X h 1X l 1X l
F̄ (x, n, v, c) = F (x, ξi ) − c F (x, ξi ) − F (x, ξi ) . (3)
n n v
i=1 i=1 i=1

Then the variance of F̄ bf (x, n, v, c) becomes

c2 (Var(F̄ l (x, n)) + Var(F̄ l (x, v))) − 2cCov(F̄ h (x, n), F̄ l (x, n))
(4)
+ 2cCov(F̄ h (x, n), F̄ l (x, v)) − 2c2 Cov(F̄ l (x, n), F̄ l (x, v)) + Var(F̄ h (x, n)).

Therefore, variance reduction becomes feasible with appropriate covariances and variances for
certain values of n, v, and c.
In our proposed algorithm, we have developed an innovative AS strategy, referred as bi-fidelity
Adaptive Sampling (BAS), that leverages both LF and HF oracles. Our approach dynamically
employs BFMC and CMC, guided by estimates of covariances and variances for the functions.
While we expect that BAS can be deployed within a broad range of iterative solvers, we focus
exclusively on stochastic Trust Region (TR) algorithms to solve (1).

1.4 Stochastic Trust Region Algorithms for Derivative-free Stochastic Opti-

mization
The popularity of stochastic TR algorithms has recently surged for addressing (1) due to their
robustness, which comes from their ability to self-tune and naturally utilize curvature information
in determining step lengths. Stochastic TR algorithms typically entail the following four steps in
every iteration k:

(a) (model construction) a local model is constructed to approximate the objective function f
by utilizing specific design points and their function estimates within a designated area of
confidence, i.e., TR, typically defined as an L2 region with a radius of ∆k centered around
the current iterate Xk ;

(b) (subproblem minimization) a candidate point Xks is obtained by approximately minimizing

the local model within the TR;

(c) (candidate evaluation) the objective function at Xks is estimated by querying the oracle, and
depending on this evaluation, Xks is either accepted or rejected; and

(d) (TR management) if Xks is accepted, it becomes the subsequent iterate Xk+1 , and the TR
radius ∆k is either enlarged or remains unchanged; conversely, if Xks is rejected, Xk retains
its position as the next iterate Xk+1 , and ∆k decreases to facilitate the construction of a
more accurate local model to approximate the objective function f .

As described in Section 3.2, our proposed algorithm performs the aforementioned four steps
multiple times within a single iteration to address Q1, utilizing both HF and LF simulation oracles.

4
Specifically, when the correlation between the LF and HF function is expected to be high, which is
determined by the optimization history, the local models will be constructed for the LF function.
If we fail to find a better solution using the local model for the LF function, the local model will
be constructed for the HF function within the same iteration.

1.5 Summary of Results and Insight.

In this section, we summarize our contribution and results.

(a) We propose a novel stochastic TR method with adaptive sampling tailored specifically for
stochastic bi-fidelity optimization problems, aptly named ASTRO-BFDF. Addressing Q1,
ASTRO-BFDF integrates two separate TRs to handle HF and LF functions, along with a
novel concept termed an adaptive correlation constant. This constant dynamically assesses
the need for constructing a local model for the LF function, with its value evolving in
response to the historical data gathered during the optimization process.

(b) To provide an answer for Q2, we suggest a new adaptive sampling algorithm utilizing both HF
and LF simulation oracles, named BAS. The following three critical decisions are dynamically
made in BAS while replicating function evaluations.

· At a given x ∈ ℜd , is BFMC cheaper than CMC?

· What are the best sample sizes for the LF and HF simulation oracles and the best
coefficient c for BFMC in (3)?
· Is an accuracy of the function estimate from BAS sufficient to proceed with the opti-
mization at the current iteration?

(c) We prove the almost sure convergence, i.e., limk→∞ ∥∇f h (Xk )∥ = 0 w.p.1, of ASTRO-
BFDF. The analysis revolves around two key points. Firstly, when the candidate solution
from the local model for the LF function is accepted, it must ensure a sufficient reduction
in the HF function. Secondly, the estimates for stochastic errors obtained with BFMC
should be sufficiently smaller than the optimality error. Together, these aspects enable the
algorithm to find a better solution for the objective function with reduced computational
effort.

(d) The performance of ASTRO-BFDF has been evaluated using test problems from the SimOpt
library [14]. We started with synthetic problems, created by adding artificial stochastic
noise to deterministic functions. For more realistic numerical experiments, we also tested
simulation optimization problems involving discrete event simulation. Our findings not only
highlight the superior performance of ASTRO-BFDF but also explore various scenarios in
which using the LF function in optimization is beneficial or not.

2 Preliminaries
In this section, we provide key definitions, standing assumptions, and some useful results that will
be invoked in the convergence analysis of the proposed algorithm.

2.1 Notation
We use bold font for vectors; x = (x1 , x2 , · · · , xd ) ∈ ℜd denotes a d-dimensional vector. Calli-
graphic fonts represent sets, and sans serif fonts denote matrices. Our default norm ∥ · ∥ is the L2

5
norm. The closed ball of radius ∆ > 0 centered at x0 is B(x0 ; ∆) = {x ∈ ℜd : ∥x − x0 ∥ ≤ ∆}.
For a sequence of sets An , An i.o. denotes lim supn→∞ An . We write f (x) = O(g(x)) if there are
positive constants ε and m such that |f (x)| ≤ mg(x) for all x with 0 < ∥x∥ < ε. Capital letters
w.p.1
denote random scalars and vectors. For a sequence of random vectors Xk , k ∈ N, Xk −−−→ X de-
notes almost sure convergence. “iid” means independent and identically distributed, and “w.p.1”
means with probability 1. The superscripts h and l indicate that the terminology is related to
high-fidelity and low-fidelity simulations, respectively. The terms σ̂ h (x, n) and σ̂ l (x, n) are the
standard deviation estimates of HF and LF functions at x with sample size n, while σ̂ h,l (x, n) is
the covariance estimate between them.

2.2 Key Definitions

Definition 1 (stochastic interpolation models). Given Xk = Xk0 ∈ ℜd and ∆qk > 0, let Φ(x) =
[ϕ0 (x), ϕ1 (x), . . . , ϕp (x)] be a polynomial basis on ℜd . With p = d(d + 3)/2, q ∈ {h, l} and the
design set Xk := {X i }pi=1 ⊂ B(Xk ; ∆qk ), we find νkq = [νk,0q q
, νk,1 q ⊺
, . . . , νk,p ] such that
⊺
M(Φ, Xk )νk = F̄kq (Xk0 , N (Xk0 ))), F̄kq (Xk1 , N (Xk1 )), . . . , F̄kq (Xkp , N (Xkp )) ,

(5)

where

ϕ1 (Xk0 ) ϕ2 (Xk0 ) · · · ϕp (Xk0 )

 
ϕ1 (X 1 ) ϕ2 (X 1 ) · · · ϕp (X 1 )
k k k 
M(Φ, Xk ) =  . ..  .

.. ..
 .. . . . 
ϕ1 (Xk ) ϕ2 (Xk ) · · · ϕp (Xkp )
p p

The matrix M(Φ, Xk ) is nonsingular if the set Xk is poised in B(Xk ; ∆qk ). A set Xk is Λ−poised
in B(Xk ; ∆qk ) if Λ ≥ maxi=0,...,p maxz∈B(Xk ;∆q ) |li (z)|,where li (z) are the Lagrange polynomials.
k
If
Pp there exists a solution to (5), then the function Mkq : B(Xk ; ∆qk ) → ℜ, defined as Mkq (x) =
q
j=0 νk,j ϕj (x) is a stochastic polynomial interpolation of estimated values of f q on B(Xk ; ∆qk ).
h i⊺
q q q
In particular, if Gqk := νk,1 νk,2 · · · νk,d and Hqk is a symmetric d × d matrix with elements
q q q
uniquely defined by (νk,d+1 , νk,d+2 , . . . , νk,p ), then we can define the stochastic quadratic model
Mkq : B(Xk ; ∆qk ) → ℜ, as

1
Mkq (x) = νk,0
q
+ (x − Xk )⊺ Gqk + (x − Xk )⊺ Hqk (x − Xk ). (6)
2

Definition 2 (stochastic fully linear models). Given x ∈ ℜd , ∆q > 0 and q ∈ {h, l}, a function
M q : B(x; ∆q ) → ℜ is a stochastic fully linear model of f q on B(X; ∆q ) if ∇f q is Lipschitz
continuous with constant κL , and there exist positive constants κeg and κef dependent on κL but
independent of x and ∆q such that almost surely

∥∇f q (x) − ∇M q (x)∥ ≤ κeg ∆q and |f q (x) − M q (x)| ≤ κef (∆q )2 ∀x ∈ B(x; ∆q ).

Definition 3 (Cauchy reduction). Given Xk ∈ ℜd , ∆qk > 0, q ∈ {h, l}, and a function Mkq :
B(Xk ; ∆qk ) → ℜ obtained following Definition 1, Skc is called the Cauchy step if

∥∇M q (Xk )∥

q q c 1 q q
M (Xk ) − M (Xk + Sk ) ≥ ∥∇M (Xk )∥ min ,∆ .
2 ∥∇2 M q (Xk )∥ k

6
When ∥∇2 Mkq (Xk )∥ = 0, we assume ∥∇M q (Xk )∥/∥∇2 M q (Xk )∥ = +∞. The Cauchy step is
derived by minimizing the model Mkq (·) along the steepest descent direction within B(Xk ; ∆qk ),
making it easy and quick to compute.

Definition 4 (filtration and stopping time). A filtration {Fk }k≥1 on a probability space (Ω, P, F)
is a sequence of σ-algebras, each contained within the next, such that for all k, Fk is a subset of
Fk+1 , and all are subsets of F. A function N : Ω → {0, 1, 2, . . . , ∞} is referred to as a stopping
time with respect to the filtration F if the set {ω ∈ Ω : N (ω) = n} is an element of F for every
n < ∞.

2.3 Standing Assumptions

We now introduce the standing assumptions. Assumption 1 elucidates the characteristics of the
functions f h and f l , enabling us to precisely pinpoint the problem we aim to address.

Assumption 1 (function). The HF function f h and the LF function f l are twice continuously
differentiable in an open domain Ω, ∇f h and ∇f l are Lipschitz continuous in Ω with constant
κLg > 0, and ∇2 f h and ∇2 f l are Lipschitz continuous in Ω with constant κL > 0.

We make the next assumption on the higher moments of the stochastic noise resembling the
Bernstein condition. Random variables fulfilling Assumptions 2 exhibit a subexponential tail
behavior.

Assumption 2 (stochastic noise). The Monte Carlo oracles generate iid random variables F q (Xki , ξj ) =
i,q i,q
f q (Xki ) + Ek,j with Ek,j ∈ Fk,j for i ∈ {0, 1, 2, . . . , p, s} and q ∈ {h, l}, where Xks is the candidate
iterate at iteration k and Fk := Fk,0 ⊂ Fk,1 ⊂ · · · ⊂ Fk+1 for all k. Then the stochastic errors
i,q i,q
Ek,j are independent of Fk−1 , E[Ek,j | Fk,j−1 ] = 0, and there exists (σ q )2 > 0 and bq > 0 such
that for a fixed n,
n
1X i,q m m! q m−2 q 2
E[|Ek,j | | Fk,j−1 ] ≤ (b ) (σ ) , ∀m = 2, 3, · · · , ∀k
n 2
j=1

2.4 Useful Results

In this section, we present useful results that will be invoked to prove the almost sure convergence
of ASTRO-BFDF. The first result demonstrates that, under Assumption 2, the estimate of the
stochastic errors is bounded by the square of the TR radius when a specific adaptive sampling
rule is applied. Consequently, after a sufficiently large number of iterations, the true function f
decreases whenever the candidate solution is accepted.

Theorem 1 (Stochastic noise [15]). Let cf > 0 and ∆qk > 0 be given and Ek,j i,q
denotes the
i
stochastic noise following Assumption 2 for q ∈ {h, l}. If the sample size N (Xk ) is determined
by, for any σ0 > 0 and k ∈ N,

max{σ0 , SV (Xki , n)} κ(∆qk )2

i
N (Xk ) = min n ∈ N : √ ≤ √ , (7)
n λk

where λk = O(log k) and SV (Xki , n) is a finite random variable that depends on sample size n
and the design point Xki , we obtain

7
 
∞ N (Xki )
X  1 X i,q q 2

P Ek,j ≥ cf (∆ k  < ∞.
)
 N (Xki )
k=1 j=1

Despite the fact that Theorem 1 has been proven with SV (Xki , n) = σ̂ h (Xki , n) in Section 3.2
of [15], it can also be trivially established with a finite random variable SV (Xki , n) by employing
the same logical framework. The next result provides an upper bound for the gradient error norm
at any design point within the TR when a stochastic linear or quadratic interpolation model is
used. Combined with Theorem 1, it indicates that the gradient error norm will be bounded by
the order of the TR radius after sufficiently many iterations.

Lemma 1 (Stochastic Interpolation Model [11]). If Mkq (z) is a stochastic linear interpolation
model or a stochastic quadratic interpolation model of f q with the design set Xk := {Xki }pi=0 ⊂
B(Xk ; ∆qk ) and corresponding function estimates F̄ q (Xki , N (Xki )) = f q (Xki ) + Ēki,q (Nki ) for q ∈
{h, l}, there exist positive constants κeg1 and κeg2 such that for any z ∈ B(Xk ; ∆qk ),
qP
p i,q i 0,q 0
q q q i=1 (Ēk (Nk ) − Ēk (Nk ))
∥∇M (z) − ∇f (z)∥ ≤ κeg1 ∆ + κeg2 , (8)
∆q
PN (Xki )
where Ēki,q (Nki ) = N (Xki )−1 j=1
i,q
Ek,j .

Lastly, we present the variance of BFMC estimator. To make sure that the variance of BFMC
is reduced, the second and third terms of the RHS of (9) should be less then zero for some
n, v, and c. The another important thing we should notice is that σ h (x), σ l (x), and σ h,l (x) are
usually unknown in reality, forcing us to use the estimates such as σ̂ h (x, n), σ̂ l (x, max{n, v}), and
σ̂ h,l (x, min{n, v}).

Lemma 2 (Variance of BFMC [13]). Let x ∈ ℜd , n, v ∈ N, and c ∈ R. Then the varaince of the
BFMC estimator F̄ bf (x, n, v, c) is

(σ h (x))2

bf 2 1 1 1
Var(F̄ (x, n, v, c)) = +c + −2 (σ l (x))2
n n v max{n, v}
(9)
1 1 h,l
+ 2c − σ (x),
max{n, v} n

where σ h,l (x) is a covariance between F h (x, ξ) and F l (x, ξ).

Remark 1. When v is less than or equal to n, Var(F̄ bf (x, n, v, c)) is bigger than or equal to
Var(F̄ h (x, n)). Hence, a variance reduction is only available when v is bigger than n.

3 Adaptive Sampling Bi-fidelity Trust Region Optimization

Similar to other stochastic TR algorithms, an adaptive sampling bi-fidelity TR algorithm (ASTRO-
BFDF), stemming from ASTRO-DF [11], generates {Xk } through the four steps outlined in
Section 1.4. Notably, there exist two distinct components within ASTRO-BFDF, primarily at-
tributable to the presence of a LF simulation oracle.

8
(1) The sample sizes are intricately managed through adaptive sampling using BFMC or CMC.
Within this approach, two critical decisions are made. Initially, as the samples stream in, the
method discerns between employing CMC or BFMC. Secondly, it dynamically adjusts the
sample sizes for both HF and LF simulation oracles, alongside determining the coefficient c
in (3), all in real-time as the sample sizes increase.

(2) At each iteration k, two local models can be constructed using HF and LF simulation oracles,
respectively, each with its own TR: ∆lk for the LF function and ∆hk for the HF function.
The local model utilizing the LF oracle serves two purposes: 1) identifying the candidate
solution for the next iterate, and 2) updating the adaptive correlation constant determining
the utilization of the local model for the LF function in Algorithm 3.

We first introduce the bi-fidelity adaptive sampling (BAS) strategy, which corresponds to the
first feature.

Algorithm 1 [Nk (x), Vk (x), Ck (x), Fek (x)]=BAS(x, ∆k , λk , κ, Fk )

Input: x ∈ ℜd , TR radius ∆k , sample size lower bound sequence {λk }, batch size sh and sl for
HF and LF simulation oracles, adaptive sampling constant κ > 0, and lower bound of an
initial variance approximation σ0 > 0.
1: Set n = (σ0 )2 λk (κ2 ∆4k )−1 and v = n.
2: Estimate σ̂ h (x, n), σ̂ h,l (x, n), and σ̂ l (x, v).
3: Obtain the predicted sample sizes N p , where

σ̂ h (x, n) κ∆2k

p p
N (x) = min n ∈ N : √ ≤ √ (10)
np λk
4: loop
5: Approximately compute C ∗ , N ∗ , and V ∗ by solving the problem (11) and set c = C ∗ .
6: if wh N ∗ + wl V ∗ ≤ wh N p then
7: Set v = max{n + 1, v} and update σ̂ h,l (x, n) and σ̂ l (x, v) by calling the LF oracle.
8: if Var(F̄ bf (x, n, v, c)) ≤ κ2 ∆4k λ−1
k then
9: return [n, v, c, F̄ bf (x, n, v, c)] (BFMC)
10: end if
11: if n ≥ N ∗ − 1 then
12: Set v = v + sl and get sl additional replications of the LF oracle and update σ el (x, v).
13: else
14: Set n = n + sh and update σ̂ h (x, n) and σ̂ h,l (x, n) by calling the LF and HF oracles.
15: end if
16: else
17: if n ≥ N p (x) then
18: return [n, v, c, F̄ h (x, n)] (CMC)
19: end if
20: Set n = n + sh and update σ̂ h (x, n) and N p (x) by calling the HF oracle.
21: end if
22: end loop

9
3.1 Adaptive Sampling for Bi-Fidelity Stochastic Optimization
While BFMC has the capability to reduce the variance of the function estimate, blindly employing
BFMC may not always be advantageous. For example, when the expense of invoking the LF
simulation oracle is comparable or marginally lower than that of the HF one, and the inherent
variance of the LF simulation significantly exceeds that of the HF simulation, BFMC should
be avoided. Therefore, it is essential to decide which Monte Carlo method to employ at a given
design point based on the variance of the LF and HF simulation output and the covariance between
them. However, a challenge is that the true variances of the LF and HF simulation output are
unknown. Therefore, when adaptive sampling is utilized, the choice of the MC method needs to
be dynamically determined based on variance and covariance estimates, which are sequentially
updated using the simulation results. In summary, we should dynamically determine N, V , and C
while streaming the simulation replications, where N and V are the sample sizes for the HF and
LF oracle and C represents the coefficient in the BFMC estimate, which is denoted as c in (3).
To achieve this for any design point x at iteration k, we suggest BAS, as listed in Algorithm 1.
Algorithm 1 starts by sampling n number of the HF oracle and the LF oracle to estimate
the variance and covariance terms in (9). By leveraging the variance estimate σ̂ h (x, n), we can
derive a predicted minimum sample sizes N p (x) for CMC, adhering to the adaptive sampling
rule (7). Then the predicted computational cost of CMC at x is represented as wh N p (x), where
wh is the cost of calling HF oracle once. Our next step involves juxtaposing this against the
projected computational costs of BFMC. To predict the lowest costs with corresponding sample
sizes for BFMC, we solve (11) with variance estimates σ̂ h (x, n), σ̂ l (x, max{v, n}), and σ̂ h,l (x, n)
for Var(F̄ bf (x, n
e, ve, e
c)).

[N ∗ , V ∗ , C ∗ ] ∈ argmin wh n
e + wl ve
n
e,e c∈ℜ
v ,e

subject to Var(F̄ bf (x, n c)) ≤ κ2 ∆4k λ−1

e, ve, e k
e − ve ≤ 0
n (11)
n≤n
e≤∞
max{v, n} ≤ ve < ∞,

where wl is the cost for calling LF oracle once. The first constraint is the adaptive sampling
rule which originated from (7), ensuring that BFMC achieves the required accuracy. We are now
poised to contrast the predicted computational costs between the crude MC and BFMC.
If wh N p ≤ wh N ∗ +wl V ∗ , employing the CMC method would be more cost-effective in achieving
the required accuracy of estimates at the current iteration, given the information available. Hence,
if n ≥ N p , the accuracy of the function estimate is already sufficient to proceed the optimization
and thus, the algorithm returns the function estimate with CMC (F̄ (x, n)). If not, we update
σ̂ h (x, n = n + sh ) by calling sh additional replications of the HF oracle. Then we proceed to Step
5 and continue with the algorithm using the newly updated n and variance estimates. It is worth
to emphasize that the variance estimate for the LF function and the covariance estimate are not
updated.
Following Step 5, if the cost of BFMC is lower than that of CMC, BFMC might offer greater
cost-effectiveness, prompting the algorithm to proceed to Step 6. We first set v = max{v, n + 1}
with the updated n in Step 20 to ensure that v > n (See remark 1). Following this adjustment,
additional replications of the LF oracle may become necessary due to the updated value of v.
Consequently, it is imperative to update both σ̂ l (x, v) and σ̂ h,l (x, n) to reflect this change. Sub-

10
sequently, if the variance of the function estimate by BFMC is small enough compared to the
optimality measure, i.e.,
Var(F̄ bf (x, n, v, c)) ≤ κ2 ∆4k λ−1
k , (12)
the algorithm returns F̄ bf (x, n, v, c). If not, we need to decide whether to increase n or v. As
evident from (9), n has a more pronounced impact on the left-hand side of (12) compared to v.
Therefore, our initial assessment focuses on determining whether we need to increase n or not. If
n < N ∗ − 1, we set n = n + sh and update σ̂ h (x, n) and σ̂ h,l (x, n) by obtaining the necessary
number of additional LF and HF simulation results. Subsequently, we move on to Step 5 and
proceed with the algorithm utilizing the newly updated n. If n ≥ N ∗ − 1, we acquire only sl
additional replications from the LF oracle and update σ̂ l (x, v). Following this, we advance to
Step 5 and continue the algorithm with the newly updated v.
Since n, v, and c are determined dynamically based on the realization of simulation results in
Algorithm 1, three outputs are the stopping times determined by the filtration. Hence, we refer
the output of Algorithm 1 as [Nk (x), Vk (x), Ck (x), Fek (x)].

Remark 2. While employing CMC, the combined replications from the LF oracle can be reused
for constructing the local model using the LF oracle, detailed in Section 3.2.

3.2 Bi-fidelity Stochastic Trust Region Method

We now delve into the bi-fidelity stochastic TR method with the adaptive sampling method
(ASTRO-BFDF). The fundamental concept of the suggested algorithm is to leverage the LF
oracle until it no longer yields superior solutions for the HF function, exclusively relying on the
LF oracle. Hence, unlike the stochastic TR methods discussed in Section 1.4, two different local
models can be constructed utilizing two distinct TRs specifically tailored for HF and LF functions,
referred to as ∆lk for the LF one and ∆hk for the HF one. It is noteworthy that ∆hk is greater than
∆lk for any iterations k ∈ N to maintain large step sizes for the local model with the HF function
and save computational budget in (12) and (10).
With two TRs, we should determine when and how to construct the local model for the LF
function because the indiscriminate use of the LF oracle may lead to inefficiency. The challenge lies
in our inability to pinpoint the specific region where the LF function proves beneficial in finding
better solutions for the HF function without assessing both functions across the entire feasible
domain. Some previous studies have attempted to address this by quantifying the correlation
between the HF and LF functions through sampling various design points and their corresponding
function estimates [16, 17]. However, as we will unveil in Section 5, it becomes evident that
low correlation does not consistently signify the unhelpfulness of the LF function in optimization.
Particularly, the TR method must ascertain the utility of the LF oracle within the TR, rather than
across the entire feasible domain. Therefore, we propose the adoption of an adaptive correlation
constant, represented as αk , to assess whether it is beneficial to construct Mkl or not.
The adaptive correlation constant is dynamically updated, akin to the TR, leveraging results
from previous iterations. This enables us to effectively quantify whether the LF function con-
tributes to optimization within the current TR or not. In details, for each iteration k, we initially
evaluate whether αk is larger than a user-defined threshold αth ∈ (0, 1) by invoking Algorithm 3
in Step 2 of Algorithm 2. If this condition is met, we construct Mkl using solely a design set Xkl
and corresponding LF function estimates. Next, we locate the candidate iterate by minimizing
Mkl within the TR B(Xk ; ∆lk ). If this candidate yields a sufficient decrease in the HF function
and the gradient norm of the model is sufficiently large, it is accepted, the TR expands, and αk

11
Algorithm 2 ASTRO-BFDF
Input: Initial incumbent x0 ∈ ℜd , initial and maximum TR radius ∆l0 , ∆h0 , ∆max > 0, model
fitness thresholds 0 < η < 1 and certification threshold µ > 0, expansion and shrinkage
constants γ1 > 1 and γ2 ∈ (0, 1), sample size lower bound sequence {λk } = {O(log k)},
adaptive sampling constant κ > 0, correlation constant αk > 0, and lower bound of an initial
variance approximation σ0l > 0.
1: for k = 0, 1, 2, . . . do
2: Obtain Ikh , Xks , ∆lk , and ∆hk by calling Algorithm 3.
3: if Ikh is True then
4: Select Xk = {Xki }2d h
i=0 ⊂ B(Xk ; ∆k ).
5: Estimate the HF function at {Xki }2d i=0 by calling Algorithm 1.
6: Estimate the LF function F̄ l (Xki , Tki ) at {Xki }2d
i=0 , satisfying

max{σ0l , σ̂ l Xki , t }

κ(∆hk )2

i
Tk = min t ∈ N : √ ≤ √ . (13)
t λk

7: Construct local models Mkl (X) and Mkh (X).

8: Approximately compute the local model minimizers

Xks,h ∈ argmin Mkh (X) and Xks,l ∈ argmin Mkl (X).

∥X−Xk ∥≤∆h
k ∥X−Xk ∥≤∆h
k

9: Estimate Fe(Xks,h ) and Fe(Xks,l ) by calling Algorithm 1 with ∆k = ∆hk .

10: Set the candidate point Xks ∈ argminx∈{X s,h ,X s,l } Fe(x).
k k
11: Compute the success ratio ρ̂k and ρ̂lk as

Fek (Xk0 ) − Fek (Xks ) l Fek (Xk0 ) − Fek (Xks,l )

ρ̂k = h 0 and ρ̂k = .
Mk (Xk ) − Mkh (Xks ) Mkl (Xk0 ) − Mkl (Xks,l )

12: If ρ̂lk ≥ η, set αk = γ1 αk ; otherwise set αk = γ2 αk .

13: Set (Xk+1 , ∆hk+1 ) =
(
(Xks , min{γ1 ∆hk , ∆max }) if ρ̂k ≥ η and µ∥∇Mkh (Xk )∥ ≥ ∆hk ,
(Xk , γ2 ∆hk ) otherwise,

∆lk = min ∆lk , ∆hk , and k = k + 1.

14: else
15: Set (Xk+1 , ∆lk+1 ) = (Xks , γ1 ∆lk ), αk = min{γ1 αk , 1} and k = k + 1.
16: end if
17: end for

12
Algorithm 3 [Ikh , Xks , ∆lk , ∆hk ] = ASTRO-LFDF(Xk )
Input: Xk , ∆lk , model fitness thresholds 0 < η < 1 and certification threshold µ > 0, sufficient
reduction constant θ > 0, expansion and shrinkage constants γ1 > 1 and γ2 ∈ (0, 1), sample
size lower bound sequence {λk } = {O(log k)}, adaptive sampling constant κ > 0, correlation
constant αk > 0, correlation threshold αth > 0, lower bound of an initial variance approxima-
tion σ0l > 0, sufficient reduction constant ζ > 0, and gradient norm of the model lower bound
ϵ̂ > 0.
1: loop
2: if αk < αth then
3: Set Ikh = True and Xks,l = Xk
4: break
5: end if
6: Select Xkl = {Xki }pi=0 ⊂ B(Xk ; ∆lk ).
7: Estimate F̄ l (Xki , Tki ) at {Xki }2d l h
i=0 , satisfying (13) with ∆k instead ∆k .
8: Construct local model Mkl (X).
9: Approximately compute the local model minimizer

Xks,l = argmin Mkl (X).

∥X−Xk ∥≤∆lk

10: Estimate Fek (Xks,l ) and Fek (Xk0 ) by calling Algorithm 1 with ∆k = ∆lk .
11: Compute the success ratio ρ̂k as

Fek (Xk0 ) − Fek (Xks,l )

ρ̂k = . (14)
max{ζ(∆hk )2 , Mkl (Xk0 ) − Mkl (Xks,l )}

12: if ρ̂k ≥ η and ∥∇Mkl (Xk0 )∥ ≥ ϵ̂ then

13: Set Ikh = False
14: break
15: end if
16: Set ∆lk = γ2 ∆lk and αk = γ2 αk
17: end loop
18: Set ∆hk = max{∆lk , ∆hk }
19: return [Ikh , Xks,l , ∆lk , ∆hk ]

increases. Additionally, ∆hk is adjusted to be larger than ∆lk at Step 18 in Algorithm 3, and we
proceed to the next iteration k + 1. If not, the candidate is rejected, leading to the contraction of
∆lk , a decrease in αk , and progression to Step 6 in Algorithm 3 to identify a superior candidate
within the shrunken TR. This process continues until the algorithm concludes that the LF oracle
cannot contribute to discovering a better solution, indicated by αk < αth .

Remark 3. In Algorithm 3, a sufficient reduction test (Step 11 and 12) is different with the one
in Algorithm 2. Firstly, for a successful iteration, the reduction in function estimates must be
larger than ζ(∆hk )2 for some ζ > 0 (See Step 11). Secondly, the norm of the local model gradient
should be bigger than ϵ̂ (See Step 12). These conditions prevent us from accepting a candidate due
to a very small reduction in the local model value in (14). Additionally, they ensure the almost

13
sure convergence of ASTRO-BFDF (See the proof of Theorem 2 in Section 4).

When Algorithm 3 fails to identify the next iterate, we construct the local model of the HF
function (Mkh ) in Algorithm 2. To select the design set Xk , we aim to reuse as many previously
visited design points from past iterations or those used while constructing Mkl in the current iter-
ation as possible. Subsequently, we estimate the value of the HF function by invoking Algorithm
1. This yields the HF function estimate Fe(Xki ) and the LF function estimate F̄ l (Xik , Vki ) for any
i ∈ {0, 1, 2, . . . , |Xk |}. Then, we can additionally derive estimates for the LF function (F̄ l (Xik , Tki )),
aligning with the adaptive sampling rule (13). It is worth noting that when the estimates for the
HF function are acquired through BFMC, Vki from Algorithm 1 inherently adhere to (13), i.e.,
Vki ≥ Tki , with a high probability. With Xk and corresponding HF and LF function estimates,
two distinct local models for the HF and LF functions are constructed, and two minimizers, Xks,l
and Xks,h , are derived, with one stemming from Mkl and the other from Mkh . We evaluate the HF
function values at two potential candidate points and designate the one with the lower objective
value as the candidate point to go forward with. Leveraging this candidate, we update the next
iterate and ∆hk . Finally, the adaptive correlation constant is adjusted based on the results of the
sufficient reduction test at Xks,l .
In Algorithm 2, the creation of the local model for the LF function occurs at various points,
each serving distinct purposes. Specifically, within Algorithm 3, which operates as the inner loop
within Algorithm 2, we construct this model to seek an improved solution for the HF function.
This decision stems from the belief that the LF function shares analogous gradient and curvature
information, deduced from the insights gathered from previous iterations, denoted by αk > αth .
Thus, the utilization of the HF oracle is minimized, being employed only for the sufficient reduction
test. In the outer loop, the primary objective is to update the adaptive correlation constant, even
in cases where the LF function has not proven beneficial in preceding iterations. In this case,
our aim is to minimize reliance on the LF function, a goal achievable through the adoption
of BFMC. When the HF function values are estimated at Step 5 in ASTRO-BFDF, an ample
number of independent replications of the LF oracle are already obtained by BFMC. This enables
the construction of Mkl without incurring any additional computational burden. Unfortunately,
in scenarios where the LF function fails to contribute meaningfully to the optimization process,
Algorithm 2 may consume more resources compared to alternative solvers that exclusively utilize
the HF function. However, discerning the utility of the LF function necessitates an additional
computational budget, albeit the impact may be marginal in practice, as we will elaborate on in
Section 5.

4 Convergence Analysis
In this section, we delve into demonstrating the convergence of ASTRO-BFDF. We first introduce
two additional assumptions concerning the local model. Firstly, we stipulate that the minimizer of
the local model must yield a certain degree of function reduction, known as the Cauchy reduction
(See Definition 3). Secondly, the Hessian of the local model should be uniformly bounded. Both
of these assumptions are essential to validate the quality of the candidate point for any given
iteration k.
Assumption 3 (Reduction in Subproblem). For some
q 0 q s,q q 0 q 0 c
κf cd ∈ c (0, 1], q ∈ {h, l}, and all k,
Mk (Xk ) − Mk (Xk ) ≥ κf cd Mk (Xk ) − Mk (Xk + Sk ) , where Sk is the Cauchy step.
Assumption 4 (Bounded Hessian in Norm). In ASTRO-BFDF, the local model Hessians Hqk are
bounded by κqH for all k and q ∈ {h, l} with κqH ∈ (0, ∞) almost surely.

14
The convergence analysis of the adaptive sampling stochastic TR method for derivative-free op-
timization has received considerable attention in prior works such as [11, 15]. While our approach
to proving the convergence shares similarities with those, there are two crucial considerations we
must address.

(a) (Stochastic noise) BFMC ought to yield results consistent with those of Theorem 1, indicat-
ing that the stochastic error in BFMC will indeed be less than O((∆qk )2 ) for q ∈ {h, l} after
sufficiently large k. To achieve this, a crucial prerequisite (See Assumption 2) is ensuring
that F h (x, ξ) − cF l (x, ξ) exhibits similar properties to F h (x, ξ) for any ξ ∈ Ξ, c > 0, and
x ∈ ℜd .
(b) (Trust-region) The TR sizes for both HF and LF functions need to converge to zero. In
the context of convergence theory in the most stochastic TR methods [2, 4, 11], it becomes
imperative to demonstrate that the TR radius converges to zero as k approaches +∞ in
some probabilistic senses. This necessity arises because function estimate errors typically
remain bounded by the order of the TR size, given specific sampling rules and assumptions.
Consequently, the estimation errors will also converge to zero, ensuring the accuracy of the
estimates. Therefore, within bi-fidelity stochastic optimization, we also need the same result
for ∆hk . Furthermore, since ∆hk ≥ ∆lk for all k ∈ N, the convergence of ∆hk implies the
convergence of ∆lk as well.

Taking into account the aforementioned crucial considerations, we are now poised to present
the convergence theory of ASTRO-BFDF.
Theorem 2 (Almost Sure Convergence). Let Assumptions 1-4 hold. Then,
w.p.1
lim ∥∇f h (Xk )∥ −−−→ 0. (15)
k→∞

Theorem 2 guarantees that a sequence {Xk (ω)} generated by Algorithm 2 converges to the
first-order stationary point for any sample path ω ∈ Ω.
i,h i,l
Proof of Theorem 2 We start by demonstrating that the iid random variables Ek,j − cEk,j also
fulfill Assumption 2 for any k ∈ N, c ∈ R, and i ∈ {0, 1, 2, . . . , p, s}, indicating their adherence to
the sub-exponential distribution.
Lemma 3. Let Assumption 2 holds. Then there exist σ 2 > 0 and b > 0 such that for a fixed n
and c ∈ ℜ,
n
1X i,h i,l m m! m−2 2
E[|Ek,j − cEk,j | | Fk,j−1 ] ≤ b σ , ∀m = 2, 3, · · · , ∀k. (16)
n 2
j=1

Proof. We obtain from the Minkowski inequality and Assumption 2 that for a any k, j ∈ N, c ∈ R,
and any m ∈ {2, 3, · · · }, there exist bh , bl , (σ h )2 , (σ l )2 > 0 such that
1 1
m
i,h i,l m i,h m i,l m
E[|Ek,j − cEk,j | | Fk,j−1 ] ≤ E[|Ek,j | | Fk,j−1 ] m + E[c|Ek,j | | Fk,j−1 ] m

m! h m−2 h 2 1 m! l m−2 l 2 1 m
(17)
≤ ( (b ) (σ ) ) m + ( (b ) (σ ) ) m .
2 2

Without loss of generality, let us assume that σ h > σ l > 0 and bh > bl . Then there must exist
some constant ασ , αb > 1 such that ασ (σ l )2 = (σ h )2 and αb bl = bh . Then the right-hand side of

15
(17) becomes ((ασ2 αbm−2 )1/m + 1)m (2−1 m!(bl )m−2 (σ l )2 ). Since (ασ2 αbm−2 )1/m + 1 ≤ ασ αb + 1 for all
m ∈ {2, 3, · · · }, we obtain
n
1X i,h i,l m m!
E[|Ek,j − cEk,j | | Fk,j−1 ] ≤ ((ασ αb + 1)bl )m−2 ((ασ αb + 1)σ l )2 .
n 2
j=1

Hence, the statement of the theorem holds with σ = σ l (ασ αb + 1) and b = (ασ αb + 1)bl .

Now let us prove that the function estimate error from BAS is bounded by O((∆hk )2 ), aligning
with the outcome stated in Theorem 1. This finding not only enables us to attain the stochastic
fully linear model (See Definition 2) but also leads to the crucial observation that ∆hk converges
to 0 almost surely as k tends to infinity.
Lemma 4. Let Assumption 2 holds and Xki for i ∈ {0, 1, 2, . . . , p, s} be the design points generated
e i be the HF function estimate obtained
by Algorithm 2 at iteration k. Let Fe(Xki ) = f (Xki ) + Ek
q
from Algorithm 1 with ∆k = ∆k for q ∈ {h, l}. Then, given cf > 0,
e i | ≥ cf (∆q )2 i.o.} = 0.
P{|E (18)
k k

Proof. Let ω ∈ Ω. Firstly, if the function estimate from BAS was obtained by CMC, we know
from Theorem 1 that the statement of the lemma is satisfied. Now, we assume that the function
estimate Fe(Xki ) is obtained using BFMC, implying that
e i (ω)| = |Ē i,h (N i (ω)) − Ck (ω)Ē i,l (N i (ω)) + Ck (ω)Ē i,l (V i (ω))|,
|Ek k k k k k k

−1 PN (Xki ) i,q
where Nki = Nk (Xki ), Vki = Vk (Xki ), and Ēki,q (Nki ) = N (Xki ) j=1 Ek,j for q ∈ {h, l}. To
simplify notation, we will omit ω from this point forward. Then we have
e i | ≥ cf (∆q )2 |Ck = c} ≤ P{|Ē i,h (N i ) − cĒ i,l (N i )| ≥ cf q 2 cf
P{|Ek k k k k k (∆k ) } + P{|cĒki,l (Vki )| ≥ (∆qk )2 }.
2 2
(19)

We know from Step 1 and 8 in Algorithm 1, Lemma 2, and Nki < Vki that we obtain Nki , Vki , and
c such that
q
max{σ0 , (σ̂ h (Xki , Nki ))2 + c2 (σ̂ l (Xki , Vki ))2 − 2cσ̂ h,l (Xki , Nki )} κ(∆q )2
q ≤ √k ,
Nki λk

and q
max{σ0 , 2cσ̂ h,l (Xki , Nki ) − c2 (σ̂ l (Xki , Vki ))2 } κ(∆qk )2
q ≤ √ ,
Vki λk

i,h i,l
for some σ0 > 0. We also know from Lemma 3 that Assumption 2 holds for Ek,0 − cEk,0 , implying
i,h i,l
that Theorem 1 holds for Ek,0 − cEk,0 . Hence, the right-hand side of (19) is summable, from
i q 2 e i | ≥ cf (∆q )2 } = E[P{|E
ei | ≥
which we obtain P{|Ek | ≥ cf (∆k ) } is also summable based on P{|E
e
k k k
q 2
cf (∆k ) |Ck = c}]. As a result, the statement of the theorem holds.

16
Next, we demonstrate that as k goes to infinity, both TR radii inevitably converge to zero
almost surely. Despite the main framework of our proof differing trivially from the one presented
in [4], we opt to provide a comprehensive proof to facilitate understanding in Appendix A.

Lemma 5. Let Assumptions 1-4 hold. Then,

w.p.1 w.p.1
∆hk −−−→ 0 and ∆lk −−−→ 0 as k → ∞.

Proof. See Appendix A.

Relying on Lemma 5, we show through Lemma 6 that the gradient of the model for the HF
function converges to a true gradient almost surely. It is worth highlighting that the local model
for the HF function is not constructed at every iteration, as sometimes the local model for the LF
function can discover a better solution.

Lemma 6. Let Assumptions 1-4 hold. Let {kj } be the subsequence such that Ikhj = True. Then,

w.p.1
∥∇Mkhj (Xk0j ) − ∇f (Xk0j )∥ −−−→ 0 as j → ∞.

Proof. We know from Lemma 4 that given cf > 0, there exists sufficiently large J such that
e i | < cf (∆h )2 for any i ∈ {0, 1, · · · , p, s} and j > J. Then from Lemma 1, we have,
|Ekj kj
qP
p ei e 0 )2
i=1 (Ekj −Ekj
∥∇Mkhj (Xk0j ) − ∇f (Xk0j )∥ ≤ κeg1 ∆hkj + κeg2
∆hkj
ei − E
|E e0 |
kj kj
≤ κeg1 ∆hkj + κeg2
∆hkj
≤ (κeg1 + 2κeg2 cf )∆hkj .

Given that Lemma 5 ensures ∆hkj converges to 0 w.p.1, the statement of the theorem holds.

In the following lemma, we demonstrate that after a sufficient number of iterations, if the
TR for the HF function is relatively smaller than the model gradient, the iteration is successful
with probability one. Given Lemma 6, Lemma 7 suggests that in cases where the true gradient
is greater than zero, if the TR radius for the HF function is comparatively smaller than the true
gradient, the candidate solution is accepted and the TR is expanded. This ensures that the TR
for the HF function will not converge to zero before the true gradient does.

Lemma 7. Let Assumptions 1-4 hold. Then there exists cd > 0 such that
n \ o
P ∆hk ≤ cd ∥∇Mkh (Xk0 )∥ (ρ̂k < η) i.o. = 0.

Proof. We first note that for any k ∈ N, when the minimizer of the low fidelity local model in
Algorithm 3 is accepted as a next iterate, i.e., Ikh is False, we already have ρ̂k ≥ η. Otherwise, the
HF local model is constructed in Algorithm 2. Then the rest of the proof trivially follows from
Lemma 4.4 with the adaptive sampling rule (A-0) in [15].

17
Lemma 8. Let Assumptions 1-4 hold. Then
w.p.1
lim inf ∥∇f h (Xk )∥ −−−→ 0 as k → ∞. (20)

Proof. Using Lemma 6 and 7, the proof can be completed by straightforwardly following the steps
outlined in Theorem 4.6 of [4].

We have now reached a point where we can confidently establish the almost sure convergence
of ASTRO-BFDF. The following proof solidifies our claim.

Proof of Theorem 2. We first need to assume that there is a subsequence that has gradients
bounded away from zero for contradiction. Particularly, suppose that there exists a set, D̂, of
positive measure, ω1 ∈ D̂, ϵ0 > 0, and a subsequence of successful iterates, {tj (ω1 )}, such that
∥∇f h (Xtj (ω1 ) (ω1 ))∥ > 2ϵ0 , for all j ∈ N. We denote tj = tj (ω1 ) and suppress ω1 in the following
statements for ease of notation. Due to the lim-inf type of convergence just proved in (20), for
each tj , there exists a first successful iteration, ℓj := ℓ(tj ) > tj , such that, for large enough k,

∥∇f h (Xk )∥ > 2ϵ0 , tj ≤ k < ℓj , (21)

and
∥∇f h (Xℓj )∥ < 1.5ϵ0 . (22)
Define Ahj := k ∈ H : tj ≤ k < ℓj and Alj := k ∈ L : tj ≤ k < ℓj . Let j be sufficiently large

and let k ∈ Ahj . We then obtain from Lemma 6

∥∇Mkh (Xk )∥ > ϵ0 . (23)

Since k is a successful iteration, ρ̂k ≥ η. Furthermore, Assumption 3 and (23) then imply that

f h (Xk ) − f h (Xk+1 ) + E
e0 − E
k
e s ≥ η[M h (Xk ) − M h (Xk+1 )]
k k k
( )
1 h ∥∇Mkh (Xk )∥ h
≥ ηκf cd ∥∇Mk (Xk )∥ min , ∆k (24)
2 ∥Hk ∥
> cf d ∆hk ,

where cf d = 12 ηκf cd min{ϵ0 , ϵ̂}. When k ∈ Alj , we also obtain the similar result with (24) using
∥∇Mkl (Xk )∥ > ϵ̂:
f h (Xk ) − f h (Xk+1 ) + E
e0 − E
k
e s > cf d ∆l .
k k (25)
Since we know from Lemma 4 that
e0 − E
|E e s | < 0.5cf d ∆h for k ∈ Ah and |E
e0 − E
e s | < 0.5cf d ∆l for k ∈ Al , (26)
k k k j k k k j

the sequence {f h (Xk )}k∈Aj is monotone decreasing for sufficiently large j. From (25), (26), and
the fact that ∥Xk − Xk+1 ∥ ≤ ∆k for all k, we deduce that
X X X
∥Xtj − Xℓj ∥ ≤ ∥Xi − Xi+1 ∥ ≤ ∆hi + ∆li
i∈Aj i∈Ah i∈Alj
j
(27)
2(f h (Xtj ) − f h (Xℓj ))
≤ .
cf d

18
Now define Cj := {k ∈ K : ℓj ≤ k < tj+1 }. Let k ∈ Cj for sufficiently large j. From (24)-(26), we
then obtain
f h (Xk ) − f h (Xk+1 ) ≥ 0.5cf d (∆lk )2 ,
implying that the sequence {f h (Xk )}k∈Aj ∪Bj is monotone decreasing for sufficiently large j. The
boundedness of f h from below then implies that the right-hand side of (27) converges to 0 as j goes
to infinity, concluding that limj→∞ ∥Xtj − Xℓj ∥ = 0. Consequently, by continuity of the gradient,
we obtain that limj→∞ ∥∇f h (Xtj ) − ∇f h (Xℓj )∥ = 0. However, this contradicts ∥∇f h (Xtj ) −
∇f h (Xℓj )∥ > 0.5ϵ0 , obtained from (21) and (22). Thus, (15) must hold.

5 Numerical Experiments
We will now assess and compare ASTRO-BFDF with other simulation optimization solvers. Our
focus lies on testing across two distinct problem categories: synthetic problems and toy problems
with Discrete Event Simulation (DES).
Synthetic problems constitute deterministic problems embellished with artificial stochastic
Gaussian noise. Given our knowledge of the closed equation of f h , generating numerous problems
that adhere to predetermined assumptions becomes relatively straightforward. However, since
both the function f h and the stochastic noises are artificially generated, the performance of the
solvers on these problems might not be indicative of its efficacy in handling real-world problems.
In particular, when the same random number stream is used, the stochastic noises at different
design points will be identical, implying that F h (·, ξ) − f h (·) is a constant function given fixed
ξ ∈ Ξ. This setting satisfies a stricter assumption than the one posed in this paper. Hence, we
also evaluated the solvers on toy problems utilizing DES to ensure testing with more realistic
scenarios. DES simulates real-world conditions, generating multiple outputs utilized within the
objective function f . All experiments have been implemented using SimOpt [18].
We compare ASTRO-BFDF, ASTRO-DF, ADAM [19], and Nelder-Mead [20]. In implement-
ing the solvers, including ASTRO-BFDF, we applied Common Random Numbers (CRN), which
involves using the same random number stream to reduce variance when comparing function esti-
mates at different design points. To integrate CRN into ASTRO-BFDF, each time a local model
is constructed, the sample sizes and the coefficient at the center point, obtained through BAS, are
preserved and subsequently utilized for estimating the function values at other design points.

5.1 Synthetic Problems

We have evaluated four distinct deterministic equations for f h , creating a total of 108 test prob-
lems by altering the LF function and incorporating different stochastic noises. Specifically, each
problem utilizes a HF function defined as F h (·, ξ) = f h (·) + E h (·, ξ), and a LF function repre-
sented as F l (·, ξ) = g1 (κcor )g2 (f h (·))+(E h (·, ξ)+E l (·, ξ))/2, where g1 and g2 are smooth functions.
E h (·, ξ) and E l (·, ξ) represent stochastic noises, and κcor is a constant that adjusts the correlation
between the HF and LF functions, which ranges from 0 to 1. Details about three deterministic
problems (BRANIN, COLVILLE, FORRETAL) can be found in [17], while the last problem, the
Rosenbrock function (ROSEN) for bi-fidelity optimization, is introduced in [21]. We have tested
the problems with three different values of κcor : 0.1, 0.5, and 0.9. Regarding the stochastic noises,
a more complex setup is required to determine whether BAS can enhance computational efficiency.
For instance, test problems should include an instance with a high variance of the LF oracle, which
can make BFMC undesirable during the optimization. Thus, we examined the configuration in

19
which the stochastic noises for the HF and LF oracles adhere to the standard Gaussian distri-
bution N (0, chsd ) for chsd ∈ {20, 30, 40} and Gaussian distribution N (0, clsd ) for clsd ∈ {20, 30, 40}
respectively.

Figure 2: Fraction of 108 synthetic problems solved to 0.01-optimality with 95% confidence
intervals from 20 runs of each algorithm shows a clear advantage in finite-time performance of
ASTRO-BFDF.

Solvability profiles are used to compare the solvers, as illustrated in Figure 2. These profiles
provide insights into how well a solver performs by showing the proportion of tested problems it
solves within a certain relative optimality gap. Calculating this gap requires the optimal solution,
which is determined as the best solution among all solvers for a given problem in practice. When
the cost ratio of calling HF and LF oracles stands at 1:0.1, ASTRO-BFDF emerges as a standout
performer, solving over 50% of the problems within a mere 15% of the budget. What is particularly
noteworthy is that even when the costs for the LF function match those for the HF function (cost
ratio 1:1), ASTRO-BFDF demonstrates faster convergence than ASTRO-DF. This suggests that
utilizing the LF function could be beneficial for optimization, even if optimizing it requires a larger
computational budget compared to optimizing the HF function. Hence, we will next delve deeper
into the specific scenarios where leveraging the LF function proves advantageous for optimization.
Usefulness of the LF function Most papers [16, 17] employ the correlation between the LF
and HF functions or the LF and the surrogate model to determine whether employing the LF
oracle can be helpful for the optimization or not. However, the correlation can be varied based
on the region we try to quantify. For instance, even though the LF function may exhibit a high
correlation with the HF function in specific feasible regions, its usefulness can vary based on the
optimization progress and setup, such as the initial design point. Hence, the correlation might not
be suitable metric to determine whether the LF function is helpful for the optimization. Indeed,
instead of requiring high correlation within the entire feasible region, having an accurate gradient

20
at the current iterate is sufficient to find a better solution, utilizing any available information
source. This rationale underpins the use of an adaptive correlation constant in ASTRO-BFDF,
which is updated based on whether the previous gradient estimates from the LF oracle have
improved the solution.

(a) Low correlation (κcor = 0.1) (b) High correlation (κcor = 0.9)

Figure 3: Solvability profiles of 36 problems with 95% confidence intervals from 20 runs of each
algorithm with two different correlation setting between the LF and HF functions.

In Figure 3, ASTRO-BFDF consistently demonstrates superiority regardless of the correlation

between the HF and LF functions. Particularly, when the correlation is relatively low, although
the confidence interval of the fraction becomes large until 20% of the budget compared to the
case with the high correlation, ASTRO-BFDF still shows faster convergence and maintains robust
performance compared to ASTRO-DF.

𝑿∗𝒉 𝑿∗𝒍 ෡ ∗𝒉 𝑿0
𝑿

Figure 4: The illustration depicts the scenario where the LF function is convex. Xh∗ and Xl∗
represent the global optima of the HF and LF functions, respectively, while X0 marks the initial
iterate. Depending on the step size, using only the HF function may lead Xk to converge to a
local optimum (X̂h∗ ). However, if the LF function is used until the iterate reaches the green area,
achieving the global optimum becomes possible.

21
The usefulness of the LF function in providing accurate gradient estimates can be maximized
when it possesses unique structural properties, such as convexity, which could aid in locating
the global optimum of the non-convex HF function (See Figure 4). In this case, the bi-fidelity
optimization still remains advantageous despite high variance and costs of the LF oracle. However,
it is important to note that the opposite scenario is also possible, where the optimum of the LF
function is located near the local optimum of the HF function, which is undesirable (See Figure
1).

10
High-Fidelity Objective Function 10
Low-Fidelity Objective Function
30 0
8 24 8 100

6 18 6 200

12 300
4 4
6 400
x1

x1
2 2
0 500
0 0
6 600
2 2 700
12
4 4 800
18
4 2 0 2 4 6 8 10 4 2 0 2 4 6 8 10
x0 x0
Figure 5: The contour maps of the HF and LF function without stochastic noises of the BRANIN
problem with κcor = 1.

The Branin function is an example for which the bi-fidelity optimization is helpful due to the
structure of the LF function. In Figure 5, even though the LF function is non-convex, it possesses
a favorable structure that allows gradient-based methods to find the global optimum more easily
compared to the HF function. Hence, during the optimization, the solver can find the solution
near the global optimum of the HF function by leveraging only the LF function.
Remark 4. The trajectory of ASTRO-BFDF toward a local optimum closely hinges on two critical
factors: the initial design point and the parameter α0 . Take, for instance, the scenario where the
LF function typically pinpoints near the global optimum in the HF function like the BRANIN
problem. Yet, if we kick off ASTRO-BFDF with an initial solution like (7,8), it is quite likely
that ASTRO-BFDF converges towards points nearby, perhaps around (4,10) or (9,10). Moreover,
when α0 < αth , the optimization at iteration 0 leans on the HF function, potentially leading the
iterates to converge to a distinct local optimum compared to the path followed when α0 > αth . In
our numerical experiments, we deliberately set α0 > αth to maximize the computational efficiency.

5.2 Toy Problems with DES

In this section, we tested more realistic problems using DES for the HF and LF simulation oracles.
We have tested two problems: a M/M/1 queue problem (M/M/1) and an inventory problem
(SSCONT). In both cases, the DES model operates until a defined end time, denoted as T , thereby
enabling the acquisition of multi-fidelity results through variations in T . Take, for instance, the
scenario where the goal is to minimize inventory costs over 365 days. While the HF model
simulates the entire 365-day inventory system, the LF simulation model can effectively mimic the

22
HF model’s behavior over a shorter duration, say 110 days. In this specific instance, the cost ratio
between the HF and LF models stands at 1 : 0.3. A notable aspect of this problem is that running
one replication of the HF model inherently produces one replication of the LF model without
incurring any additional computational expenses.

Figure 6: Fraction of 25 toy problems with DES solved to 0.1-optimality with 95% confidence
intervals from 20 runs of each algorithm. ASTRO-BFDF demonstrates not only a faster con-
vergence but also an enhanced ability to identify superior solutions by the end of the allocated
computational budget.

Before delving into the details of each problem, we provide the solvability profile with 25
instances (See Figure 6), specifically including 5 instances from MM1 and 20 instances from
SSCONT. The cost ratio between the HF and LF models for both problems is 1 : 0.3, indicating
that the LF oracle simulates the system for 0.3T days.

5.2.1 M/M/1 Queue Problem

We employ a model that simulates a M/M/1 queue, characterized by an exponential distribution

for both interarrival and service times. Let us denote the rate parameters of these as µ for
interarrival and λ for service time. The objective here is to minimize both the expected average
sojourn time and a penalty, represented by 0.1µ2 , where µ acts as the decision variable. One of
the important characteristic of the problem is that F h (·, ξ) and F l (·, ξ) are smooth functions for
any ξ ∈ Ξ, which can be verified by Figure 7. The difficulty in addressing this problem with
the LF function lies in the fact that its gradient is relatively smaller compared to that of the HF
function, which becomes clearer when we have larger λ (See Figure 7c and 7d). In Algorithm 3,
the criterion for successful iterations hinges on comparing the reduction in model and function
estimates. Therefore, when the gradient of the LF function is small, a candidate point can be
accepted, albeit leading away from the optimal solution (Refer Remark 3).

23
HF function HF function
40 LF function LF function
objective function value

objective function value

30 40

30
20
20
10
10

0 0
1 2 3 4 5 6 1 2 3 4 5 6
mu mu
(a) Independent sampling and λ = 1 (b) CRN and λ = 1

100 HF function HF function

LF function 100 LF function
objective function value

80 objective function value 80

60
60
40 40

20 20

0 0
1 2 3 4 5 6 1 2 3 4 5 6
mu mu
(c) Independent sampling and λ = 5 (d) CRN and λ = 5

Figure 7: The trajectory of the objective function of the M/M/1 problem with and without
CRN. When employing CRN, the objective function exhibits smoothness, indicating that both
E h (·, ξ) and E l (·, ξ) are smooth functions for any ξ ∈ Ξ.

We have conducted testing on 5 instances of the M/M/1 problem, varying λ across the range
{1, 2, . . . , 5} in Figure 6. Figure 8 illustrates the optimization progress for two scenarios: one
where λ = 1 and another where λ = 5. In the scenario where λ = 1, as the incumbents approach
the optimal solution, it becomes essential for the TR to contract appropriately in order to achieve
an accurate gradient approximation. While contracting the TR, ASTRO-DF exhausts its budget
entirely, which explains its slower convergence in Figure 8a. In contrast, ASTRO-BFDF is capable
of rapidly identifying a near-optimal solution. The primary reason is that the gradient of the local
model for the LF function is inherently small, enabling us to sustain successful iterations before the
TR initiates sequential contraction. Conversely, when λ = 5, the gradient of the local model for the
LF function becomes exceedingly minuscule, prompting a cessation of LF function utilization after
just a few iterations. Hence, in Figure 8b, the optimization trajectory of ASTRO-BFDF appears
similar to that of ASTRO-DF, but ASTRO-BFDF demonstrates slightly faster convergence due
to the variance-reduced function estimates provided by BAS.

24
(a) λ = 1 (b) λ = 5

Figure 8: Fraction of the optimality gap with 95% confidence intervals from 20 runs of each
algorithm.

5.2.2 (s, S) Inventory Problem

In the inventory problem, we consider (s, S) inventory model with full backlogging. At each time
step t, the demand Dt , which follows the exponential distribution with µD , is generated. At the
end of each time step, the inventory level is calculated and if it is below s, an order to get back
up to S is placed. Lead times follow the Poisson distribution with mean µL time steps. The
optimization is to find the best s and S for minimizing the average costs, which is composed of
backorder costs, order costs, and holding costs. This problem is significantly more challenging than
the MM1 problem due to the inherent non-smoothness with CRN (see Figure 9). Therefore, it is
highly probable that the majority of incumbent sequences converges to local optima, regardless
of the solvers used.
We conducted tests on 20 instances of the SSCONT problem, with varied parameters µD =
{25, 50, 100, 200, 400} and µL = {1, 3, 6, 9}. In the majority of cases, ASTRO-BFDF demonstrated
at least similar performance to ASTRO-DF, and sometimes surpassed it by uncovering superior
solutions with a smaller budget. This success can be attributed to its ability to avoid getting
trapped in local optima too quickly by leveraging the LF function. Detailed numerical results can
be found in Appendix B.

6 Conclusion
This paper introduces ASTRO-BFDF, a novel stochastic TR algorithm tailored for addressing bi-
fidelity simulation optimization. ASTRO-BFDF stands out for two key features: Firstly, it utilizes
bi-fidelity Monte Carlo or crude Monte Carlo dynamically, adjusting sample sizes adaptively for
both fidelity oracles within BAS. This ensures accurate estimation of function values, with the
accuracy required for both function and gradient determined by the progress of optimization.
Secondly, it strategically guides incumbents towards the neighborhood of the stationary point
of the HF function by solely utilizing the LF function. These two features allow to achieve a
faster convergence with enhanced computational efficiency, as demonstrated on several problems

25
600
High-Fidelity Objective Function 600
Low-Fidelity Objective Function 732
660
714
550 648 550
696
500 636 500 678
624 660
450 450
s

s
612 642
400 600 400 624
588 606
350 350
576 588
300 564 300 570
300 350 400 450 500 550 600 300 350 400 450 500 550 600
S S
Figure 9: The contour maps of the HF and LF function of the SSCONT problem with CRN.
The HF simulator operates for 100 days, while the LF simulator runs for 30 days, indicating a
cost ratio of 1:0.3.

including the synthetic problems and toy problems with DES. We also demonstrate the asymptotic
behavior of the incumbents generated by ASTRO-BFDF, which converges to the stationary point
almost surely.

Acknowledgments
This work was authored by the National Renewable Energy Laboratory, operated by Alliance
for Sustainable Energy, LLC, for the U.S. Department of Energy (DOE) under Contract No.
DE-AC36-08GO28308. Funding for the algorithmic development and numerical experiment work
was provided by Laboratory Directed Research and Development investments. Funding for the
theoretical work (proofs) was provided by the Office of Science, Office of Advanced Scientific Com-
puting Research, Scientific Discovery through Advanced Computing (SciDAC) program through
the FASTMath Institute. The views expressed in the article do not necessarily represent the views
of the DOE or the U.S. Government. The U.S. Government retains and the publisher, by accept-
ing the article for publication, acknowledges that the U.S. Government retains a nonexclusive,
paid-up, irrevocable, worldwide license to publish or reproduce the published form of this work,
or allow others to do so, for U.S. Government purposes.

A Proof of Lemma 5
Proof. Let us begin by noting that we have established from Step 13 in Algorithm 2 and Step 18
in Algorithm 3 that ∆hk ≥ ∆lk almost surely for any k ∈ N. Hence, if ∆hk converges to zero almost
surely, so does ∆lk . Let us define the following index sets,

H := {k ∈ N : (ρ̂k > η) ∩ (µ∥∇Mkh (Xk0 )∥ ≥ ∆hk ) ∩ (Ikh is True)},

L := {k ∈ N : Ikh is False}.

26
We have, for any k ∈ H,

Fe(Xk0 ) − Fe(Xks ) ≥ η[Mkh (Xk0 ) − Mkh (Xks )]

( )
1 ∥∇Mkh (Xk )∥ h
≥ ηκf cd ∥∇Mkh (Xk )∥ min , ∆k (28)
2 ∥Hk ∥
> κR (∆hk )2 ,
−1
where κR := min{ηκf cd (2µ)−1 min{(µκhH ) , 1}, ζ}. Note that (28) holds regardless of whether
Xks comes from minimizing Mkl or Mkh . We also obtain from Step 11 in Algorithm 3 that Fe(Xk0 ) −
Fe(Xks ) ≥ ζ(∆hk )2 ≥ κR (∆hk )2 for any k ∈ L. Hence, for any k ∈ K := H ∪ L,

X X ∞
X
κR (∆hk )2 ≤ h h e0 − E
(f (Xk ) − f (Xk+1 ) + Ek
e s ) ≤ f h (x0 ) − f h +
k ∗
e0 − E
|Ek
e s |,
k
k∈K k∈K k=0

where f∗h is the optimal value of f h . We note that H and L are disjoint sets and for any k ̸∈ K,
∆hk+1 = γ2 ∆hk . Let K = {k1 , k2 , . . . }, k0 = −1, and ∆h−1 = ∆h0 /γ2 . Then from the fact that
∆hk ≤ γ1 γ2k−ki −1 ∆hki for k = ki + 1, . . . , ki+1 and each i, we obtain

ki+1 ki+1 ∞
X X 2(k−ki −1)
X γ12
(∆hk )2 ≤ γ12 (∆hki )2 γ2 ≤ γ12 (∆hki )2 γ22k = (∆h )2 .
1 − γ22 ki
k=ki +1 k=ki +1 k=0

By Lemma 1, there must exist a sufficiently large K∆ such that |E e0 − E e s | < c∆ (∆h )2 for any
k k k
given c∆ > 0 and any k ≥ K∆ . Then, we have
∞ ∞
!
′
X
h 2 γ12 X h 2 γ12 (∆h0 )2 f h (x0 ) − f∗h + E0,∞
(∆k ) ≤ (∆ ) < +
1 − γ22 i=0 ki 1 − γ22 γ22 κR
k=0
!
h h ′ ′
γ12 (∆h0 )2 f (x0 ) − f∗ + E0,K∆ −1 + EK∆ ,∞
< + ,
1 − γ22 γ22 κR

′ =
Pj e0 e s |. Then we get from E ′ P∞ h 2
where Ei,j k=i |Ek −Ek K∆ ,∞ < c∆ K∆ (∆k ) that

∞ −1
!
h h ′
X γ12 (∆h0 )2 f (x0 ) − f∗ + E0,K∆ −1 γ12 c∆
(∆hk )2 < + 1−
1 − γ22 γ22 κR 1 − γ22 κR
k=K∆

w.p.1
Therefore, ∆hk −−−→ 0 as k → ∞ and the statement of the theorem holds.

27
B Numerical Results (SSCONT)
In this section, we show the performance of ASTRO-DF and ASTRO-BFDF on 20 instances of
the SSCONT problem. µ is the parameter for the demand and θ is the parameter for Lead times.
See Figure 10.

(a) µ = 25 and θ = 1 (b) µ = 25 and θ = 3

(c) µ = 25 and θ = 6 (d) µ = 25 and θ = 9

Figure 10: Optimization progress with 95% confidence intervals from 10 runs of ASTRO-DF
and ASTRO-BFDF on SSCONT.

28
(e) µ = 50 and θ = 1 (f) µ = 50 and θ = 3

(g) µ = 50 and θ = 6 (h) µ = 50 and θ = 9

(i) µ = 100 and θ = 1 (j) µ = 100 and θ = 3

Figure 10: Optimization progress with 95% confidence intervals from 10 runs of ASTRO-DF
and ASTRO-BFDF on SSCONT.
29
(k) µ = 100 and θ = 6 (l) µ = 100 and θ = 9

(m) µ = 200 and θ = 1 (n) µ = 200 and θ = 3

(o) µ = 200 and θ = 6 (p) µ = 200 and θ = 9

Figure 10: Optimization progress with 95% confidence intervals from 10 runs of ASTRO-DF
and ASTRO-BFDF on SSCONT.
30
(q) µ = 400 and θ = 1 (r) µ = 400 and θ = 3

(s) µ = 400 and θ = 6 (t) µ = 400 and θ = 9

Figure 10: Optimization progress with 95% confidence intervals from 10 runs of ASTRO-DF
and ASTRO-BFDF on SSCONT.

C Implementation Details
All methods used the same parameters (e.g., TR radius ∆k , success ratio η1 ) where possible.
ADAM and Nelder-Mead have used the default setting outlined in SimOpt github [14]. In terms of
the design set selection for the model construction, ASTRO-DF has used 2d + 1 design points with
the rotated coordinate basis (See history-informed ASTRO-DF [22]). In the bi-fidelity scenario,
we have employed two distinct design sets (Xk and Xkl ) at Step 4 in Algorithm 2 and Step 6 in
Algorithm 3 respectively. Xk is selected to construct the local model for the HF function, implying
that the computational costs for estimating the function value at Xk is relatively high. Hence,
the design set will be selected by reusing the design points within the TR and the corresponding
replications as much as possible. To achieve this, we first pick d + 1 design points to obtain
sufficiently affinely independent points by employing Algorithm 4.2 in [23]. After that, we pick
additional d design points following the opposite direction to construct the quadratic interpolation

31
model with diagonal Hessian. Xkl consists of 2d + 1 design points, selected using the coordinate
basis to minimize deterministic error owing to the lower cost of the LF oracle. In this scenario,
the design set Xkl is optimally designed for design sets of any size ranging from d + 2 to 2d + 1
(see [24]).

Hyper-parameters ASTRO-BFDF
∆max (maximum TR radius) problem dependent
2
∆l0 and ∆h0 (initial TR radius) 10⌈log(∆max )−1⌉/d
γ2 (expansion constant) 1.5
γ1 (shrinkage constant) 0.75
λk (sample size lower bound) 5
κ (adaptive sampling constant) F (X0 )/(∆h0 )2
η (model fitness threshold) 0.1
α0 (initial correlation constant) 0.5
ϵ̂ (model gradient threshold) 0.001
ζ (sufficient reduction constant) 0.01

Table 1: Hyper-parameters for ASTRO-BFDF.

References
[1] A. S. Berahas, L. Cao, K. Choromanski, and K. Scheinberg, “A theoretical and empirical comparison of
gradient approximations in derivative-free optimization,” Foundations of Computational Mathematics,
vol. 22, no. 2, pp. 507–560, 2022.
[2] R. Chen, M. Menickelly, and K. Scheinberg, “Stochastic optimization using a trust-region method and
random models,” Mathematical Programming, vol. 169, no. 2, pp. 447–487, 2018.
[3] K.-H. Chang, L. J. Hong, and H. Wan, “Stochastic trust-region response-surface method (strong)—a
new response-surface framework for simulation optimization,” INFORMS Journal on Computing,
vol. 25, no. 2, pp. 230–243, 2013.
[4] Y. Ha and S. Shashaani, “Iteration complexity and finite-time efficiency of adaptive sampling trust-
region methods for stochastic derivative-free optimization,” arXiv:2305.10650, 2023.
[5] S. Ghadimi and G. Lan, “Stochastic first- and zeroth-order methods for nonconvex stochastic pro-
gramming,” SIAM Journal on Optimization, vol. 23, no. 4, pp. 2341–2368, 2013.
[6] L. W. Ng and K. E. Willcox, “Multifidelity approaches for optimization under uncertainty,” Interna-
tional Journal for numerical methods in Engineering, vol. 100, no. 10, pp. 746–772, 2014.
[7] B. Peherstorfer, K. Willcox, and M. Gunzburger, “Survey of multifidelity methods in uncertainty
propagation, inference, and optimization,” Siam Review, vol. 60, no. 3, pp. 550–591, 2018.
[8] s. Xu, s. Zhang, s. Huang, s.-H. Chen, s. H. Lee, and s. Celik, “Efficient multi-fidelity simulation
optimization,” in Proceedings of the Winter Simulation Conference 2014, pp. 3940–3951, IEEE, 2014.
[9] S. De, K. Maute, and A. Doostan, “Bi-fidelity stochastic gradient descent for structural optimization
under uncertainty,” Computational Mechanics, vol. 66, pp. 745–771, 2020.
[10] R. Bollapragada, R. Byrd, and J. Nocedal, “Adaptive sampling strategies for stochastic optimization,”
SIAM Journal on Optimization, vol. 28, no. 4, pp. 3312–3343, 2018.
[11] S. Shashaani, F. S Hashemi, and R. Pasupathy, “ASTRO-DF: A class of adaptive sampling trust-region
algorithms for derivative-free stochastic optimization,” SIAM Journal on Optimization, vol. 28, no. 4,
pp. 3145–3176, 2018.
[12] R. Bollapragada, C. Karamanli, and S. M. Wild, “Derivative-free optimization via adaptive sampling
strategies,” arXiv preprint arXiv:2404.11893, 2024.

32
[13] B. Peherstorfer, K. Willcox, and M. Gunzburger, “Optimal model management for multifidelity monte
carlo estimation,” SIAM Journal on Scientific Computing, vol. 38, no. 5, pp. A3163–A3194, 2016.
[14] D. J. Eckman, S. G. Henderson, S. Shashaani, and R. Pasupathy, “SimOpt.” https://github.com/
simopt-admin/simopt, 2023.
[15] Y. Ha, S. Shashaani, and R. Pasupathy, “Complexity of zeroth-and first-order stochastic trust-region
algorithms,” arXiv preprint arXiv:2405.20116, 2024.
[16] J. Müller, “An algorithmic framework for the optimization of computationally expensive bi-fidelity
black-box problems,” INFOR: Information Systems and Operational Research, vol. 58, no. 2, pp. 264–
289, 2020.
[17] X. Song, L. Lv, W. Sun, and J. Zhang, “A radial basis function-based multi-fidelity surrogate model:
exploring correlation between high-fidelity and low-fidelity models,” Structural and Multidisciplinary
Optimization, vol. 60, pp. 965–981, 2019.
[18] D. J. Eckman, S. G. Henderson, and S. Shashaani, “Diagnostic tools for evaluating and comparing
simulation-optimization algorithms,” INFORMS Journal on Computing, vol. 35, no. 2, pp. 350–367,
2023.
[19] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” 2017.
[20] R. R. Barton and J. S. Ivey Jr, “Nelder-mead simplex modifications for simulation optimization,”
Management Science, vol. 42, no. 7, pp. 954–973, 1996.
[21] L. Mainini, A. Serani, M. P. Rumpfkeil, E. Minisci, D. Quagliarella, H. Pehlivan, S. Yildiz, S. Ficini,
R. Pellegrini, F. Di Fiore, et al., “Analytical benchmark problems for multifidelity optimization meth-
ods,” arXiv preprint arXiv:2204.07867, 2022.
[22] Y. Ha and S. Shashaani, “Towards greener stochastic derivative-free optimization with trust regions
and adaptive sampling,” in 2023 Winter Simulation Conference (WSC), pp. 3508–3519, IEEE, 2023.
[23] S. M. Wild, R. G. Regis, and C. A. Shoemaker, “Orbit: Optimization by radial basis function in-
terpolation in trust-regions,” SIAM Journal on Scientific Computing, vol. 30, no. 6, pp. 3197–3219,
2008.
[24] T. M. Ragonneau and Z. Zhang, “An optimal interpolation set for model-based derivative-free opti-
mization methods,” arXiv preprint arXiv:2302.09992, 2023.

SARMAP v7 Technical User Manual
100% (2)
SARMAP v7 Technical User Manual
119 pages
Multifidelity importance sampling
No ratings yet
Multifidelity importance sampling
23 pages
RBFOpt
No ratings yet
RBFOpt
36 pages
Gonzalez 2020
No ratings yet
Gonzalez 2020
79 pages
Convergence Analysis of Multifidelity Monte Carloestimation_2018
No ratings yet
Convergence Analysis of Multifidelity Monte Carloestimation_2018
25 pages
Mathematical Optimization: Fundamentals and Applications
From Everand
Mathematical Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet
On Efficient Algorithms For Computing Near-Best Polynomial Approximations To High-Dimensional, Hilbert-Valued Functions From Limited Samples
No ratings yet
On Efficient Algorithms For Computing Near-Best Polynomial Approximations To High-Dimensional, Hilbert-Valued Functions From Limited Samples
75 pages
19T_004
No ratings yet
19T_004
34 pages
Convergence Analysis of Multifidelity Monte Carloestimation
No ratings yet
Convergence Analysis of Multifidelity Monte Carloestimation
24 pages
Evaluation of Failure Probability Via Surrogate Models
No ratings yet
Evaluation of Failure Probability Via Surrogate Models
15 pages
Gosh (2022) [CPT convex ]
No ratings yet
Gosh (2022) [CPT convex ]
12 pages
Sample Complexity of Offline Distributionally Robust Linear Markov Decision Processes
No ratings yet
Sample Complexity of Offline Distributionally Robust Linear Markov Decision Processes
42 pages
A Physics and Data Co-Driven Surrogate Modeling Method For High-Dimensional Rare Event Simulation
No ratings yet
A Physics and Data Co-Driven Surrogate Modeling Method For High-Dimensional Rare Event Simulation
22 pages
Dalalyan - 2017 - Theoretical Guarantees For Approximate Sampling From Smooth and Log-Concave Densities
No ratings yet
Dalalyan - 2017 - Theoretical Guarantees For Approximate Sampling From Smooth and Log-Concave Densities
26 pages
9.2. SE5072_Multi-fidelity
No ratings yet
9.2. SE5072_Multi-fidelity
34 pages
Numerical Methods For Engineering Design and Optimization
No ratings yet
Numerical Methods For Engineering Design and Optimization
19 pages
cdd4 PDF
No ratings yet
cdd4 PDF
75 pages
1 s2.0 S0888327021005847 Main
No ratings yet
1 s2.0 S0888327021005847 Main
19 pages
Article 8
No ratings yet
Article 8
12 pages
Random Optimization: Fundamentals and Applications
From Everand
Random Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet
Digital Signal Processing (DSP) with Python Programming
From Everand
Digital Signal Processing (DSP) with Python Programming
Maurice Charbit
No ratings yet
Bilinear Interpolation: Enhancing Image Resolution and Clarity through Bilinear Interpolation
From Everand
Bilinear Interpolation: Enhancing Image Resolution and Clarity through Bilinear Interpolation
Fouad Sabry
No ratings yet
12966-Article (PDF) - 28717-1-10-20211102
No ratings yet
12966-Article (PDF) - 28717-1-10-20211102
49 pages
Aiaa 00 4891
No ratings yet
Aiaa 00 4891
10 pages
On Sampling Methods For Costly Multi-Objective Black-Box Optimization
No ratings yet
On Sampling Methods For Costly Multi-Objective Black-Box Optimization
19 pages
Importance Sampling: 4.1 The Basic Problem of Rare Event Simulation
No ratings yet
Importance Sampling: 4.1 The Basic Problem of Rare Event Simulation
18 pages
Lim 05429427
No ratings yet
Lim 05429427
10 pages
Optimization of Computer Simulation Models 1997 European Journal of Operatio
No ratings yet
Optimization of Computer Simulation Models 1997 European Journal of Operatio
24 pages
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
deridder_2019_A Bayesian Approach to Adaptive Frequency
No ratings yet
deridder_2019_A Bayesian Approach to Adaptive Frequency
4 pages
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
(2023) Reliability Based Structural Optimization Using Adaptive Neural Network Multisphere Importance Sampling
No ratings yet
(2023) Reliability Based Structural Optimization Using Adaptive Neural Network Multisphere Importance Sampling
25 pages
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
From Everand
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
Gérard Blanchet
3/5 (1)
Help Doc
No ratings yet
Help Doc
7 pages
B-splines chaos and Kalman Filters for solving a stochastic differential equation - ScienceDirect
No ratings yet
B-splines chaos and Kalman Filters for solving a stochastic differential equation - ScienceDirect
4 pages
A Gradient-Based Sequential Multifidelity Approach To Multidisciplinary Design Optimization
No ratings yet
A Gradient-Based Sequential Multifidelity Approach To Multidisciplinary Design Optimization
32 pages
379 Rate Informed Discovery Vi
No ratings yet
379 Rate Informed Discovery Vi
20 pages
A Tutorial On Bayesian Optimization
No ratings yet
A Tutorial On Bayesian Optimization
22 pages
Bayesian Optimization PDF
No ratings yet
Bayesian Optimization PDF
22 pages
Framework For Particle Swarm Optimization With Surrogate Functions
No ratings yet
Framework For Particle Swarm Optimization With Surrogate Functions
11 pages
1 s2.0 S0957417416301452 Main
No ratings yet
1 s2.0 S0957417416301452 Main
15 pages
Approximation Methods For Bilevel Programming: Saeed Ghadimi Mengdi Wang February 8, 2018
No ratings yet
Approximation Methods For Bilevel Programming: Saeed Ghadimi Mengdi Wang February 8, 2018
27 pages
Passivity-Based Sample Selection and Adaptive Vector Fitting Algorithm For Pole-Residue Modeling of Sparse Frequency-Domain Data
No ratings yet
Passivity-Based Sample Selection and Adaptive Vector Fitting Algorithm For Pole-Residue Modeling of Sparse Frequency-Domain Data
6 pages
Bayesian Inference With Subset Simulation: Strategies and Improvements
No ratings yet
Bayesian Inference With Subset Simulation: Strategies and Improvements
22 pages
Stoch Load Bal r1
No ratings yet
Stoch Load Bal r1
19 pages
Randomized Methods For Control of Uncertain Systems
No ratings yet
Randomized Methods For Control of Uncertain Systems
8 pages
Theory-6.0Dakota 6.0 Theory
No ratings yet
Theory-6.0Dakota 6.0 Theory
77 pages
2505.23913v1
No ratings yet
2505.23913v1
14 pages
Theory-6 0 0
No ratings yet
Theory-6 0 0
77 pages
P1.T2. Quantitative Analysis Bionic Turtle FRM Practice Questions Chapter 13: Simulation and Bootstrapping
No ratings yet
P1.T2. Quantitative Analysis Bionic Turtle FRM Practice Questions Chapter 13: Simulation and Bootstrapping
24 pages
NIPS-2011-non-asymptotic-analysis-of-stochastic-approximation-algorithms-for-machine-learning-Paper
No ratings yet
NIPS-2011-non-asymptotic-analysis-of-stochastic-approximation-algorithms-for-machine-learning-Paper
9 pages
Deep Gaussian Processes For Multi-Fidelity Modeling
No ratings yet
Deep Gaussian Processes For Multi-Fidelity Modeling
15 pages
DeRidder_2019_Adaptive frequency sampling using linear Bayesian vector fitting
No ratings yet
DeRidder_2019_Adaptive frequency sampling using linear Bayesian vector fitting
3 pages
Optim Notes
No ratings yet
Optim Notes
19 pages
Stochastic MPC
No ratings yet
Stochastic MPC
7 pages
Stochastic Search Optimization
No ratings yet
Stochastic Search Optimization
317 pages
Reliability and Cost Optimization of Electronic Devices Considering The Component Failure Rate Uncertainty
No ratings yet
Reliability and Cost Optimization of Electronic Devices Considering The Component Failure Rate Uncertainty
14 pages
HandbookORMS SP-Chapter01
No ratings yet
HandbookORMS SP-Chapter01
64 pages
INDENG231_FinalProjectReport
No ratings yet
INDENG231_FinalProjectReport
40 pages
Abstracts MonteCarloMethods
No ratings yet
Abstracts MonteCarloMethods
4 pages
Chapter 11 Tools and Application
No ratings yet
Chapter 11 Tools and Application
65 pages
Chapter 7 Risk Quiz Er Jonno
No ratings yet
Chapter 7 Risk Quiz Er Jonno
32 pages
Automated Timetable Generator Using Machine Learning
No ratings yet
Automated Timetable Generator Using Machine Learning
5 pages
Qmmod7 PDF
No ratings yet
Qmmod7 PDF
8 pages
Resource Management Techniques Subject Code: Cs6704: Course Notes
No ratings yet
Resource Management Techniques Subject Code: Cs6704: Course Notes
53 pages
2011 - Bearing Capacity of Spatially Random Cohesive Soil Using Numerical Limit Analyses - Kasama & Whittle
No ratings yet
2011 - Bearing Capacity of Spatially Random Cohesive Soil Using Numerical Limit Analyses - Kasama & Whittle
8 pages
QMT634 Buku Online
No ratings yet
QMT634 Buku Online
71 pages
An Exposition of Least Square Monte Carlo Approach for Real Options Valuation
No ratings yet
An Exposition of Least Square Monte Carlo Approach for Real Options Valuation
26 pages
Project Risk Management
100% (1)
Project Risk Management
27 pages
Overview of Airport Terminal Simulation Models: Saleh Mumayiz
No ratings yet
Overview of Airport Terminal Simulation Models: Saleh Mumayiz
10 pages
Snip, A Statistics-Sensitive Background Treatment For The Quantitative
No ratings yet
Snip, A Statistics-Sensitive Background Treatment For The Quantitative
7 pages
Bayesian Inference and Learning
No ratings yet
Bayesian Inference and Learning
48 pages
Mathematical Modeling For Business Analytics
94% (16)
Mathematical Modeling For Business Analytics
451 pages
vCavjRGTepKAPYH9u-MA586 Syllabus Fall2023v1.2
No ratings yet
vCavjRGTepKAPYH9u-MA586 Syllabus Fall2023v1.2
18 pages
Using Degradation Measures To Estimate A Time To Failure Distribution
No ratings yet
Using Degradation Measures To Estimate A Time To Failure Distribution
15 pages
Ước lượng vasicek bằng OLS
No ratings yet
Ước lượng vasicek bằng OLS
38 pages
Generation and Transmission Expansion Planning: MILP-Based Probabilistic Model
No ratings yet
Generation and Transmission Expansion Planning: MILP-Based Probabilistic Model
10 pages
Thomson 2006
No ratings yet
Thomson 2006
5 pages
Monte Carlo Methods for Particle Transport 2nd Edition Alireza Haghighat all chapter instant download
100% (1)
Monte Carlo Methods for Particle Transport 2nd Edition Alireza Haghighat all chapter instant download
62 pages
Modeling & Simulation
0% (1)
Modeling & Simulation
51 pages
Week 8 Lecture New
No ratings yet
Week 8 Lecture New
49 pages
GOOD - Introduction To Business Analytics PDF
No ratings yet
GOOD - Introduction To Business Analytics PDF
21 pages
Monte-Carlo Simulation: Abstract - Monte Carlo Methods Are Often Used in Simulating Physical and
No ratings yet
Monte-Carlo Simulation: Abstract - Monte Carlo Methods Are Often Used in Simulating Physical and
4 pages
Adaptive Sampling Bi-Fidelity Stochastic Trust Region Method For Derivative-Free Stochastic Optimization
No ratings yet
Adaptive Sampling Bi-Fidelity Stochastic Trust Region Method For Derivative-Free Stochastic Optimization
33 pages
IS 28640-Random Variate Generation Methods PDF
No ratings yet
IS 28640-Random Variate Generation Methods PDF
58 pages
Cohort Analysis (Mason W. - Wolfinger N.,, 2001)
No ratings yet
Cohort Analysis (Mason W. - Wolfinger N.,, 2001)
7 pages
RADIOACTIVE DECAY
No ratings yet
RADIOACTIVE DECAY
4 pages
Risk and Decision Making: Topic List
No ratings yet
Risk and Decision Making: Topic List
50 pages
U3 Ch07 Risk Management
No ratings yet
U3 Ch07 Risk Management
32 pages

Adaptive Sampling Bi-Fidelity Stochastic Trust Region Method For Derivative-Free Stochastic Optimization

Uploaded by

Adaptive Sampling Bi-Fidelity Stochastic Trust Region Method For Derivative-Free Stochastic Optimization

Uploaded by

Adaptive Sampling Bi-Fidelity Stochastic Trust Region Method for

Derivative-Free Stochastic Optimization

where f h : ℜd → ℜ is nonconvex and bounded from below, and F h : ℜd × Ξ → ℜ is a random

10 Low-fidelity function with stochastic noise

It is reasonable to expect that employing a LF simulation oracle can expedite convergence

1.2 Adaptive Sampling

1.3 Bi-fidelity Monte Carlo

Then the variance of F̄ bf (x, n, v, c) becomes

1.4 Stochastic Trust Region Algorithms for Derivative-free Stochastic Opti-

(b) (subproblem minimization) a candidate point Xks is obtained by approximately minimizing

1.5 Summary of Results and Insight.

· At a given x ∈ ℜd , is BFMC cheaper than CMC?

2.2 Key Definitions

ϕ1 (Xk0 ) ϕ2 (Xk0 ) · · · ϕp (Xk0 )

2.3 Standing Assumptions

2.4 Useful Results

max{σ0 , SV (Xki , n)} κ(∆qk )2

where σ h,l (x) is a covariance between F h (x, ξ) and F l (x, ξ).

3 Adaptive Sampling Bi-fidelity Trust Region Optimization

Algorithm 1 [Nk (x), Vk (x), Ck (x), Fek (x)]=BAS(x, ∆k , λk , κ, Fk )

subject to Var(F̄ bf (x, n c)) ≤ κ2 ∆4k λ−1

3.2 Bi-fidelity Stochastic Trust Region Method

7: Construct local models Mkl (X) and Mkh (X).

Xks,h ∈ argmin Mkh (X) and Xks,l ∈ argmin Mkl (X).

9: Estimate Fe(Xks,h ) and Fe(Xks,l ) by calling Algorithm 1 with ∆k = ∆hk .

Fek (Xk0 ) − Fek (Xks ) l Fek (Xk0 ) − Fek (Xks,l )

12: If ρ̂lk ≥ η, set αk = γ1 αk ; otherwise set αk = γ2 αk .

∆lk = min ∆lk , ∆hk , and k = k + 1.

Xks,l = argmin Mkl (X).

Fek (Xk0 ) − Fek (Xks,l )

12: if ρ̂k ≥ η and ∥∇Mkl (Xk0 )∥ ≥ ϵ̂ then

Lemma 5. Let Assumptions 1-4 hold. Then,

Proof. See Appendix A.

∥∇f h (Xk )∥ > 2ϵ0 , tj ≤ k < ℓj , (21)

and let k ∈ Ahj . We then obtain from Lemma 6

∥∇Mkh (Xk )∥ > ϵ0 . (23)

5.1 Synthetic Problems

In Figure 3, ASTRO-BFDF consistently demonstrates superiority regardless of the correlation

5.2 Toy Problems with DES

5.2.1 M/M/1 Queue Problem

We employ a model that simulates a M/M/1 queue, characterized by an exponential distribution

objective function value

100 HF function HF function

80 objective function value 80

5.2.2 (s, S) Inventory Problem

H := {k ∈ N : (ρ̂k > η) ∩ (µ∥∇Mkh (Xk0 )∥ ≥ ∆hk ) ∩ (Ikh is True)},

Fe(Xk0 ) − Fe(Xks ) ≥ η[Mkh (Xk0 ) − Mkh (Xks )]

(a) µ = 25 and θ = 1 (b) µ = 25 and θ = 3

(c) µ = 25 and θ = 6 (d) µ = 25 and θ = 9

(g) µ = 50 and θ = 6 (h) µ = 50 and θ = 9

(i) µ = 100 and θ = 1 (j) µ = 100 and θ = 3

(m) µ = 200 and θ = 1 (n) µ = 200 and θ = 3

(o) µ = 200 and θ = 6 (p) µ = 200 and θ = 9

(s) µ = 400 and θ = 6 (t) µ = 400 and θ = 9

Table 1: Hyper-parameters for ASTRO-BFDF.

You might also like