0% found this document useful (0 votes)
4 views

AutoTandemML: Active Learning Enhanced Tandem Neural Networks for Inverse Design Problems

The document presents AutoTandemML, a novel hybrid framework that integrates active learning with Tandem Neural Networks (TNNs) to enhance the efficiency of solving inverse design problems in science and engineering. The approach selectively samples informative data points, significantly improving accuracy while reducing the dataset size across three benchmark problems: airfoil inverse design, photonic surface design, and scalar boundary condition reconstruction. The study demonstrates that AutoTandemML outperforms traditional methods, providing a robust tool for optimizing complex design spaces.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

AutoTandemML: Active Learning Enhanced Tandem Neural Networks for Inverse Design Problems

The document presents AutoTandemML, a novel hybrid framework that integrates active learning with Tandem Neural Networks (TNNs) to enhance the efficiency of solving inverse design problems in science and engineering. The approach selectively samples informative data points, significantly improving accuracy while reducing the dataset size across three benchmark problems: airfoil inverse design, photonic surface design, and scalar boundary condition reconstruction. The study demonstrates that AutoTandemML outperforms traditional methods, providing a robust tool for optimizing complex design spaces.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

AutoTandemML: Active Learning Enhanced

Tandem Neural Networks for Inverse Design


Problems
Luka Grbcic1*, Juliane Müller2 and Wibe Albert de Jong1*
1* AppliedMathematics and Computational Research, Lawrence Berkeley
National Laboratory, 1 Cyclotron Rd, Berkeley, 94720, California, USA.
arXiv:2502.15643v1 [cs.LG] 21 Feb 2025

2 Computational Science Center, National Renewable Energy Laboratory,

15013 Denver West Parkway, Golden, 80401, Colorado, USA.

*Corresponding author(s). E-mail(s): [email protected]; [email protected];


Contributing authors: [email protected];

Abstract
Inverse design in science and engineering involves determining optimal design
parameters that achieve desired performance outcomes, a process often hindered
by the complexity and high dimensionality of design spaces, leading to significant
computational costs. To tackle this challenge, we propose a novel hybrid approach
that combines active learning with Tandem Neural Networks to enhance the effi-
ciency and effectiveness of solving inverse design problems. Active learning allows
to selectively sample the most informative data points, reducing the required
dataset size without compromising accuracy. We investigate this approach using
three benchmark problems: airfoil inverse design, photonic surface inverse design,
and scalar boundary condition reconstruction in diffusion partial differential
equations. We demonstrate that integrating active learning with Tandem Neural
Networks outperforms standard approaches across the benchmark suite, achieving
better accuracy with fewer training samples.

Keywords: inverse design, tandem neural networks, active learning, machine learning

1
1 Introduction
Inverse design problems are inherently complex because they require determining
the necessary inputs or configurations to achieve a specific desired outcome, often
within systems governed by non-linear, multi-dimensional relationships. These chal-
lenges are prevalent across various engineering disciplines—for instance, designing an
airfoil shape to meet certain aerodynamic criteria [24], or designing photonic materi-
als based on target optical properties [14, 30]. Such problems are frequently ill-posed,
lacking unique solutions and being sensitive to data imperfections, which worsens their
complexity. Consequently, there is a critical need for efficient computational tools and
advanced methods—such as optimization algorithms and machine learning—to solve
inverse design problems effectively, enabling engineers to innovate and refine designs
by working backwards from desired performance specifications.
Deep Neural Networks (DNN) have been very successful as inverse design models in
science and engineering [23, 24, 30]. Generative Adversarial Networks (GANs) [48, 49],
Variational Autoencoders (VAEs) [21, 41], and Invertible Neural Networks (INNs)
[9, 10] are some of the most prominent DNN-based inverse design approaches. However,
both GANs and VAEs have drawbacks related to training difficulty, instability, and
limitations in capturing the full diversity of possible designs [20], while INNs introduce
complex neural architecture components in order to be effective, and thus are harder
to train.
Tandem Neural Networks (TNNs) offer advantages over VAEs, GANs, and INNs
in inverse design tasks by enabling direct optimization of input parameters to achieve
desired outputs, leading to more stable and straightforward training. They avoid issues
like training instability, mode collapse, and difficulty capturing multi-modal outputs
that commonly affect VAEs and GANs, resulting in more accurate, and interpretable
designs that better meet specific target properties. Furthermore, TNNs do not require
the complex neural architectures of INNs, making them more effective and efficient
for exploring complex design spaces in inverse design applications.
TNNs have emerged as a powerful tool for solving inverse design problems across
various fields, enabling the efficient optimization of complex systems by mapping
desired outputs back to input parameters. In the realm of photonics and nanopho-
tonics, they have been extensively employed to design metasurfaces and nanophotonic
devices with tailored optical properties. Studies have demonstrated the use of TNNs
for optimizing metasurfaces for specific functionalities such as polarization conversion,
absorption, and color filtering [7, 15, 25, 32, 42–44, 46, 47].
In addition to metasurfaces, TNNs have been applied to the inverse design of other
nanophotonic structures, including optical nanoantennas and multilayer thin films
[16, 26, 36, 40, 45, 50]. These studies showcase the versatility of TNNs in handling
both continuous and discrete design parameters, facilitating the creation of devices
with customized optical responses and advancing the capabilities in nanofabrication
and materials science.
Beyond photonics, TNNs have found applications in diverse areas such as porous
media transport, radar engineering, wind turbine airfoil design, electronic integrated
systems, and optical fiber design. In porous media transport, TNNs have been uti-
lized to model multicomponent reactive transport processes, enhancing the accuracy

2
and efficiency of simulations [5]. In radar engineering, they have facilitated the
design of composite radar-absorbing structures by enabling inverse design capabilities
[31]. Similarly, in wind turbine airfoil design, TNNs have been employed to achieve
desired aerodynamic properties by inversely mapping performance metrics to geomet-
ric parameters [2]. In electronic integrated systems, TNNs have assisted in channel
inverse design [27]. In the field of optical fibers, they have been applied to the inverse
design of hollow-core anti-resonant fibers [28].
Despite the demonstrated efficacy of TNNs in specific applications, there remains a
lack of comprehensive research evaluating their performance across a broad spectrum
of inverse design problems. Most existing studies focus on tailored solutions for indi-
vidual challenges, which limits the understanding of TNNs as a general-purpose tool.
In particular, the impact of training data characteristics on the efficacy of TNNs has
not been thoroughly investigated. Addressing this gap, the novelty and contributions
of our work are as follows:
1. We develop a hybrid framework, AutoTandemML, that combines sampling data
generation by active learning with TNNs specifically for inverse design problems.
2. We investigate whether datasets optimized for accurately predicting forward
relationships also perform well in capturing inverse relationships with the TNNs.
3. We compare AutoTandemML with TNNs trained on data generated by other
sampling algorithms—namely Random, Latin Hypercube, Best Candidate, and
GreedyFP—across three inverse design benchmark problems [19].
4. We introduce an inverse design benchmark suite comprising three significant prob-
lems in science and engineering: Airfoil Inverse Design (AID), Photonic Surfaces
Inverse Design (PSID), and Scalar Boundary Reconstruction (SBR) of a scalar dif-
fusion partial differential equation (PDE). We make these benchmarks available to
the research community through a public data repository.
5. We develop the AutoTandemML Python module available for everyone—an easy-
to-use software tool for the automated generation of TNNs.
The remainder of this article is organized as follows. Sec. 2 introduces the
AutoTandemML framework, including descriptions of inverse design models, and each
component that forms the framework—specifically, active learning for multi-output
regression and TNNs. Sec. 3 describes the inverse design benchmarks and the setup
of all numerical experiments. We outline the sampling methods used for compari-
son, as well as the accuracy metrics and inverse design validation methods employed
in our study. In Sec. 4, we present the performance results of AutoTandemML and
compare it with other sampling methods used to train the TNNs across all inverse
design benchmark problems. Finally, App. A, B, and C provide additional details on
the hyperparameters used for all machine learning models, specifics of the inverse
design benchmarks, and supplementary results that enhance the interpretation of
AutoTandemML’s performance, respectively. This organizational structure aims to
guide the reader through the development, implementation, and evaluation of the
AutoTandemML framework in a logical and coherent manner.

3
2 AutoTandemML Framework
In this section, we mathematically define inverse design models, the active learning
algorithm for multi-output regression and describe the components of the TNN. Fur-
thermore, we explain how active learning and TNNs can be integrated into a single
framework for efficiently solving inverse design problems.

2.1 Inverse Design Models


The main objective of inverse design models is to infer design parameters that yield
a known target value or desired property. Mathematically, the inverse design problem
is defined as:
x = f −1 (y) (1)
In Equation (1), x ∈ R is the design vector we aim to determine, and y ∈ RN is
d

the target vector specified a priori. The function f : Rd → RN represents the forward
model that maps design parameters to their resulting properties. In inverse design we
seek to invert this function to find the design parameters that produce the desired
target outcomes.
Inverse design problems are typically ill-posed, meaning that multiple values of
x can yield similar values of y. This lack of a unique solution necessitates the use
of computational methods and machine learning models to approximate the inverse
mapping or to identify suitable design parameters through optimization techniques.

2.2 Active Learning for Multi-output Regression


Active learning is a machine learning approach that operates as a form of opti-
mal experimental design, where computational methods are employed to identify and
obtain data instances that will most significantly enhance the model’s accuracy or per-
formance. By strategically selecting the most informative data points—typically those
about which the model is least certain—active learning seeks to maximize learning
efficiency ([37]).
More specifically, the active learning algorithm aims to select input design vectors
where the machine learning model M (also denoted as the forward model) exhibits the
highest predictive uncertainty. This is formally expressed as an optimization problem:
d
X
x∗i ∈ arg max u(x) ∈ arg max σj (x; M) (2)
x x
j=1
Here, u(x) represents the total uncertainty at input design vector x ∈ Rd ,
computed as the sum of standard deviations across all d output dimensions for a multi-
output regression problem. x∗i represents the ith optimal input design vector based on
the maximum value of u(x). The standard deviations are encapsulated in the vector:

σ1 (x∗i ; M)
 
σ2 (x∗i ; M)
σ(x∗i ; M) =   ∈ Rd (3)
 
..
 . 
σd (x∗i ; M)

4
The algorithm proceeds by iteratively selecting the most uncertain inputs, query-
ing the High fidelity (HF) surrogate H at these points, and updating the model
accordingly. The detailed steps are outlined in Alg. 1.
More specifically, the algorithm begins with an initial dataset D of size n0 , where
each design vector xi is evaluated using the HF surrogate H to obtain f (xi ). The
model M is initially trained on this dataset. In each iteration, while the total number
of evaluations is less than the pre-defined evaluation budget Nmax , the algorithm
performs an optimization step to find k input points x∗i that maximize the total
predictive uncertainty u(x) of M, as defined in Eq. 2, each subsequent optimization is
started with a new random seed. The k points form the batch B. Each design vector
in B is evaluated using the HF surrogate H to obtain new outputs, and the dataset D
is updated accordingly. The model M is retrained on this expanded dataset before the
next iteration begins, thereby incrementally improving its performance by focusing on
areas where it is most uncertain.

Algorithm 1 Active Learning for Multi-output Regression


Require: • HF surrogate H(x)
• Initial dataset size n0
• Maximum number of evaluations Nmax
• Batch size k
n0
1: Initialize dataset D = {(xi , f (xi ))}i=1 , where f (xi ) = H(xi )
2: Train initial model M on dataset D
3: while number of evaluations < Nmax do
4: Optimization Step:
5: for i = 1 to k do
6: Find point x∗i by solving with a new random seed:

d
X
x∗i = arg max u(x) = arg max σj (x; M)
x x
j=1

7: end for
8: B = {x∗(1) , x∗(2) , . . . , x∗(k) }
9: for i = 1 to k do
10: Evaluate HF surrogate: f (xi ) = H(xi )
11: Update dataset: D ← D ∪ {(xi , f (xi ))}
12: end for
13: Retrain model M on updated dataset D
14: end while

The input design vector (x) and the HF surrogate H both vary depending on the
specific benchmark problem under consideration. The specifics of the design vectors
and responses for each benchmark are provided in Sec. 3.1, while detailed descrip-
tions of the HF surrogates H are given in App. B. Details on the computational
implementation of the multi-output regression active learning algorithm—including

5
the hyperparameters used, the selection of the model M, and the optimizer employed
to determine the optimal design vectors (x∗i )—can be found in Sec. 3.2.

2.3 Active Learning Enhanced Tandem Neural Networks


We aim to utilize the dataset D = {(x, f (x))}, generated through active learning,
to train each component of the TNN. The complete AutoTandemML framework is
illustrated in Fig. 1. Specifically, Fig. 1a depicts the active learning algorithm. As
elaborated in the previous section, the goal of active learning is to employ the model
M to generate design vectors x that maximize uncertainty, and then evaluate them
using the HF surrogate H to form and update the dataset D = {(x, f (x))}.
Furthermore, Figs. 2b and 2c illustrate the two segments of the TNN—the forward
DNN (denoted as FDNN ) and the inverse DNN (denoted as IDNN ), respectively. The
forward DNN approximates the mapping from input parameters to outputs, while
the inverse DNN predicts the input parameters corresponding to desired outputs. By
training these networks using the active learning generated dataset, we enhance the
predictive accuracy and reliability of the inverse design process.
More specifically, in inverse design, the objective is to determine the input param-
eters x that produce a desired output through a complex system. We start by training
the FDNN , to approximate the mapping from inputs to outputs, x → f (x). We then
train the IDNN , which aims to predict the input x = IDNN (f (x)) corresponding to
a given output f (x). During the training of IDNN , we utilize a loss function L that
compares the original output f (x) with the output of the forward model when fed
with the inverse prediction:
 
L = L f (x), FDNN (x) = L f (x), FDNN (IDNN (f (x))) (4)

Minimizing this loss adjusts the weights of IDNN so that the predicted input generates
an output through FDNN that closely matches the desired output f (x).
This tandem training approach ensures that the inverse neural network IDNN not
only seeks inputs that reproduce the desired outputs but also aligns with the learned
mapping of the forward neural network FDNN . By incorporating FDNN into the loss
function, we effectively regularize IDNN , promoting reliable inverse solutions. This
method is particularly advantageous in scenarios where the inverse problem is ill-posed
or where multiple inputs can lead to the same output, as it leverages the forward
model’s representation to guide the inverse predictions toward feasible and meaningful
solutions.
The training details and the hyperparameters of both FDNN and the IDNN are
presented in App. A.1.

6
(3) (2)
Design points Points x of
Ground truth
maximum
model (G) x
uncertainty

Active

Ev

M
mo )
f(x
f(x ons
alu
Learning Loop

del
)

x,
ati

in
Tra
Update dataset (x,
f(x))

(1)

(a)

Forward Deep Neural Network FDNN(x) f(x) Inverse Deep Neural Network IDNN(f(x))

f(x)
x

x
Compute Loss
L(f(x), FDNN(x))

(b) (c)

Fig. 1: AutoTandemML framework segments: (a) Active learning to generate a dataset


(x, f (x)). (b) Forward deep neural network FDNN training with the active learning
generated dataset (x, f (x)). (c) Inverse deep neural network IDNN training with the
active learning generated dataset (f (x), x), and a modified loss function that utilizes
the FDNN predictions.

3 Benchmarks and Numerical Experiments


In this section, we define the inverse design benchmark problems used to analyze the
performance of the active learning sampling and evaluate the accuracy of the IDNN .
We provide detailed descriptions of the numerical experiments conducted, including
the sampling algorithms employed for comparison with the active learning approach.
Finally, we outline the accuracy metrics used to assess the performance of each method
and describe the inverse design validation procedure.

7
3.1 Inverse Design Benchmarks
In order to assess the performance of the active learning enhanced TNN, we develop a
benchmark suite of three inverse design problems. The outline of the three problems
is presented in Fig. 2.
The first problem is the Airfoil Inverse Design (AID), where the goal is to deter-
mine both the flow and the design parameters of an airfoil based on its pressure
coefficient distribution. In Fig. 2a the specifics of the AID problem are visualized. The
dimensionless pressure coefficient curves are denoted as Cp , while the flow and the
geometrical parameters include the Reynolds number (Re), the Angle of Attack (α),
the maximum camber distance (m), position of the maximum camber (p), and the
maximum thickness of the airfoil (t).
The second problem is the Photonic Surface Inverse Design (PSID), which aims to
obtain laser manufacturing parameters corresponding to a desired spectral emissivity
curve of the photonic surface (a metamaterial), visualized in Fig. 2b. The photonic
surfaces are created by texturing the surface of the plain material (in our case the
alloy Inconel) with ultra-fast lasers. The spectral emissivity curves are defined as ϵ,
and the laser manufacturing parameters are laser power (Lp ) (W), scanning speed (Ss )
(mm/s), and the spacing of the textures on the surface of the metamaterial (Sp ) (µm).
The third problem is the Scalar Boundary Reconstruction (SBR), where the objec-
tive is to recover the boundary conditions (cBC ) of a scalar diffusion PDE using
scattered measurements of the scalar field (c) within a two-dimensional domain, as
shown in Fig. 2c. The scalar diffusion PDE models phenomena such as heat transfer
or contaminant diffusion in a medium. The boundary conditions cBC represent the
values of the scalar field (e.g., temperature or concentration) along the top boundary
of the domain, which are unknown and need to be determined. The scattered mea-
surements c are obtained at 30 interior points within the domain and provide partial
information about the scalar field’s distribution. These measurements are strategically
placed to capture the behavior of the field within the domain. Both cBC and c are
dimensionless quantities, normalized to facilitate computational analysis.
A mathematical summary of the benchmarks—including the size of the design
spaces and formal definitions of the design vectors and objectives—is provided in Table
1. Comprehensive details of all benchmarks, such as the mathematical and numerical
background, simulation software used, experimental setup and datasets, as well as the
training and validation details of the HF surrogates H, are given in Appendix B.

8
(a)

(b)

(c)

Fig. 2: Inverse design benchmark problems: (a) Airfoil inverse design (AID). (b)
Photonic surfaces inverse design (PSID). (c) Scalar boundary reconstruction (SBR).

Table 1: Summary of Benchmark Problems


Aspect Benchmark 1 Benchmark 2 Benchmark 3
Problem Name AID PSID SBR
Re cBC1
   
 
α Lp  cBC2 
Design Vector x x =  m  ∈ R5 x =  Ss  ∈ R 3 x =  .  ∈ R20
   
p Sp  .. 
t cBC20
Output y Cp ∈ R75 ϵ ∈ R822 c ∈ R30
Forward Mapping f : R5 → R75 f : R3 → R822 f : R20 → R30
Inverse Mapping f −1 : R75 → R5 f −1 : R822 → R3 f −1 : R30 → R20
Objective Recover x from Cp Recover x from ϵ Recover x from c

9
3.2 Numerical Experiments Setup
In this section, we outline the setup of the numerical experiments conducted. Specifi-
cally, we detail the hyperparameters of the active learning process, and the sampling
algorithms employed for comparison with the active learning approach.
In the active learning procedure, we employed two different algorithms to efficiently
train the forward model M: the Random Forest (RF) algorithm and an ensemble of
DNNs, referred to as Deep Ensembles (DE). Both algorithms were chosen because
they provide uncertainty quantification capabilities essential for active learning. Imple-
mentation details and hyperparameters for both algorithms are provided in App. A.
We initially used only the RF algorithm to explore their performance across different
benchmark problems. However, we found that it failed to accurately model the for-
ward relationship in the SBR benchmark problem. Consequently, we selected the DE
algorithm for the SBR benchmark, as outlined in Tab. 2.
Furthermore, the maximum number of samples generated by the active learning
procedure varied per benchmark as outlined in Tab. 2. The number of maximum
samples corresponds to the number of times the HF surrogate H is queried for a
response in order to generate a dataset and train the TNN components FDNN and
IDNN . For all benchmarks, the active learning process was initialized with n0 = 20
samples generated using the Latin Hypercube Sampling (LHS) algorithm, and a batch
size k = 5 was used in each iteration of the process. To determine the optimal design
vectors (x∗i ) (defined in Alg. 1) we utilize the Particle Swarm Optimization (PSO)
algorithm (implemented in the Indago 0.5.1 Python module for optimization), where
the maximum number of evaluations is set to 100, while all other hyperparameters of
the PSO are set as the default recommended variables of the module ([17]).

Table 2: Active learning numer-


ical experiments setup for each
benchmark.
Benchmark Algorithm Nmax
AID RF 150
PSID RF 300
SBR DE 400

We employed four different sampling algorithms to generate efficient datasets for


training each component of the TNN and to compare their performance with the active
learning approach. The algorithms used are Random sampling (R), LHS, GreedyFP
(GFP) [8, 11], and Best Candidate (BC) [29] sampling.
The BC and GFP sampling algorithms were selected as they were determined to
perform efficiently for an array of tasks such as surrogate modeling, hyperparameter
optimization, and data analysis [19]. Both the GFP algorithm and the BC algorithm
generate sequences of samples by iteratively proposing candidate samples and selecting
the one that is farthest from the set of samples selected so far; however, the key
difference is that GFP uses a constant number of candidate samples at each iteration,
while the BC algorithm generates an increasing number of random candidate samples

10
with each iteration. Both sampling algorithms were implemented as outlined in the
work by [19].
For each benchmark problem, the number of samples generated by these samplers
is equal to the maximum number of samples specified in Tab. 2 to ensure a fair
comparison with the active learning approach. The generated samples were evaluated
using the HF surrogate H to form the training dataset. To account for randomness,
the active learning procedure, all of the sampling algorithms used to generate the
datasets, and the training of the TNN components were each repeated 30 times. The
results from these repetitions were then statistically analyzed.

3.3 Accuracy Metrics


For the forward model (M) assessment, and the IDNN , we use the root mean square
error (RMSE) (Eq. 5), and the coefficient of determination (R2 ) (Eq. 6) as they
are defined in [3], for multioutput regression problems since all of our inverse design
benchmarks fit into this category. Moreover, we also utilize the normalized maximum
absolute prediction error (NMAE) [39] (Eq. 7) extended for multi-output regression.
Firstly, the RMSE is defined as:
v
u n Xp
u 1 X 2
RMSE = t (yij − ŷij ) (5)
np i=1 j=1
where n is the number of samples in the output test data (Ty ), p is the number of
output variables (i.e. dimension of the output vectors of each inverse design benchmark
problem), yij is the true value of the j-th output for the i-th sample, ŷij is the model
(either M or IDNN ) predicted value of the j-th output for the i-th sample. Secondly,
we define the R2 as:
p P
n
1
P 2
p (yij − ŷij )
j=1 i=1
R2 = 1 − p P n (6)
1
P 2
p (yij − ȳj )
j=1 i=1
1
Pn
where ȳj is the mean of the true values for output j defined as ȳj = n i=1 yij .
Finally, we define the NMAE as:

p max |y − ŷij |
1 X 1≤i≤n ij
NMAE = (7)
p j=1 max |yij − ȳj |
1≤i≤n
By employing these metrics, we can comprehensively evaluate the performance
of our models across all output variables, ensuring that both average performance
(through RMSE and R2 ) and worst-case scenarios (through NMAE) are adequately
assessed.

3.4 Inverse Design Validation


To assess the accuracy metrics (RMSE, R2 , and NMAE) of the IDNN on the test
data, we utilize the validation framework presented in Fig. 3. Specifically, since we

11
are interested in the inverse relationship, we use the output values Ty from the test
dataset as inputs to the IDNN . The IDNN then produces predictions of the original
inputs, which we denote as PIDN N .
Next, we feed these predicted inputs PIDN N into the HF surrogate H to generate
reconstructed outputs, denoted as Py . Essentially, Py are the outputs that the model
H would produce given the predicted inputs from the IDNN . We then compare these
reconstructed outputs Py with the actual output values Ty from the test dataset
using the accuracy metrics. This comparison allows us to evaluate how well the IDNN ,
in conjunction with the HF surrogate, can reproduce the original outputs, thereby
assessing the IDNN model’s performance.
In all inverse design benchmarks, we utilized a test dataset (Tx , Ty ) consisting of
1,000 randomly selected instances (also unseen by the HF surrogate H). The IDNN
was trained using the maximum number of samples specified in Tab. 2, which was
significantly fewer than 1,000 in each case. Consequently, the size of the training set
was much smaller than that of the test set, resulting in a train/test size ratio heavily
skewed towards the test set. This imbalance highlights the challenge for IDNN to
generalize effectively from a limited amount of training data to a larger, diverse test
set. Further details of the test dataset for each inverse design benchmark problem are
available in App. B.

(1)
Test Data
Inputs Tx
Outputs Ty (2)

Inverse Deep
Test data Ty Neural Network Predictions PIDNN
(3)
IDNN(Ty)

HF Surrogate
H(PIDNN)
(4)
Comparison
Evaluations Py (Ty, Py)

Fig. 3: Inverse design validation procedure aiming to assess the accuracy of the trained
IDNN using the test data from each inverse design benchmark problem. In step (1), we
define the test dataset comprising input-output pairs (Tx , Ty ). In step (2), we utilize
the output values Ty as inputs to the inverse model IDNN to obtain the predicted
inputs PIDNN . In step (3), we reconstruct the outputs Py by feeding the predicted
inputs PIDNN into the HF surrogate H. Finally, in step (4), we compare the original
output values Ty from the test dataset with the reconstructed outputs Py to evaluate
the inverse model’s accuracy using the specified metrics.

12
4 AutoTandemML Framework Results
In this section, we present the results of the AutoTandemML framework applied to
all three inverse design benchmark problems. To assess the performance of the active
learning approach, we also compare these results with those obtained when the dataset
used to train the TNN is generated using other sampling methods (R, LHS, GFP, and
BC).

4.1 Airfoil Inverse Design Results


Fig. 4 presents the results of the AID benchmark. Specifically, the R2 metric results
are shown in Fig. 4a, the RMSE results are displayed in Fig. 4b, and the NMAE results
are illustrated in Fig. 4c. All three metrics are represented as box plots as they provide
a concise visual summary of each metric’s distribution by displaying their medians,
quartiles, and outliers. The results are also summarized statistically in Tab. 3.
Fig. 4a indicates that all approaches perform similarly in terms of the R2 score.
However, Tab. 3 reveals that the active learning approach (IDNNAL ) outperforms other
inverse DNNs trained with different samplers when considering the mean R2 score
(R2 =0.93). Furthermore, the active learning approach exhibits the best performance
regarding outliers of the obtained R2 scores, as the minimum R2 from the 30 runs is
R2 =0.87, the close second best approach in terms of the R2 score is IDNNGFP . Some
models obtained a negative R2 in as their worst performing run, indicating that their
predictions did not explain any of the variability in the data and are inferior to just
using the mean of the target variable as a predictor.
In Fig. 4b, we observe that when considering RMSE, the IDNNAL also performs
the best among the other samplers. All other approaches exhibit outliers that increase
the overall RMSE. In Tab. 3, while the mean RMSE of the IDNNAL is the best,
the significant difference between the active learning approach and other sampling
approaches becomes apparent when examining the maximum RMSE values. Specifi-
cally, the maximum RMSE for IDNNAL is 0.1529, whereas, as a reference, the worst
maximum RMSE is 0.4993 for IDNNBC .
Finally, in Fig. 4c, the NMAE metric confirms that IDNNAL also outperforms the
other approaches, achieving the lowest maximum NMAE (0.0459). This significant
difference in maximum NMAE is further evident when comparing IDNNAL to other
approaches, as observed in Tab. 3, i.e. the maximum NMAE for IDNNAL is 0.0623,
which is close to the mean NMAE of the IDNNBC (0.0604). Moreover, not only does
IDNNAL perform best when considering the mean of all metrics, but it also exhibits
the least uncertainty across the 30 runs, achieving the lowest standard deviation for all
metrics (third column of Tab. 3). The details of the forward model (M) performance
on the AID benchmark can be found in App. C.1.

13
1.000 0.5
0.975

0.950 0.4

0.925

RMSE
0.3
R2
0.900

0.875
0.2
0.850

0.825 0.1
0.800
IDNNR IDNNLHS IDNNAL IDNNBC IDNNGFP IDNNR IDNNLHS IDNNAL IDNNBC IDNNGFP

(a) (b)

0.14

0.12

0.10
NMAE

0.08

0.06

0.04

IDNNR IDNNLHS IDNNAL IDNNBC IDNNGFP

(c)

Fig. 4: Inverse DNN (IDNN ) performance on AID benchmark problem using different
dataset generation methods: (a) R2 (higher is better), (b) RMSE (lower is better), and
(c) NMAE (lower is better). Subscripts in IDNN denote the sampling method (e.g.,
IDNNR for random sampling).

14
Table 3: Statistical analysis of the IDNN performance on
AID benchmark problem using different dataset genera-
tion methods. Bold values indicate best performance in
each metric. RMSE and NMAE values should be as low
as possible, while R2 should be as high as possible.
Metric Method Mean Std Max Min
RMSE IDNNR 0.1261 0.0539 0.3770 0.0717
RMSE IDNNLHS 0.1341 0.0605 0.4194 0.0770
RMSE IDNNAL 0.1086 0.0239 0.1529 0.0693
RMSE IDNNBC 0.1313 0.0752 0.4993 0.0724
RMSE IDNNGFP 0.1244 0.0341 0.2247 0.0724
R2 IDNNR 0.9174 0.0915 0.9725 0.4592
R2 IDNNLHS 0.8878 0.2439 0.9702 -0.4185
R2 IDNNAL 0.9319 0.0254 0.9742 0.8719
R2 IDNNBC 0.8526 0.4169 0.9756 -1.3843
R2 IDNNGFP 0.9274 0.0386 0.9683 0.7920
NMAE IDNNR 0.0569 0.0150 0.1254 0.0380
NMAE IDNNLHS 0.0589 0.0142 0.1261 0.0426
NMAE IDNNAL 0.0459 0.0067 0.0623 0.0343
NMAE IDNNBC 0.0604 0.0180 0.1436 0.0411
NMAE IDNNGFP 0.0574 0.0102 0.0869 0.0439

4.2 Photonic Surface Inverse Design Results


Fig. 5 presents the results of the PSID benchmark. The R2 metric is shown in Fig.
5a, the RMSE results are displayed in Fig. 5b, and the NMAE results are illustrated
in Fig. 5c. The results are also statistically summarized in Tab. 4.
Fig. 5a indicates that IDNNAL outperforms all other sampling methods in terms
of the R2 score, and that it exhibits a very narrow interquartile range when com-
pared to other samplers. Tab. 4 reveals that the difference between IDNNAL and the
inverse DNNs trained with other samplers is substantial when considering the mean R2
scores. Specifically, the mean R2 for IDNNAL is 0.82, while the second-best performing
approach, IDNNBC , has an R2 of 0.60.
In Fig. 5b, we observe that when considering RMSE, IDNNAL also performs the
best among the sampling methods and exhibits the narrowest interquartile range. All
other approaches display increased RMSE values. In Tab. 4, it can be seen that both
the mean RMSE and the maximum RMSE of IDNNAL are the lowest compared to the
other sampling approaches. Specifically, the mean and maximum RMSE for IDNNAL
are 0.056 and 0.075, respectively. The second-best performing approach, IDNNBC ,
obtains mean and maximum RMSE values of 0.080 and 0.16, respectively.
Finally, in Fig. 5c, the NMAE metric confirms that IDNNAL outperforms the other
approaches, achieving the lowest mean NMAE of 0.324. This significant difference in
NMAE is further evident when comparing IDNNAL to other approaches, as shown
in Tab. 4; specifically, the maximum NMAE for IDNNAL is 0.42, while the worst-
performing approach, IDNNR , has a maximum NMAE of 0.92. Additionally, IDNNAL
exhibits the narrowest interquartile range in the NMAE metric, reflecting its consistent
performance across the 30 runs.

15
Moreover, IDNNAL performs the best when considering the mean of all metrics
and exhibits the least uncertainty across the 30 runs, achieving the lowest standard
deviation for all metrics (third column of Tab. 4). The details of the forward model
(M) performance on the PSID benchmark can be found in App. C.2.

1.0 0.18

0.16
0.8
0.14

0.6 0.12

RMSE
R2

0.10
0.4
0.08

0.2 0.06

0.04
0.0
IDNNR IDNNLHS IDNNAL IDNNBC IDNNGFP IDNNR IDNNLHS IDNNAL IDNNBC IDNNGFP

(a) (b)

0.9

0.8

0.7
NMAE

0.6

0.5

0.4

0.3

0.2
IDNNR IDNNLHS IDNNAL IDNNBC IDNNGFP

(c)

Fig. 5: IDNN performance on PSID benchmark problem using different dataset gen-
eration methods: (a) R2 (higher is better), (b) RMSE (lower is better), and (c) NMAE
(lower is better). Subscripts in IDNN denote the sampling method (e.g., IDNNR for
random sampling).

16
Table 4: Statistical analysis of the IDNN performance on
PSID benchmark problem using different dataset gener-
ation methods. Bold values indicate best performance in
each metric. RMSE and NMAE values should be as low
as possible, while R2 should be as high as possible.
Metric Method Mean Std Max Min
RMSE IDNNR 0.0820 0.0426 0.1740 0.0381
RMSE IDNNLHS 0.0838 0.0339 0.1552 0.0369
RMSE IDNNAL 0.0560 0.0092 0.0750 0.0365
RMSE IDNNBC 0.0806 0.0380 0.1632 0.0362
RMSE IDNNGFP 0.0873 0.0386 0.1652 0.0413
R2 IDNNR 0.5747 0.4141 0.9214 -0.3864
R2 IDNNLHS 0.5898 0.3144 0.9309 -0.1665
R2 IDNNAL 0.8224 0.0562 0.9299 0.7042
R2 IDNNBC 0.6094 0.3455 0.9315 -0.1932
R2 IDNNGFP 0.5589 0.3519 0.9093 -0.2343
NMAE IDNNR 0.4701 0.2136 0.9209 0.2085
NMAE IDNNLHS 0.5000 0.1902 0.8957 0.2069
NMAE IDNNAL 0.3244 0.0500 0.4236 0.2197
NMAE IDNNBC 0.4823 0.1945 0.8957 0.2280
NMAE IDNNGFP 0.5136 0.2117 0.8957 0.2663

4.3 Scalar Boundary Reconstruction Results


Fig. 6 presents the results of the SBR benchmark. The R2 metric is shown in Fig. 6a,
the RMSE results are displayed in Fig. 6b, and the NMAE results are illustrated in
Fig. 6c. The results are also statistically summarized in Tab. 5.
Fig. 6a indicates that IDNNBC outperforms all other sampling methods in terms
of the R2 score, achieving the highest mean R2 of 0.885. The second-best performing
approach, IDNNAL , has a mean R2 of 0.860, as shown in Tab. 5. Although IDNNAL
has a slightly lower mean R2 , it exhibits the narrowest interquartile range among all
methods, suggesting more consistent performance across the 30 runs.
In Fig. 6b, we observe that when considering RMSE, IDNNBC again performs the
best among the sampling methods, obtaining the lowest mean RMSE of 0.905. The
second-best performing approach, IDNNAL , achieves a mean RMSE of 0.978. While
IDNNAL has a slightly higher mean RMSE than IDNNBC , it exhibits the narrow-
est interquartile range, indicating more consistent error rates across different runs.
Finally, in Fig. 6c, the NMAE metric reveals that IDNNAL slightly outperforms the
other approaches, achieving the lowest mean NMAE of 0.2002 compared to 0.2009 for
IDNNBC . This suggests that, in terms of normalized absolute errors, IDNNAL provides
marginally better performance.
Moreover, IDNNAL demonstrates comparable performance to IDNNBC when con-
sidering the mean of all metrics and exhibits the least uncertainty across the 30 runs,
achieving the lowest standard deviations for RMSE and NMAE (as shown in the third
column of Tab. 5). Additionally, IDNNAL exhibits the narrowest interquartile range
for all metrics, reflecting its consistent performance across the 30 runs. These observa-
tions highlight the effectiveness of the Active Learning sampling method in providing

17
reliable and consistent results. The details of the forward model (M) performance on
the SBR benchmark can be found in App. C.3.

1.0
3.5
0.8 3.0

0.6 2.5

RMSE
R2

2.0
0.4
1.5

0.2 1.0

0.5
0.0
IDNNR IDNNLHS IDNNAL IDNNBC IDNNGFP IDNNR IDNNLHS IDNNAL IDNNBC IDNNGFP

(a) (b)

0.5

0.4
NMAE

0.3

0.2

IDNNR IDNNLHS IDNNAL IDNNBC IDNNGFP

(c)

Fig. 6: IDNN performance on SBR benchmark problem using different dataset gener-
ation methods: (a) R2 (higher is better), (b) RMSE (lower is better), and (c) NMAE
(lower is better). Subscripts in IDNN denote the sampling method (e.g., IDNNR for
random sampling).

18
Table 5: Statistical analysis of the Inverse DNN (IDNN )
performance on SBR benchmark problem using differ-
ent dataset generation methods. Bold values indicate best
performance in each metric. RMSE and NMAE values
should be as low as possible, while R2 should be as high
as possible.
Metric Method Mean Std Max Min
RMSE IDNNR 1.4125 0.7054 3.6755 0.5859
RMSE IDNNLHS 1.3290 0.5998 2.9147 0.6482
RMSE IDNNAL 0.9781 0.2176 1.5112 0.6016
RMSE IDNNBC 0.9054 0.2133 1.5668 0.6047
RMSE IDNNGFP 1.1048 0.6339 3.2841 0.5237
R2 IDNNR 0.7122 0.2880 0.9565 -0.4353
R2 IDNNLHS 0.7309 0.2726 0.9366 -0.1042
R2 IDNNAL 0.8601 0.0694 0.9367 0.6479
R2 IDNNBC 0.8852 0.0576 0.9513 0.6812
R2 IDNNGFP 0.7885 0.3206 0.9602 -0.7087
NMAE IDNNR 0.2951 0.0946 0.5554 0.1786
NMAE IDNNLHS 0.2726 0.0791 0.4739 0.1731
NMAE IDNNAL 0.2002 0.0275 0.2597 0.1575
NMAE IDNNBC 0.2009 0.0411 0.3239 0.1472
NMAE IDNNGFP 0.2230 0.0868 0.5094 0.1505

5 Conclusion
We introduced and investigated the AutoTandemML framework for inverse design
problems in science and engineering. AutoTandemML synergistically combines active
learning with TNNs to efficiently generate datasets for accurate inverse design solu-
tions. We evaluated the framework on three benchmarks—the airfoil inverse design,
photonic surfaces inverse design, and scalar boundary reconstruction—and demon-
strated excellent performance across all. Compared to other sampling algorithms, the
TNN trained with active learning outperformed others in two benchmarks and was
competitive in the third. Notably, AutoTandemML offers reliable performance with
low variability across repeated experiments, a significant advantage for inverse design
applications.
Future research could explore applying AutoTandemML to other inverse design
problems, enhancing the active learning component with more sophisticated uncer-
tainty quantification methods, and developing hybrid approaches that combine active
learning with best candidate sampling. Additionally, extending the TNN architecture
to other deep neural networks like Graph Neural Networks could enable the framework
to handle discrete and graph-structured datasets, opening new possibilities in areas
like molecular inverse design. Ultimately, a comprehensive scalability study should be
conducted to deepen our understanding of the problem dimensionality for which this
approach is most effective.

19
Acknowledgments
This work was supported by the Laboratory Directed Research and Development Pro-
gram of Lawrence Berkeley National Laboratory under U.S. Department of Energy
Contract No. DE-AC02-05CH11231. Müller’s time was supported under U.S. Depart-
ment of Energy Contract No. DE-AC36-08GO28308, U.S. Department of Energy Office
of Science, Office of Advanced Scientific Computing Research, Scientific Discovery
through Advanced Computing (SciDAC) program through the FASTMath Institute
to the National Renewable Energy Laboratory.

Author Contributions
L.G. wrote the manuscript, developed the methods, developed the code, designed the
numerical experiments, and analyzed the performance of the algorithms, J.M. and
W.A.J. supervised the research and edited the manuscript.

Declaration of Competing Interest


The authors declare that they have no known competing financial interests or personal
relationships that could have appeared to influence the work reported in this paper.

Data Availability
The AutoTandemML code needed to reproduce the study can be found on the follow-
ing repository: https://github.com/lukagrbcic/AutoTandemML. The inverse design
benchmark HF surrogates, and the train/test data can be found on the following
repository: https://github.com/lukagrbcic/InverseBench.

20
Appendix A Machine Learning Algorithms
Hyperparameters and Training
In this section we present the hyperparameter and training details of the forward and
inverse DNN models of the TNN. Moreover, we present all of the training details and
the hyperparameters of the algorithms used for training the active learning model M
and uncertainty quantification.

A.1 Forward DNN (FDNN ) and Inverse DNN (IDNN )


Both the forward DNN (FDNN ) and the inverse DNN (IDNN ) were configured with
identical hyperparameters and training settings, except for the loss functions used,
and the input and output layers reversed. Each network was implemented as a multi-
layer perceptron (MLP) comprising five hidden layers with neuron counts of 64, 128,
256, 128, and 64 neurons, respectively. The Rectified Linear Unit (ReLU) activation
function was employed in all hidden layers to introduce non-linearity into the models.
Training was conducted over 2,000 epochs with a batch size of 32 and a learning rate
of 0.001, ensuring efficient and stable convergence during optimization.
Both networks utilized the Adam optimizer for weight updates, a validation split of
10% to monitor performance on unseen data, and an early stopping mechanism with
a patience parameter of 10 epochs to prevent overfitting. The RMSE loss function for
both networks. All training procedures for the forward and inverse neural networks
were carried out using PyTorch 2.3.0 with Python 3.10 [34].
Furthermore, prior to training, we applied Min-Max scaling to both the input
features and output targets using the MinMaxScaler function from scikit-learn version
1.2.2 [35]. This preprocessing step normalized both the inputs and outputs to a range
between 0 and 1, which helped improve the training process by ensuring that all
variables contribute equally to the learning process.
For the inverse DNN training, we did not rescale the data anew. Instead, we reused
the Min-Max scalers from the forward DNN training, but applied them in reverse
order. Specifically, the scaler that was used for the outputs in the forward DNN was
applied to the inputs in the inverse DNN, and vice versa for the scaler used on the
inputs. This approach maintains consistency between the forward and inverse models
and ensures that the scaling corresponds appropriately to the data being modeled.

A.2 Random Forests


RFs ([4]) were employed as the model M during the active learning procedure for
the AID and PSID benchmark problems. RFs function by constructing an ensemble
of decision trees, each trained on a bootstrap sample of the original dataset. At each
node in a tree, a random subset of input features is selected to determine the best
split, introducing additional randomness that helps to reduce overfitting and enhance
model robustness. Each individual decision tree generates its own prediction, and these
predictions are then aggregated—by averaging in regression tasks—to produce the
final output. To perform uncertainty quantification with the RF model, we calculate
the standard deviation of the predictions from all individual trees in the ensemble.

21
This standard deviation serves as an estimate of the predictive uncertainty, reflecting
the variance among the trees’ predictions.
We used the RF implementation from scikit-learn version 1.2.2 in Python 3.10.
For our active learning model, all hyperparameters were set to their default values in
scikit-learn, except for the number of estimators (trees), which we increased to 150.
This configuration provided us with 150 individual tree predictions, enabling us to
calculate the standard deviation of these predictions for uncertainty quantification.

A.3 Deep Ensembles


DEs are an ensemble learning technique where multiple neural networks are trained
independently, each starting from different random initializations or using subsets of
the data. By averaging the predictions of these independently trained models, deep
ensembles can provide a natural way to estimate uncertainty [1]. DEs were utilized for
the active learning procedure in the SBR benchmark problem.
We constructed an ensemble of 10 neural network models, where each model is a
pipeline that first scales the input features using MinMaxScaler and then applies an
MLP with hidden layers consisting of 100, 200, and 100 neurons. To introduce diversity
among the models in the ensemble, each MLP was initialized with a different random
seed. The MinMax scaler and the MLP implemented in our active learning code were
from the scikit-learn 1.2.2 module. The MLP used default hyperparameters such as:
the activation function was set to ReLU, the optimizer was Adam, and the learning
rate was constant with an initial value of 0.001. The number of epochs was 200, with
an L2 regularization parameter alpha set as 0.0001.

Appendix B Benchmark Models and Data


In this section, we provide comprehensive details on the construction of the inverse
design benchmark datasets for all problems, as well as the training procedures and
accuracy of the HF surrogates (H). For each benchmark, the HF surrogates were used
to evaluate the sampled design points (either with active learning or with samplers
used for comparison) in order to form the TNN training datasets.

B.1 Airfoil Inverse Design


To address the airfoil inverse design (AID) problem, we construct a dataset to train
the HF surrogate H by varying key parameters of NACA 4-digit airfoils and flow
conditions. Specifically, we vary the maximum camber m, the position of maximum
camber p, the maximum thickness t of the airfoil, the Reynolds number Re of the
fluid flow, and the angle of attack α. Note that m, p, and t are normalized by the
chord length of the airfoil and thus take values between 0 and 1. The details on how
to compute the geometrical coordinates of the NACA 4-digit airfoil shapes are given
in the work by [22].
For each combination of these parameters, we obtain the pressure coefficient curve
Cp by running simulations with XFOIL 6.99 software. XFOIL is a computational tool
for the design and analysis of subsonic isolated airfoils, combining a panel method for

22
potential flow with an integral boundary layer formulation. Tab. B1 shows the lower
and upper boundaries of the design vector that contains the flow and shape parameters.

Table B1: Lower and upper boundaries of the AID benchmark


parameters used to generate the dataset.
Parameter Lower Bound Upper Bound
Maximum Camber, m 0.02 0.09
Position of Maximum Camber, p 0.2 0.7
Maximum Thickness, t 0.06 0.15
Reynolds Number, Re 4 × 106 6 × 106
Angle of Attack, α 0◦ 7◦

Moreover, the total number of simulations that formed the initial dataset for the
AID benchmark was 12,223. The dataset was divided into training, validation, and
testing sets. Specifically, 70% of the data was allocated to the training set, and the
remaining 30% was used as the testing set. Within the training set, 10% was reserved
as a validation set. Consequently, the final dataset comprised 63% for training, 7% for
validation, and 30% for testing. The Extreme Gradient Boosting (XGBoost) algorithm
by [6] was used to train the HF surrogate (H). The XGBoost algorithm was chosen as
it excels in modeling structured tabular data [38]. We used the xgboost 2.0.3 Python
module with default hyperparameters, except for the ones adjusted as shown in Tab.
B2.
Using the airfoil flow and shape parameters as inputs, the model was trained to
predict the pressure coefficient curves, Cp . Fig. B1 illustrates the overall performance
of the model on the testing dataset, which comprises 3,667 data instances. The model
achieved an RMSE of 0.022 and an R2 of 0.996, indicative of high predictive accuracy.
Furthermore, Fig. B1a presents the distribution of RMSE values for each prediction
compared to its corresponding test set curve. Fig. B1b displays the training and vali-
dation loss curves of the XGBoost model, demonstrating high stability and low error
throughout the training process. Finally, Fig. B1c shows an example of a predicted Cp
curve alongside the ground truth Cp curve from the test set, and it could be noticed
that both curves are hardly distinguishable.

Table B2: Hyperparameters of XGBoost algorithm used to


train the AID HF surrogate.
Hyperparameter Value
Objective Function (objective) reg:squarederror
Evaluation Metric (eval metric) rmse
Learning Rate (eta) 0.1
Maximum Tree Depth (max depth) 3
Subsample Ratio (subsample) 0.8
Column Subsample Ratio (colsample bytree) 0.8
L2 Regularization Term (reg lambda) 10
Number of Estimators (n estimators) 2000
Early Stopping Rounds (early stopping rounds) 10

23
Train
600 0.6 Validation

Predictions
0.4

RMSE
400

200 0.2

0 0.0
0.00 0.05 0.10 0.15 0 500 1000 1500 2000
RMSE Epochs

(a) (b)

0
Cp

−1

Ground truth
−2
Prediction, RMSE = 0.015

0.0 0.2 0.4 0.6 0.8 1.0


ζx

(c)

Fig. B1: Performance evaluation of the HF surrogate (H) on the AID benchmark:
(a) RMSE distribution between predicted and test set Cp curves (lower is better). (b)
Training and validation loss curves for H using XGBoost algorithm. (c) Representative
comparison between predicted and test set ground truth Cp curves with corresponding
RMSE value.

B.2 Photonic Surface Inverse Design


To train the HF surrogate (H) for the photonic surface inverse design problem, we uti-
lized experimentally obtained data from our prior work [13]. This data was generated
by varying the laser manufacturing parameters—Laser power (Lp ), Scanning speed
(Ss ), and Spacing of the textures (Sp )—to create photonic surfaces with Inconel alloy
as the base material. The experimental setup is thoroughly described in our previous
studies [13, 33]. The total experimental dataset was comprised of 11,759 instances,
72% of the instances were used for training, while 28% were used for testing of the

24
model. The lower and upper boundaries of these laser manufacturing parameters are
provided in Table B3.

Table B3: Lower and upper boundaries of the


PSID benchmark parameters used to generate the
dataset.
Parameter Lower Bound Upper Bound
Laser power, Lp 0.2 W 1.3 W
Scanning speed, Ss 10 mm/s 700 mm/s
Spacing, Sp 0.02 µm 28 µm

As outlined in our previous work [14], we employed the RF algorithm in combi-


nation with Principal Component Analysis (PCA) to train the HF surrogate (H) for
this benchmark. PCA was used to reduce the dimensionality of the 822-dimensional
spectral emissivity curves to 10 principal components, thereby enhancing computa-
tional efficiency. The laser manufacturing parameters served as inputs to the model,
while the PCA-compressed spectral emissivity curves ϵ were the outputs. In the com-
bined RF-PCA model, the RF predicted sets of 10 principal components, which were
then inversely transformed using the PCA model to reconstruct the original spectral
emissivity space.
We utilized the RF and PCA implementations from the scikit-learn 1.2.2 Python
module [35]. All hyperparameters of the RF model were set to their default val-
ues except for the number of trees (n estimators), which was set to 450, and the
maximum depth (max depth), which was set to 10.
The RF-PCA model achieved an RMSE of 0.0212 and an R2 of 0.977, indicating
high predictive accuracy. Fig. B2 illustrates the performance of the RF-PCA model in
detail. Specifically, Fig. B2a shows the distribution of RMSE values for all predicted
spectral emissivity curves (ϵ) when compared to the experimental test set curves. Fig.
B2b presents a juxtaposition of a predicted emissivity curve and its corresponding
experimental ground truth curve, with the RMSE value indicated in the legend.

25
1.0
Ground truth
400 Prediction, RMSE = 0.006
0.8

Predictions 300
0.6


200
0.4

100
0.2

0
0.00 0.05 0.10 0.15 0.0
RMSE 4 6 8 10 12
Wavelength (µm)
(a) (b)

Fig. B2: Performance evaluation of the HF surrogate (H) on the PSID benchmark:
(a) RMSE distribution between predicted and test set ϵ curves (lower is better). (b)
Representative comparison between predicted and test set ϵ curves with corresponding
RMSE value.

B.3 Scalar Boundary Reconstruction


To generate the dataset for the HF surrogate H, we employed the open-source compu-
tational fluid dynamics (CFD) toolbox OpenFOAM version 9 [18]. OpenFOAM utilizes
the finite volume method to numerically solve partial differential equations (PDEs)
that arise in fluid dynamics and heat transfer. Specifically, we used the laplacianFoam
solver provided by OpenFOAM, which is designed to solve the unsteady diffusion
equation. This solver allowed us to simulate the temporal and spatial evolution of the
diffusion process with high fidelity, providing accurate data for training and validating
our model. The unsteady diffusion equation is defined as:
∂c
= D∇2 c in Ω, t ∈ [0, tmax ] (B1)
∂t
where c is the dimensionless scalar value in the domain, D is the coefficient of
diffusion (defined as 1 m2 /s), t is the time (s), and tmax denotes the end time of the
simulation, and Ω is the 2D domain. The tmax is defined as 0.1 s and is treated as a
converged state. The scalar boundary values are a Dirichlet boundary condition type
along the top of the domain:

c = g(cBC , t) on ∂Ωtop , t ∈ [0, tmax ]. (B2)


The remaining parts of the 2D domain ∂Ωother (left, right, bottom) are the
Neumann boundary condition type:
∂c
=0 on ∂Ωother , t ∈ [0, tmax ], (B3)
∂n

26
where n is the unit normal vector pointing outward from the domain. The finite vol-
ume mesh used in the simulations consisted of a total of 400 cells, providing adequate
spatial resolution for capturing the diffusion process. Specifically, the top boundary
of the domain was discretized into 20 cells, corresponding to the 20 elements of the
boundary condition array cBC .
For each simulation, we randomly varied the values of the boundary condition
array cBC (along the top ∂Ωtop , ensuring that each component was within the range
of 0 to 30, (for cBC ∈ R20 ). This randomization introduced a diverse set of boundary
conditions, allowing the model to learn from a wide variety of scenarios and improving
its generalization capabilities. Upon completing each simulation, we obtained scalar
values from scattered measurement locations within the two-dimensional domain, as
illustrated in Fig. 2c. These collected values were compiled into the scalar measure-
ments vector c. Additional details on the construction of the random cBC generator
and the coordinates of the 2D scattered measurement locations are provided in our
previous work ([12]).
The generated dataset consisted of 10,000 instances. We allocated 70% of the data
to the training set and the remaining 30% to the testing set. Within the training set,
10% was reserved for validation purposes. Consequently, the final dataset comprised
63% for training, 7% for validation, and 30% for testing. We used the XGBoost algo-
rithm to train the HF surrogate (H) with the hyperparameters specified in Tab. B4.
All other hyperparameters were set to their default values as provided by the xgboost
2.0.3 Python module.
The XGBoost model achieved an overall RMSE of 0.270 and an R2 of 0.984, indi-
cating high performance. Fig. B3 shows the performance of the model. Specifically,
Fig. B3a presents the RMSE distribution of each predicted set of measurements (c)
when compared to the test set values. Furthermore, Fig. B3b displays the training and
validation curves of the XGBoost model, and Fig. B3c shows an example of the test
set ground truth values and the predicted set of measurements, with the RMSE value
provided in the legend.

Table B4: Hyperparameters of XGBoost algorithm used to


train the SBR HF surrogate.
Hyperparameter Value
Objective Function (objective) reg:squarederror
Evaluation Metric (eval metric) rmse
Learning Rate (eta) 0.1
Maximum Tree Depth (max depth) 3
L1 Regularization Term (reg alpha) 20
L2 Regularization Term (reg lambda) 50
Number of Estimators (n estimators) 3000
Early Stopping Rounds (early stopping rounds) 5

27
3.0 Train
250
Validation
2.5
200

Predictions
2.0

RMSE
150
1.5
100
1.0
50
0.5

0
0.2 0.4 0.6 0.8 0 200 400 600
RMSE Epochs

(a) (b)

30
Ground truth
25 Prediction, RMSE = 0.277

20

15
c

10

0
0 10 20 30
Measurement locations
(c)

Fig. B3: Performance evaluation of the HF surrogate (H) on the SBR benchmark:
(a) RMSE distribution between predicted and test set c measured values (lower is
better). (b) Training and validation loss curves for H using XGBoost algorithm. (c)
Representative comparison between predicted and test set c values with corresponding
RMSE value.

Appendix C Forward Model Benchmark Results


In this section, we present the results of the forward model M for all three inverse
design benchmarks. Specifically, we compare the performance of M when trained using
active learning with its performance when trained using other sampling methods (Ran-
dom, Latin Hybercube sampling, Best Candidate sampling, GreedyFP sampling). The
comparison results are presented through box plots and statistical summaries.

28
C.1 Forward Model Airfoil Inverse Design Results
In Fig. C4 and Tab. C5, we present the results for the AID benchmark. Across all
three performance metrics—R2 , RMSE, and NMAE—the active learning approach
(MAL ) consistently outperforms the other sampling methods. Specifically, Fig. C4a
displays the R2 score, where MAL achieves the highest value of R2 =0.84, as reported
in Tab. C5. Similarly, Fig. C4b shows the RMSE results, with MAL attaining the
lowest RMSE of 0.147. Finally, Fig. C4c illustrates the NMAE scores, where MAL
again achieves the lowest value of NMAE=0.065. These results, summarized in Tab.
C5, confirm the superior performance of the active learning approach over the other
samplers in the AID benchmark.

1.0

0.20
0.9

0.8 0.18

RMSE
R2

0.7 0.16

0.6 0.14

0.5
MR MLHS MAL MBC MGFP MR MLHS MAL MBC MGFP

(a) (b)

0.09

0.08
NMAE

0.07

0.06

MR MLHS MAL MBC MGFP

(c)

Fig. C4: Forward model (M) performance on AID benchmark problem using different
dataset generation methods: (a) R2 (higher is better), (b) RMSE (lower is better),
and (c) NMAE (lower is better). Subscripts in MR denote the sampling method (e.g.,
MR for random sampling).

29
Table C5: Statistical analysis of the forward model
(M) performance on AID benchmark problem using dif-
ferent dataset generation methods. Bold values indicate
best performance in each metric. RMSE and NMAE
values should be as low as possible, while R2 should be
as high as possible.
Metric Method Mean Std Max Min
RMSE MR 0.1679 0.0105 0.1994 0.1469
RMSE MLHS 0.1658 0.0087 0.1884 0.1461
RMSE MAL 0.1470 0.0115 0.1740 0.1287
RMSE MBC 0.1691 0.0112 0.2031 0.1455
RMSE MGFP 0.1696 0.0155 0.2122 0.1520
R2 MR 0.8040 0.0197 0.8360 0.7601
R2 MLHS 0.8158 0.0120 0.8405 0.7921
R2 MAL 0.8444 0.0330 0.8911 0.7080
R2 MBC 0.8052 0.0188 0.8406 0.7595
R2 MGFP 0.8060 0.0160 0.8435 0.7612
NMAE MR 0.0796 0.0032 0.0858 0.0710
NMAE MLHS 0.0798 0.0033 0.0854 0.0704
NMAE MAL 0.0655 0.0059 0.0834 0.0565
NMAE MBC 0.0814 0.0035 0.0885 0.0746
NMAE MGFP 0.0801 0.0045 0.0950 0.0738

30
C.2 Forward Model Photonic Surface Inverse Design Results
In Fig. C5 and Tab. C6, we present the results for the PSID benchmark. Across all
three performance metrics—R2 , RMSE, and NMAE—the active learning approach
(MAL ) consistently outperforms the other sampling methods. Specifically, Fig. C5a
displays the R2 score, where MAL achieves the highest value of R2 =0.92, as reported
in Tab. C6. Similarly, Fig. C5b shows the RMSE results, with MAL attaining the
lowest RMSE of 0.04. Finally, Fig. C5c illustrates the NMAE scores, where MAL
again achieves the lowest value of NMAE=0.275. These results, summarized in Tab.
C6, confirm the superior performance of the active learning approach over the other
samplers in the PSID benchmark.

1.000
0.0525
0.975
0.0500
0.950
0.0475
0.925

RMSE
0.0450
R2

0.900
0.0425
0.875
0.0400
0.850

0.825 0.0375

0.800 0.0350
MR MLHS MAL MBC MGFP MR MLHS MAL MBC MGFP

(a) (b)

0.50

0.45

0.40
NMAE

0.35

0.30

0.25

MR MLHS MAL MBC MGFP

(c)

Fig. C5: Forward model (M) performance on PSID benchmark problem using differ-
ent dataset generation methods: (a) R2 (higher is better), (b) RMSE (lower is better),
and (c) NMAE (lower is better). Subscripts in MR denote the sampling method (e.g.,
MR for random sampling).

31
Table C6: Statistical analysis of the forward model
(M) performance on PSID benchmark problem using
different dataset generation methods. Bold values indi-
cate best performance in each metric. RMSE and
NMAE values should be as low as possible, while R2
should be as high as possible.
Metric Method Mean Std Max Min
RMSE MR 0.0451 0.0026 0.0531 0.0413
RMSE MLHS 0.0454 0.0021 0.0505 0.0414
RMSE MAL 0.0402 0.0020 0.0445 0.0354
RMSE MBC 0.0444 0.0021 0.0492 0.0413
RMSE MGFP 0.0443 0.0020 0.0512 0.0407
R2 MR 0.8944 0.0147 0.9144 0.8475
R2 MLHS 0.8937 0.0107 0.9128 0.8664
R2 MAL 0.9251 0.0067 0.9401 0.9089
R2 MBC 0.8996 0.0102 0.9140 0.8781
R2 MGFP 0.8998 0.0097 0.9151 0.8667
NMAE MR 0.3856 0.0332 0.4667 0.3167
NMAE MLHS 0.4014 0.0384 0.5024 0.3376
NMAE MAL 0.2758 0.0172 0.3110 0.2367
NMAE MBC 0.3973 0.0341 0.4723 0.3273
NMAE MGFP 0.3961 0.0356 0.4729 0.3320

32
C.3 Forward Model Scalar Field Reconstruction Results
In Fig. C6 and Tab. C7, we present the results for the SBR benchmark. The perfor-
mance of MAL differs slightly from that in the AID and PSID benchmarks. In the
SBR benchmark, the best-performing algorithm for the forward model is MBC ; how-
ever, MAL comes as a close second in the R2 and RMSE metrics. Specifically, Fig.
C6a displays the R2 scores, where MBC achieves the highest value of R2 =0.97, as
reported in Tab. C7, while MAL obtains a score of 0.96. Similarly, Fig. C6b shows the
RMSE results, with MBC attaining the lowest RMSE of 0.40, while MAL achieves an
RMSE of 0.41. However, Fig. C6c illustrates the NMAE scores, where MAL and MBC
both achieve the lowest value of NMAE=0.09. These results, summarized in Tab. C7,
demonstrate that MAL can also perform competitively.

33
1.00 0.65

0.98 0.60

0.55
0.96

RMSE
R2
0.50
0.94
0.45
0.92
0.40

0.90
MR MLHS MAL MBC MGFP MR MLHS MAL MBC MGFP

(a) (b)

0.13

0.12
NMAE

0.11

0.10

0.09

MR MLHS MAL MBC MGFP

(c)

Fig. C6: Forward model (M) performance on SBR benchmark problem using different
dataset generation methods: (a) R2 (higher is better), (b) RMSE (lower is better),
and (c) NMAE (lower is better). Subscripts in MR denote the sampling method (e.g.,
MR for random sampling).

34
Table C7: Statistical analysis of the forward model
(M) performance on SBR benchmark problem using
different dataset generation methods. Bold values indi-
cate best performance in each metric. RMSE and
NMAE values should be as low as possible, while R2
should be as high as possible.
Metric Method Mean Std Max Min
RMSE MR 0.5741 0.0259 0.6194 0.5138
RMSE MLHS 0.5793 0.0321 0.6423 0.5217
RMSE MAL 0.4143 0.0069 0.4317 0.4035
RMSE MBC 0.4047 0.0158 0.4393 0.3739
RMSE MGFP 0.4358 0.0239 0.4817 0.3898
R2 MR 0.9565 0.0028 0.9631 0.9525
R2 MLHS 0.9555 0.0040 0.9624 0.9477
R2 MAL 0.9684 0.0009 0.9698 0.9656
R2 MBC 0.9718 0.0016 0.9748 0.9689
R2 MGFP 0.9695 0.0024 0.9736 0.9656
NMAE MR 0.1230 0.0053 0.1309 0.1100
NMAE MLHS 0.1245 0.0057 0.1350 0.1133
NMAE MAL 0.0903 0.0020 0.0946 0.0866
NMAE MBC 0.0906 0.0035 0.0984 0.0827
NMAE MGFP 0.0981 0.0046 0.1078 0.0897

References
[1] Abdar M, Pourpanah F, Hussain S, et al (2021) A review of uncertainty quan-
tification in deep learning: Techniques, applications and challenges. Information
fusion 76:243–297

[2] Anand A, Marepally K, Muneeb Safdar M, et al (2024) A novel approach to


inverse design of wind turbine airfoils using tandem neural networks. Wind Energy

[3] Borchani H, Varando G, Bielza C, et al (2015) A survey on multi-output regres-


sion. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
5(5):216–233

[4] Breiman L (2001) Random forests. Machine learning 45:5–32

[5] Chen J, Dai Z, Yang Z, et al (2021) An improved tandem neural network architec-
ture for inverse modeling of multicomponent reactive transport in porous media.
Water Resources Research 57(12):e2021WR030595

[6] Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. In: Pro-
ceedings of the 22nd acm sigkdd international conference on knowledge discovery
and data mining, pp 785–794

[7] Chen W, Li R, Huang Z, et al (2023) Inverse design of polarization conversion


metasurfaces by deep neural networks. Applied Optics 62(8):2048–2054

35
[8] Eldar Y, Lindenbaum M, Porat M, et al (1997) The farthest point strategy for
progressive image sampling. IEEE transactions on image processing 6(9):1305–
1315

[9] Frising M, Bravo-Abad J, Prins F (2023) Tackling multimodal device distributions


in inverse photonic design using invertible neural networks. Machine Learning:
Science and Technology 4(2):02LT02

[10] Glaws A, King RN, Vijayakumar G, et al (2022) Invertible neural networks for
airfoil design. AIAA journal 60(5):3035–3047

[11] Gonzalez TF (1985) Clustering to minimize the maximum intercluster distance.


Theoretical computer science 38:293–306

[12] Grbcic L, Müller J, de Jong WA (2024) Efficient inverse design optimization


through multi-fidelity simulations, machine learning, and boundary refinement
strategies. Engineering with Computers pp 1–28

[13] Grbcic L, Park M, Elzouka M, et al (2024) Inverse design of photonic sur-


faces on inconel via multi-fidelity machine learning ensemble framework and high
throughput femtosecond laser processing. arXiv preprint arXiv:240601471

[14] Grbcic L, Park M, Müller J, et al (2025) Artificial intelligence driven laser param-
eter search: Inverse design of photonic surfaces using greedy surrogate-based
optimization. Engineering Applications of Artificial Intelligence 143:109971

[15] He X, Cui X, Chan CT (2023) Constrained tandem neural network assisted inverse
design of metasurfaces for microwave absorption. Optics Express 31(24):40969–
40979

[16] Head S, Keshavarz Hedayati M (2022) Inverse design of distributed bragg


reflectors using deep learning. Applied Sciences 12(10):4877

[17] Ivic S, Druzeta S, Grbcic L (2024) Indago v0.5.0. PyPI, URL https://pypi.org/
project/Indago/

[18] Jasak H, Jemcov A, Tukovic Z, et al (2007) Openfoam: A c++ library for complex
physics simulations. In: International workshop on coupled methods in numerical
dynamics, Dubrovnik, Croatia), pp 1–20

[19] Kamath C (2022) Intelligent sampling for surrogate modeling, hyperparameter


optimization, and data analysis. Machine Learning with Applications 9:100373

[20] Kossale Y, Airaj M, Darouichi A (2022) Mode collapse in generative adversarial


networks: An overview. In: 2022 8th International Conference on Optimization
and Applications (ICOA), IEEE, pp 1–6

36
[21] Kudyshev ZA, Kildishev AV, Shalaev VM, et al (2020) Machine-learning-assisted
metasurface design for high-efficiency thermal emitter optimization. Applied
Physics Reviews 7(2)

[22] Ladson CL, Brooks Jr CW, Hill AS, et al (1996) Computer program to obtain
ordinates for naca airfoils. Tech. rep.

[23] Lei R, Bai J, Wang H, et al (2021) Deep learning based multistage method
for inverse design of supercritical airfoil. Aerospace Science and Technology
119:107101

[24] Li J, Du X, Martins JR (2022) Machine learning in aerodynamic shape optimiza-


tion. Progress in Aerospace Sciences 134:100849

[25] Liu E, Tan C, Gui L, et al (2023) Efficient design of structural parameters and
materials of plasmonic fano-resonant metasurfaces by a tandem neural network.
In: AOPC 2022: Optoelectronics and Nanophotonics, SPIE, pp 69–75

[26] Liu P, Zhao Y, Li N, et al (2024) Deep neural networks with adaptive solution
space for inverse design of multilayer deep-etched grating. Optics and Lasers in
Engineering 174:107933

[27] Ma H, Li EP, Wang Y, et al (2022) Channel inverse design using tandem neural
network. In: 2022 IEEE 26th Workshop on Signal and Power Integrity (SPI),
IEEE, pp 1–3

[28] Meng F, Ding J, Zhao Y, et al (2023) Artificial intelligence designer for optical
fibers: Inverse design of a hollow-core anti-resonant fiber based on a tandem neural
network. Results in Physics 46:106310

[29] Mitchell DP (1991) Spectrally optimal sampling for distribution ray tracing. In:
Proceedings of the 18th annual conference on Computer graphics and interactive
techniques, pp 157–164

[30] Molesky S, Lin Z, Piggott AY, et al (2018) Inverse design in nanophotonics.


Nature Photonics 12(11):659–670

[31] Nielsen D, Lee J, Nam YW (2022) Design of composite double-slab radar absorb-
ing structures using forward, inverse, and tandem neural networks. American
Society for Composites Technical Conference

[32] Noureen S, Syed IH, Ijaz S, et al (2023) Physics-driven tandem inverse design
neural network for efficient optimization of uv–vis meta-devices. Applied Surface
Science Advances 18:100503

[33] Park M, Grbčić L, Motameni P, et al (2024) Inverse design of photonic surfaces


via high throughput femtosecond laser processing and tandem neural networks.

37
Advanced Science p 2401951

[34] Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-


performance deep learning library. Advances in neural information processing
systems 32

[35] Pedregosa F, Varoquaux G, Gramfort A, et al (2011) Scikit-learn: Machine


learning in python. the Journal of machine Learning research 12:2825–2830

[36] Qiu C, Wu X, Luo Z, et al (2021) Simultaneous inverse design continuous and dis-
crete parameters of nanophotonic structures via back-propagation inverse neural
network. Optics Communications 483:126641

[37] Settles B (2011) From theories to queries: Active learning in practice. In: Active
learning and experimental design workshop in conjunction with AISTATS 2010,
JMLR Workshop and Conference Proceedings, pp 1–18

[38] Shwartz-Ziv R, Armon A (2022) Tabular data: Deep learning is not all you need.
Information Fusion 81:84–90

[39] Surjanovic S, Welch WJ (2019) Adaptive partitioning design and analysis for
emulation of a complex computer code. arXiv preprint arXiv:190701181

[40] Swe SK, Noh H (2024) Inverse design of reflectionless thin-film multilayers with
optical absorption utilizing tandem neural network. In: Photonics, MDPI, p 964

[41] Tang Y, Kojima K, Koike-Akino T, et al (2020) Generative deep learning model


for inverse design of integrated nanophotonic devices. Laser & Photonics Reviews
14(12):2000287

[42] Wang L, Dong J, Zhang W, et al (2024) Inverse design for laser-compatible


infrared camouflage metasurface enabled by physics-driven neural network and
genetic algorithm. Optical Materials p 115639

[43] Xie C, Li H, Cui C, et al (2023) Deep learning assisted inverse design of


metamaterial microwave absorber. Applied Physics Letters 123(18)

[44] Xu P, Lou J, Li C, et al (2024) Inverse design of a metasurface based on a deep


tandem neural network. JOSA B 41(2):A1–A5

[45] Xu X, Sun C, Li Y, et al (2021) An improved tandem neural network for the


inverse design of nanophotonics devices. Optics Communications 481:126513

[46] Yeung C, Tsai JM, King B, et al (2021) Multiplexed supercell metasurface design
and optimization with tandem residual networks. Nanophotonics 10(3):1133–1143

[47] Yeung C, Tsai JM, King B, et al (2021) Designing multiplexed supercell


metasurfaces with tandem neural networks. Nanophotonics 10:1133–1143

38
[48] Yilmaz E, German B (2020) Conditional generative adversarial network frame-
work for airfoil inverse design. In: AIAA aviation 2020 forum, p 3185

[49] Yonekura K, Wada K, Suzuki K (2022) Generating various airfoils with required
lift coefficients by combining naca and joukowski airfoils using conditional varia-
tional autoencoders. Engineering Applications of Artificial Intelligence 108:104560

[50] Yuan X, Gu L, Wei Z, et al (2024) Bootstrap sampling style ensemble neu-


ral network for inverse design of optical nanoantennas. Optics Communications
557:130296

39

You might also like