AutoTandemML: Active Learning Enhanced Tandem Neural Networks for Inverse Design Problems
AutoTandemML: Active Learning Enhanced Tandem Neural Networks for Inverse Design Problems
Abstract
Inverse design in science and engineering involves determining optimal design
parameters that achieve desired performance outcomes, a process often hindered
by the complexity and high dimensionality of design spaces, leading to significant
computational costs. To tackle this challenge, we propose a novel hybrid approach
that combines active learning with Tandem Neural Networks to enhance the effi-
ciency and effectiveness of solving inverse design problems. Active learning allows
to selectively sample the most informative data points, reducing the required
dataset size without compromising accuracy. We investigate this approach using
three benchmark problems: airfoil inverse design, photonic surface inverse design,
and scalar boundary condition reconstruction in diffusion partial differential
equations. We demonstrate that integrating active learning with Tandem Neural
Networks outperforms standard approaches across the benchmark suite, achieving
better accuracy with fewer training samples.
Keywords: inverse design, tandem neural networks, active learning, machine learning
1
1 Introduction
Inverse design problems are inherently complex because they require determining
the necessary inputs or configurations to achieve a specific desired outcome, often
within systems governed by non-linear, multi-dimensional relationships. These chal-
lenges are prevalent across various engineering disciplines—for instance, designing an
airfoil shape to meet certain aerodynamic criteria [24], or designing photonic materi-
als based on target optical properties [14, 30]. Such problems are frequently ill-posed,
lacking unique solutions and being sensitive to data imperfections, which worsens their
complexity. Consequently, there is a critical need for efficient computational tools and
advanced methods—such as optimization algorithms and machine learning—to solve
inverse design problems effectively, enabling engineers to innovate and refine designs
by working backwards from desired performance specifications.
Deep Neural Networks (DNN) have been very successful as inverse design models in
science and engineering [23, 24, 30]. Generative Adversarial Networks (GANs) [48, 49],
Variational Autoencoders (VAEs) [21, 41], and Invertible Neural Networks (INNs)
[9, 10] are some of the most prominent DNN-based inverse design approaches. However,
both GANs and VAEs have drawbacks related to training difficulty, instability, and
limitations in capturing the full diversity of possible designs [20], while INNs introduce
complex neural architecture components in order to be effective, and thus are harder
to train.
Tandem Neural Networks (TNNs) offer advantages over VAEs, GANs, and INNs
in inverse design tasks by enabling direct optimization of input parameters to achieve
desired outputs, leading to more stable and straightforward training. They avoid issues
like training instability, mode collapse, and difficulty capturing multi-modal outputs
that commonly affect VAEs and GANs, resulting in more accurate, and interpretable
designs that better meet specific target properties. Furthermore, TNNs do not require
the complex neural architectures of INNs, making them more effective and efficient
for exploring complex design spaces in inverse design applications.
TNNs have emerged as a powerful tool for solving inverse design problems across
various fields, enabling the efficient optimization of complex systems by mapping
desired outputs back to input parameters. In the realm of photonics and nanopho-
tonics, they have been extensively employed to design metasurfaces and nanophotonic
devices with tailored optical properties. Studies have demonstrated the use of TNNs
for optimizing metasurfaces for specific functionalities such as polarization conversion,
absorption, and color filtering [7, 15, 25, 32, 42–44, 46, 47].
In addition to metasurfaces, TNNs have been applied to the inverse design of other
nanophotonic structures, including optical nanoantennas and multilayer thin films
[16, 26, 36, 40, 45, 50]. These studies showcase the versatility of TNNs in handling
both continuous and discrete design parameters, facilitating the creation of devices
with customized optical responses and advancing the capabilities in nanofabrication
and materials science.
Beyond photonics, TNNs have found applications in diverse areas such as porous
media transport, radar engineering, wind turbine airfoil design, electronic integrated
systems, and optical fiber design. In porous media transport, TNNs have been uti-
lized to model multicomponent reactive transport processes, enhancing the accuracy
2
and efficiency of simulations [5]. In radar engineering, they have facilitated the
design of composite radar-absorbing structures by enabling inverse design capabilities
[31]. Similarly, in wind turbine airfoil design, TNNs have been employed to achieve
desired aerodynamic properties by inversely mapping performance metrics to geomet-
ric parameters [2]. In electronic integrated systems, TNNs have assisted in channel
inverse design [27]. In the field of optical fibers, they have been applied to the inverse
design of hollow-core anti-resonant fibers [28].
Despite the demonstrated efficacy of TNNs in specific applications, there remains a
lack of comprehensive research evaluating their performance across a broad spectrum
of inverse design problems. Most existing studies focus on tailored solutions for indi-
vidual challenges, which limits the understanding of TNNs as a general-purpose tool.
In particular, the impact of training data characteristics on the efficacy of TNNs has
not been thoroughly investigated. Addressing this gap, the novelty and contributions
of our work are as follows:
1. We develop a hybrid framework, AutoTandemML, that combines sampling data
generation by active learning with TNNs specifically for inverse design problems.
2. We investigate whether datasets optimized for accurately predicting forward
relationships also perform well in capturing inverse relationships with the TNNs.
3. We compare AutoTandemML with TNNs trained on data generated by other
sampling algorithms—namely Random, Latin Hypercube, Best Candidate, and
GreedyFP—across three inverse design benchmark problems [19].
4. We introduce an inverse design benchmark suite comprising three significant prob-
lems in science and engineering: Airfoil Inverse Design (AID), Photonic Surfaces
Inverse Design (PSID), and Scalar Boundary Reconstruction (SBR) of a scalar dif-
fusion partial differential equation (PDE). We make these benchmarks available to
the research community through a public data repository.
5. We develop the AutoTandemML Python module available for everyone—an easy-
to-use software tool for the automated generation of TNNs.
The remainder of this article is organized as follows. Sec. 2 introduces the
AutoTandemML framework, including descriptions of inverse design models, and each
component that forms the framework—specifically, active learning for multi-output
regression and TNNs. Sec. 3 describes the inverse design benchmarks and the setup
of all numerical experiments. We outline the sampling methods used for compari-
son, as well as the accuracy metrics and inverse design validation methods employed
in our study. In Sec. 4, we present the performance results of AutoTandemML and
compare it with other sampling methods used to train the TNNs across all inverse
design benchmark problems. Finally, App. A, B, and C provide additional details on
the hyperparameters used for all machine learning models, specifics of the inverse
design benchmarks, and supplementary results that enhance the interpretation of
AutoTandemML’s performance, respectively. This organizational structure aims to
guide the reader through the development, implementation, and evaluation of the
AutoTandemML framework in a logical and coherent manner.
3
2 AutoTandemML Framework
In this section, we mathematically define inverse design models, the active learning
algorithm for multi-output regression and describe the components of the TNN. Fur-
thermore, we explain how active learning and TNNs can be integrated into a single
framework for efficiently solving inverse design problems.
the target vector specified a priori. The function f : Rd → RN represents the forward
model that maps design parameters to their resulting properties. In inverse design we
seek to invert this function to find the design parameters that produce the desired
target outcomes.
Inverse design problems are typically ill-posed, meaning that multiple values of
x can yield similar values of y. This lack of a unique solution necessitates the use
of computational methods and machine learning models to approximate the inverse
mapping or to identify suitable design parameters through optimization techniques.
σ1 (x∗i ; M)
σ2 (x∗i ; M)
σ(x∗i ; M) = ∈ Rd (3)
..
.
σd (x∗i ; M)
4
The algorithm proceeds by iteratively selecting the most uncertain inputs, query-
ing the High fidelity (HF) surrogate H at these points, and updating the model
accordingly. The detailed steps are outlined in Alg. 1.
More specifically, the algorithm begins with an initial dataset D of size n0 , where
each design vector xi is evaluated using the HF surrogate H to obtain f (xi ). The
model M is initially trained on this dataset. In each iteration, while the total number
of evaluations is less than the pre-defined evaluation budget Nmax , the algorithm
performs an optimization step to find k input points x∗i that maximize the total
predictive uncertainty u(x) of M, as defined in Eq. 2, each subsequent optimization is
started with a new random seed. The k points form the batch B. Each design vector
in B is evaluated using the HF surrogate H to obtain new outputs, and the dataset D
is updated accordingly. The model M is retrained on this expanded dataset before the
next iteration begins, thereby incrementally improving its performance by focusing on
areas where it is most uncertain.
d
X
x∗i = arg max u(x) = arg max σj (x; M)
x x
j=1
7: end for
8: B = {x∗(1) , x∗(2) , . . . , x∗(k) }
9: for i = 1 to k do
10: Evaluate HF surrogate: f (xi ) = H(xi )
11: Update dataset: D ← D ∪ {(xi , f (xi ))}
12: end for
13: Retrain model M on updated dataset D
14: end while
The input design vector (x) and the HF surrogate H both vary depending on the
specific benchmark problem under consideration. The specifics of the design vectors
and responses for each benchmark are provided in Sec. 3.1, while detailed descrip-
tions of the HF surrogates H are given in App. B. Details on the computational
implementation of the multi-output regression active learning algorithm—including
5
the hyperparameters used, the selection of the model M, and the optimizer employed
to determine the optimal design vectors (x∗i )—can be found in Sec. 3.2.
Minimizing this loss adjusts the weights of IDNN so that the predicted input generates
an output through FDNN that closely matches the desired output f (x).
This tandem training approach ensures that the inverse neural network IDNN not
only seeks inputs that reproduce the desired outputs but also aligns with the learned
mapping of the forward neural network FDNN . By incorporating FDNN into the loss
function, we effectively regularize IDNN , promoting reliable inverse solutions. This
method is particularly advantageous in scenarios where the inverse problem is ill-posed
or where multiple inputs can lead to the same output, as it leverages the forward
model’s representation to guide the inverse predictions toward feasible and meaningful
solutions.
The training details and the hyperparameters of both FDNN and the IDNN are
presented in App. A.1.
6
(3) (2)
Design points Points x of
Ground truth
maximum
model (G) x
uncertainty
Active
Ev
M
mo )
f(x
f(x ons
alu
Learning Loop
del
)
x,
ati
in
Tra
Update dataset (x,
f(x))
(1)
(a)
Forward Deep Neural Network FDNN(x) f(x) Inverse Deep Neural Network IDNN(f(x))
f(x)
x
x
Compute Loss
L(f(x), FDNN(x))
(b) (c)
7
3.1 Inverse Design Benchmarks
In order to assess the performance of the active learning enhanced TNN, we develop a
benchmark suite of three inverse design problems. The outline of the three problems
is presented in Fig. 2.
The first problem is the Airfoil Inverse Design (AID), where the goal is to deter-
mine both the flow and the design parameters of an airfoil based on its pressure
coefficient distribution. In Fig. 2a the specifics of the AID problem are visualized. The
dimensionless pressure coefficient curves are denoted as Cp , while the flow and the
geometrical parameters include the Reynolds number (Re), the Angle of Attack (α),
the maximum camber distance (m), position of the maximum camber (p), and the
maximum thickness of the airfoil (t).
The second problem is the Photonic Surface Inverse Design (PSID), which aims to
obtain laser manufacturing parameters corresponding to a desired spectral emissivity
curve of the photonic surface (a metamaterial), visualized in Fig. 2b. The photonic
surfaces are created by texturing the surface of the plain material (in our case the
alloy Inconel) with ultra-fast lasers. The spectral emissivity curves are defined as ϵ,
and the laser manufacturing parameters are laser power (Lp ) (W), scanning speed (Ss )
(mm/s), and the spacing of the textures on the surface of the metamaterial (Sp ) (µm).
The third problem is the Scalar Boundary Reconstruction (SBR), where the objec-
tive is to recover the boundary conditions (cBC ) of a scalar diffusion PDE using
scattered measurements of the scalar field (c) within a two-dimensional domain, as
shown in Fig. 2c. The scalar diffusion PDE models phenomena such as heat transfer
or contaminant diffusion in a medium. The boundary conditions cBC represent the
values of the scalar field (e.g., temperature or concentration) along the top boundary
of the domain, which are unknown and need to be determined. The scattered mea-
surements c are obtained at 30 interior points within the domain and provide partial
information about the scalar field’s distribution. These measurements are strategically
placed to capture the behavior of the field within the domain. Both cBC and c are
dimensionless quantities, normalized to facilitate computational analysis.
A mathematical summary of the benchmarks—including the size of the design
spaces and formal definitions of the design vectors and objectives—is provided in Table
1. Comprehensive details of all benchmarks, such as the mathematical and numerical
background, simulation software used, experimental setup and datasets, as well as the
training and validation details of the HF surrogates H, are given in Appendix B.
8
(a)
(b)
(c)
Fig. 2: Inverse design benchmark problems: (a) Airfoil inverse design (AID). (b)
Photonic surfaces inverse design (PSID). (c) Scalar boundary reconstruction (SBR).
9
3.2 Numerical Experiments Setup
In this section, we outline the setup of the numerical experiments conducted. Specifi-
cally, we detail the hyperparameters of the active learning process, and the sampling
algorithms employed for comparison with the active learning approach.
In the active learning procedure, we employed two different algorithms to efficiently
train the forward model M: the Random Forest (RF) algorithm and an ensemble of
DNNs, referred to as Deep Ensembles (DE). Both algorithms were chosen because
they provide uncertainty quantification capabilities essential for active learning. Imple-
mentation details and hyperparameters for both algorithms are provided in App. A.
We initially used only the RF algorithm to explore their performance across different
benchmark problems. However, we found that it failed to accurately model the for-
ward relationship in the SBR benchmark problem. Consequently, we selected the DE
algorithm for the SBR benchmark, as outlined in Tab. 2.
Furthermore, the maximum number of samples generated by the active learning
procedure varied per benchmark as outlined in Tab. 2. The number of maximum
samples corresponds to the number of times the HF surrogate H is queried for a
response in order to generate a dataset and train the TNN components FDNN and
IDNN . For all benchmarks, the active learning process was initialized with n0 = 20
samples generated using the Latin Hypercube Sampling (LHS) algorithm, and a batch
size k = 5 was used in each iteration of the process. To determine the optimal design
vectors (x∗i ) (defined in Alg. 1) we utilize the Particle Swarm Optimization (PSO)
algorithm (implemented in the Indago 0.5.1 Python module for optimization), where
the maximum number of evaluations is set to 100, while all other hyperparameters of
the PSO are set as the default recommended variables of the module ([17]).
10
with each iteration. Both sampling algorithms were implemented as outlined in the
work by [19].
For each benchmark problem, the number of samples generated by these samplers
is equal to the maximum number of samples specified in Tab. 2 to ensure a fair
comparison with the active learning approach. The generated samples were evaluated
using the HF surrogate H to form the training dataset. To account for randomness,
the active learning procedure, all of the sampling algorithms used to generate the
datasets, and the training of the TNN components were each repeated 30 times. The
results from these repetitions were then statistically analyzed.
p max |y − ŷij |
1 X 1≤i≤n ij
NMAE = (7)
p j=1 max |yij − ȳj |
1≤i≤n
By employing these metrics, we can comprehensively evaluate the performance
of our models across all output variables, ensuring that both average performance
(through RMSE and R2 ) and worst-case scenarios (through NMAE) are adequately
assessed.
11
are interested in the inverse relationship, we use the output values Ty from the test
dataset as inputs to the IDNN . The IDNN then produces predictions of the original
inputs, which we denote as PIDN N .
Next, we feed these predicted inputs PIDN N into the HF surrogate H to generate
reconstructed outputs, denoted as Py . Essentially, Py are the outputs that the model
H would produce given the predicted inputs from the IDNN . We then compare these
reconstructed outputs Py with the actual output values Ty from the test dataset
using the accuracy metrics. This comparison allows us to evaluate how well the IDNN ,
in conjunction with the HF surrogate, can reproduce the original outputs, thereby
assessing the IDNN model’s performance.
In all inverse design benchmarks, we utilized a test dataset (Tx , Ty ) consisting of
1,000 randomly selected instances (also unseen by the HF surrogate H). The IDNN
was trained using the maximum number of samples specified in Tab. 2, which was
significantly fewer than 1,000 in each case. Consequently, the size of the training set
was much smaller than that of the test set, resulting in a train/test size ratio heavily
skewed towards the test set. This imbalance highlights the challenge for IDNN to
generalize effectively from a limited amount of training data to a larger, diverse test
set. Further details of the test dataset for each inverse design benchmark problem are
available in App. B.
(1)
Test Data
Inputs Tx
Outputs Ty (2)
Inverse Deep
Test data Ty Neural Network Predictions PIDNN
(3)
IDNN(Ty)
HF Surrogate
H(PIDNN)
(4)
Comparison
Evaluations Py (Ty, Py)
Fig. 3: Inverse design validation procedure aiming to assess the accuracy of the trained
IDNN using the test data from each inverse design benchmark problem. In step (1), we
define the test dataset comprising input-output pairs (Tx , Ty ). In step (2), we utilize
the output values Ty as inputs to the inverse model IDNN to obtain the predicted
inputs PIDNN . In step (3), we reconstruct the outputs Py by feeding the predicted
inputs PIDNN into the HF surrogate H. Finally, in step (4), we compare the original
output values Ty from the test dataset with the reconstructed outputs Py to evaluate
the inverse model’s accuracy using the specified metrics.
12
4 AutoTandemML Framework Results
In this section, we present the results of the AutoTandemML framework applied to
all three inverse design benchmark problems. To assess the performance of the active
learning approach, we also compare these results with those obtained when the dataset
used to train the TNN is generated using other sampling methods (R, LHS, GFP, and
BC).
13
1.000 0.5
0.975
0.950 0.4
0.925
RMSE
0.3
R2
0.900
0.875
0.2
0.850
0.825 0.1
0.800
IDNNR IDNNLHS IDNNAL IDNNBC IDNNGFP IDNNR IDNNLHS IDNNAL IDNNBC IDNNGFP
(a) (b)
0.14
0.12
0.10
NMAE
0.08
0.06
0.04
(c)
Fig. 4: Inverse DNN (IDNN ) performance on AID benchmark problem using different
dataset generation methods: (a) R2 (higher is better), (b) RMSE (lower is better), and
(c) NMAE (lower is better). Subscripts in IDNN denote the sampling method (e.g.,
IDNNR for random sampling).
14
Table 3: Statistical analysis of the IDNN performance on
AID benchmark problem using different dataset genera-
tion methods. Bold values indicate best performance in
each metric. RMSE and NMAE values should be as low
as possible, while R2 should be as high as possible.
Metric Method Mean Std Max Min
RMSE IDNNR 0.1261 0.0539 0.3770 0.0717
RMSE IDNNLHS 0.1341 0.0605 0.4194 0.0770
RMSE IDNNAL 0.1086 0.0239 0.1529 0.0693
RMSE IDNNBC 0.1313 0.0752 0.4993 0.0724
RMSE IDNNGFP 0.1244 0.0341 0.2247 0.0724
R2 IDNNR 0.9174 0.0915 0.9725 0.4592
R2 IDNNLHS 0.8878 0.2439 0.9702 -0.4185
R2 IDNNAL 0.9319 0.0254 0.9742 0.8719
R2 IDNNBC 0.8526 0.4169 0.9756 -1.3843
R2 IDNNGFP 0.9274 0.0386 0.9683 0.7920
NMAE IDNNR 0.0569 0.0150 0.1254 0.0380
NMAE IDNNLHS 0.0589 0.0142 0.1261 0.0426
NMAE IDNNAL 0.0459 0.0067 0.0623 0.0343
NMAE IDNNBC 0.0604 0.0180 0.1436 0.0411
NMAE IDNNGFP 0.0574 0.0102 0.0869 0.0439
15
Moreover, IDNNAL performs the best when considering the mean of all metrics
and exhibits the least uncertainty across the 30 runs, achieving the lowest standard
deviation for all metrics (third column of Tab. 4). The details of the forward model
(M) performance on the PSID benchmark can be found in App. C.2.
1.0 0.18
0.16
0.8
0.14
0.6 0.12
RMSE
R2
0.10
0.4
0.08
0.2 0.06
0.04
0.0
IDNNR IDNNLHS IDNNAL IDNNBC IDNNGFP IDNNR IDNNLHS IDNNAL IDNNBC IDNNGFP
(a) (b)
0.9
0.8
0.7
NMAE
0.6
0.5
0.4
0.3
0.2
IDNNR IDNNLHS IDNNAL IDNNBC IDNNGFP
(c)
Fig. 5: IDNN performance on PSID benchmark problem using different dataset gen-
eration methods: (a) R2 (higher is better), (b) RMSE (lower is better), and (c) NMAE
(lower is better). Subscripts in IDNN denote the sampling method (e.g., IDNNR for
random sampling).
16
Table 4: Statistical analysis of the IDNN performance on
PSID benchmark problem using different dataset gener-
ation methods. Bold values indicate best performance in
each metric. RMSE and NMAE values should be as low
as possible, while R2 should be as high as possible.
Metric Method Mean Std Max Min
RMSE IDNNR 0.0820 0.0426 0.1740 0.0381
RMSE IDNNLHS 0.0838 0.0339 0.1552 0.0369
RMSE IDNNAL 0.0560 0.0092 0.0750 0.0365
RMSE IDNNBC 0.0806 0.0380 0.1632 0.0362
RMSE IDNNGFP 0.0873 0.0386 0.1652 0.0413
R2 IDNNR 0.5747 0.4141 0.9214 -0.3864
R2 IDNNLHS 0.5898 0.3144 0.9309 -0.1665
R2 IDNNAL 0.8224 0.0562 0.9299 0.7042
R2 IDNNBC 0.6094 0.3455 0.9315 -0.1932
R2 IDNNGFP 0.5589 0.3519 0.9093 -0.2343
NMAE IDNNR 0.4701 0.2136 0.9209 0.2085
NMAE IDNNLHS 0.5000 0.1902 0.8957 0.2069
NMAE IDNNAL 0.3244 0.0500 0.4236 0.2197
NMAE IDNNBC 0.4823 0.1945 0.8957 0.2280
NMAE IDNNGFP 0.5136 0.2117 0.8957 0.2663
17
reliable and consistent results. The details of the forward model (M) performance on
the SBR benchmark can be found in App. C.3.
1.0
3.5
0.8 3.0
0.6 2.5
RMSE
R2
2.0
0.4
1.5
0.2 1.0
0.5
0.0
IDNNR IDNNLHS IDNNAL IDNNBC IDNNGFP IDNNR IDNNLHS IDNNAL IDNNBC IDNNGFP
(a) (b)
0.5
0.4
NMAE
0.3
0.2
(c)
Fig. 6: IDNN performance on SBR benchmark problem using different dataset gener-
ation methods: (a) R2 (higher is better), (b) RMSE (lower is better), and (c) NMAE
(lower is better). Subscripts in IDNN denote the sampling method (e.g., IDNNR for
random sampling).
18
Table 5: Statistical analysis of the Inverse DNN (IDNN )
performance on SBR benchmark problem using differ-
ent dataset generation methods. Bold values indicate best
performance in each metric. RMSE and NMAE values
should be as low as possible, while R2 should be as high
as possible.
Metric Method Mean Std Max Min
RMSE IDNNR 1.4125 0.7054 3.6755 0.5859
RMSE IDNNLHS 1.3290 0.5998 2.9147 0.6482
RMSE IDNNAL 0.9781 0.2176 1.5112 0.6016
RMSE IDNNBC 0.9054 0.2133 1.5668 0.6047
RMSE IDNNGFP 1.1048 0.6339 3.2841 0.5237
R2 IDNNR 0.7122 0.2880 0.9565 -0.4353
R2 IDNNLHS 0.7309 0.2726 0.9366 -0.1042
R2 IDNNAL 0.8601 0.0694 0.9367 0.6479
R2 IDNNBC 0.8852 0.0576 0.9513 0.6812
R2 IDNNGFP 0.7885 0.3206 0.9602 -0.7087
NMAE IDNNR 0.2951 0.0946 0.5554 0.1786
NMAE IDNNLHS 0.2726 0.0791 0.4739 0.1731
NMAE IDNNAL 0.2002 0.0275 0.2597 0.1575
NMAE IDNNBC 0.2009 0.0411 0.3239 0.1472
NMAE IDNNGFP 0.2230 0.0868 0.5094 0.1505
5 Conclusion
We introduced and investigated the AutoTandemML framework for inverse design
problems in science and engineering. AutoTandemML synergistically combines active
learning with TNNs to efficiently generate datasets for accurate inverse design solu-
tions. We evaluated the framework on three benchmarks—the airfoil inverse design,
photonic surfaces inverse design, and scalar boundary reconstruction—and demon-
strated excellent performance across all. Compared to other sampling algorithms, the
TNN trained with active learning outperformed others in two benchmarks and was
competitive in the third. Notably, AutoTandemML offers reliable performance with
low variability across repeated experiments, a significant advantage for inverse design
applications.
Future research could explore applying AutoTandemML to other inverse design
problems, enhancing the active learning component with more sophisticated uncer-
tainty quantification methods, and developing hybrid approaches that combine active
learning with best candidate sampling. Additionally, extending the TNN architecture
to other deep neural networks like Graph Neural Networks could enable the framework
to handle discrete and graph-structured datasets, opening new possibilities in areas
like molecular inverse design. Ultimately, a comprehensive scalability study should be
conducted to deepen our understanding of the problem dimensionality for which this
approach is most effective.
19
Acknowledgments
This work was supported by the Laboratory Directed Research and Development Pro-
gram of Lawrence Berkeley National Laboratory under U.S. Department of Energy
Contract No. DE-AC02-05CH11231. Müller’s time was supported under U.S. Depart-
ment of Energy Contract No. DE-AC36-08GO28308, U.S. Department of Energy Office
of Science, Office of Advanced Scientific Computing Research, Scientific Discovery
through Advanced Computing (SciDAC) program through the FASTMath Institute
to the National Renewable Energy Laboratory.
Author Contributions
L.G. wrote the manuscript, developed the methods, developed the code, designed the
numerical experiments, and analyzed the performance of the algorithms, J.M. and
W.A.J. supervised the research and edited the manuscript.
Data Availability
The AutoTandemML code needed to reproduce the study can be found on the follow-
ing repository: https://github.com/lukagrbcic/AutoTandemML. The inverse design
benchmark HF surrogates, and the train/test data can be found on the following
repository: https://github.com/lukagrbcic/InverseBench.
20
Appendix A Machine Learning Algorithms
Hyperparameters and Training
In this section we present the hyperparameter and training details of the forward and
inverse DNN models of the TNN. Moreover, we present all of the training details and
the hyperparameters of the algorithms used for training the active learning model M
and uncertainty quantification.
21
This standard deviation serves as an estimate of the predictive uncertainty, reflecting
the variance among the trees’ predictions.
We used the RF implementation from scikit-learn version 1.2.2 in Python 3.10.
For our active learning model, all hyperparameters were set to their default values in
scikit-learn, except for the number of estimators (trees), which we increased to 150.
This configuration provided us with 150 individual tree predictions, enabling us to
calculate the standard deviation of these predictions for uncertainty quantification.
22
potential flow with an integral boundary layer formulation. Tab. B1 shows the lower
and upper boundaries of the design vector that contains the flow and shape parameters.
Moreover, the total number of simulations that formed the initial dataset for the
AID benchmark was 12,223. The dataset was divided into training, validation, and
testing sets. Specifically, 70% of the data was allocated to the training set, and the
remaining 30% was used as the testing set. Within the training set, 10% was reserved
as a validation set. Consequently, the final dataset comprised 63% for training, 7% for
validation, and 30% for testing. The Extreme Gradient Boosting (XGBoost) algorithm
by [6] was used to train the HF surrogate (H). The XGBoost algorithm was chosen as
it excels in modeling structured tabular data [38]. We used the xgboost 2.0.3 Python
module with default hyperparameters, except for the ones adjusted as shown in Tab.
B2.
Using the airfoil flow and shape parameters as inputs, the model was trained to
predict the pressure coefficient curves, Cp . Fig. B1 illustrates the overall performance
of the model on the testing dataset, which comprises 3,667 data instances. The model
achieved an RMSE of 0.022 and an R2 of 0.996, indicative of high predictive accuracy.
Furthermore, Fig. B1a presents the distribution of RMSE values for each prediction
compared to its corresponding test set curve. Fig. B1b displays the training and vali-
dation loss curves of the XGBoost model, demonstrating high stability and low error
throughout the training process. Finally, Fig. B1c shows an example of a predicted Cp
curve alongside the ground truth Cp curve from the test set, and it could be noticed
that both curves are hardly distinguishable.
23
Train
600 0.6 Validation
Predictions
0.4
RMSE
400
200 0.2
0 0.0
0.00 0.05 0.10 0.15 0 500 1000 1500 2000
RMSE Epochs
(a) (b)
0
Cp
−1
Ground truth
−2
Prediction, RMSE = 0.015
(c)
Fig. B1: Performance evaluation of the HF surrogate (H) on the AID benchmark:
(a) RMSE distribution between predicted and test set Cp curves (lower is better). (b)
Training and validation loss curves for H using XGBoost algorithm. (c) Representative
comparison between predicted and test set ground truth Cp curves with corresponding
RMSE value.
24
model. The lower and upper boundaries of these laser manufacturing parameters are
provided in Table B3.
25
1.0
Ground truth
400 Prediction, RMSE = 0.006
0.8
Predictions 300
0.6
200
0.4
100
0.2
0
0.00 0.05 0.10 0.15 0.0
RMSE 4 6 8 10 12
Wavelength (µm)
(a) (b)
Fig. B2: Performance evaluation of the HF surrogate (H) on the PSID benchmark:
(a) RMSE distribution between predicted and test set ϵ curves (lower is better). (b)
Representative comparison between predicted and test set ϵ curves with corresponding
RMSE value.
26
where n is the unit normal vector pointing outward from the domain. The finite vol-
ume mesh used in the simulations consisted of a total of 400 cells, providing adequate
spatial resolution for capturing the diffusion process. Specifically, the top boundary
of the domain was discretized into 20 cells, corresponding to the 20 elements of the
boundary condition array cBC .
For each simulation, we randomly varied the values of the boundary condition
array cBC (along the top ∂Ωtop , ensuring that each component was within the range
of 0 to 30, (for cBC ∈ R20 ). This randomization introduced a diverse set of boundary
conditions, allowing the model to learn from a wide variety of scenarios and improving
its generalization capabilities. Upon completing each simulation, we obtained scalar
values from scattered measurement locations within the two-dimensional domain, as
illustrated in Fig. 2c. These collected values were compiled into the scalar measure-
ments vector c. Additional details on the construction of the random cBC generator
and the coordinates of the 2D scattered measurement locations are provided in our
previous work ([12]).
The generated dataset consisted of 10,000 instances. We allocated 70% of the data
to the training set and the remaining 30% to the testing set. Within the training set,
10% was reserved for validation purposes. Consequently, the final dataset comprised
63% for training, 7% for validation, and 30% for testing. We used the XGBoost algo-
rithm to train the HF surrogate (H) with the hyperparameters specified in Tab. B4.
All other hyperparameters were set to their default values as provided by the xgboost
2.0.3 Python module.
The XGBoost model achieved an overall RMSE of 0.270 and an R2 of 0.984, indi-
cating high performance. Fig. B3 shows the performance of the model. Specifically,
Fig. B3a presents the RMSE distribution of each predicted set of measurements (c)
when compared to the test set values. Furthermore, Fig. B3b displays the training and
validation curves of the XGBoost model, and Fig. B3c shows an example of the test
set ground truth values and the predicted set of measurements, with the RMSE value
provided in the legend.
27
3.0 Train
250
Validation
2.5
200
Predictions
2.0
RMSE
150
1.5
100
1.0
50
0.5
0
0.2 0.4 0.6 0.8 0 200 400 600
RMSE Epochs
(a) (b)
30
Ground truth
25 Prediction, RMSE = 0.277
20
15
c
10
0
0 10 20 30
Measurement locations
(c)
Fig. B3: Performance evaluation of the HF surrogate (H) on the SBR benchmark:
(a) RMSE distribution between predicted and test set c measured values (lower is
better). (b) Training and validation loss curves for H using XGBoost algorithm. (c)
Representative comparison between predicted and test set c values with corresponding
RMSE value.
28
C.1 Forward Model Airfoil Inverse Design Results
In Fig. C4 and Tab. C5, we present the results for the AID benchmark. Across all
three performance metrics—R2 , RMSE, and NMAE—the active learning approach
(MAL ) consistently outperforms the other sampling methods. Specifically, Fig. C4a
displays the R2 score, where MAL achieves the highest value of R2 =0.84, as reported
in Tab. C5. Similarly, Fig. C4b shows the RMSE results, with MAL attaining the
lowest RMSE of 0.147. Finally, Fig. C4c illustrates the NMAE scores, where MAL
again achieves the lowest value of NMAE=0.065. These results, summarized in Tab.
C5, confirm the superior performance of the active learning approach over the other
samplers in the AID benchmark.
1.0
0.20
0.9
0.8 0.18
RMSE
R2
0.7 0.16
0.6 0.14
0.5
MR MLHS MAL MBC MGFP MR MLHS MAL MBC MGFP
(a) (b)
0.09
0.08
NMAE
0.07
0.06
(c)
Fig. C4: Forward model (M) performance on AID benchmark problem using different
dataset generation methods: (a) R2 (higher is better), (b) RMSE (lower is better),
and (c) NMAE (lower is better). Subscripts in MR denote the sampling method (e.g.,
MR for random sampling).
29
Table C5: Statistical analysis of the forward model
(M) performance on AID benchmark problem using dif-
ferent dataset generation methods. Bold values indicate
best performance in each metric. RMSE and NMAE
values should be as low as possible, while R2 should be
as high as possible.
Metric Method Mean Std Max Min
RMSE MR 0.1679 0.0105 0.1994 0.1469
RMSE MLHS 0.1658 0.0087 0.1884 0.1461
RMSE MAL 0.1470 0.0115 0.1740 0.1287
RMSE MBC 0.1691 0.0112 0.2031 0.1455
RMSE MGFP 0.1696 0.0155 0.2122 0.1520
R2 MR 0.8040 0.0197 0.8360 0.7601
R2 MLHS 0.8158 0.0120 0.8405 0.7921
R2 MAL 0.8444 0.0330 0.8911 0.7080
R2 MBC 0.8052 0.0188 0.8406 0.7595
R2 MGFP 0.8060 0.0160 0.8435 0.7612
NMAE MR 0.0796 0.0032 0.0858 0.0710
NMAE MLHS 0.0798 0.0033 0.0854 0.0704
NMAE MAL 0.0655 0.0059 0.0834 0.0565
NMAE MBC 0.0814 0.0035 0.0885 0.0746
NMAE MGFP 0.0801 0.0045 0.0950 0.0738
30
C.2 Forward Model Photonic Surface Inverse Design Results
In Fig. C5 and Tab. C6, we present the results for the PSID benchmark. Across all
three performance metrics—R2 , RMSE, and NMAE—the active learning approach
(MAL ) consistently outperforms the other sampling methods. Specifically, Fig. C5a
displays the R2 score, where MAL achieves the highest value of R2 =0.92, as reported
in Tab. C6. Similarly, Fig. C5b shows the RMSE results, with MAL attaining the
lowest RMSE of 0.04. Finally, Fig. C5c illustrates the NMAE scores, where MAL
again achieves the lowest value of NMAE=0.275. These results, summarized in Tab.
C6, confirm the superior performance of the active learning approach over the other
samplers in the PSID benchmark.
1.000
0.0525
0.975
0.0500
0.950
0.0475
0.925
RMSE
0.0450
R2
0.900
0.0425
0.875
0.0400
0.850
0.825 0.0375
0.800 0.0350
MR MLHS MAL MBC MGFP MR MLHS MAL MBC MGFP
(a) (b)
0.50
0.45
0.40
NMAE
0.35
0.30
0.25
(c)
Fig. C5: Forward model (M) performance on PSID benchmark problem using differ-
ent dataset generation methods: (a) R2 (higher is better), (b) RMSE (lower is better),
and (c) NMAE (lower is better). Subscripts in MR denote the sampling method (e.g.,
MR for random sampling).
31
Table C6: Statistical analysis of the forward model
(M) performance on PSID benchmark problem using
different dataset generation methods. Bold values indi-
cate best performance in each metric. RMSE and
NMAE values should be as low as possible, while R2
should be as high as possible.
Metric Method Mean Std Max Min
RMSE MR 0.0451 0.0026 0.0531 0.0413
RMSE MLHS 0.0454 0.0021 0.0505 0.0414
RMSE MAL 0.0402 0.0020 0.0445 0.0354
RMSE MBC 0.0444 0.0021 0.0492 0.0413
RMSE MGFP 0.0443 0.0020 0.0512 0.0407
R2 MR 0.8944 0.0147 0.9144 0.8475
R2 MLHS 0.8937 0.0107 0.9128 0.8664
R2 MAL 0.9251 0.0067 0.9401 0.9089
R2 MBC 0.8996 0.0102 0.9140 0.8781
R2 MGFP 0.8998 0.0097 0.9151 0.8667
NMAE MR 0.3856 0.0332 0.4667 0.3167
NMAE MLHS 0.4014 0.0384 0.5024 0.3376
NMAE MAL 0.2758 0.0172 0.3110 0.2367
NMAE MBC 0.3973 0.0341 0.4723 0.3273
NMAE MGFP 0.3961 0.0356 0.4729 0.3320
32
C.3 Forward Model Scalar Field Reconstruction Results
In Fig. C6 and Tab. C7, we present the results for the SBR benchmark. The perfor-
mance of MAL differs slightly from that in the AID and PSID benchmarks. In the
SBR benchmark, the best-performing algorithm for the forward model is MBC ; how-
ever, MAL comes as a close second in the R2 and RMSE metrics. Specifically, Fig.
C6a displays the R2 scores, where MBC achieves the highest value of R2 =0.97, as
reported in Tab. C7, while MAL obtains a score of 0.96. Similarly, Fig. C6b shows the
RMSE results, with MBC attaining the lowest RMSE of 0.40, while MAL achieves an
RMSE of 0.41. However, Fig. C6c illustrates the NMAE scores, where MAL and MBC
both achieve the lowest value of NMAE=0.09. These results, summarized in Tab. C7,
demonstrate that MAL can also perform competitively.
33
1.00 0.65
0.98 0.60
0.55
0.96
RMSE
R2
0.50
0.94
0.45
0.92
0.40
0.90
MR MLHS MAL MBC MGFP MR MLHS MAL MBC MGFP
(a) (b)
0.13
0.12
NMAE
0.11
0.10
0.09
(c)
Fig. C6: Forward model (M) performance on SBR benchmark problem using different
dataset generation methods: (a) R2 (higher is better), (b) RMSE (lower is better),
and (c) NMAE (lower is better). Subscripts in MR denote the sampling method (e.g.,
MR for random sampling).
34
Table C7: Statistical analysis of the forward model
(M) performance on SBR benchmark problem using
different dataset generation methods. Bold values indi-
cate best performance in each metric. RMSE and
NMAE values should be as low as possible, while R2
should be as high as possible.
Metric Method Mean Std Max Min
RMSE MR 0.5741 0.0259 0.6194 0.5138
RMSE MLHS 0.5793 0.0321 0.6423 0.5217
RMSE MAL 0.4143 0.0069 0.4317 0.4035
RMSE MBC 0.4047 0.0158 0.4393 0.3739
RMSE MGFP 0.4358 0.0239 0.4817 0.3898
R2 MR 0.9565 0.0028 0.9631 0.9525
R2 MLHS 0.9555 0.0040 0.9624 0.9477
R2 MAL 0.9684 0.0009 0.9698 0.9656
R2 MBC 0.9718 0.0016 0.9748 0.9689
R2 MGFP 0.9695 0.0024 0.9736 0.9656
NMAE MR 0.1230 0.0053 0.1309 0.1100
NMAE MLHS 0.1245 0.0057 0.1350 0.1133
NMAE MAL 0.0903 0.0020 0.0946 0.0866
NMAE MBC 0.0906 0.0035 0.0984 0.0827
NMAE MGFP 0.0981 0.0046 0.1078 0.0897
References
[1] Abdar M, Pourpanah F, Hussain S, et al (2021) A review of uncertainty quan-
tification in deep learning: Techniques, applications and challenges. Information
fusion 76:243–297
[5] Chen J, Dai Z, Yang Z, et al (2021) An improved tandem neural network architec-
ture for inverse modeling of multicomponent reactive transport in porous media.
Water Resources Research 57(12):e2021WR030595
[6] Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. In: Pro-
ceedings of the 22nd acm sigkdd international conference on knowledge discovery
and data mining, pp 785–794
35
[8] Eldar Y, Lindenbaum M, Porat M, et al (1997) The farthest point strategy for
progressive image sampling. IEEE transactions on image processing 6(9):1305–
1315
[10] Glaws A, King RN, Vijayakumar G, et al (2022) Invertible neural networks for
airfoil design. AIAA journal 60(5):3035–3047
[14] Grbcic L, Park M, Müller J, et al (2025) Artificial intelligence driven laser param-
eter search: Inverse design of photonic surfaces using greedy surrogate-based
optimization. Engineering Applications of Artificial Intelligence 143:109971
[15] He X, Cui X, Chan CT (2023) Constrained tandem neural network assisted inverse
design of metasurfaces for microwave absorption. Optics Express 31(24):40969–
40979
[17] Ivic S, Druzeta S, Grbcic L (2024) Indago v0.5.0. PyPI, URL https://pypi.org/
project/Indago/
[18] Jasak H, Jemcov A, Tukovic Z, et al (2007) Openfoam: A c++ library for complex
physics simulations. In: International workshop on coupled methods in numerical
dynamics, Dubrovnik, Croatia), pp 1–20
36
[21] Kudyshev ZA, Kildishev AV, Shalaev VM, et al (2020) Machine-learning-assisted
metasurface design for high-efficiency thermal emitter optimization. Applied
Physics Reviews 7(2)
[22] Ladson CL, Brooks Jr CW, Hill AS, et al (1996) Computer program to obtain
ordinates for naca airfoils. Tech. rep.
[23] Lei R, Bai J, Wang H, et al (2021) Deep learning based multistage method
for inverse design of supercritical airfoil. Aerospace Science and Technology
119:107101
[25] Liu E, Tan C, Gui L, et al (2023) Efficient design of structural parameters and
materials of plasmonic fano-resonant metasurfaces by a tandem neural network.
In: AOPC 2022: Optoelectronics and Nanophotonics, SPIE, pp 69–75
[26] Liu P, Zhao Y, Li N, et al (2024) Deep neural networks with adaptive solution
space for inverse design of multilayer deep-etched grating. Optics and Lasers in
Engineering 174:107933
[27] Ma H, Li EP, Wang Y, et al (2022) Channel inverse design using tandem neural
network. In: 2022 IEEE 26th Workshop on Signal and Power Integrity (SPI),
IEEE, pp 1–3
[28] Meng F, Ding J, Zhao Y, et al (2023) Artificial intelligence designer for optical
fibers: Inverse design of a hollow-core anti-resonant fiber based on a tandem neural
network. Results in Physics 46:106310
[29] Mitchell DP (1991) Spectrally optimal sampling for distribution ray tracing. In:
Proceedings of the 18th annual conference on Computer graphics and interactive
techniques, pp 157–164
[31] Nielsen D, Lee J, Nam YW (2022) Design of composite double-slab radar absorb-
ing structures using forward, inverse, and tandem neural networks. American
Society for Composites Technical Conference
[32] Noureen S, Syed IH, Ijaz S, et al (2023) Physics-driven tandem inverse design
neural network for efficient optimization of uv–vis meta-devices. Applied Surface
Science Advances 18:100503
37
Advanced Science p 2401951
[36] Qiu C, Wu X, Luo Z, et al (2021) Simultaneous inverse design continuous and dis-
crete parameters of nanophotonic structures via back-propagation inverse neural
network. Optics Communications 483:126641
[37] Settles B (2011) From theories to queries: Active learning in practice. In: Active
learning and experimental design workshop in conjunction with AISTATS 2010,
JMLR Workshop and Conference Proceedings, pp 1–18
[38] Shwartz-Ziv R, Armon A (2022) Tabular data: Deep learning is not all you need.
Information Fusion 81:84–90
[39] Surjanovic S, Welch WJ (2019) Adaptive partitioning design and analysis for
emulation of a complex computer code. arXiv preprint arXiv:190701181
[40] Swe SK, Noh H (2024) Inverse design of reflectionless thin-film multilayers with
optical absorption utilizing tandem neural network. In: Photonics, MDPI, p 964
[46] Yeung C, Tsai JM, King B, et al (2021) Multiplexed supercell metasurface design
and optimization with tandem residual networks. Nanophotonics 10(3):1133–1143
38
[48] Yilmaz E, German B (2020) Conditional generative adversarial network frame-
work for airfoil inverse design. In: AIAA aviation 2020 forum, p 3185
[49] Yonekura K, Wada K, Suzuki K (2022) Generating various airfoils with required
lift coefficients by combining naca and joukowski airfoils using conditional varia-
tional autoencoders. Engineering Applications of Artificial Intelligence 108:104560
39