Bayesian and surroagte
Bayesian and surroagte
com/npjcompumats
ARTICLE OPEN
Bayesian optimization (BO) is an indispensable tool to optimize objective functions that either do not have known functional forms
or are expensive to evaluate. Currently, optimal experimental design is always conducted within the workflow of BO leading to
more efficient exploration of the design space compared to traditional strategies. This can have a significant impact on modern
scientific discovery, in particular autonomous materials discovery, which can be viewed as an optimization problem aimed at
looking for the maximum (or minimum) point for the desired materials properties. The performance of BO-based experimental
design depends not only on the adopted acquisition function but also on the surrogate models that help to approximate
underlying objective functions. In this paper, we propose a fully autonomous experimental design framework that uses more
adaptive and flexible Bayesian surrogate models in a BO procedure, namely Bayesian multivariate adaptive regression splines and
Bayesian additive regression trees. They can overcome the weaknesses of widely used Gaussian process-based methods when
faced with relatively high-dimensional design space or non-smooth patterns of objective functions. Both simulation studies and
1234567890():,;
real-world materials science case studies demonstrate their enhanced search efficiency and robustness.
npj Computational Materials (2021)7:194 ; https://doi.org/10.1038/s41524-021-00662-x
1
Department of Statistics, Texas A&M University, College Station, TX, USA. 2Department of Mechanical Engineering, Texas A&M University, College Station, TX, USA. 3Department
of Electrical & Computer Engineering, Texas A&M University, College Station, TX, USA. 4Department of Computer Science & Engineering, Texas A&M University, College Station,
TX, USA. 5Department of Materials Science & Engineering, Texas A&M University, College Station, TX, USA. ✉email: [email protected]
Published in partnership with the Shanghai Institute of Ceramics of the Chinese Academy of Sciences
B. Lei et al.
2
2
Fig. 1 Plots of black-box functions. a The valley of a two-dimensional Rosenbrock function which has the formula y ¼ 100ðx2 x 21 Þ þ ðx1 1Þ2 .
b The P frequent and regularly distributed local minima of a two-dimensional Rastrigin function which has the formula
1234567890():,;
quantification, which makes it broadly applicable to many still not perform well when the dimension of predictors is
problems12,13,24. relatively high or the choice of the kernel is not suitable for the
Oftentimes, stationary or isotropic GP-based BO may be unknown function10,12. Apart from these solutions, there is a broad
challenging when faced with (even moderately) high- literature on flexible nonstationary covariance kernels28. Deep
dimensional design spaces, particularly when very little initial network kernel is a prominent recent example29 while its strength
information about the design space is available—in some fields of may be limited when faced with sparse datasets.
science and engineering where BO is being used14,25,26, data The focus of this paper is to replace GP-based machine learning
sparsity is, in fact, the norm, rather than the exception. In MD models with other, potentially more adaptive and flexible,
problems, data sparsity is exacerbated by the (apparent) high Bayesian models. More specifically, we explore Bayesian spline-
dimensionality of the design space, as a priori, it is possible that based models and Bayesian ensemble-learning methods as
many controllable features could be responsible for the materials’ surrogate models in a BO setting. Bayesian multivariate adaptive
behavior/property of interest. In practice, however, the potentially regression splines (BMARS)30,31 and Bayesian additive regression
high-dimensional MD space may actually be reduced, as in trees (BART)32 are used in this paper as they can potentially be
materials science it is often the case that a small subset of all superior alternatives to GP-based surrogates, particularly when the
available degrees of freedom is actually controlling the materials’ objective function, f, requires more flexible models. BMARS is a
behavior of interest. Searching over a large dimensional space flexible nonparametric approach based on product spline basis
when only a small subspace is of interest may be highly functions. BART belongs to Bayesian ensemble-learning-based
computationally inefficient. A challenge is then how to discover methods and fits unknown patterns through a sum of small trees.
the dominant degrees of freedom when very little data is available
Both of them are well equipped with automatic feature selection
and no proper feature selection can be carried out at the outset of
techniques.
the discovery process. The problem may become more complex
In this article, we present a fully automated experimental design
due to the existence of interaction effects among the covariates
framework that adopts BART and BMARS as the surrogate models
since such interactions are extremely challenging to discover
when the available data is very sparse. used to predict the outcome(s) of yet-to-be-made observations/
We note that there are some more flexible GP-based models, queries of/to the expensive “black-box” function. The surrogates
like automatic relevance detection (ARD),27 which introduces a are used to evaluate the acquisition policy within the context of
different scale parameter for each input variable inside the BO. Automated algorithm-based experimental design is a growing
covariance function to facilitate removal of unimportant variables technology used in many fields such as materials informatics, and
and may alleviate the problem. Recently, Talapatra et al.14 biosystems design22,33,34. It combines the principles of specific
proposed a robust model for f, based on Gaussian mixtures and domains with the use of machine learning to accelerate scientific
Bayesian model averaging, as a strategy to deal with the data discovery. We compare the performance of this BO approach
dimension and sparsity problem. Their framework was capable of using non-GP surrogate models against other GP-based BO
detecting subspaces most correlated with optimization objectives methods using standard analytic functions, and then present
by evaluating the Bayesian evidence of competing feature results in which the framework has been applied to realistic
subsets. However, their covariance functions, and in general, materials science discovery problems. We then discuss the
most commonly used covariance functions for GP usually induce possible underlying reasons for the remarkable improvements in
smoothness property and assume continuity for f, which may not performance associated with using more flexible surrogate
necessarily be warranted and limit its performance when f is non- models, particularly when the (unknown) objective function is
smooth or has sudden transitions—this may be a common very complex and does not follow the underlying assumptions
occurrence in many MD challenges. Also, GP-based methods may motivating the use of GPs as surrogates.
npj Computational Materials (2021) 194 Published in partnership with the Shanghai Institute of Ceramics of the Chinese Academy of Sciences
B. Lei et al.
3
(a)N = 10, Rosenblock (c)N = 10, Rastrigin
model model
80
150 BART BART
Minimum y observed
Minimum y observed
BMA1 60 BMA1
100
BMA2 BMA2
BMARS 40 BMARS
GP (RBK) GP (RBK)
50
GP (RBK ARD) 20 GP (RBK ARD)
GP (Dot) GP (Dot)
0 0
GP (DKNet) GP (DKNet)
20 40 60 80 20 40 60 80
Iteration Iteration
Minimum y observed
BMA1 BMA1
100
BMA2 BMA2
40
BMARS BMARS
50 GP (RBK) GP (RBK)
20
GP (RBK ARD) GP (RBK ARD)
GP (Dot) GP (Dot)
0 0
GP (DKNet) GP (DKNet)
20 40 60 80 20 40 60 80
Iteration Iteration
Fig. 2 The average minimum y observed based on each model in each iteration. a Rosenbrock function with the initial set of sample size
N = 10. b Rosenbrock function with the initial set of sample size N = 20. c Rastrigin function with the initial set of sample size N = 10.
d Rastrigin function with the initial set of sample size N = 20.
Published in partnership with the Shanghai Institute of Ceramics of the Chinese Academy of Sciences npj Computational Materials (2021) 194
B. Lei et al.
4
Table 1. The mean value and interquartile range (IQR) of the number of experiments based on each model to find the maximum bulk modulus K in
MAX phases with the initial set of sample size N ∈ {2, 5, 10, 15, 20}.
Turning to the Rastrigin function36,40, it is a nonconvex function MAX phase composition palette has so far been explored to a
used to measure the performance of optimization workflows. The limited degree, so there is also significant potential to reveal
formula for a d-dimensional Rastrigin function reads as follows: promising chemistries with optimal property sets14,42,43. For these
reasons, we compared different algorithms for searching among
X
d
f ðxÞ ¼ 10d þ ½x2i 10 cosð2πxi Þ: (2) the Mn + 1AXn phases, where M refers to a transition metal, A refers
i¼1
to group IV and VA elements, and X corresponds to carbon or
nitrogen.
It is based on a quadratic function with an addition of cosine Specifically, the materials design space for this work consists of
modulation which brings about frequent and regularly distributed the conventional MAX phases M2AX and M3AX2, where M ∈ {Sc, Ti,
local minima as depicted in Fig. 1b. Similar to Rosenbrock’s case, V, Cr, Zr, Nb, Mo, Hf, Ti}, A ∈ {Al, Si, P, S, Ga, Ge, As, Cd, In, Sn, Tl, Pd},
the search space is continuous and we focus on [−2, 2] for each and X ∈ {C, N}. The space is discrete which includes 403 stable
direction. Thus, the test function is highly multimodal, making it a MAX phases in total, aligned with Talapatra et al.14. More
challenging task where algorithms easily get stuck in local minima. discussion about the discrete space in BO can be found in
The global minimum point is x* = (0, …, 0) and f(x*) = 0. Supplementary Note 4. The goal of the automated algorithm is to
For the simulated data, we set d = 10 and again we add five provide a fast exploration of the material space, namely to find the
uninformative features following a standard normal distribution. most appropriate material design, which is either (i) the maximum
With these five additional variables (or design degrees of bulk modulus K or (ii) the minimum shear modulus G. The results
freedom), we can assess whether these frameworks are capable in the following sections are obtained with the aim (i), while those
of detecting the factors that are truly correlated with the objective for (ii) can be found in Supplementary Note 2. We point out that
function, enabling an efficient exploration of the design space. while the material design space is small, knowledge of the ground
As seen in Fig. 2c, d, the solid blue curves for BMARS again truth can assist significantly in the verification of the solutions
exhibit the fastest decline, indicating the best performance. The arrived at by different optimization algorithms.
BART-based BO (solid red curves) follows and presents similar For the predictors, we follow the setting in Talapatra et al.14 and
decreased speed with most of the GP-based methods. However, consider 13 possible features in the model: empirical constants C,
the dotted brown curve seems to be the slowest, which is for the m, which link the elements of the material to its bulk modulus;
baseline GP (RBK). Considering the convergent stage, the blue valence electron concentration Cv; electron to atom ratio ae; lattice
curve reaches it between 50 and 60 iterations and the minimum parameters a and c; atomic number Z; interatomic distance Idist;
observed y is very close to the global optimum value f(x*) = 0. The the groups corresponding to the periodic table of the M, A, and X
other methods remain in a decreasing pattern with larger values elements ColM, ColA, ColX, respectively; the order O of MAX phase
of the minimum observed y. It is no surprise that GP-based (whether of order 1 according to M2AX or order 2 according to
methods suffer under this scenario, for which Rastrigin function’s M3AX2); and the atomic packing factor (APF). We note that the
quick switch between different local minima may be the reason, features above can potentially be correlated with the intrinsic
especially for GP (RBK). In contrast, with the flexible bases mechanical properties of MAX phases, although a priori we
constructed and multiple tree models, BMARS and BART are able assume that we have no knowledge as to how such features are
to capture this complex trend of f. We note that BART might need correlated. In practice, as was found in ref. 14, only a small subset
a few more training samples to gain more competitive advantages of the feature space is correlated with the target properties. We
over more flexible GPs like BMA1 and BMA2 due to block patterns note that in ref. 14 the motivation for using Bayesian model
of Rastrigin function. averaging was precisely to be able to detect subsets within the
Having established the better overall performance of our larger feature set most effectively correlated with the target
proposed non-GP base functions applied to complex BO properties to optimize.
problems, we will now turn our attention to two materials For the probabilistic model, we align with the simulation study
science-motivated problems. above and compare our suggested framework that uses BART32
and BMARS30 to the widely used baselines, including GP (RBK)38,
MD in the MAX phase space GP (RBK ARD)27, GP (Dot)39, Bayesian model average using GP
MAX phases (ternary layered carbides/nitrides)14,41 create an (BMA1 and BMA2)14, and GP (DKNet)29. For the acquisition
adequate system to investigate the behavior of autonomous function, we choose EI for each of them to ensure a fair
materials design frameworks, as a result of both their chemical comparison. To get a comprehensive picture, we follow the
richness and the wide range of their properties. The pure ternary structure in the previous section (where we studied the
npj Computational Materials (2021) 194 Published in partnership with the Shanghai Institute of Ceramics of the Chinese Academy of Sciences
B. Lei et al.
5
In MD problems, beyond the identification of optimal regions in
Table 2. The top 5 important factors selected by BART for the
the materials design space, it is also desirable to understand the
maximum bulk modulus K in MAX phases with the initial set of sample
factors/features most correlated with the properties of interest. By
size N ∈ {2, 5, 10, 15, 20}.
taking these predictor and interaction rankings into account,
Setting Top 1 Top 2 Top 3 Top 4 Top 5 researchers can gain a deeper understanding of the connection
between features and material properties. We present the relevant
N=2 ColA e
a APF ColM c results for the maximum bulk modulus K, and those for the
N=5 ColA e
ColM APF Idist minimum shear modulus G are in Supplementary Note 2. BART
a
N = 10 ColA APF ColM e
Idist and BMARS are endowed with automatic feature selection based
a
on their appearances in the corresponding surrogate models,
N = 15 ColA e
a APF ColM Idist while baselines GP (RBK) and GP (Dot) cannot identify feature
N = 20 ColA APF e
a ColM Idist importance relative to the BO objective. Although BMA1 and
BMA2 can utilize the coefficient of each component to provide
some information about feature importance, they cannot directly
tell the exact order of individual variables and interactions.
Table 3. The top 5 important factors selected by BMARS for the Under five different scenarios N ∈ {2, 5, 10, 15, 20}, Tables 2 and 3
maximum bulk modulus K in MAX phases with the initial set of sample list the top 5 important factors aimed at the maximum value of K
size N ∈ {2, 5, 10, 15, 20}. using BART and BMARS, respectively. The rankings are based on the
median inclusion times of the 100 replicates from the last model
Setting Top 1 Top 2 Top 3 Top 4 Top 5 when the workflow stops. When using BART, ColA, ae, ColM, APF, and
Idist are the most useful. While turning to BMARS, ColA, ae, c, a, APF, and
N=2 e
a ColA APF Idist c Idist always play a key role. We can see a similar pattern for the top-
N=5 ColA Idist APF e
a c ranked features between the two models for different N, although
N = 10 ColA e
a APF Idist a some differences exist in their order. Regarding the interactions
N = 15 ColA e
APF Idist c among features, we measure their importance by counting the
a
N = 20 ColA APF e
Idist a
coexistence of two of them within each basis function. The more
a
frequently they are used in the same basic function, the greater their
influences on material improvement are. The detailed results for the
benchmark Rosenbrock and Rastrigin functions) and start the interaction selection can be referred in Supplementary Note 2.
above models with five different sizes of initial samples (N = 2, 5, During the material development process, we may not know which
10, 15, 20), which are randomly chosen from the design space. For features we should add to the model in advance. In light of this, it is
each N, the results are based on 100 replicates. To avoid an usually the case that one considers all possible features during the
excessive number of iterations, we add two materials at a time in training and optimization to avoid missing important features. This
the platform. For the stopping criteria, it is set as successfully brings an important challenge because it is often not possible to carry
locating the material with ideal properties or running out of the out any sort of feature selection ahead of the experimental campaign.
budget which is set as 80, roughly 20% of the available space. For Moreover, GP-based BO frameworks tend to become less efficient as
these replicates not converging within the budget, we follow the dimension of the design space increases as the required coverage
Talapatra et al.14 and regard their number of calculations as 100 to to ensure adequate learning of the response surface is exponential
avoid an excessive number of evaluations. with the number of features11. Moreover, the sparse nature of the
Due to the high cost per experiment, the framework has better sampling scheme—BO, after all, is used when there are stringent
performance if it needs a fewer number of experiments before resource constraints to query the problem space—makes the
finding the candidate with desired properties. Therefore, we use it (learned) response surface very flat over wide regions of the design
as a vital criterion for evaluating model capabilities. Table 1 shows space, with some interspersed, local highly nonconvex landscapes44.
the mean value and interquartile range (IQR) of the total number These issues make high-dimensional BO very hard. In materials
of evaluations searching for the maximum bulk modulus K within science problems, a key challenge is that many of the potential
dimensions of the problem space are uninformative, i.e., they are not
the MAX phase design space. The smaller values of the mean and
correlated with the objective of interest.
IQR indicate a more efficient and stable platform.
It is thus desirable to develop frameworks that are robust
As depicted in Table 1, while GP (Dot), BMA1, and BMA2 are
against the existence of possibly many uninformative or
more efficient than GP (RBK) and GP (DKNet) when looking for the
redundant features. To further check the platform’s utility to distill
maximum bulk modulus K, BART and BMARS can further greatly
useful information and maintain the speed, we simulate 16
reduce the number of experiments and maintain a more stable random predictors following the standard normal distribution and
performance compared to GP-based models. For GP (RBK ARD), it mix them with the 13 predictors described above. With these new
achieves good speed when N is larger than 10, but shows poor non-informative features, we use the same automated framework
and unstable performance under small N. Also, considering the and explore the space for the materials with ideal properties.
interquartile range of each model, BART and BMARS tend to As shown from Table 4, BART’s performance is not degraded by
be more robust under each setting and can achieve the goal the newly added unhelpful information and is still the most
before 80 iterations, while the other five are more likely to run out efficient choice, indicating its robust property. At the same time,
of the budget without achieving the objective. although BMARS is slower than the best, it is still competitive
Two possible reasons could explain why BMARS and BART can compared to other GP-based approaches like BMA1 and BMA2.
improve the searching speed much more efficiently than BART-based BO is clearly capable of detecting non-informative
competing strategies. On the one hand, BMARS and BART are features in a very effective manner.
known to be more flexible surrogates compared to GP-based We also find the top 5 features as well as interaction effects for
methods and are more powerful when faced with unknown and both BART and BMARS. For the 16 newly added unimportant
complex mechanisms in real-world data. On the other hand, features, we denote them by n1, …, n16. Tables 5 and 6 summarize
BMARS and BART usually scale better with the dimension and can the most significant features. We can see that the results do not
be more robust when handling high or even moderately include n1, …, n16, indicating a good ability to filter out useless
dimensional design spaces. information. Compared with Table 2, we can also notice that the
Published in partnership with the Shanghai Institute of Ceramics of the Chinese Academy of Sciences npj Computational Materials (2021) 194
B. Lei et al.
6
Table 4. The mean value and interquartile range of the number of experiments based on each model to find the maximum bulk modulus K in MAX
phases with additional non-informative features with the initial set of sample size N ∈ {2, 5, 10, 15, 20}.
npj Computational Materials (2021) 194 Published in partnership with the Shanghai Institute of Ceramics of the Chinese Academy of Sciences
B. Lei et al.
7
Fig. 3 The average maximum or minimum stacking fault energy (SFE) [mJ⋅m−2] observed on each model in each iteration. a The
maximum SFE with the initial set of sample size N = 10. b The maximum SFE with the initial set of sample size N = 20. c The minimum SFE with
the initial set of sample size N = 10. d The minimum SFE with the initial set of sample size N = 20.
Table 7. The top five important factors selected by BART for the maximum stacking fault energy (SFE) with the initial set of sample size N ∈ {2, 5, 10,
15, 20}.
The figure shows that BART- and BMARS-based BO is capable of SFE using BART and BMARS, respectively. The rankings are based
finding the materials with SFE values close to the ground-truth on the median inclusion times of the 100 replicates from the last
maximum in the dataset in ~80 iterations, corresponding to just model when the workflow stops. For BART, Specific.Heat_Avg,
0.25% of the total materials design space that could be explored. Pauling.EN_Var, Mn, and Ni are the most important features.
This is an impressive performance that is eclipsed when Meanwhile, turning to BMARS, Specific.Heat_Avg, Pauling.EN_Va,
considering the performance of BART/BMARS-BO in the minimiza- C_11_Avg, and Mn always play vital roles. Comparing top-ranked
tion problem, as shown in Fig. 3c, d. In Fig. 3c, d, the blue curves features for sets of different N, we observe similar patterns, but
for BMARS and red curves for BART drop much faster than other with a few differences in order. Immediately, one can see that only
curves, which confirms a more efficient search ability of our a few chemical elements are detected to be strongly correlated to
methods. In this case, by about ~40 iterations, the optimizer has the SFE in this HEA system and that, instead, other (atomically
converged to the points extremely close to the ground-truth averaged) intrinsic properties may be more informative when
minimum in the dataset. This corresponds to about 0.125% of the attempting to predict this important quantity. This implies that
total materials design space. In this case, the performance of the focusing exclusively on chemistry as opposed to derived features
proposed frameworks is much better than most of the alter- may not have been an optimal strategy towards BO-based
natives. Here we note that, although GP (Dot) performs better exploration of this space. Notably, Ni figures as the feature highly
than BMARS or BART in a few settings, an additional advantage of correlated to SFE in almost all scenarios considered. This is not
the latter methods is the automatic detection of important surprising as Ni is also highly correlated with the stability of FCC
features detailed below. over competing phases (such as HCP), and thus, higher Ni content
In this case study, not only the design space has become much in an alloy should be correlated to higher stability of FCC and
larger but also the number of candidate design features has higher SFE49. Co and Mn also appear as important covariates. In
increased. Using other approaches, it would be more difficult to the case of Co, limited experimental studies have shown that
evaluate the significance of the different features (or degrees of increased Co tends to result in lower SFEs in FCC-based HEAs50.
freedom) as well as their interactions. Here, we present the While trying to understand the underlying reasons for why other
corresponding results for finding the maximum SFE, and those for covariates (Specific hear, Pauling Electronegativity, etc.) seem to
the minimum SFE are in Supplementary Note 3. be highly correlated to SFE is beyond the scope of this work,
Under five different scenarios N ∈ {2, 5, 10, 15, 20}, Tables 7 and what is notable is that in this framework, such insights can be
8 list the top five factors most correlated with the maximum in the gleaned at the same time that the materials problem space is
Published in partnership with the Shanghai Institute of Ceramics of the Chinese Academy of Sciences npj Computational Materials (2021) 194
B. Lei et al.
8
Table 8. The top five important factors selected by BMARS for the maximum stacking fault energy (SFE) with the initial set of sample size N ∈ {2, 5,
10, 15, 20}.
being explored. Thus, in an admittedly limited manner, the BART/ spline basis, BMARS is able to catch challenging patterns like
BMARS-BO framework not only assists in the (very) efficient sudden transitions in the response surface. At the same time, BART
exploration of materials design spaces but also enhances our also ensembles multiple individual trees and leads to a strong
understanding of the underpinnings of material behavior. More regression algorithm. Resulting from the recursive partitioning
results about the interactions among features are presented in structures, they are equipped with a model-free variable selection
Supplementary Note 3. that is based on feature inclusion frequencies in their basic
functions and trees. This enables them to more accurately
recognize the trends and correctly reveal the true factors.
DISCUSSION We would like to close by briefly discussing potential
In general, there are two major categories of BO: (i) acquisition- applications of the framework in the context of autonomous
based BO (ABO), and (ii) partitioning-based BO (PBO). ABO10,12,36 is materials research (AMR). Recently, the concept of autonomous
the most traditional and broadly used BO. The key idea is to pick experimentation for MD57 has quickly emerged as an active area
an acquisition function, which is derived from the posterior and of research58–60. Going beyond traditional high-throughput
then optimized at each iteration to specify the next experiment. approaches to MD61–63, AMR aims to deploy robotic-assisted
On the other hand, PBO36,51 successfully avoids the optimization platforms capable of the automated exploration of complex
of acquisition functions by intelligently partitioning the space materials spaces. Autonomy, in the context of AMR, can be
based on observed experiments and exploring promising areas, achieved by developing systems capable of automatically select-
greatly reducing computations. Compared to PBO, ABO usually ing the experimental points to explore in a principled manner,
makes better use of the available knowledge and makes higher with as little human intervention as possible. Our proposed non-
quality decisions, leading to a fewer number of needed GP BO methods seem to have robust performance against a wide
experiments. In this study, we focused on ABO to construct the range of problems. It is thus conceivable that the experimental
autonomous workflow for material discovery. design engines of AMR platforms could benefit from algorithms
GP-based BO has been widely used in a number of areas and such as those proposed here.
gradually become a benchmark method12,13,24 for optimization of
expensive “black-box” functions. However, its power can be
limited by the intrinsic weaknesses of GP10,12. Isotropic covariance METHODS
functions such as the Matérn and Gaussian kernels commonly Bayesian optimization
employed in the literature have continuous sample paths, which is BO10 is a procedure intended to determine the global minimum (or
undesirable in many problems including material discovery as it is maximum, with the similar procedure) x* of an unknown objective
well known that the behavior of materials often changes abruptly function f sequentially and optimally, where X denotes the search space:
with minute changes in chemical make-up or (multiscale) x ¼ argmin f ðxÞ: (3)
microstructural arrangements. Moreover, such isotropic kernels fx2Xg
are provably suboptimal52 in function estimation when there are In the common setting of BO, the target function f can be either “black
spurious covariates or anisotropic smoothness. While remedies box” or expensive to evaluate, as such a function may represent a
have been proposed in the literature involving more flexible resource-intensive experiment or a very complex set of numerical
kernel functions with additional hyperparameters53 and sparse simulations. Thus, we would like to reduce the number of function
additive GPs54,55, tuning and computation of such models can be evaluations as we explore the design space and search for the optimal
significantly challenging, especially given a modest amount of point. It mainly includes two steps: (i) fitting the hidden pattern of the
data. Thus, in complex material science problems such as ours, target function, f, given observed data D so far based on some surrogate
models, and (ii) optimizing selected utility or acquisition functions u(x∣D)
Bayesian approaches based on additive regression trees or
based on the posterior distribution of the surrogate estimates of f in order
multivariate splines constitute an attractive alternative to GPs. to decide the next sample point to evaluate in the design space X . To be
Attractive theoretical properties of BART, including adaptivity to more specific, it generally follows Algorithm 1:
the underlying anisotropy and roughness, have recently
appeared56. Algorithm 1. Bayesian optimization (BO). Input: initial observed dataset
In this paper, we proposed a fully automated experimental D = {(yi, xi), i = 1, …, N}. Output: candidate with desired properties. 1: Begin
design pipeline where we took advantage of more adaptive and with s = 1. 2: while stopping criteria are not satisfied do (3 to 7). 3: Train
flexible Bayesian models including BMARS30,31 and BART32 within the chosen probabilistic model based on data D. 4: Calculate the selected
an otherwise conventional BO procedure. A wide range of acquisition function u(x∣D). 5: Choose the next experiment point by
problems in scientific studies can be handled with this xsþ1 ¼ argmax fxsþ1 2Xg uðxjDÞ. 6: Get the new point (ys + 1, xs + 1) and add it
into the observed dataset D. 7: s = s + 1. 8: return the candidate with
algorithm-based workflow, including MD. Both the simulation desired properties
studies and real data analysis applied to scientifically relevant
materials problems demonstrate that using BO with BMARS and A schematic illustration of BO is shown in Fig. 4—we note that such an
BART outperforms GP-based methods in terms of searching speed algorithm can be implemented in autonomous experimental design
for the optimal design and automatic feature importance platforms. Each of the subplots presents the state after one BO iteration,
determination. To be more specific, due to its well-designed where they include the true unknown function (blue curve), utility
npj Computational Materials (2021) 194 Published in partnership with the Shanghai Institute of Ceramics of the Chinese Academy of Sciences
B. Lei et al.
9
Fig. 4 Schematic illustration of Bayesian optimization (BO). Four subplots (a–d) give an example of the sequential automated experimental
design using BO. They describe the true unknown function (blue curve), expected improvement (red curve), fitted values using GP (orange
curve), 95% confidence interval (orange area), observed samples (black points), and the next experiment design (gray triangle) in each
iteration.
function—in this case EI—(red curve), fitted values using GP (orange a joint Gaussian distribution:
curve), 95% confidence interval (orange shaded area), observed samples
pðfÞ N ðfjmðxÞ; KÞ; ½K ij ¼ kðxi ; xj Þ; (5)
(black points), and the next experiment recommended by the utility/
acquisition function (gray triangle). where m(⋅) is the mean function and k(⋅, ⋅) is the kernel function.
In this sequential optimization strategy, one of the key components is A common choice for m(⋅) is a constant mean function. For k(⋅, ⋅), there
the Bayesian surrogate model for f, which is used to fit the available data34 are various candidates and we can decide it based on the corresponding
and to predict the outcome—with a measure of uncertainty—of task. The radial-basis function (RBF) kernel is popular to capture stationary
experiments yet to be carried out. Another important determinant of BO and isotropic patterns. RBF kernels with ARD27 assign different scale
efficiency is the choice of the acquisition function34. It can assist in setting parameters for each feature instead of using a common value27, which can
our expectations regarding how much we can learn and gain from a new help to identify key covariates determining f. There are also nonstationary
candidate design. The next design structure to be tested is usually the one kernels, such as dot-product kernels39 and more flexible deep network
that maximizes the acquisition function, balancing the trade-off between kernels29. For simplicity, we use D = {x1:n, y1:n} to denote the data we have
exploration and exploitation of the design space. There are many collected. For a new input x*, the predictive distribution of response y* is:
commonly used acquisition functions, such as EI, probability of improve-
ment, upper confidence bound, and Thompson sampling10,11. Here, we pðy jx ; DÞ ¼ N ðμ ; σ 2 Þ; (6)
choose to use EI as the acquisition function, which can find the point that, 1
in expectation, improves on f n the most: μ ¼ mðx Þ þ kðx ; x1:n ÞðK þ σ 2 IÞ ðy1:n mðx1:n ÞÞ; (7)
Published in partnership with the Shanghai Institute of Ceramics of the Chinese Academy of Sciences npj Computational Materials (2021) 194
B. Lei et al.
10
Fig. 5 Workflow of the automated experimental design framework. The overall workflow of the automated experimental design framework
is based on Bayesian optimization with adaptive Bayesian learning.
where αj denotes the relevant coefficient for the basic function Bj taking At the same time, one can use this regression model for automatic
the form of variable selection, which greatly expands its scope of use. The importance
8 of each predictor is based on the average variable inclusion frequency in
>
< 1; j ¼ 1;
all splitting rules32. Bleich et al.76 further put forward a permutation-based
Bj ðxi Þ ¼ QQj
(10) inferential approach, which is a good alternative for the factor significance
>
: ½sqj ðxi;vðq;jÞ tqj Þþ ; j 2 f2; ¼ ; lg;
q¼1 determination.
with sqj ∈ {−1, 1}, v(q, j) denoting the index of the variables, and the set {v
(q, j); q = 1, …, Qj} not repeated. Here tqj tells the partition location, Automated experimental design framework
ðÞþ ¼ maxð0; Þ, and Qj is the polynomial degree of the basis Bj and also With BO using BMARS or BART, we propose an autonomous platform for
corresponds to the number of predictors involved in Bj. The number of efficient experimental design, aiming at significantly reducing the number
parameters is O(l) and we set the maximum value of l as 500. of required trials and the total expense to find the best candidate in MD.
To obtain samples from the joint posterior distribution, the computation The framework is depicted in Fig. 5 and the detailed description is as
is mainly based on the reversible jump Metropolis–Hastings algorithms66. follows.
The sampling scheme only draws the important covariates, hence In this workflow, we begin with an initially observed dataset and the sample
automatic feature selection is naturally done in this procedure. size can be as small as two. Then, we train our surrogate Bayesian learning
model on the observed dataset and collect the relevant posterior samples.
Using these samples, the acquisition function for each potential experiment to
Ensemble learning and BART perform is calculated. After obtaining the values of the acquisition function, we
Apart from model mixing, ensemble learning67 provides an alternative way select the candidates with top scores and do experiments at these points. With
of combining models, which is a popular procedure that constructs the new outcomes, the observed dataset is augmented and the stopping
multiple weak learners and aggregates them into a stronger learner68–70. In criteria are checked. If the criteria are fulfilled, we stop the workflow and return
several circumstances, it is challenging for an individual model to capture the candidate with the desired properties. Otherwise, we update the surrogate
the unknown complex mechanism connecting inputs to the output(s) by model by making use of the augmented dataset and use the updated belief to
itself. Therefore, it is a better strategy to use a divide-and-conquer method guide the next round of experiments.
in the ensemble-learning framework, which allows each of the models to Within this fully automated framework, what we need to provide is the
fit a small part of the function. This is the key difference of our adopted initial sample and the stopping criteria. The beginning dataset can be
Bayesian ensemble learning from the GP-based model mixing strategy in some available data before this project. If we do not have this kind of
Talapatra et al.14. Ensemble learning’s robust performance to handle information, we can randomly conduct a small number of experiments to
complex data makes it a great candidate for BO71. However, it has not populate the database and initialize the surrogate models used in the
been explored to its full potential in the context of optimal experimental sequential experimental protocol. For the stopping criteria, it can be
design yet. Hence, we choose to combine BO with the Bayesian ensemble arriving at the desired properties or running out of the experimental
learning72, in particular, BART32. As BART is a tree-based model without budget14.
inherent smoothness assumptions, it is also a more flexible surrogate
model when modeling objective functions that are non-smooth, often
encountered in MD. This strategy is effective and efficient due to its ability DATA AVAILABILITY
to take advantage of both the ensemble-learning procedure and the The data files for materials discovery in the MAX phase space and optimal design for
Bayesian paradigm. stacking fault energy in high entropy alloy space are available upon reasonable
BART32 is a nonparametric regression method utilizing the Bayesian request.
ensemble-learning technique. Many simulations and real-world applica-
tions confirmed its flexible fitting capabilities73–75. Given xi 2 Rp and yi(i =
1, …, n), where it approximates the target function f by aggregating a set Received: 2 July 2021; Accepted: 3 November 2021;
of regression trees:
X
l
i.i.d.
y i ¼ f ðxi Þ þ ϵi ; ^f ðxi Þ ¼ gj ðxi ; T j ; Mj Þ; ϵi N ð0; σ 2 Þ; (11)
j¼1
where Tj denotes a binary regression tree, Mj ¼ ðμj1 ; ¼ ; μjbj Þ> denotes a REFERENCES
vector of means corresponding to the bj leaf nodes of Tj, and gj(xi; Tj, Mj) is 1. Mockus, J. In Bayesian Approach to Global Optimization, 125–156 (Springer,
the function that assigns μjt ∈ Mj to xi. Dordrecht, 1989).
Using regularization priors on those trees is critical for the superior 2. Kushner, H. J. A new method of locating the maximum point of an arbitrary
performance of this ensemble regression model. That way, each tree will multipeak curve in the presence of noise. J. Basic Eng. 86, 97–106 (1964).
be regularized to explain a small and distinct part of f. This aligns with the 3. Jones, D. R., Schonlau, M. & Welch, W. J. Efficient global optimization of expensive
essence of ensemble learning, which is about combining weak learners black-box functions. J. Glob. Optim. 13, 455–492 (1998).
into a stronger model. The number of parameters is correlated with the 4. Kaufmann, E., Cappé, O. & Garivier, A. On Bayesian upper confidence bounds for
number of trees l as well as the tree depth dj and is O(l ⋅ 2d). In our analysis, bandit problems. In Proc. 15th International Conference on Artificial Intelligence
l is set as 50 and dj is usually smaller than 6. and Statistics (AISTAT), 592–600 (JMLR, 2012).
npj Computational Materials (2021) 194 Published in partnership with the Shanghai Institute of Ceramics of the Chinese Academy of Sciences
B. Lei et al.
11
5. Garivier, A. & Cappé, O. The kl-ucb algorithm for bounded stochastic bandits and 34. Mateos, C., Nieves-Remacha, M. J. & Rincón, J. A. Automated platforms for reac-
beyond. In Proc. 24th Annual Conference on Learning Theory, 359–376 (JMLR tion self-optimization in flow. React. Chem. Eng. 4, 1536–1544 (2019).
Workshop and Conference Proceedings, 2011). 35. Bashir, L. Z. & Hasan, R. S. M. Solving banana (rosenbrock) function based on
6. Maillard, O.-A., Munos, R. & Stoltz, G. A finite-time analysis of multi-armed fitness function. World Sci. News 12, 41–56 (2015).
bandits problems with kullback-leibler divergences. In Proc. 24th annual Con- 36. Merrill, E., Fern, A., Fern, X. & Dolatnia, N. An empirical study of Bayesian opti-
ference On Learning Theory, 497–514 (JMLR Workshop and Conference Pro- mization: acquisition versus partition. J. Mach. Learn. Res. 22, 1–25 (2021).
ceedings, 2011). 37. Pohlheim, H. GEATbx: Genetic and Evolutionary Algorithm Toolbox for use with
7. Auer, P., Cesa-Bianchi, N. & Fischer, P. Finite-time analysis of the multiarmed MATLAB Documentation. http://www.geatbx.com/docu/algindex-03.html (2008).
bandit problem. Mach. Learn. 47, 235–256 (2002). 38. Vert, J.-P., Tsuda, K. & Schölkopf, B. A primer on kernel methods. Kernel Methods
8. Negoescu, D. M., Frazier, P. I. & Powell, W. B. The knowledge-gradient algorithm Comput. Biol. 47, 35–70 (2004).
for sequencing experiments in drug discovery. INFORMS J. Comput. 23, 346–363 39. Williams, C. K. & Rasmussen, C. E. Gaussian Processes for Machine Learning, Vol. 2
(2011). (MIT Press, 2006).
9. Lizotte, D. J., Wang, T., Bowling, M. H. & Schuurmans, D. Automatic gait optimi- 40. Molga, M. & Smutnicki, C. Test functions for optimization needs. Test. Funct.
zation with Gaussian process regression. In Proc. Int. Joint Conf. on Artificial Optim. Needs 101, 48 (2005).
Intelligence, 7, 944–949 (2007). 41. Barsoum, M. W. MAX Phases: Properties of Machinable Ternary Carbides and Nitrides
10. Frazier, P. I. Bayesian optimization. In Recent Advances in Optimization and (Wiley, 2013).
Modeling of Contemporary Problems, 255–278 (INFORMS, 2018). 42. Aryal, S., Sakidja, R., Barsoum, M. W. & Ching, W.-Y. A genomic approach to the
11. Shahriari, B., Swersky, K., Wang, Z., Adams, R. P. & De Freitas, N. Taking the human stability, elastic, and electronic properties of the max phases. Phys. Stat. Sol. 251,
out of the loop: a review of Bayesian optimization. Proc. IEEE 104, 148–175 (2015). 1480–1497 (2014).
12. Snoek, J., Larochelle, H. & Adams, R. P. Practical Bayesian optimization of machine 43. Barsoum, M. W. & Radovic, M. Elastic and mechanical properties of the max
learning algorithms. Adv. Neural Inform. Process. Syst. 25, 2960–2968 (2012). phases. Annu. Rev. Mater. Res. 41, 195–227 (2011).
13. Iyer, A. et al. Data-centric mixed-variable Bayesian optimization for materials 44. Rana, S., Li, C., Gupta, S., Nguyen, V. & Venkatesh, S. High dimensional Bayesian
design. In International Design Engineering Technical Conferences and Computers optimization with elastic Gaussian process. In International Conference on
and Information in Engineering Conference, Vol. 59186, V02AT03A066 (American Machine Learning, 2883–2891 (PMLR, 2017).
Society of Mechanical Engineers, 2019). 45. Chaudhary, N., Abu-Odeh, A., Karaman, I. & Arróyave, R. A data-driven machine
14. Talapatra, A. et al. Autonomous efficient experiment design for materials dis- learning approach to predicting stacking faulting energy in austenitic steels. J.
covery with Bayesian model averaging. Phys. Rev. Mater. 2, 113803 (2018). Mater. Sci. 52, 11048–11076 (2017).
15. Ju, S. et al. Designing nanostructures for phonon transport via Bayesian optimi- 46. Hu, Y.-J., Sundar, A., Ogata, S. & Qi, L. Screening of generalized stacking fault
zation. Phys. Rev. X 7, 021024 (2017). energies, surface energies and intrinsic ductile potency of refractory multi-
16. Ghoreishi, S. F., Molkeri, A., Srivastava, A., Arroyave, R. & Allaire, D. Multi- component alloys. Acta Mater. 210, 116800 (2021).
information source fusion and optimization to realize icme: application to dual- 47. Denteneer, P. & Soler, J. Energetics of point and planar defects in aluminium from
phase materials. J. Mech. Des. 140, 111409 (2018). first-principles calculations. Solid State Commun. 78, 857–861 (1991).
17. Khatamsaz, D. et al. Efficiently exploiting process-structure-property relationships 48. Denteneer, P. & Van Haeringen, W. Stacking-fault energies in semiconductors
in material design by multi-information source fusion. Acta Mater. 206, 116619 from first-principles calculations. J. Phys. C 20, L883 (1987).
(2021). 49. Cockayne, D., Jenkins, M. & Ray, I. The measurement of stacking-fault energies of
18. Ghoreishi, S. F., Molkeri, A., Arróyave, R., Allaire, D. & Srivastava, A. Efficient use of pure face-centred cubic metals. Philos. Mag. 24, 1383–1392 (1971).
multiple information sources in material design. Acta Mater. 180, 260–271 (2019). 50. Liu, S. et al. Transformation-reinforced high-entropy alloys with superior
19. Frazier, P. I. & Wang, J. Bayesian optimization for materials design. In Information mechanical properties via tailoring stacking fault energy. J. Alloys Compd. 792,
Science for Materials Discovery and Design, 45–75 (Springer, 2016). 444–455 (2019).
20. Liu, Y., Wu, J.-M., Avdeev, M. & Shi, S.-Q. Multi-layer feature selection incorpor- 51. Wang, S. & Ng, S. H. Partition-based Bayesian optimization for stochastic simu-
ating weighted score-based expert knowledge toward modeling materials with lations. In 2020 Winter Simulation Conference (WSC), 2832–2843 (IEEE, 2020).
targeted properties. Adv. Theory Simul. 3, 1900215 (2020). 52. Bhattacharya, A., Pati, D. & Dunson, D. Anisotropic function estimation using
21. Janet, J. P. & Kulik, H. J. Resolving transition metal chemical space: feature multi-bandwidth Gaussian processes. Ann. Stat. 42, 352 (2014).
selection for machine learning and structure–property relationships. J. Phys. 53. Cheng, L. et al. An additive Gaussian process regression model for interpretable
Chem. A 121, 8939–8954 (2017). non-parametric analysis of longitudinal data. Nat. Commun. 10, 1–11 (2019).
22. Ramprasad, R., Batra, R., Pilania, G., Mannodi-Kanakkithodi, A. & Kim, C. Machine 54. Qamar, S. & Tokdar, S. T. Additive Gaussian process regression. Preprint at https://
learning in materials informatics: recent applications and prospects. npj Comput. arxiv.org/abs/1411.7009 (2014).
Mater. 3, 1–13 (2017). 55. Vo, G. & Pati, D. Sparse additive Gaussian process with soft interactions. Open J.
23. Honarmandi, P., Hossain, M., Arroyave, R. & Baxevanis, T. A top-down character- Stat. 7, 567 (2017).
ization of NiTi single-crystal inelastic properties within confidence bounds 56. Ročková, V. & van der Pas, S. et al. Posterior concentration for Bayesian regression
through Bayesian inference. Shap. Mem. Superelasticity 7, 50–64 (2021). trees and forests. Ann. Stat. 48, 2108–2131 (2020).
24. Ceylan, Z. Estimation of municipal waste generation of turkey using socio- 57. Nikolaev, P. et al. Autonomy in materials research: a case study in carbon
economic indicators by Bayesian optimization tuned Gaussian process regres- nanotube growth. npj Comput. Mater. 2, 1–6 (2016).
sion. Waste Manag. Res. 38, 840–850 (2020). 58. Kusne, A. G. et al. On-the-fly closed-loop materials discovery via Bayesian active
25. Moriconi, R., Deisenroth, M. P. & Kumar, K. S. High-dimensional bayesian opti- learning. Nat. Commun. 11, 5966 (2020).
mization using low-dimensional feature spaces. Mach. Learn. 109, 1925–1943 59. Aldeghi, M., Häse, F., Hickman, R. J., Tamblyn, I. & Aspuru-Guzik, A. Golem: an
(2020). algorithm for robust experiment and process optimization. Chem. Sci. 12,
26. Wang, Z., Hutter, F., Zoghi, M., Matheson, D. & de Feitas, N. Bayesian optimization 14792–14807 (2021).
in a billion dimensions via random embeddings. J. Artif. Intell. Res. 55, 361–387 60. Häse, F. et al. Olympus: a benchmarking framework for noisy optimization and
(2016). experiment planning. Mach. Learn. Sci. Technol. 2, 035021 (2021).
27. Aye, S. A. & Heyns, P. An integrated Gaussian process regression for prediction of 61. Liu, P. et al. High throughput materials research and development for lithium ion
remaining useful life of slow speed bearings based on acoustic emission. Mech. batteries. High-throughput Exp. Model. Res. Adv. Batter. 3, 202–208 (2017).
Syst. Signal Process. 84, 485–498 (2017). 62. Melia, M. A. et al. High-throughput additive manufacturing and characterization
28. Paciorek, C. J. & Schervish, M. J. Nonstationary covariance functions for gaussian of refractory high entropy alloys. Appl. Mater. Today 19, 100560 (2020).
process regression. In Advances in Neural Information Processing Systems, 273–280 63. Potyrailo, R. et al. Combinatorial and high-throughput screening of materials
(Citeseer, 2003). libraries: review of state of the art. ACS Comb. Sci. 13, 579–633 (2011).
29. Wilson, A. G., Hu, Z., Salakhutdinov, R. & Xing, E. P. Deep kernel learning. In 64. Schulz, E., Speekenbrink, M. & Krause, A. A tutorial on Gaussian process regres-
Artificial Ontelligence and Statistics, 370–378 (PMLR, 2016). sion: modelling, exploring, and exploiting functions. J. Math. Psychol. 85, 1–16
30. Denison, D. G., Mallick, B. K. & Smith, A. F. Bayesian MARS. Stat. Comput. 8, (2018).
337–346 (1998). 65. Denison, D. G., Holmes, C. C., Mallick, B. K. & Smith, A. F. Bayesian methods for
31. Friedman, J. H. Multivariate adaptive regression splines. Ann. Statist. 1–67 (1991). nonlinear classification and regression, Vol. 386 (John Wiley & Sons, 2002).
32. Chipman, H. A., George, E. I. & McCulloch, R. E. et al. Bart: Bayesian additive 66. Green, P. J. Reversible jump markov chain Monte Carlo computation and Baye-
regression trees. Ann. Appl. Stat. 4, 266–298 (2010). sian model determination. Biometrika 82, 711–732 (1995).
33. HamediRad, M. et al. Towards a fully automated algorithm driven platform for 67. Sagi, O. & Rokach, L. Ensemble learning: a survey. Wiley Interdiscip. Rev. Data Min.
biosystems design. Nat. Commun. 10, 1–10 (2019). Knowl. Discov. 8, e1249 (2018).
Published in partnership with the Shanghai Institute of Ceramics of the Chinese Academy of Sciences npj Computational Materials (2021) 194
B. Lei et al.
12
68. Krawczyk, B., Minku, L. L., Gama, J., Stefanowski, J. & Woźniak, M. Ensemble function of composition. X.Q., A.B., and D.P. contributed to the discussion on the ML/
learning for data stream analysis: a survey. Inf. Fusion 37, 132–156 (2017). BO aspects of the work. R.A. and T.Q.K. provided the materials science context and
69. Laradji, I. H., Alshayeb, M. & Ghouti, L. Software defect prediction using ensemble designed the materials science examples. All authors analyzed the results,
learning on selected features. Inf. Softw. Technol. 58, 388–402 (2015). contributed to the manuscript, and edited it. All authors reviewed the final version
70. Chen, X. M., Zahiri, M. & Zhang, S. Understanding ridesplitting behavior of on- of the manuscript.
demand ride services: an ensemble learning approach. Transp. Res. Part C 76,
51–70 (2017).
71. Zhang, W., Wu, C., Zhong, H., Li, Y. & Wang, L. Prediction of undrained shear COMPETING INTERESTS
strength using extreme gradient boosting and random forest based on Bayesian The authors declare no competing interests.
optimization. Geosci. Front. 12, 469–477 (2021).
72. Fersini, E., Messina, E. & Pozzi, F. A. Sentiment analysis: Bayesian ensemble
learning. Decis. Support Syst. 68, 26–38 (2014).
ADDITIONAL INFORMATION
73. Hill, J., Linero, A. & Murray, J. Bayesian additive regression trees: a review and look
forward. Annu. Rev. Stat. Appl. 7, 251–278 (2020). Supplementary information The online version contains supplementary material
74. McCord, S. E., Buenemann, M., Karl, J. W., Browning, D. M. & Hadley, B. C. Inte- available at https://doi.org/10.1038/s41524-021-00662-x.
grating remotely sensed imagery and existing multiscale field data to derive
rangeland indicators: application of Bayesian additive regression trees. Rangel. Correspondence and requests for materials should be addressed to Raymundo
Ecol. Manag. 70, 644–655 (2017). Arroyave.
75. Sparapani, R. A., Logan, B. R., McCulloch, R. E. & Laud, P. W. Nonparametric
survival analysis using Bayesian additive regression trees (bart). Stat. Med. 35, Reprints and permission information is available at http://www.nature.com/
2741–2753 (2016). reprints
76. Bleich, J., Kapelner, A., George, E. I. & Jensen, S. T. Variable selection for bart: an
application to gene regulation. Ann. Appl. Stat. 8, 1750–1781 (2014). Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims
in published maps and institutional affiliations.
ACKNOWLEDGEMENTS
B.K.M., A.B., and D.P. acknowledge support by NSF through Grant No. NSF CCF- Open Access This article is licensed under a Creative Commons
1934904 (TRIPODS). T.Q.K. acknowledges the NSF through Grant No. NSF-DGE- Attribution 4.0 International License, which permits use, sharing,
1545403. X.Q. and R.A. acknowledge NSF through Grants Nos. 1835690 and 2119103 adaptation, distribution and reproduction in any medium or format, as long as you give
(DMREF). The authors also acknowledge Texas A&M’s Vice President for Research for appropriate credit to the original author(s) and the source, provide a link to the Creative
partial support through the X-Grants program. Dr. Prashant Singh (Ames Laboratory) Commons license, and indicate if changes were made. The images or other third party
is acknowledged for his DFT calculations of SFE in FCC HEAs. Dr. Anjana Talapatra and material in this article are included in the article’s Creative Commons license, unless
Dr. Shahin Boluki are acknowledged for facilitating the BMA Code. DFT calculations of indicated otherwise in a credit line to the material. If material is not included in the
the SFEs were conducted with the computing resources provided by Texas A&M High article’s Creative Commons license and your intended use is not permitted by statutory
Performance Research Computing. regulation or exceeds the permitted use, you will need to obtain permission directly
from the copyright holder. To view a copy of this license, visit http://creativecommons.
org/licenses/by/4.0/.
AUTHOR CONTRIBUTIONS
B.L. and B.K.M. conceived of the concept for non-GP BO. B.L. implemented the
algorithms and carried out the experiments. T.Q.K. provided the model for SFE as a © The Author(s) 2021
npj Computational Materials (2021) 194 Published in partnership with the Shanghai Institute of Ceramics of the Chinese Academy of Sciences