0% found this document useful (0 votes)
20 views

Prop To Composition Cgan

Inferring composition required to achieve target properties of a material, using conditional GAN

Uploaded by

awadhuts
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Prop To Composition Cgan

Inferring composition required to achieve target properties of a material, using conditional GAN

Uploaded by

awadhuts
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Debnath et al.

J Mater Inf 2021;1:3


DOI: 10.20517/jmi.2021.05
Journal of
Materials Informatics

Perspective Open Access

Generative deep learning as a tool for inverse design


of high entropy refractory alloys
Arindam Debnath1, Adam M. Krajewski1, Hui Sun1, Shuang Lin1, Marcia Ahn1, Wenjie Li1, Shanshank
Priya1, Jogender Singh2, Shunli Shang1, Allison M. Beese1, Zi-Kui Liu1, Wesley F. Reinhart1,3
1
Department of Materials Science and Engineering, Pennsylvania State University, University Park, PA 16802, USA.
2
Applied Research Laboratory, Pennsylvania State University, University Park, PA 16802, USA.
3
Institute for Computational and Data Sciences, Pennsylvania State University, University Park, PA 16802, USA.

Correspondence to: Prof. Wesley Reinhart, Department of Materials Science and Engineering, Pennsylvania State University,
Steidle Building, University Park, PA 16802, USA. E-mail: [email protected]

How to cite this article: Debnath A, Krajewski AM, Sun H, Lin S, Ahn M, Li W, Priya S, Singh J, Shang S, Beese AM, Liu ZK,
Reinhart WF. Generative deep learning as a tool for inverse design of high entropy refractory alloys. J Mater Inf 2021;1:3.
https://dx.doi.org/10.20517/jmi.2021.05

Received: 10 Jul 2021 First Decision: 16 Aug 2021 Revised: 23 Aug 2021 Accepted: 27 Aug 2021 First online: 3 Sep 2021

Academic Editor: Xing-Jun Liu Copy Editor: Xi-Jun Chen Production Editor: Xi-Jun Chen

Abstract
Generative deep learning is powering a wave of new innovations in materials design. This article discusses the
basic operating principles of these methods and their advantages over rational design through the lens of a case
study on refractory high-entropy alloys for ultra-high-temperature applications. We present our computational
infrastructure and workflow for the inverse design of new alloys powered by these methods. Our preliminary
results show that generative models can learn complex relationships to generate novelty on demand, making them
a valuable tool for materials informatics.

Keywords: High entropy alloys, databases, machine learning, inverse design

INTRODUCTION
More than half of the National Academy of Engineering’s 14 Grand Challenges for the 21st Century[1]
involves the design, manufacture, and maintenance of advanced materials whose functions and properties
will be derived from their internal structures. The relationship between structure and function is
challenging to understand and even harder to predict because it is nonlinear, high-dimensional, and results

© The Author(s) 2021. Open Access This article is licensed under a Creative Commons Attribution 4.0
International License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, sharing,
adaptation, distribution and reproduction in any medium or format, for any purpose, even commercially, as
long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and
indicate if changes were made.

www.jmijournal.com
Page 2 of 13 Debnath et al. J Mater Inf 2021;1:3 https://dx.doi.org/10.20517/jmi.2021.05

from physical phenomena at many scales. Traditional materials design has relied on human intuition to
interpret patterns in known structure-property relationships and infer new materials with similar or
improved properties. However, as materials chemistry and processing become more and more complex,
these strategies become increasingly challenging, and progress is stymied by an overwhelming design space.

Fortunately, new mathematical frameworks and powerful hardware to implement them have been
developed to handle such difficult scientific problems. For example, deep neural networks (DNNs) can learn
incredibly complex nonlinear functions on text, images, and graphs[2]. DNNs extract the so-called latent
features from high-dimensional input data to make meaningful transformations on them. For example, a
DNN trained to generate realistic images of human faces may learn latent features describing hair color and
facial expression[3]. Thus, the model can not only be asked to generate an image with precisely the desired
characteristics, expression, and lighting, but it can also “explain” the image to some degree. The idea of
latent spaces is not unique to machine learning; the highly influential Materials Genome Initiative (MGI)
has made use of a very similar concept to revolutionize the way researchers approach rational materials
design. In the language of MGI, a material genome is a quantitative description of the underlying features of
a material that governs its properties. Likewise, the latent space of the model is a learned representation that
captures the dominant modes of the variation in the observed data, which leads to the variation in the
properties.

While predictions about material properties can be made using traditional computational methods, an
exciting and powerful new capability afforded by DNNs is the ability to approximate inverse functions. A
generative model is produced by training a DNN to invert random noise from a prescribed distribution to
approximate an observed distribution. Once trained, such a model can draw novel samples from random
noise, creating entirely new observations that approximately match the general rules from the training data
without exactly matching them. Generative models have recently been applied to a variety of materials,
including organics and inorganics[4,5]. For instance, they were recently used to design composite materials
with toughness exceeding 20% of what has been achieved through other optimization methods (e.g.,
topology optimization)[6]. Similar approaches have been demonstrated for optical meta-materials[7] and
bulk[8] and thin-film[9] inorganic materials. Aside from the design of new materials, generative models are
also becoming a popular method for reconstructing high-resolution images from partial or noisy
microscopy data[10].

Here we will consider a case study on a particular class of materials, high entropy refractory alloys[11]. First,
we discuss the challenges in using traditional design schemes, even those accelerated by recent machine
learning approaches, and how generative deep learning can provide solutions. Next, we describe the data
ecosystem that enables our approach and provide preliminary results from the generative models trained on
those data. Finally, we conclude with brief remarks on the future challenges of applying these techniques to
materials design.

DESIGN OF HIGH-ENTROPY REFRACTORY ALLOYS


Ni-based superalloys have been a popular material system for high-temperature applications like turbines
due to their exceptional properties at elevated temperatures. However, the current generation of Ni-based
components are operating at close to their melting point (1100 °C)[11], and additional thermal management
strategies such as internal cooling channels and conventional thermal barrier coatings have also been
pushed to their limits. The ability to operate at even higher temperatures will increase the efficiency of these
systems and lead to a reduction in carbon emissions and an increase in fuel and energy savings. Therefore,
there has been an increase in the demand for new materials that display superior mechanical properties at
Debnath et al. J Mater Inf 2021;1:3 https://dx.doi.org/10.20517/jmi.2021.05 Page 3 of 13

temperatures as high as 1600 °C.

Refractory alloys are promising candidates as they exhibit desirable properties at elevated temperatures.
However, traditional refractory alloys also exhibit low ductility at room temperature and are prone to
oxidation[12]. A variety of processing techniques have been employed in attempts to address these
drawbacks[12,13]. A different route is to produce high-entropy alloys (HEAs) from the refractory
elements[11,14]. However, a very limited number of HEAs that surpass the performance of Ni-based
superalloys have been discovered so far. Designing new HEAs that meet these requirements using the
conventional trial-and-error approach is, therefore, a challenging task that requires domain knowledge and
depends on fortuitous discovery.
Data-driven rational design
Computational tools for prediction and evaluation of stable phases based on thermodynamics using the
CALculation of PHAse Diagram (CALPHAD) approach and first-principles in terms of the density
functional theory (DFT) have matured in the last decade and continue to contribute to an increasingly rich
ecosystem of data[15]. Well-populated databases of alloy phase stability can enable rational design through
expert intuition or more sophisticated numerical techniques[16,17]. The quantity and span of these
computational methods have the potential to greatly reduce the barrier to the rational, forward design of
improved materials. Furthermore, these datasets can guide experimental synthesis to the most promising
candidates, leading to substantially better materials from only a handful of experiments[18]. However, there is
more work to be done on making these data accessible to the general scientific community through software
for data mining and predictive modeling.

Based on these plentiful datasets, machine learning approaches such as deep learning can be deployed to
rapidly predict the properties of hypothetical compounds[19-24]. In addition, targeted alloy design can be
achieved by surrogate models for specific material properties[25-27]. While such methods have been
successfully employed, for instance, to synthesize new Co-based alloys[28,29], they still have to rely on a
human designer to utilize the forward-mode surrogate models properly. This human can help introduce
some valuable expert knowledge into the workflow, but at the same time, slows down the overall process
and can introduce unintended bias.

HEA design specifically has benefited from data-driven modeling in recent years. In this case, data-driven
design refers to optimization or improvement of material properties such as stability, hardness, or
manufacturability with the help of surrogate models[30,31]. The most straightforward of these approaches take
advantage of the availability of historical experimental and computational data, while more sophisticated
implementations include the design of experiments and simulation in the loop. For instance, a variety of
data-driven methods have been used to predict the stable phases of HEAs in recent years[27,32-34], with
particular attention on single-phase HEAs. Unfortunately, even with the success of these forward models,
the conventional combinatorial approach to candidate selection leaves a design space discouragingly large
to probe in the case of equiatomic HEAs[34], or physically impossible to investigate completely in the case of
non-equiatomic HEAs.
Generative modeling
We aim to build on recent success in end-to-end DNN architectures used in other material design contexts
which rely on implicit feature learning[35,36]. A core advantage of these models is the ability to learn
meaningful representations of complex design spaces. Furthermore, the learned spaces are low-dimensional
and smooth by construction (i.e., using a normal random vector), whereas the original design spaces may be
jagged and discontinuous in many dimensions.
Page 4 of 13 Debnath et al. J Mater Inf 2021;1:3 https://dx.doi.org/10.20517/jmi.2021.05

The most popular variety of these models is the Generative Adversarial Network (GAN)[37]. A GAN model
consists of two DNNs: a generator that learns a mapping between a random normal latent space and the
target distribution (effectively generating new data), and a critic that learns to distinguish between the real
observations and generated data from its adversary. The term “adversarial” refers to the training procedure
in which the two networks compete with each other, the generator trying to produce increasingly realistic
examples and the discriminator trying to catch the generator in the act. This scheme allows the generator to
learn very high-quality representations without much training data.
Towards inverse design
In vanilla GAN, there is no way to control the output produced by the generator, meaning that many
samples must be drawn before a suitable candidate is found. However, this can be controlled in the
conditional GAN (cGAN) architecture, in which the generator is provided with an additional conditioning
vector that enforces a mapping between the latent space and the desired figure of merit[38]. In this way, the
generator learns the probability distributions of the underlying alloy properties data conditioned on the
alloy composition, and therefore, samples drawn from the multi-dimensional distribution will represent
viable compositions with predictable properties. The scheme is illustrated in Figure 1.

The cGAN approach has been demonstrated on the design of Al alloys with validation by computational
methods[39]. In that case, the use of conditional density estimation in the inverse problem enables extremely
efficient exploration of a high-dimensional design space resulting in the design of dozens of new stable
alloys. The success of these models for solving design problems relies heavily on the property of invertibility,
which means that promising points in the latent space can be sent through the model in reverse to yield
candidates in the original design space. Access to an invertible latent space enables rapid candidate material
generation with the ability to interpolate continuously between desirable structures, as demonstrated with
metal-organic frameworks[40], rather than the more rudimentary combinatorial high-throughput screening
associated with forward design methods.

There are a variety of alternative approaches which could be considered for this problem. Without
generative architectures, the design process would typically proceed in two stages. First, supervised learning
could be used to train predictive models for the properties of interest. Second, optimization (e.g., gradient
descent) could then be performed to identify an input composition to yield the desired properties using this
fast surrogate model. This is generally not preferred since generative models can produce suitable
compositions in a single step.

It is noted that there are other generative architectures besides GAN that are viable for this problem, such as
the conditional variational autoencoder[41]. VAEs minimize a reconstruction loss to learn a suitable latent
space instead of relying on adversarial training to learn the mapping from a reference distribution to the
distribution of interest as GANs do. However, VAEs have been shown to produce inferior results to GANs
due to the noise injection inherent to the training procedure and the requirement of a predefined metric for
reconstruction error[42].

Despite their advantages, it is known that cGANs are difficult to work with and require significant tuning to
obtain good results. A suitable distribution for the conditioning vector must be provided in the training
procedure to ensure that both the generator and discriminator have opportunities to explore the joint
distribution. These models can also suffer from vanishing gradients, convergence problems, and mode
collapse[37]. While strategies such as Wasserstein GAN[43] offer piecemeal solutions, ultimately, GAN remains
a convenient approximation rather than a cure-all solution to implicit data modeling[44].
Debnath et al. J Mater Inf 2021;1:3 https://dx.doi.org/10.20517/jmi.2021.05 Page 5 of 13

Figure 1. Schematic illustration of generative modeling for inverse design of materials using a conditional Generative Adversarial
Network. (A) Adversarial training procedure in which the Generator and Discriminator compete for superior performance. (B) Inverse
design using the trained Generator.

CASE STUDY: INVERSE DESIGN OF REFRACTORY HEAS


Data ecosystem
Any generative material design effort requires close integration with existing literature data and scientific
techniques to validate generated samples beyond the known set. We accomplish this by creating an
advanced data ecosystem in this case study, presented in Figure 2. It seamlessly merges literature, validation,
and generated data by retaining their independence at the single data point level, yet ensuring a coherent
JavaScript Object Notation-like data representation and combining them at the single unique material level,
as shown in the gray section of Figure 2.

This arrangement, centered on automated identification of unique materials, allows an efficient and fully
automated identification of voids in the current state of database knowledge. These voids can then be dealt
with dynamically by the appropriate component of the ecosystem every time a change in the database is
detected, e.g., whenever a new alloy is designed by a GAN. This is accomplished by a constantly running
cloud Virtual Machine server linked to the database through a high-throughput application programming
interface in this case study. Identified missing literature data is passed to natural language processing based
search algorithms and researchers, who attempt to fill it (green loop in Figure 2). Data identified as missing
a necessary validation is passed to computational techniques and researchers responsible for experiments
(red loop). At the same time, predictive models attempt to rapidly fill in any void with approximations
(orange loop) based on all defined empirical models from the literature and data-driven predictions based
on already known data. In this case study, the structure-aware linear combination of elemental properties
was found to be particularly useful. A void-free dataset of materials with various properties is then
employed to create generative models, with materials used as samples and associated properties used for
conditioning the model. With trained GANs, new candidates are generated and uploaded back to the low-
level dataset as novel materials in need of validation. We describe this generation process in detail in the
following sections. This ecosystem design inherently leads to a data flow within independent yet interacting
loops, shown in Figure 3, providing many benefits to the design process. Foremost, it allows interaction
between literature, inverse design, and validation to be fully automated, making sure that at any given time,
GANs are trained on all available data and validations are run on the most recent candidate selection. Once
running, it eliminates any wait stages resulting in maximization of discovery rate given resources.
Building a generative model
Once a sufficient dataset was collected in the literature loop shown in Figure 3, we began to fuel the inverse
design component of the data ecosystem. To demonstrate novel refractory HEAs with the desired
properties, a cGAN model based on a simple feedforward NN architecture with four fully connected layers
was trained using 529 HEA literature-derived compositions from our database[45]. The cGAN was
Page 6 of 13 Debnath et al. J Mater Inf 2021;1:3 https://dx.doi.org/10.20517/jmi.2021.05

Figure 2. A schematic of the data ecosystem that enables the inverse design.

Figure 3. Four main data flow paths in the ecosystem.

conditioned on the shear modulus and fracture toughness values to generate new compositions that should
exhibit specific values of these properties. The values of these properties were normalized to ensure that the
importance of each feature is equivalently reflected on the model. The conditioning values were sampled
using the probability distribution of the property values. Batches of normally distributed sixteen-
dimensional latent vectors and the sampled conditioning vectors were then provided as input to the
generator. One advantage of the adversarial loss of GANs over other competing methods like reconstructive
loss of VAEs is the simplicity of the objective function - here the generator receives the negative critic score
as its loss, such that it maximizes the “realism” of the generated samples. Because the critic is trained in
tandem with the generator, there is no need to define a metric for this “realism”, which is learned directly
from the observed distribution. We used the Wasserstein GAN[43] loss to avoid vanishing gradients and the
unrolled GAN[46] strategy to avoid mode collapse. Training the model took about one hour on an NVIDIA
Tesla P100 GPU.

The properties of the generated material compositions will next be verified experimentally or through other
computational approaches such as ab-initio DFT-based calculations combined with CALPHAD models[47]
and fed back into the data ecosystem to serve as a new training dataset for the cGAN, as illustrated in
Figure 3. This cycle will ensure the continuous generation of novel candidate alloys, with each iteration
Debnath et al. J Mater Inf 2021;1:3 https://dx.doi.org/10.20517/jmi.2021.05 Page 7 of 13

increasing the probability of arriving at the targeted properties.

We first show that the cGAN can learn the underlying distribution of refractory HEAs; in effect, the
adversarial training teaches the generator a set of design rules for a HEA looks. When generating new
samples, an observer should be convinced that these are legitimate alloys. Thus, to evaluate the generator,
we consider some different measures of the generated ensemble of alloy compositions in Figure 4. While
some minor differences can be observed, the generator appears to have largely captured the fundamental
definition of a refractory HEA - such as the correlation between different elements and the number of
different constituent elements - without requiring us to provide any guidance to the model (e.g., design
rule) aside from a collection of raw data of compositions of alloys.

In addition to generating valid compositions, we also want to learn the joint distribution between
compositions and material properties. To evaluate this, we plot the conditioning supplied to the generator
against the reference property value in Figure 5, provided as the ground-truth. As most reports of HEAs in
the literature do not include shear modulus G and fracture toughness KIC, reference values were derived
based on a linear combination (LC) of the pure elemental properties from DFT calculations[48]. The shear
modulus was approximated as a simple LC of elemental shear modulus values, while fracture toughness was
obtained using Rice’s model[49] given by the equation,

Where EUSF is the unstable stacking fault energy, G the shear modulus for sliding along the slip plane, and v
the Poisson’s ratio for the stable element reference structure. There is good agreement in regions with more
prevalent training data (40 GPa < G < 100 GPa), while peripheral regions with fewer observations (G >
100 GPa) show a weaker fit. Overall, both the shear modulus and fracture toughness values are well
captured by the cGAN model over a majority of the data domain.
Inverse design
We next demonstrate how the trained model can be used to perform the inverse design of HEA
compositions to the shear modulus and fracture toughness. By supplying a conditioning vector with desired
property values, the generator can be biased towards compositions likely to exhibit those properties. As seen
in Figure 6, even though the generated compositions do not produce the exact desired value of shear
modulus, they do appear to come from regions of the latent space which are better aligned with the desired
outcome. This effect can be observed from the sample compositions in Figure 6. With the increasing value
of shear modulus, the frequency of elements like W, Re, and Ru with high elemental shear modulus (173,
150, and 149 GPa, respectively) increase, while elements like Hf, Mo, and Zr with low elemental shear
modulus (30.4, 19.7, and 32.7 GPa) decrease. Thus, the cGAN model chooses appropriate elements to
generate compositions that best approach the target properties.

While targets (A-C) in Figure 6 appear reasonably well-matched, the generator struggles with (D),
corresponding to a shear modulus of 120 GPa. As shown in Figure 5A, there are not many compositions in
our training data that exhibit approximated shear modulus in excess of 100 GPa. As a consequence, the
generator is biased against creating valid compositions that match the imposed condition. Thus, the
generator resorts to creating compositions with a broad range of shear modulus values above and below the
target to compensate.
Page 8 of 13 Debnath et al. J Mater Inf 2021;1:3 https://dx.doi.org/10.20517/jmi.2021.05

Figure 4. Comparison of real (top row) and generated (bottom row) compositions. (A) Correlation between pairs of elements.
Increasing value of red indicates element pair more likely to appear in HEA composition, increasing value of blue indicates element pair
less likely to appear in HEA composition. (B) Number of different elements present in each alloy. (C) Some sample compositions. Each
column represents an alloy, according to the number density of each element. The intensity of blue indicates the atomic fraction of the
element in the composition.

Figure 5. Comparison of reference and cGAN (A) shear modulus and (B) fracture toughness values for the compositions in our
database.

Moreover, when specific values of fracture toughness are not requested from the generator, increasing the
value of shear modulus naturally leads to increased fracture toughness in the generated compositions, as
seen in Figure 6. This is a result of the general correlation between these two properties shown in Figure 7.
Therefore, the cGAN model implicitly learns the correlation between the shear modulus and fracture
toughness values and tends to generate compositions with accordant values of shear modulus and fracture
toughness (as shown by points b and c in Figure 7).
Debnath et al. J Mater Inf 2021;1:3 https://dx.doi.org/10.20517/jmi.2021.05 Page 9 of 13

Figure 6. Histograms of shear modulus and fracture toughness (top) and sample compositions (bottom) generated by fixing the shear
modulus values at (A) 30 GPa, (B) 60 GPa, (C) 90 GPa, and (D) 120 GPa. Each column represents an alloy, according to the number
density of each element. The intensity of blue indicates a greater number of compositions with the corresponding values of shear
modulus and fracture toughness in the top plots and the atomic fraction of the element in the composition in the bottom plots.

Figure 7. (A) Correlation between shear modulus and fracture toughness values of the real compositions. a, b, c and d represent four
conditioning cases of interest. (B) Histograms of shear modulus and fracture toughness for compositions generated using the
conditions shown in panel (a). The intensity of blue in the histograms indicates a greater number of compositions with the
corresponding values of shear modulus and fracture toughness.

Discovering novel alloys rather than simply sampling from known compositions often requires that the
cGAN model be able to generate compositions that have opposing values of these properties (e.g., high
Page 10 of 13 Debnath et al. J Mater Inf 2021;1:3 https://dx.doi.org/10.20517/jmi.2021.05

Figure 8. Sample compositions generated using conditions specified in Figure 7. Each column represents an alloy, according to the
number density of each element. The intensity of blue indicates the atomic fraction of the element in the composition.

shear modulus with low fracture toughness). We generated an ensemble of compositions (shown in
Figure 8) to evaluate this capability with both properties specified in the conditioning vector. This results in
interesting trends, such as more varied elemental compositions for case c and W-dominant compositions in
case b. In addition, compositions generated using opposing conditions a and d tend to rely on a few
elements like Nb and Ta in both cases, while elements like Mo/Cr and Ir/Re appear exclusively in cases a
and d, respectively. The predominance of a single element in these cases shows that the generator is relying
on some particular elements with unusual properties to achieve these opposing objectives.

CONCLUSIONS AND OUTLOOK


Generative deep learning is impacting a range of scientific fields, and materials informatics is no exception.
The complex relationships and high-dimensional design spaces intrinsic to materials make this a
compelling domain for testing the efficacy of generative models in solving real-world problems. For
example, we have shown preliminary progress towards the inverse design of refractory HEAs using a cGAN.
With only a few hundred observed HEA compositions from the literature, our model was able to capture
important trends in the data and reproduce realistic-looking compositions.

We demonstrated the ability of the trained model to design new alloys with targeted properties based on a
learned correlation between approximated mechanical properties and the latent code used by the generator.
While it does not produce a perfect match, this conditioning strongly biases the types of compositions
generated by the model. Notably, the generator struggled when pushed to the limits of the training data
domain and when the conditioning reflected rare corner cases, pointing to the gap for the need for new
Debnath et al. J Mater Inf 2021;1:3 https://dx.doi.org/10.20517/jmi.2021.05 Page 11 of 13

computational or experimental data. This is an important obstacle to address if the model is to be used to
explore new alloy compositions with exceptional properties and points to a promising avenue of “hybrid
methods” which use both generative deep learning models and conventional physics-based models to
maximize new information gained in each iteration of computation and synthesis.

Overall, we believe these generative models are a promising new approach to materials design that will be
put to best use in conjunction with more conventional computational techniques. In our case study of
HEAs design, we employ them as an inexpensive, low fidelity approach to generate new and interesting
samples automatically paired with more expensive, high fidelity validation steps. As innovation in deep
learning has been incredibly fast-paced in recent years, in part due to large investments by industry, a key
challenge to making the most of these technologies is modifying architectures developed for other problems
like computer vision to work for materials design. Ultimately this presents more opportunities than
obstacles since it should allow for constantly improving models as researchers learn general strategies for
model adaptation and use them to guide other well-established techniques.

DECLARATIONS
Authors’ contributions
Conception and design of the study: Debnath A, Krajewski AM, Sun H, Lin S, Ahn M, Li W, Priya S,
Singh J, Shang S, Beese AM, Liu ZK, Reinhart WF
Data analysis, visualization, and interpretation: Debnath A, Krajewski AM
Generative modeling and inverse design: Debnath A
Data ecosystem software and curation: Krajewski AM
Fracture toughness modeling: Sun H, Shang S
Data collection: Debnath A, Krajewski AM, Sun H, Lin S, Ahn M, Li W
Writing: Debnath A, Krajewski AM, Liu ZK, Reinhart WF
Review and editing: Debnath A, Krajewski AM, Sun H, Lin S, Ahn M, Li W, Priya S, Singh J, Shang S,
Beese AM, Liu ZK, Reinhart WF
Resources, supervision, and project administration: Liu ZK, Shang S, Priya S, Singh J, Beese AM, Reinhart
WF
Availability of data and materials
Data used to generate results presented in this paper has not been published at the time of writing due to
still ongoing research. However, in the future, authors intend to make the data publicly available through a
purpose-built database of refractory HEAs currently being built guided by the FAIR principles (
https://www.go-fair.org/fair-principles/). The database, built around ecosystem presented in Figure 2, will
feature highly available data and an application programming interface to access it, in order to allow rapid
ML model development by other researchers.
Financial support and sponsorship
The present work is based upon work supported by the Department of Energy/Advanced Research Projects
Agency - Energy (ARPA-E) under award No DE-AR0001435.
Conflicts of interest
All authors declared that there are no conflicts of interest.
Ethical approval and consent to participate
Not applicable.
Page 12 of 13 Debnath et al. J Mater Inf 2021;1:3 https://dx.doi.org/10.20517/jmi.2021.05

Consent for publication


Not applicable.
Copyright
© The Author(s) 2021.

REFERENCES
1. NAE Grand Challenges for Engineering. 2017. Available from: http://www.engineeringchallenges.org/. [Last accessed on 31 Aug
2021].
2. Goodfellow I, Bengio Y, Courville A. Deep learning. Cambridge: MIT press; 2016.
3. Tian Y, Peng X, Zhao L, Zhang S, Metaxas DN. Cr-gan: learning complete representations for multi-view generation. Proceedings of
the Twenty-Seventh International Joint Conference on Artificial Intelligence; Main track; 2018. p. 942-8. DOI
4. Sanchez-Lengeling B, Aspuru-Guzik A. Inverse molecular design using machine learning: Generative models for matter engineering.
Science 2018;361:360-65. DOI PubMed
5. Bhowmik A, Castelli IE, Garcia-lastra JM, Jørgensen PB, Winther O, Vegge T. A perspective on inverse design of battery interphases
using multi-scale modelling, experiments and generative deep learning. Energy Storage Materials 2019;21:446-56. DOI
6. Chen CT, Gu GX. Generative deep neural networks for inverse materials design using backpropagation and active learning. Adv Sci
(Weinh) 2020;7:1902607. DOI PubMed PMC
7. Yeung C, Tsai R, Pham B, et al. Global inverse design across multiple photonic structure classes using generative deep learning. Adv
Optical Mater 2021. DOI
8. Dan Y, Zhao Y, Li X, Li S, Hu M, Hu J. Generative adversarial networks (GAN) based efficient sampling of chemical composition
space for inverse design of inorganic materials. npj Comput Mater 2020;6:84. DOI
9. Dong Y, Li D, Zhang C, et al. Inverse design of two-dimensional graphene/h-BN hybrids by a regressional and conditional GAN.
Carbon 2020;169:9-16. DOI
10. Iyer A, Dey B, Dasgupta A, Chen W, Chakraborty A. A conditional generative model for predicting material microstructures from
processing methods. arXiv preprint arXiv:1910.02133 2019.
11. Senkov ON, Miracle DB, Chaput KJ, Couzinie J. Development and exploration of refractory high entropy alloys-A review. J Mater
Res 2018;33:3092-128. DOI
12. Philips NR, Carl M, Cunningham NJ. New opportunities in refractory alloys. Metall Mater Trans A 2020;51:3299-310. DOI
13. Melia MA, Whetten SR, Puckett R, et al. High-throughput additive manufacturing and characterization of refractory high entropy
alloys. Applied Materials Today 2020;19:100560. DOI
14. Chen J, Zhou X, Wang W, et al. A review on fundamental of high entropy alloys with promising high-temperature properties. J Alloys
Compd 2018;760:15-30. DOI
15. Liu Z. Ocean of Data: Integrating first-principles calculations and CALPHAD modeling with machine learning. J Phase Equilib Diffus
2018;39:635-49. DOI
16. Li Q, Chen W, Zhong J, Zhang L, Chen Q, Liu Z. On sluggish diffusion in Fcc Al-Co-Cr-Fe-Ni high-entropy alloys: an experimental
and numerical study. Metals 2018;8:16. DOI
17. Wu Y, Si J, Lin D, et al. Phase stability and mechanical properties of AlHfNbTiZr high-entropy alloys. Mater Sci Eng A Struct Mater
2018;724:249-59. DOI
18. Wen C, Zhang Y, Wang C, et al. Machine learning assisted design of high entropy alloys with desired property. Acta Materialia
2019;170:109-17. DOI
19. Krajewski AM, Siegel JW, Xu J, Liu ZK. Extensible structure-informed prediction of formation energy with improved accuracy and
usability employing neural networks. arXiv preprint arXiv:2008.13654 2020. DOI
20. Tawfik SA, Isayev O, Spencer MJS, Winkler DA. Predicting thermal properties of crystals using machine learning. Adv Theory Simul
2020;3:1900208. DOI
21. Chibani S, Coudert F. Machine learning approaches for the prediction of materials properties. APL Materials 2020;8:080701. DOI
22. Goodall REA, Lee AA. Predicting materials properties without crystal structure: deep representation learning from stoichiometry. Nat
Commun 2020;11:6280. DOI PubMed PMC
23. Schleder GR, Padilha ACM, Acosta CM, Costa M, Fazzio A. From DFT to machine learning: recent approaches to materials science-a
review. J Phys Mater 2019;2:032001. DOI
24. Schmidt J, Marques MRG, Botti S, Marques MAL. Recent advances and applications of machine learning in solid-state materials
science. npj Comput Mater 2019:5. DOI
25. Dai D, Xu T, Wei X, et al. Using machine learning and feature engineering to characterize limited material datasets of high-entropy
alloys. Computational Materials Science 2020;175:109618. DOI
26. Kim G, Diao H, Lee C, et al. First-principles and machine learning predictions of elasticity in severely lattice-distorted high-entropy
alloys with experimental validation. Acta Materialia 2019;181:124-38. DOI
27. Qu N, Liu Y, Liao M, et al. Ultra-high temperature ceramics melting temperature prediction via machine learning. Ceramics
International 2019;45:18551-5. DOI
28. Yu J, Guo S, Chen Y, et al. A two-stage predicting model for γ′ solvus temperature of L12-strengthened Co-base superalloys based on
Debnath et al. J Mater Inf 2021;1:3 https://dx.doi.org/10.20517/jmi.2021.05 Page 13 of 13

machine learning. Intermetallics 2019;110:106466. DOI


29. Ruan J, Xu W, Yang T, et al. Accelerated design of novel W-free high-strength Co-base superalloys with extremely wide γ/γʹ region
by machine learning and CALPHAD methods. Acta Materialia 2020;186:425-33. DOI
30. Jha R, Chakraborti N, Diercks DR, Stebner AP, Ciobanu CV. Combined machine learning and CALPHAD approach for discovering
processing-structure relationships in soft magnetic alloys. Computational Materials Science 2018;150:202-11. DOI
31. Nomoto S, Segawa M, Wakameda H. Non-equilibrium phase field model using thermodynamics data estimated by machine learning
for additive manufacturing solidification. Solid Freeform Fabrication 2018: Proceedings of the 29th Annual International Solid
Freeform Fabrication Symposium - An Additive Manufacturing Conference; Austin, TX, USA. 2020. p. 1875-86.
32. Huang W, Martin P, Zhuang HL. Machine-learning phase prediction of high-entropy alloys. Acta Materialia 2019;169:225-36. DOI
33. Li Y, Guo W. Machine-learning model for predicting phase formations of high-entropy alloys. Phys Rev Materials 2019;3:95005.
DOI
34. Kaufmann K, Maryanovsky D, Mellor WM, et al. Discovery of high-entropy ceramics via machine learning. npj Comput Mater
2020;6:42. DOI
35. Flam-Shepherd D, Wu T, Aspuru-Guzik A. Graph deconvolutional generation. eprint arXiv:2002.07087 2020.
36. Kim S, Noh J, Gu GH, Aspuru-Guzik A, Jung Y. Generative adversarial networks for crystal structure prediction. ACS Cent Sci
2020;6:1412-20. DOI PubMed PMC
37. Goodfellow I. NIPS 2016 tutorial: generative adversarial networks. eprint arXiv:1701.00160 2016.
38. Aggarwal K, Kirchmeyer M, Yadav P, Keerthi SS, Gallinari P. Regression with Conditional GAN. arXiv preprint arXiv:190512868
2019.
39. Nguyen P, Tran T, Gupta S, Rana S, Venkatesh S. Hybrid generative-discriminative models for inverse materials design. eprint
arXiv:1811.06060 2018.
40. Yao Z, Sanchez-Lengeling B, Bobbitt NS, et al. Inverse design of nanoporous crystalline reticular materials with deep generative
models. ChemRxiv 2020. DOI
41. Lim J, Ryu S, Kim JW, Kim WY. Molecular generative model based on conditional variational autoencoder for de novo molecular
design. J Cheminform 2018;10:31. DOI PubMed PMC
42. Bao J, Chen D, Wen F, Li H, Hua G. CVAE-GAN: fine-grained image generation through asymmetric training. In: Proceedings of the
IEEE international conference on computer vision; 2017. p. 2745-54.
43. Arjovsky M, Chintala S, Bottou L. Wasserstein generative adversarial networks. Proceedings of the 34th International Conference on
Machine Learnin; PMLR; 2017. p. 214-23.
44. Li K, Malik J. On the implicit assumptions of gans. eprint arXiv:1811.12402 2018.
45. ULTERA MongoDB. Available from: https://phaseslab.com/ultera/. [Last accessed on 31 Aug 2021].
46. Metz L, Poole B, Pfau D, Sohl-Dickstein J. Unrolled generative adversarial networks. arXiv preprint arXiv:161102163 2016.
47. Liu Z. First-principles calculations and CALPHAD modeling of thermodynamics. J Phase Equilib Diffus 2009;30:517-34. DOI
48. Chong X, Shang SL, Krajewski AM, et al. Correlation analysis of materials properties by machine learning: illustrated with stacking
fault energy from first-principles calculations in dilute fcc-based alloys. J Phys Condens Matter 2021;33:295702. DOI PubMed
49. Rice JR. Dislocation nucleation from a crack tip: an analysis based on the Peierls concept. J Mech Phys Solids 1992;40:239-71. DOI

You might also like