Wu et al. - 2020 - iQSPR in XenonPy A Bayesian Molecular Design Algo
Wu et al. - 2020 - iQSPR in XenonPy A Bayesian Molecular Design Algo
com
DOI: 10.1002/minf.201900107
Abstract: iQSPR is an inverse molecular design algorithm build customized molecular design algorithms using pre-set
based on Bayesian inference that was developed in our modules and a pre-trained model library in XenonPy. In this
previous study. Here, the algorithm is integrated in Python paper, we describe key features of iQSPR-X and provide
as a new module called iQSPR-X in the all-in-one materials guidance on its use, illustrated by an application to a
informatics platform XenonPy. Our new software provides a polymer design that targets a specific range of bandgap
flexible, easy-to-use, and extensible platform for users to and dielectric constant.
Keywords: molecular design · machine learning · Bayesian inference · open source · polymer
© 2019 The Authors. Published by Wiley-VCH Verlag GmbH & Co. KGaA Mol. Inf. 2020, 39, 1900107 (1 of 9) 1900107
18681751, 2020, 1-2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/minf.201900107 by Kanazawa University, Wiley Online Library on [29/05/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Full Paper www.molinf.com
tional autoencoders,[12–15] generative adversarial networks,[16] explanations and sample codes for building customized
recurrent neural networks,[17,18] and so on. These methods generators and evaluators, performing the inverse design
have been able to produce diverse chemical structures; calculations, and using some of the convenient modules in
however, they often require large training datasets to XenonPy. In this paper, we highlight some key features of
obtain a DNN-based generator that can produce chemically iQSPR-X and describe its application to the task of design-
realistic molecules with grammatically valid SMILES. Data- ing polymers using data from Polymer Genome (PG).[28,29]
sets this large are unavailable in many applications.
Furthermore, many of these methods generate chemically
or grammatically invalid representations of molecules at 2 Computational Methods
relatively high rates, unless their hyperparameters are
carefully tuned.[19–21] 2.1 Bayesian Molecular Design
Some previous works considered simpler generative
models to avoid the need to train the model with large The primary task of the Bayesian molecular design is to
dataset. Yoshikawa et al.[22] exploited a grammatical evolu- draw a set of samples from the posterior distribution P(S j
tion method with parallel computation to generate a Y2U), which represents the conditional probability of
diverse set of candidate molecules conditional on arbitrarily observing a chemical structure S, given material properties
given design targets. Ikebata et al.[23] combined a simple Y = {Yi j i = 1,…,m}, that lies in a target region U. In the
probabilistic language model based on an n-gram represen- iQSPR-X implementation, S is encoded as a SMILES string;
tation of SMILES sequences with a Bayesian inference i. e., S = s1s2…sn, where si is any valid character in SMILES.
framework to sequentially modify a population of mole- For example, phenol (C6H6O) can be represented by the
cules into promising candidate molecules that would SMILES string “C1 = CC=C(C=C1)O”, where C and O denote
exhibit desired properties. For a more complete review of the carbon and oxygen atoms, respectively; “ = ” denotes a
the above methods, Schwalbe-Koda and Gómez- double bond; the two “1” digits denote the opening and
Bombarelli[24] have provided a detailed overview of recent closing of the ring structure; and the parentheses denote
developments in inverse molecular design. the beginning and ending of the branching component.
In this paper, we introduce iQSPR-X, a flexible software According to Bayes’ theorem, a posterior distribution is
constructed to implement the Bayesian molecular design proportional to the product of a likelihood function and a
algorithm iQSPR, which was developed in our previous prior distribution:
work.[23] The algorithm was implemented in XenonPy, a
Python package with an integrated platform of materials PðSjY 2 UÞ / PðY 2 UjSÞPðSÞ
informatics.[25] In contrast to the original iQSPR algorithm
developed in R, the new version allows users to exploit where P(Y2U j S) represents the likelihood function that
various features of XenonPy as described below. The basic evaluates the goodness-of-fit of S with respect to the given
computational workflow consists of a two-step iteration: (1) property requirement Y2U, and P(S) represents the prior
current chemical structures are modified to new ones using probability that S belongs to a predefined search space of
a generator and (2) candidate molecules that show promise SMILES strings. Thus, P(S) will deliver a small or even zero
for desired properties are selected using an evaluator, probability when presented with an unfavorable or chemi-
which is a set of ML models for predicting material cally unrealistic structure, thereby acting as a filter for such
properties. The generator and the evaluator can be pre- out-of-scope or invalid structures. In iQSPR-X, a sequential
trained separately with given training instances. Users can Monte Carlo algorithm proposed by Ikebata et al.[23] is
either train new models from scratch or reuse relevant pre- implemented. This algorithm is somewhat similar to a
trained models from a model library in XenonPy, which genetic algorithm. With a given set of initial samples S0 =
covers a broad array of material properties for small {Si0 j i = 1,…,N} of size N, the pre-trained prior is used as a
molecules and polymers. In addition, when the available generator to propose a new set of samples S0’. A fitness
data on the structure-property relationship for a target task score is then assigned to each sample in S0’ using the
are limited, directly obtaining a reliable prediction model is likelihood, which is the evaluator in iQSPR-X. By resampling
difficult. However, an ML technique called transfer learning N samples from S0’ in proportion to the fitness scores, a
can be used to extract knowledge relevant to the target refined set S1 is obtained and once again modified by the
task from a large set of pre-trained models to help training generator. This cycle is repeated T times to obtain a final
new models more efficiently.[26] Successful application of sample set ST.
the iQSPR method, in conjunction with transfer learning to There are three important building blocks in this
overcome limited polymeric properties data, was demon- algorithm: the generator (prior), the evaluator (likelihood),
strated in our previous study, which achieved the discovery and the descriptor φ(S). When building models for the
of new polymers with high thermal conductivity.[27] A set of evaluator, we encode a chemical structure into a descriptor
tutorials distributed as Jupyter notebooks are available at vector φ(S) using, for example, a molecular fingerprinting
the website of XenonPy,[25] and these include detailed algorithm. Using training instances {(Yk, Sk) j k = 1,…,Ndata} on
© 2019 The Authors. Published by Wiley-VCH Verlag GmbH & Co. KGaA Mol. Inf. 2020, 39, 1900107 (2 of 9) 1900107
18681751, 2020, 1-2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/minf.201900107 by Kanazawa University, Wiley Online Library on [29/05/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Full Paper www.molinf.com
the structure-property relationships, we then derive a available function in iQSPR-X. This model consists of two
model that describes the materials properties Y as a components: (1) a table that records the probability of
function of the descriptor φ(S), defining Ŷ = μ(φ(S)) with the observing a subsequent character given a substring and (2)
trained model μ. Although iQSPR-X allows users to plug in a function that modifies a given SMILES string based on the
customized functions for each building block, we also stored n-gram probability table. The table can be trained by
provide some commonly used functions internally, and supplying a set of SMILES strings sampled from the desired
these can be directly called from the package. For the search space. The maximum length of a substring to be
descriptor, all available fingerprint types in RDKit[30] and the considered and stored in the table is controlled by the
Python descriptor package Mordred[31] are available by “order” parameter. In the extended n-gram model, SMILES
default. Users can alternatively use a set of features strings are internally tokenized into a list of characters. For
extracted from pre-trained neural networks in the XenonPy example, “=O” and “%10” are considered as one character,
model library, as described in the next section. For the and a terminal character is automatically added at the end
evaluator, a Gaussian likelihood is given as a choice with of each string. When proposing a new candidate molecule,
any user-defined model μi(φ(S)) and the standard deviation the modifier function deletes a random number of charac-
σi(S), which represents the uncertainty of predicted proper- ters from the end of the SMILES string, and then elongates
ties: the shortened string based on the n-gram table. Because
the representation of a molecule in SMILES is not unique, a
reordering of the SMILES string is probabilistically per-
formed to avoid constantly modifying the same part of the
chemical structure.
where U is the target region in the m dimensional space, In short, the most important parameters in modelling
and μi(φ(S)) and σi(φ(S)) are the mean and standard the generator include the probability required to trigger
deviation for the ith property, respectively, obtained from reordering, the range of the number of letters to be
ML models with input φ(S). For the generator, the extended deleted, and the order parameter controlling the maximum
n-gram model developed by Ikebata et al.[23] can be used by length of a substring in training and sampling the n-gram
training it with any chemical structuresQ given in SMILES. model. Users can adjust these parameters based on the
The model takes the form P(S) = P(s1) ni = 2 P(si j si-1,…,s1). expected molecule size in the targeted search space.
Figure 1 summarizes the computational workflow of iQSPR- Although SMILES is a powerful representation of chemical
X. structures, as exemplified by its ability to handle chirality
using the “@” symbol, the non-uniqueness of SMILES
representations may lead to subtle effects in certain usages.
2.2 Generator: Extended n-gram Model For example, the aromatic ring in phenol can be repre-
sented as “C1=CC=CC=C1” or “c1ccccc1.” We recommend
The role of the generator is to propose new candidate that users not mix different representations of the same
molecules modified from a set of initial molecules. We molecular structure when training the extended n-gram
implemented the extended n-gram model as an internally model.
Figure 1. Computational workflow in iQSPR-X with three main 2.4 Pre-trained Neural Descriptors in XenonPy
building blocks that users can flexibly construct: the generator, the
evaluator, and the converter that translates an input chemical One of the most distinctive features of our software is the
structure into a descriptor vector. availability in XenonPy of a comprehensive set of pre-
© 2019 The Authors. Published by Wiley-VCH Verlag GmbH & Co. KGaA Mol. Inf. 2020, 39, 1900107 (3 of 9) 1900107
18681751, 2020, 1-2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/minf.201900107 by Kanazawa University, Wiley Online Library on [29/05/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Full Paper www.molinf.com
trained neural features for use as the descriptor φ(S). The 3 Results and Discussion
sampling efficiency of iQSPR-X is highly influenced by the
reliability of the evaluator that predicts the material proper- 3.1 Data
ties for any given chemical structure. Building such models
from scratch is often time-consuming and requires a large We used data from PG to illustrate the use of iQSPR-X based
set of training data, which is not available in many on an example motivated from a previous study on polymer
applications. XenonPy currently provides 140,000 pre- design.[28] PG is an open database for polymeric properties
trained neural networks for the prediction of physical, that currently contains 854 polymers composed of nine
chemical, electronic, thermodynamic, and mechanical prop- types of atoms (H, C, O, N, S, F, Cl, Br, and I) with
erties of small organic molecules, polymers, and inorganic experimental data for three material properties (glass
crystalline materials, with models for 15, 18, and 12 proper- transition temperature, density, and solubility parameter)
ties of these material types, respectively. The models are and computational data from density functional theory
distributed as MXNet[32] (R) and/or PyTorch[33] (Python) (DFT) for four material properties (bandgap (Egap), refractive
model objects. The distributed API (application program- index, dielectric constant (ɛtot), and atomization energy).
ming interface) allows users to query the XenonPy.MDL Using a subset of the data (4-block polymers composed of
database. Users can directly use a retrieved model relevant CH2, NH, CO, C6H4, C4H2S, CS, and O), Mannodi-Kanakkithodi
to the target task, if available, or can re-train a pre-trained et al.[28] designed 6- to 12-block polymers with high ɛtot for
model on the target task using a transfer learning insulator applications using ML models and a genetic
technique as described below. Transfer learning has signifi- algorithm. They were specifically interested in polymers
cant potential to overcome the problem of limited materials with higher ɛtot and Egap, and this goal was adopted in our
property data, as demonstrated in our previous study,[26] for example. The given data of the chemical structures S and
various materials science tasks. Other studies have also their materials properties were used to train the generator
shown promising applications of transfer learning in and the evaluator. Here, we considered S to be the SMILES
materials informatics.[18,34–41] strings of the repeating polymer units. The connection
In this study, we applied a specific type of transfer points, i. e., the head and tail of a monomer, were denoted
learning using pre-trained neural networks. For a target as “*”.
property, a neural network pre-trained on proxy properties In PG, the lowest-energy crystal structures of the
is available in the library, where the source datasets are polymers were used for the DFT calculation. For each
sufficiently large. If the two properties are physically or polymer, Egap was computed using a hybrid Heyd-Scuseria-
chemically interrelated, the pre-trained models can be Ernzerhof (HSE06) electronic exchange-correlation func-
expected to autonomously acquire common features tional, and ɛtot, which is the sum of the electronic and ionic
relevant to the proxy properties. The features learned by dielectric constants, was computed using density functional
solving the related tasks are partially transferable to the perturbation theory (DFPT). Mannodi-Kanakkithodi et al.[28]
descriptor φ(S) in a model constructed for the target task. have detailed this computational procedure. As shown in
In general, earlier or shallower layers in a neural network Figure 2a, we observed an inverse relation between ɛtot and
tend to acquire general features to form the basis of the Egap. Polymers containing thiophene (C4H2S) tended to reach
material descriptions, and only the last one or two layers high ɛtot, but generally had low Egap. In contrast, polymers
identify specific features for the prediction of a source containing fluorine (F) atoms tended to reach high Egap, but
property. In iQSPR-X, we freeze the shallower layers for use generally had low ɛtot. However, in contrast to the enrich-
as a feature extractor. A subnetwork φ(S) of such a pre- ment offered by either C4H2S or F atoms, polymers
trained model can be reused in the supervised learning of exhibiting high ɛtot and high Egap tended to be composed of
the target property. To simplify the implementation of the CH2, NH, CO, C6H4 and O.[28] The design objective was to
repetitious tasks of neural descriptor extraction, XenonPy solve this nontrivial trade-off problem.
provides users with an internal function to extract values
from any hidden layer in a pre-trained neural network. With
its large library of pre-trained models and wide range of 3.2 Training the Generator
built-in descriptors, XenonPy provides a strong foundation
for flexibly arranging the necessary building blocks of the In this study, we considered two ways to train the extended
iQSPR algorithm. n-gram model. First, we used all 854 polymers in PG as a
training set, which covered a wide variety of polymers.
Second, we focused only on specific types of chemical
structures that shared some common features, taken from
other data sources. In practice, users may often be
interested in designing a specific class of molecules. Here,
we explored F-containing polymers with high ɛtot and Egap.
In particular, we focused on a training set containing the
© 2019 The Authors. Published by Wiley-VCH Verlag GmbH & Co. KGaA Mol. Inf. 2020, 39, 1900107 (4 of 9) 1900107
18681751, 2020, 1-2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/minf.201900107 by Kanazawa University, Wiley Online Library on [29/05/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Full Paper www.molinf.com
© 2019 The Authors. Published by Wiley-VCH Verlag GmbH & Co. KGaA Mol. Inf. 2020, 39, 1900107 (5 of 9) 1900107
18681751, 2020, 1-2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/minf.201900107 by Kanazawa University, Wiley Online Library on [29/05/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Full Paper www.molinf.com
constructed neural networks, each having six fully con- regressor under the default setting in scikit-learn. The mean
nected layers, were trained with either the ɛtot or Egap and standard deviation of the predicted values from the 10
datasets; the number of epochs was 2,000 and the dropout trained models were taken as μ and σ, respectively.
rate was 0.1. The dataset was randomly separated into For the random forest approach, the forestci package
training and validation sets at a ratio of 8 : 2. Figure 4 shows was used along with the random forest method in scikit-
learn to calculate μ and σ. The number of trees was set to
be 500 and the “max_feature” option was selected to be
“sqrt.”
For the Bayesian linear regression with neural descrip-
tors, we began by selecting pre-trained model from the
model library in XenonPy for each of the two target
properties. The 100 pre-trained neural networks of ɛtot and
Egap were modified such that the last hidden layers were
connected to Bayesian linear regressors, and the prediction
performances of the models were then evaluated by the
10-fold CV applied to the training data within each fold of
the 5-fold CV. Each of the models of ɛtot and Egap that
achieved the overall lowest MAE was selected, and their last
hidden layers were concatenated to form a new neural
descriptor. This descriptor was used to replace the originally
selected descriptor in the default Gaussian likelihood
function. Finally, this evaluator was trained with the full
training data within each fold of the five-fold CV.
Figure 5 shows the performance of each model on the
five-fold CV for the ɛtot and Egap datasets. The bagging
Figure 4. Box-plots of the MAEs across different fingerprint descrip- approach with the gradient boosting model achieved the
tors evaluated on the validation datasets of either ɛtot or Egap. APFP best overall performance and was therefore selected for the
denotes the atom pair fingerprints, ECFP denotes the non-feature- inverse design calculation.
based radius-3 Morgan fingerprints, FCFP denotes the feature-
based radius-3 Morgan fingerprints, TTFP denotes the topological
torsion fingerprints, RDKit denotes the basic fingerprints in RDKit,
and + M denotes the addition of the MACCS keys.
© 2019 The Authors. Published by Wiley-VCH Verlag GmbH & Co. KGaA Mol. Inf. 2020, 39, 1900107 (6 of 9) 1900107
18681751, 2020, 1-2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/minf.201900107 by Kanazawa University, Wiley Online Library on [29/05/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Full Paper www.molinf.com
© 2019 The Authors. Published by Wiley-VCH Verlag GmbH & Co. KGaA Mol. Inf. 2020, 39, 1900107 (7 of 9) 1900107
18681751, 2020, 1-2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/minf.201900107 by Kanazawa University, Wiley Online Library on [29/05/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Full Paper www.molinf.com
© 2019 The Authors. Published by Wiley-VCH Verlag GmbH & Co. KGaA Mol. Inf. 2020, 39, 1900107 (8 of 9) 1900107
18681751, 2020, 1-2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/minf.201900107 by Kanazawa University, Wiley Online Library on [29/05/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Full Paper www.molinf.com
[10] B. Sanchez-Lengeling, A. Aspuru-Guzik, Science 2018, 361, 360– [29] C. Kim, A. Chandrasekaran, T. D. Huan, D. Das, R. Ramprasad, J.
365. Phys. Chem. C 2018, 122, 17575–17585.
[11] D. Weininger, J. Chem. Inf. Comput. Sci. 1988, 28, 31–36. [30] G. Landrum, http://www.rdkit.org, last accessed on July 20,
[12] M. J. Kusner, B. Paige, J. M. Hernández-Lobato, PMLR 2017, 70, 2019.
1945–1954. [31] H. Moriwaki, Y.-S. Tian, N. Kawashita, T. Takagi, J. Cheminf.
[13] A. Kadurin, S. Nikolenko, K. Khrabrov, A. Aliper, A. Zhavor- 2018, 10, 4.
onkov, Mol. Pharmaceutics 2017, 14, 3098–3104. [32] T. Chen, M. Li, Y. Li, M. Lin, N. Wang, M. Wang, T. Xiao, B. Xu, C.
[14] J. Lim, S. Ryu, J. W. Kim, W. Y. Kim, J. Cheminf. 2018, 10, 31. Zhang, Z. Zhang, arXiv 2015, 1512.01274.
[15] R. Gómez-Bombarelli, J. N. Wei, D. Duvenaud, J. M. Hernández- [33] PyTorch, https://pytorch.org/, last accessed on July 20, 2019.
Lobato, B. Sánchez-Lengeling, D. Sheberla, J. Aguilera-Iparra- [34] M. L. Hutchinson, E. Antono, B. M. Gibbons, S. Paradiso, J. Ling,
guirre, T. D. Hirzel, R. P. Adams, A. Aspuru-Guzik, ACS Cent. Sci. B. Meredig, arXiv 2017, 1711.05099.
2018, 4, 268–276. [35] H. Oda, S. Kiyohara, K. Tsuda, T. Mizoguchi, J. Phys. Soc. Jpn.
[16] B. Sanchez-Lengeling, C. Outeiral, G. L. Guimaraes, A. Aspuru- 2017, 86, 123601.
Guzik, ChemRxiv 2017, 10.26434/chemrxiv. 5309668.v3. [36] R. Jalem, K. Kanamori, I. Takeuchi, M. Nakayama, H. Yamasaki,
[17] X. Yang, J. Zhang, K. Yoshizoe, K. Terayama, K. Tsuda, Sci. T. Saito, Sci. Rep. 2018, 8, 5845.
Technol. Adv. Mater. 2017, 18, 972–976. [37] T. Yonezu, T. Tamura, I. Takeuchi, M. Karasuyama, Phys. Rev.
[18] M. H. S. Segler, T. Kogej, C. Tyrchan, M. P. Waller, ACS Cent. Sci. Mater. 2018, 2, 113802.
2018, 4, 120–131. [38] B. Kailkhura, B. Gallagher, S. Kim, A. Hiszpanski, T. Y.-J. Han,
[19] W. Jin, R. Barzilay, T. Jaakkola, PMLR 2018, 80, 2323–2332. arXiv 2019, 1901.02717.
[20] H. Kajino, arXiv 2018, 1809.02745. [39] E. D. Cubuk, A. D. Sendek, E. J. Reed, J. Chem. Phys. 2019, 150,
[21] N. Brown, M. Fiscato, M. H. S. Segler, A. C. Vaucher, J. Chem. Inf. 214701.
Model. 2019, 59, 1096–1108. [40] X. Li, Y. Zhang, H. Zhao, C. Burkhart, L. C. Brinson, W. Chen, Sci.
[22] N. Yoshikawa, K. Terayama, M. Sumita, T. Homma, K. Oono, K. Rep. 2018, 8, 13461.
Tsuda, Chem. Lett. 2018, 47, 1431–1434. [41] M. Kaya, S. Hajimirza, Sci. Rep. 2019, 9, 5034.
[23] H. Ikebata, K. Hongo, T. Isomura, R. Maezono, R. Yoshida, J. [42] S. Kim, J. Chen, T. Cheng, A. Gindulyte, J. He, S. He, Q. Li, B. A.
Comput.-Aided Mol. Des. 2017, 31, 379–391. Shoemaker, P. A. Thiessen, B. Yu, L. Zaslavsky, J. Zhang, E. E.
[24] D. Schwalbe-Koda, R. Gómez-Bombarelli, arXiv 2019, Bolton, Nucleic Acids Res. 2019, 47, D1102–1109.
1907.01632. [43] S. Wager, T. Hastie, B. Efron, J. Mach. Learn. Res. 2014, 15,
[25] XenonPy, https://xenonpy.readthedocs.io/en/latest/, last accessed 1625–1651.
on July 20, 2019. [44] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion,
[26] H. Yamada, C. Liu, S. Wu, Y. Koyama, S. Ju, J. Shiomi, J. O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J.
Morikawa, R. Yoshida, ACS Cent. Sci. 2019, online pre-release. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E.
[27] S. Wu, Y. Kondo, M.-A. Kakimoto, B. Yan, H. Yamada, I. Duchesnay, J. Mach. Learn. Res. 2011, 12, 2825–2830.
Kuwajima, G. Lambard, K. Hongo, Y. Xu, J. Shiomi, C. Schick, J.
Morikawa, R. Yoshida, npj Comput. Mater. 2019, 5, 66. Received: August 16, 2019
[28] A. Mannodi-Kanakkithodi, G. Pilania, T. D. Huan, T. Lookman, R. Accepted: October 14, 2019
Ramprasad, Sci. Rep. 2016, 6, 20952. Published online on November 5, 2019
© 2019 The Authors. Published by Wiley-VCH Verlag GmbH & Co. KGaA Mol. Inf. 2020, 39, 1900107 (9 of 9) 1900107