Liu Et Al. - 2022 - Data-driven Multi-objective Molecular Design of Io
Liu Et Al. - 2022 - Data-driven Multi-objective Molecular Design of Io
h i g h l i g h t s g r a p h i c a l a b s t r a c t
a r t i c l e i n f o a b s t r a c t
Article history: Ionic liquids (ILs) are promising electrolytes or solvents for numerous applications owing to their unique
Received 8 March 2022 properties. However, it is a challenge to design the ideal IL with the required properties. Variational
Revised 7 June 2022 autoencoders (VAEs) trained by significantly large datasets have shown good performance in drug discov-
Accepted 22 June 2022
ery. However, low generation efficiency and small sparse datasets prevent their application on IL. In this
Available online 26 June 2022
work, we propose a high generation efficiency molecular design model for IL, which realizes multi-
objective optimization on a small dataset. The model combines VAE, multilayer perceptron, and particle
Keywords:
swarm optimization for property prediction and molecule optimization. The thermal conductivity and
Machine learning
Ionic liquid
heat capacity of the ILs are chosen as a case to verify the advantages of our model. The results shows that
Molecular design by setting molecular validity judgments to optimization target, 98% output of our method are valid mole-
Generative model cules. Besides, the heat capacity and thermal conductivity are improved by 39% and 15%, respectively.
Our model improves the applicability to small sparse datasets and the generation efficiency of VAE-
like generation model. By multi-objective design ILs for given properties, our model can provide guidance
for the design and application of ILs.
Ó 2022 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
1. Introduction and chemical stability [1], ILs are considered as great electrolytes
for lithium-ion battery [2,3]; and as new solvents for reactions,
Ionic liquids (ILs) are a series of salts that are liquid at or near catalysis, separation, and absorption refrigeration [4-6]. Because
room temperature, and are composed entirely of cations and of these benefits, ILs have also been proposed as optical responsive,
anions. Owing to their negligible vapor pressure and high thermal optical sensitive [7], energetic [8] and other functional materials
[9].
To screen out ILs with ideal properties for specific application,
⇑ Corresponding author.
previous studies have focused on measuring or molecular dynamic
E-mail address: [email protected] (M. He).
https://doi.org/10.1016/j.matdes.2022.110888
0264-1275/Ó 2022 The Authors. Published by Elsevier Ltd.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
X. Liu, J. Chu, Z. Zhang et al. Materials & Design 220 (2022) 110888
(MD) simulating the properties of ILs [10,11]. Artificial intelligence molecular structure. The aim of the VAE is to create a continuous
technology provided a new method to predict the properties of ILs, latent space; each coordinate in this space represents a molecule,
which is much faster than measuring or simulating [12,13]. Can and the distance between two coordinates represents the similar-
et al. predicted water solubility in ILs by association rule mining ity between two molecules. The MLP predicts the properties of the
and decision tree [14]. The input of this model is a series of ILs molecules and provides the gradient of properties in latent space.
descriptors, including molecular weight, energy value of highest/ The generating process is to search the coordinate with the best
lowest occupied molecular orbital, dipole, molecular area and oval- property in the latent space by gradient-based optimization, deter-
ity, polarizability, hydrogen bond donors/acceptors count, and mines whether a model can design ideal molecules on demand.
zero-point energy. Li et al. reported that the energies of the anions Conditional VAE (CVAE) is an improved VAE that adds property
reacting orbitals are linearly correlated to their SO2 binding ener- as a dimension to the latent space. CVAE is beneficial to multi-
gies [15]. Wang et al. provided an ILs molecular screening model objective molecular design compared with gradient-based opti-
based on active learning [16], which screens the ILs with desired mization on VAE [30-32]. The optimization process of the CVAE
property in a dataset. Because we cannot deduce the molecule requires input of the designed property value. CVAE is proposed
structure by properties, their methods cannot guide the molecular to avoid the problem of gradient-based optimization on the VAE:
design. National Institute of Standards and Technology (USA) adjusting one target property may cause an undesired change in
developed ILs properties database (ILThermo) [17], which covers other properties [32]. Joo et al. used CVAE to improve the quality
only 1300 ILs that are composed of 130 anions and 780 of latent space [30], which can directly add property to the input
cations. However, 1018 possible combinations of ions could lead of the encoder, and the input of the decoder also contains the
to potential ILs [18,19]. It is impossible to traverse all possible desired property value. Kang et al. demonstrated the applicability
structure by conducting experiment, molecular dynamic simula- of the CVAE for multi-objective optimization [31].
tion or ML prediction. Compared with above methods which can However, several issues should be addressed to apply VAE to
only screen ILs from dataset, the best approach to obtain ideal the IL design. Firstly, the dataset of ILs is sparse, and most of the
ILs is machine learning (ML) generative model: the ML generative ILs do not have a one-to-one correspondence with the property.
model can get rid of the limitation of datasets, and design ILs For example, in ILThermo, only 364 ILs have heat capacity data,
molecular structure automatically. and 81 ILs have thermal conductivity data. The commonly used
Recently, ML generative models have been widely reported and searching method for generating processes of VAE is gradient-
expanded from natural language, image [20,21] to molecular based optimization, which has several branches: Bayesian opti-
design [22-24], such as recurrent neural network (RNN) [25-27], mization [29], Gaussian process (GP) [22], and gradient descent
convolution neural network (CNN) [22], graph CNN (GCN) [28] or [33]. The MLP in these VAE models provide a gradient of property,
variational autoencoder (VAE). Thereinto, variations of VAE have which enables gradient descent in the latent space to determine
been widely reported for the design of drugs and high polymer the coordinate that has good properties. But gradient-based opti-
materials [22,29]. The VAE contains an encoder and a decoder. Dur- mization has its disadvantages. As a greedy algorithm, gradient-
ing generating molecules, VAE is usually used in conjunction with based optimization stops when it finds a local excellent and is hard
multilayer perceptron (MLP) and gradient-based optimization. The to find global optimal. Besides, in these models, VAE and MLP are
encoder encodes the molecular structure into coordinates in latent trained simultaneously with large dataset, which makes the MLP
space; meanwhile, the decoder decodes the coordinate into a force the distribution of coordinates following property values.
2
X. Liu, J. Chu, Z. Zhang et al. Materials & Design 220 (2022) 110888
The molecules with similar properties are nearby each other in the However, AAE cannot be easily trained, and does not always per-
latent space and facilitates the optimization process. If the dataset form better than VAE because the adversarial process sometimes
is sparse, the MLP cannot be trained with VAE together for leads to mode collapse [35].
gradient-based optimization. In this situation, the distribution of Thirdly, the dataset is too small. The dataset used in this work
property has no clear relationship with the coordinate in the latent are collected in ILThermo, which contains 1300 ILs. As a contrast,
space. It is difficult to obtain a molecule with optimal properties. there are 750 million molecules collected in ZINC [36], 0.13 million
The problem CVAE faces is even more serious: CVAE cannot be molecules in QM [37] and 1.96 million molecules in CHEMBL [38].
trained with a sparse dataset and can easily generate unavailable Above datasets are most popular dataset used in molecular design
molecules when inputting a significantly better property than the [39]. The generation efficiency can be defined as the rate of valid
existing molecules. molecules in the output results of the model. Skinnider et al.
Secondly, VAE and CVAE are low-efficiency, which is not a dis- pointed out that for a drug generation model, the dataset size
advantage unique when designing ILs. Trained by a dataset with has a significant influence on the generation efficiency. When ran-
less than 2000 molecules, the available molecules generated by dom sampling in latent space, a dataset with 1000 molecules
CVAE only account for less than 1%: only 32 567 molecules are only results in 6.7% generation efficiency; however, a 25 000 mole-
valid aspirin-like drugs in 2 884 000 attempt output of the CVAE cules dataset [40] results in 69.1%. Thus, the previous studies
model by Lim et al. [32]. For the gradient descent method that always used > 100 000 data. Moret et al. proposed data augmenta-
works on VAE, usually less than 5% of the decode results are avail- tion, sampling temperature, and transfer learning to generate
able molecules [34]. To overcome this defect, Blaschke et al. drug-like molecules in low data regimes [41]. However, our model
improved the VAE to an adversarial autoencoder (AAE) by an addi- results that the data augmentation and sampling temperature
tional discriminator neuron network, which can force the output of technology have no positive effect, and there is no extra IL-like
the encoder to follow a specific target distribution [23,29]. Kadurin dataset for transfer learning.
et al. improved the structure and training process of AAE to reduce In this work, we developed a VAE-MLP-particle swarm opti-
reconstruction errors [35]. AAE makes the arrangement of mole- mization (PSO) [42] model to expand the application of VAE for
cules in the latent space more compact, which makes it more pos- ILs. Compared to previous molecular design models, our model
sible to decode an available molecule from a random point. overcomes the disadvantages of previous models: enables multi-
3
X. Liu, J. Chu, Z. Zhang et al. Materials & Design 220 (2022) 110888
objective optimization, has high molecule generation efficiency, 2. Machine learning method
and most importantly, can be trained on a small sparse dataset.
Compared with previous IL design methods, our model saves a 2.1. The molecule structure representation
lot of time required for trial and error, and helps to explore new
IL structure. The combination of SMILES [48] (simplified molecular-input
Generally, for certain properties to be optimized, the first step is line-entry system) and one-hot shows wide applicability in chem-
to train the VAE with all ILs. Then train the MLP with the encoded ical reaction prediction and chemical design [22,41,49]. We use
latent coordinate and properties. Finally use PSO to optimize the SMILES to transform molecules into a string, and then employ a
latent coordinate, and decode the optimized coordinate to get one-hot encoder to transform the string into a matrix, because
the new ILs. CNN is suitable for matrix processing. We added a decoration letter
The task ‘designing IL which is used as heat-transfer fluid [43] ‘A’ and ‘Z’ to express the start and end of a SMILES string. Consider
and solvent [44]’ is chose as a case study to evaluate the model per- 1-butyl-3-methylimidazolium (BMIM) as an example, the transfor-
formance, the target of this task is to find IL with high thermal con- mation process is shown in Fig. 1.
ductivity and high heat capacity at 303 K and 363 K. These two
thermophysic properties also are significant and provide aid for 2.2. VAE model
nearly all applications of ILs, especially microemulsions, CO2 cap-
ture, electrolytes in rechargeable batteries [45-47]. It needs to The structure of the VAE used [50] is shown in Fig. 2. The input
mention that our model is not limited to thermal conductivity of the encoder is a one-hot matrix of the SMILES string as pre-
and heat capacity, it also can be applied to optimize other proper- sented in section 2.1. The output of the encoder is divided into
ties like adequate extractive properties and thermal stability for the mean value m and variance value r of a Gaussian distribution.
IL’s application in extraction and absorption as solvent. In addition, the input of the decoder is a reparametrized value z
Fig. 4. Diagram of VAE-MLP model (a) Training process (b) Optimization process.
4
X. Liu, J. Chu, Z. Zhang et al. Materials & Design 220 (2022) 110888
obtained with a random noise e. The reparameterization process is historical optimal position of the particles and the historical opti-
shown as follow: mal position of the population. The optimal solution searched by
each particle individually is called the individual extreme value,
z ¼ m þ expðrÞe ð1Þ
and the best individual extreme value is regarded as the current
After the encoding process, a molecule is represented by a point global optimal solution. For a d dimension latent space with n par-
whose coordinate is m in the latent space. The latent space can be ticles, the coordinate of the ith particle is Xi(t) = (xi1(t), xi2(t), . . ., xid(-
described by a Gaussian mixture model, all the points in this space t)) at the tth iteration, in which xik(t) is the coordinate of the ith
can be decoded to a SMILES-like string. particle in the kth dimension at the tth iteration. The speed of the
The loss function is divided into two parts: reconstruction loss ith particle is Vi(t) = (vi1(t), vi2(t), . . ., vid(t)) at the tth iteration, where
and Kullback-Leibler divergence (KLD) loss. The reconstruction loss vik(t) is the speed of the ith particle in the kth dimension at the tth
measures the difference between the generated molecule and iteration. The coordinate with the best result obtained by the ith
input molecule, which is calculated by the mean square error particle after the tth iteration is Pi(t) = (pi1(t), pi2(t), . . ., pid(t)). The
(MSE), which is provided by Eq. (2): coordinate with the best result obtained by all the particles after
the tth iteration is Pg(t) = (pg1(t), pg2(t), . . ., pgd(t)). For the (t + 1)th
1 XN X M
iteration, the speed and coordinate of the ith particle in the dth
MSE ¼ ðy pic Þ2 ð2Þ
NM i¼1 c¼1 ic dimension vid(t + 1) and xid(t + 1) are calculated by.
where N is the length of a SMILES string, M is the number of char- v id ðt þ 1Þ ¼ w v id ðtÞ þ c1 randð0; 1Þ ðpid ðtÞ xid ðtÞÞ þ c2
acter categories, yic (0 or 1) is the real probability of a one-hot
randð0; 1Þ ðpgd ðtÞ xid ðtÞÞ ð5Þ
matrix, and pic is the calculated probability. KLD loss as shown in
Eq. (3) measures the difference between the coordinate distribution
and the unit Gaussian distribution. xid ðt þ 1Þ ¼ xid ðtÞ þ v id ðt þ 1Þ ð6Þ
X
KLD ¼ 0:5 1 þ ri m2i eri ð3Þ where rand(0, 1) indicates a random number between zero and one.
c1 and c2 are acceleration factors that indicate the level of confi-
Finally, the loss function is the sum of the KLD loss and recon- dence the current particle has in itself (c1) and in the swarm (c2),
struction loss: respectively.
As shown in Fig. 3, the optimization target of PSO in this work is
Loss ¼ MSE þ KLD ð4Þ
to maximize the total score, which consists of two parts: property
score and normative score. The property score, which is an accu-
2.3. Particle swarm optimization (PSO) mulation of weighted properties, is a comprehensive measure of
IL performance. To calculate the property score, we set i to
Each particle in the PSO can be regarded as an individual search 0.00004 and j to 0.18 in this work as an example. Because at
in the latent space. The current coordinate of the particle is a can- 303 K and 363 K, the heat capacity and thermal conductivity of
didate solution to the corresponding optimization problem, and ILs in database are almost impossible to over 1250 Jmol1K-
the movement of the particle is the search process. The speed of 1
and 0.28 Wm1K1 respectively, the properties score is in range
particle movement can be dynamically adjusted according to the from 0 to 0.1. The normative score ensures that the string is an IL.
Fig. 5. Molecular structure evolution process of [BMIM][Tf2N]. Green, red and yellow represent high, middle and low similarity, respectively. (For interpretation of the
references to color in this figure legend, the reader is referred to the web version of this article.)
5
X. Liu, J. Chu, Z. Zhang et al. Materials & Design 220 (2022) 110888
Table 1
Average absolute relative deviation (AARD) (%) of various prediction methods.
3. Results and discussion In a good latent space, the distance between two points should
roughly show IL similarity. Assuming [BMIM][Tf2N] (1-butyl-3-
The training data used for model building are available from methylimidazolium bis(trifluoromethanesulfonyl)imide) as an
Ionic Liquids Database - ILthermo (https://ilthermo.boulder.nist.- example, as shown in Fig. 5, as the distance increases, the gener-
6
X. Liu, J. Chu, Z. Zhang et al. Materials & Design 220 (2022) 110888
Fig. 7. Distribution of molecules in dataset in latent space and their property. The color of points indicates the properties’ value.
ated IL becomes more different from [BMIM][Tf2N]. Moreover, the experimental result and ordinate is predicted result. The closer a
anion does not change when the distance is short (less than 0.5, as data point is to the diagonal, the more accurate it is.
shown in the first column of Fig. 5). This is because the original
anions distribute more densely.
3.3. Molecular generation ability of VAE-MLP-PSO framework
3.2. Property prediction ability of MLP Because of the low reconstruction accuracy, only 4 decoded
strings from 5000 points selected by the Gaussian distribution in
Table 1 compares the prediction accuracies of different MLPs. the latent space are valid ILs. This is because the SMILES string is
MLP is trained independently after the training of VAE: the inputs not robust, changing a character in a SMILES is likely to invalidate
are the latent space coordinate and temperature; the latent space it. In addition, because MLP is trained independently after the
dimension is 64. MLP is trained together with VAE: the inputs training of VAE, the distribution of molecules in the latent space
are the latent space coordinate and temperature; the latent space is not related to the property as shown in Fig. 7. By contrary, if
dimension is 256. The latent dim is chosen by the balance of pre- the VAE and MLP are trained together, the colors are generally dis-
diction accuracy and molecular representation ability. MLP (one- tributed in order. Considering heat capacity as an example, we can
hot): the inputs are the one-hot code of IL and temperature. The observe a color gradient from the top left (blue, low heat capacity)
convolution layers convolute the one-hot matrix from to the bottom right (purple, high heat capacity). However, if the
1 124 32 to 128 20 5, and a fully connected layer with VAE and MLP are trained independently, the color (property) distri-
2048 neurons outputs the prediction result. The MLP trained inde- bution becomes completely chaotic.
pendently after the training of VAE has better prediction accuracy The above problems are solved by using the VAE-MLP-PSO
than the MLP trained together with VAE. The reason is that there is framework. We tested the molecular generation ability of our
a mutual suppression between the loss function when MLP and model and several previous models (all these models use SMILES
VAE are trained together, which makes the reconstruction accuracy as a molecular descriptor) by three indexes in GuacaMol [53]:
to be significantly low (compared with the 58.7% reconstruction validity (whether the generated molecules are actually valid),
accuracy of the VAE which is trained independently, the VAE uniqueness (whether the generated molecules are unique from
trained together with MLP has only 22.1% reconstruction accu- others), and novelty (whether the generated molecules exist in
racy). For the comparison between MLP (independent) and MLP the dataset). The validity of the VAE–Gaussian process (GP) model,
(one-hot), we consider the encoder of VAE as a pre-training process CVAE, and AAE are all 0, which indicates that there are no valid
to improve the prediction performance. Specifically, the encoder molecules generated by these models. Because these models have
provides features extracted based on unsupervised learning, which no validity, they cannot participate in the comparison of unique-
is better than allowing the MLP to obtain the features by itself. ness and novelty. Our model achieves a excellent validity of 98%.
Fig. 6 compares the predicted results with experimental data of In addition, our model has a uniqueness up to 80.3% and a novelty
heat capacity and thermal conductivity. In Fig. 6, the abscissa is up to 99.4%. The comparison results show that the VAE-MLP-PSO
7
X. Liu, J. Chu, Z. Zhang et al. Materials & Design 220 (2022) 110888
4. Conclusion
8
X. Liu, J. Chu, Z. Zhang et al. Materials & Design 220 (2022) 110888
6% in 363 K, respectively. The maximum heat capacity and thermal acid and water based on quantum chemical calculation and molecular
dynamics simulation, J. Mol. Liq. 332 (2021) 10.
conductivity of the ILs are increased by 39% and 4% in 303 K, 27%
[11] X. Liu, P. Pan, F. Yang, M. He, Solubilities and diffusivities of R227ea, R236fa
and 13% in 363 K, respectively. The average optimization target and R245fa in 1-hexyl-3-methylimidazolium bis(trifluoromethylsulfonyl)
is increased by 25% in 303 K and 18% in 363 K, respectively. The imide, J. Chem. Thermodyn. 123 (2018) 158–164.
maximum optimization target is increased by 14% in 303 K and [12] L. Jiang, K.e. Mei, K. Chen, R. Dao, H. Li, C. Wang, Design and prediction for
highly efficient SO2 capture from flue gas by imidazolium ionic liquids, Green
21% in 363 K, respectively. Compared with the gradient-based opti- Energy Environ. 7 (1) (2022) 130–136.
mization VAE, CVAE, and AAE models, our model can work on small [13] A. Silva-Beard, A. Flores-Tlacuahuac, M. Rivera-Toledo, Optimal computer-
sparse datasets and complex multi-objective optimization with aided molecular design of ionic liquid mixtures for post-combustion carbon
dioxide capture, Comput. Chem. Eng. 157 (2022) 107622.
high generation efficiency. Predictably, our model can be extended [14] E. Can, A. Jalal, I.G. Zirhlioglu, A. Uzun, R. Yildirim, Predicting water solubility
to design ILs with idea properties required in a variety of fields like in ionic liquids using machine learning towards design of hydro-philic/phobic
extraction, absorption, battery. ionic liquids, J. Mol. Liq. 332 (2021) 13.
[15] C. Li, D. Lu, C. Wu, A theoretical study on screening ionic liquids for SO2
capture under low SO2 partial pressure and high temperature, J. Ind. Eng.
Supporting Information Chem. 98 (2021) 161–167.
[16] W. Wang, T. Yang, W.H. Harris, R. Gómez-Bombarelli, Active learning
and neural network potentials accelerate molecular screening of ether-
The training data used for model building are available from based solvate ionic liquids, Chem. Commun. (Camb.) 56 (63) (2020)
Ionic Liquids Database - ILthermo (https://ilthermo.boulder.nist.- 8920–8923.
gov), and the code in https://github.com/hicsail/materials helps [17] Q. Dong, C.D. Muzny, A. Kazakov, V. Diky, J.W. Magee, J.A. Widegren, R.D.
Chirico, K.N. Marsh, M. Frenkel, ILThermo: A free-access web database for
batch download data. The VAE-MLP-PSO framework along with
thermodynamic properties of ionic liquids, J. Chem. Eng. Data 52 (4) (2007)
data analysis tools is available as a GitHub repository at https://g 1151–1159.
ithub.com/CHUJianchun/VAE_MLP_PSO. [18] A.R. Katritzky, R. Jain, A. Lomaka, R. Petrukhin, M. Karelson, A.E. Visser, R.D.
Rogers, Correlation of the melting points of potential ionic liquids
(imidazolium bromides and benzimidazolium bromides) using the CODESSA
CRediT authorship contribution statement program, J. Chem. Inf. Comput. Sci. 42 (2) (2002) 225–231.
[19] D.M. Makarov, Y.A. Fadeeva, L.E. Shmukler, I.V. Tetko, Beware of proper
validation of models for ionic Liquids!, J Mol. Liq. 344 (2021) 117722.
Xiangyang Liu: Conceptualization, Writing – original draft, [20] L. Peng, Y. Yang, Z. Wang, Z. Huang, H.T. Shen, MRA-net: improving VQA via
Writing – review & editing, Project administration. Jianchun multi-modal relation attention network, IEEE Trans. Pattern Anal. Mach. Intell.
44 (1) (2022) 318–329.
Chu: Methodology, Software, Validation, Formal analysis, Visual-
[21] Z.-J. Zha, D. Liu, H. Zhang, Y. Zhang, F. Wu, Context-aware visual policy
ization. Ziwen Zhang: Investigation, Resources, Data curation. network for fine-grained image captioning, IEEE Trans. Pattern Anal. Mach.
Maogang He: Supervision, Funding acquisition. Intell. 44 (2) (2022) 710–722.
[22] R. Gómez-Bombarelli, J.N. Wei, D. Duvenaud, J.M. Hernández-Lobato, B.
Sánchez-Lengeling, D. Sheberla, J. Aguilera-Iparraguirre, T.D. Hirzel, R.P.
Declaration of Competing Interest Adams, A. Aspuru-Guzik, Automatic chemical design using a data-driven
continuous representation of molecules, ACS Cent. Sci.. 4 (2) (2018)
268–276.
The authors declare that they have no known competing finan- [23] A.A. Artur Kadurin, A. Kazennov, P. Mamoshina, Q. Vanhaelen, K. Khrabrov, A.
cial interests or personal relationships that could have appeared Zhavoronkov, The cornucopia of meaningful leads: Applying deep adversarial
autoencoders for new molecule development in oncology, Oncotarget 8 (2017)
to influence the work reported in this paper.
10883–10890.
[24] S.Y. Lee, S. Byeon, H.S. Kim, H. Jin, S. Lee, Deep learning-based phase prediction
of high-entropy alloys: Optimization, generation, and explanation, Mater. Des.
Acknowledgement 197 (2021) 109260.
[25] N. Jaques, S. Gu, D. Bahdanau, J. Hernández-Lobato, R.E. Turner, D. Eck,
This study was supported by the National Natural Science Foun- Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models
with KL-control, 2016.
dation of China (Nos. 51976167, 51936009, and 51721004). We
[26] M. Popova, O. Isayev, A. Tropsha, Deep reinforcement learning for de novo drug
would like to thank Editage (www.editage.cn) for English language design, Sci. Adv. 4(7) (2018) eaap7885-eaap7885.
editing. [27] M.H.S. Segler, T. Kogej, C. Tyrchan, M.P. Waller, Generating focused molecule
libraries for drug discovery with recurrent neural networks, ACS Cent. Sci. 4 (1)
(2018) 120–131.
References [28] M. Simonovsky, N. Komodakis, GraphVAE: Towards Generation of Small
Graphs Using Variational Autoencoders, in: V. Kůrková, Y. Manolopoulos, B.
Hammer, L. Iliadis, I. Maglogiannis (Eds.), Artificial Neural Networks and
[1] M. Callsen, K. Sodeyama, Z. Futera, Y. Tateyama, I. Hamada, The solvation
Machine Learning – ICANN 2018, Springer International Publishing, Cham,
structure of lithium ions in an ether based electrolyte solution from first-
2018, pp. 412–422.
principles molecular dynamics, J. Phys. Chem. B 121 (1) (2017) 180–188.
[29] T. Blaschke, M. Olivecrona, O. Engkvist, J. Bajorath, H. Chen, Application of
[2] P. Shi, D. Wang, T. Yu, R. Xing, Z. Wu, S. Yan, L. Wei, Y. Chen, H. Ren, C. Yu, F. Li,
generative autoencoder in de novo molecular design, Mol. Inform. 37 (1-2)
Solid-state electrolyte gated synaptic transistor based on SrFeO2.5 film
(2018) 1700123, https://doi.org/10.1002/minf.201700123.
channel, Mater. Des. 210 (2021) 110022.
[30] S. Joo, M.S. Kim, J. Yang, J. Park, Generative model for proposing drug
[3] G. Yang, Y. Song, Q. Wang, L. Zhang, L. Deng, Review of ionic liquids containing,
candidates satisfying anticancer properties using a conditional variational
polymer/inorganic hybrid electrolytes for lithium metal batteries, Mater. Des.
autoencoder, ACS Omega 5 (30) (2020) 18642–18650.
190 (2020) 108563.
[31] S. Kang, K. Cho, Conditional molecular design with deep generative models, J.
[4] T. Itoh, Ionic liquids as tool to improve enzymatic organic synthesis, Chem.
Chem. Inf. Model. 59 (1) (2019) 43–52.
Rev. 117 (15) (2017) 10567–10607.
[32] J. Lim, S. Ryu, J.W. Kim, W.Y. Kim, Molecular generative model based on
[5] Y. Sang, J. Huang, Benzimidazole-based hyper-cross-linked poly(ionic liquid)s
conditional variational autoencoder for de novo molecular design, J.
for efficient CO2 capture and conversion, Chem. Eng. J. 385 (2020) 123973,
Cheminform. 10 (1) (2018) 31.
https://doi.org/10.1016/j.cej.2019.123973.
[33] Z. Yao, B. Sánchez-Lengeling, N.S. Bobbitt, B.J. Bucior, S.G.H. Kumar, S.P. Collins,
[6] M.L. Ding, H.L. Jiang, Incorporation of imidazolium-based poly(ionic iiquid)s
T. Burns, T.K. Woo, O.K. Farha, R.Q. Snurr, A. Aspuru-Guzik, Inverse design of
into a metal-organic framework for CO2 capture and conversion, ACS Catal. 8
nanoporous crystalline reticular materials with deep generative models,
(4) (2018) 3194–3201.
Nature Machine Intelligence 3 (1) (2021) 76–86.
[7] S.-Y. Fan, Y.-N. Hao, W.-X. Zhang, A. Kapasi, Y. Shu, J.-H. Wang, W. Chen, Poly
[34] H. Dai, Y. Tian, B. Dai, S. Skiena, L. Song, Syntax-Directed Variational
(ionic liquid)-gated CuCo2S4 for pH-/thermo-triggered drug release and
Autoencoder for Structured Data, ArXiv abs/1802.08786 (2018).
photoacoustic imaging, ACS Appl. Mater. Interfaces 12 (8) (2020) 9000–9007.
[35] A. Kadurin, S. Nikolenko, K. Khrabrov, A. Aliper, A. Zhavoronkov, druGAN: An
[8] C. Park, M. Han, J. Kim, W. Lee, E. Kim, Effect of ionic composition on thermal
advanced generative adversarial autoencoder model for de novo generation of
properties of energetic ionic liquids, npj Comput. Mater. 3 (6) (2019) 41–50.
new molecules with desired molecular properties in silico, Mol. Pharm. 14 (9)
[9] H. Ruan, Q. Zhang, W. Liao, Y. Li, X. Huang, X. Xu, S. Lu, Enhancing tribological,
(2017) 3098–3104.
mechanical, and thermal properties of polyimide composites by the synergistic
[36] J.J. Irwin, B.K. Shoichet, ZINC A free database of commercially available
effect between graphene and ionic liquid, Mater. Des. 189 (2020) 108527.
compounds for virtual screening, J. Chem. Inf. Model. 45 (1) (2005)
[10] Y.Y. Shen, Z.R. Chen, H.Q. Qi, Z.Y. Ma, Y.S. Dai, Q. Zhao, Z.Y. Zhu, Y.X. Ma, Y.L.
177–182.
Wang, Mechanism analysis of extractive distillation for separation of acetic
9
X. Liu, J. Chu, Z. Zhang et al. Materials & Design 220 (2022) 110888
[37] J. Hoja, L. Medrano Sandonas, B.G. Ernst, A. Vazquez-Mayagoitia, R.A. DiStasio [45] M. Hejazifar, O. Lanaridi, K. Bica-Schröder, Ionic liquid based microemulsions:
Jr, A. Tkatchenko, QM7-X, a comprehensive dataset of quantum-mechanical A review, J. Mol. Liq. 303 (2020) 112264, https://doi.org/10.1016/
properties spanning the chemical space of small organic molecules, Sci. Data 8 j.molliq.2019.112264.
(1) (2021) 43. [46] S. Lian, C. Song, Q. Liu, E. Duan, H. Ren, Y. Kitamura, Recent advances in ionic
[38] D. Mendez, A. Gaulton, A.P. Bento, J. Chambers, M. De Veij, E. Félix, M.P. liquids-based hybrid processes for CO2 capture and utilization, J. Environ. Sci.
Magariños, J.F. Mosquera, P. Mutowo, M. Nowotka, M. Gordillo-Marañón, F. 99 (2021) 281–295.
Hunter, L. Junco, G. Mugumbate, M. Rodriguez-Lopez, F. Atkinson, N. Bosc, C.J. [47] W. Zhou, M. Zhang, X. Kong, W. Huang, Q. Zhang, Recent advance in ionic-
Radoux, A. Segura-Cabrera, A. Hersey, A.R. Leach, ChEMBL: towards direct liquid-based electrolytes for rechargeable metal-ion batteries, Adv. Sci. 8 (13)
deposition of bioassay data, Nucleic Acids Res. 47 (D1) (2018) D930–D940. (2021) 2004490, https://doi.org/10.1002/advs.v8.1310.1002/advs.202004490.
[39] D.C. Elton, Z. Boukouvalas, M.D. Fuge, P.W. Chung, Deep learning for molecular [48] W. David, SMILES: A chemical language and information system, J. Chem. Inf.
design—a review of the state of the art, Mol. Syst. Des. Eng. 4 (4) (2019) 828– Comput. Sci. 28 (1) (1988) 31–36.
849. [49] H. Altae-Tran, B. Ramsundar, A.S. Pappu, V. Pande, Low data drug discovery
[40] M.A. Skinnider, R.G. Stacey, D.S. Wishart, L.J. Foster, Chemical language models with one-shot learning, ACS Cent. Sci. 3 (4) (2017) 283–293.
enable navigation in sparsely populated chemical space, Nature Machine [50] D.P. Kingma, M. Welling, Auto-Encoding Variational Bayes, CoRR abs/
Intelligence (2021). 1312.6114 (2014).
[41] M. Moret, L. Friedrich, F. Grisoni, D. Merk, G. Schneider, Generative molecular [51] G. Landrum, B. Kelley, P. Tosco, sriniker, gedeck, NadineSchneider, R. Vianello,
design in low data regimes, Nature Machine Intelligence 2 (3) (2020) A. Dalke, AlexanderSavelyev, S. Turk, rdkit/rdkit: 2018_03_4 (Q1 2018)
171–180. Release, (2018).
[42] G. Venter, J. Sobieszczanski-Sobieski, Particle Swarm Optimization, AIAA J. 41 [52] T. Wang, X. Liu, J. Chu, Y. Shi, J. Li, M. He, Molecular dynamics simulation of
(8) (2003) 1583–1589. diffusion and interaction of [bmim][Tf2N] + HFO-1234yf mixture, J. Mol. Liq.
[43] K.-H. Liang, Z.-W. Lu, C.-X. Ren, J. Wei, D.-W. Fang, Feasibility of 1-ethyl-4- 312 (2020) 113390.
butyl-1,2,4-triazolium acetyl amino acid ionic liquids as sustainable heat- [53] N. Brown, M. Fiscato, M.H.S. Segler, A.C. Vaucher, GuacaMol: benchmarking
transfer fluids, ACS Sustain. Chem. Eng. 10 (11) (2022) 3417–3429. models for de novo molecular design, J. Chem. Inf. Model. 59 (3) (2019) 1096–
[44] C. Zhu, M. He, X. Liu, G.M. Kontogeorgis, X. Liang, Quantification of dipolar 1108.
contribution and modeling of green polar fluids with the polar cubic-plus- [54] S.A. Shojaee, S. Farzam, A.Z. Hezave, M. Lashkarbolooki, S. Ayatollahi, A new
association equation of state, ACS Sustain. Chem. Eng. 9 (22) (2021) 7602– correlation for estimating thermal conductivity of pure ionic liquids, Fluid
7619. Phase Equilib. 354 (2013) 199–206.
10