Transformer-generated atomic embeddings
Transformer-generated atomic embeddings
1038/s41467-025-56481-x
The development of deep learning (DL) and machine learning (ML) has (iCGCNN)12, OrbNet13, and so on14–24. They have achieved success in
created research methods for kinds of research fields1–4. In materials kinds of applications, such as learning properties from multi-fidelity
science, this development is leading to discoveries of the material data25, discovering stable lead-free hybrid organic-inorganic
properties, which may be a challenging task for traditional methods5–8. perovskites26, mapping the crystal-structure phase27, designing mate-
Many DL algorithms and models have been proposed, such as the rial microstructures28, and etc.
Crystal Graph Convolutional Neural Network (CGCNN)9, MatErials In the solid-state theory, the features and spatially topological
Graph Network (MEGNET)10, Atomistic Line Graph Neural Network arrangements of the constituent atoms in crystals or other condensed
(ALIGNN)11, improved Crystal Graph Convolutional Neural Networks systems determine their properties, which are intricately encapsulated
1
School of Information Science and Technology, Fudan University, Shanghai, China. 2Department of Physics, Fudan University, Shanghai, China. 3School of
Science, Shandong Jianzhu University, Jinan, Shandong, China. 4Department of Materials, Fudan University, Shanghai, China. 5Department of Optical Science
and Engineering and Key Laboratory of Micro and Nano Photonic Structures (Ministry of Education), Fudan University, Shanghai, China. 6State Key Laboratory
of Photovoltaic Science and Technology, Fudan University, Shanghai, China. 7These authors contributed equally: Luozhijie Jin, Zijian Du.
e-mail: [email protected]; [email protected]
(a) (b) Deep learning methods (I, II) (c) Mainstream method (III)
Large Database CIF Files
Atomic Embedding Gathering various
properties of elements
Front - end Model
One-Hot Input
Input
Matrix Back - end Models based on
atomic number
1 2
3 4
I. Trained by CrystalTransformer
Property Prediction
0 1 0 0 0 0 0 ⋯ 0 0 0 0
0 0 0 0 0 0 0 ⋯ 0 0 0 0
1 1 0 0 0 0 0 ⋯ 0 0 0 0
Predicted Crystal Properties 0 0 1 0 0 0 0 ⋯ 0 0 0 0
(formation energy, bandgap, etc.)
II. Trained by GNN models III. Artificially 0-1 vectorized
Fig. 1 | Workflow of model with front- and back-end parts to predict properties generate atomic embeddings. Method I uses Crystaltransformer to produce uni-
and different working principle of atomic embeddings. aThe workflow of versial atomic embeddings (UAE), while Method II uses the traditional GNN
front-end and back-end model using atomic embeddings. Atomic Embedding is model to produce ordinary atomic embedding. c The process and principles of
derived from the front-end models while graph neural networks (GNN) serve as Method III, which artificially constructs atomic embeddings using query databases
back-end models trained for different properties. b The process and principles or mapping known atomic properties to a 0–1 vector or one-hot vector in
of Method (I, II) which use deep learning to train on large database and most cases.
into the entity of “atomic embedding”29,30 in the DL algorithms. Spe- transmission and aggregation, node feature updating, etc., are con-
cifically, the atomic embedding is the process of inputting the prop- ducted to predict the crystal properties. In this context, these GNNs
erties of atoms into crystal model digitally, and this idea is originated are denoted as back-end models, while the methods for obtaining
from the natural language processing technique, in which the word atomic embeddings are denoted as front-end models. Essentially, the
embeddings transform the way textual data is represented31–33. An parameter of atomic embeddings can be transferred using pretrained
appropriate atomic embedding can accelerate the training of model, parameters or constructed based on predefined properties, which is
improve the accuracy of prediction, and some explainable information realized in the front-end model of methods (I, II, III), as shown in
can be derived from it34–38. Currently, most attention in the field of Fig. 1b, c.
materials informatics has been focused on the designing of crystal As shown in Fig. 1a, for the front-end model, we used our pro-
model architecture, for improving the accuracy of property predic- posed CrystalTransformer to generate atomic embeddings (Method I).
tion, while the studies on the atomic embedding are rare. Typically, it is Other pretrained atomic embeddings used GNN models (Method II
common to simply adopt 0–1 embedding as the atomic embedding shown in Fig. 1b). While some used artificially constructed features
algorithm9,10, which generally generates a sparse embedding matrix based on known atomic properties like the autoencoder-based
not conducive to the information extraction of models. approach43(Method III shown in Fig. 1c). The CrystalTransformer
In recent years, a large number of Transformer-based training model learns atomic embeddings directly from chemical information
methods39 and predictive models, such as OrbNet40, 3D-Transformer41, in crystal databases. Compared to Method III, which generates atomic
and so on, have been developed in the field of chemical molecular embeddings by processing on a predefined set of atomic properties,
property and structure prediction, which are believed to be able to our proposed ct-UAE can adapt to any desired material property
fully leverage the advantages of the Transformer architecture in pro- without relying on predefined atomic attributes.
cessing atomic interactions and capturing the three-dimensional To examine the atomic embeddings tensors obtained from dif-
structures, enabling efficient representation of the complex interac- ferent models, we used MP and MP* dataset for formation energy (Ef)
tions between atoms. Motivated by these advancements, we devel- and PBE bandgap (Eg), which are key properties for evaluating their
oped the home-made CrystalTransformer model to generate universal chemical stabilities and electronic performances. MP stands for the
atomic embeddings called ct-UAEs based on transformer architecture, 2018.6.1 version for MP44 dataset, which contains 69,239 materials
which learns a unique “fingerprint” for each atom, capturing the with properties. MP* denotes the 2023.6.23 version, which contains
essence of their roles and interactions within the materials. The 134,243 materials. For training, validation, and testing splits, we fol-
obtained embeddings are then transferred to different DL models. lowed the distribution of 60,000 (training), 5000 (validation), and
After using the clustering method of the Uniform Manifold Approx- 4239 (testing) for the MP dataset as used in previous works. While the
imation and Projection (UMAP) clustering42, we categorized atoms into MP*44, and their properties are split into 80% training, 10% validation,
different groups, analyzing the connection between the embeddings and 10% testing sets. It is worthy to note that, as discussed in Sup-
and the real atoms. plementary 1, the gaps in the band structures of solids in materials
databases such as the MP, which are defined as the difference
Results and discussions between the eigenvalues of the conduction-band minimum (CBM)
Universal atomic embeddings and valence-band maximum (VBM), were obtained by solving the
Generally, when predicting the properties such as formation energy Kohn-Sham (KS) equation with exchange-correlation (xc) in the
and bandgap of a material in deep-learning models, each atom is Perdew-Burke-Ernzerhof (PBE) parametrization45. In semiconductors
first embedded as features. This embedding process is the intrinsic and insulators, these PBE bandgaps E PBE g are not equal to their fun-
process of GNNs models such as CGCNN, ALIGNN, and MEGNET. Then damental gaps EG, but differ by a term called derivative discontinuity
the deeper feature extraction processes, including information of the xc energy Δxc46, leading to the substantial underestimation of
Table 1 | Performance comparison (MAE) of various models on different datasets and different pretrained models
Model \ Target MP-Ef MP-Eg MP*-Ef MP*-Eg JARVIS-Ef JARVIS-Eg MC3D-E
None-CrystalTransformer 0.097 0.563 0.152 0.395 - - -
None-CGCNN9 0.083 0.384 0.085 0.342 0.080 0.531 5.558
None-MEGNET10 0.051 0.324 0.054 0.291 0.070 0.493 5.029
None-ALIGNN11 0.022 0.276 0.056 0.152 0.044 0.562 3.706
CT-CGCNN9 0.071 0.359 - - 0.066 0.463 5.341
CT-MEGNET10 0.049 0.304 - - 0.068 0.443 4.687
11
CT-ALIGNN 0.018 0.256 - - 0.043 0.536 3.705
CG-CGCNN9 0.074 0.378 - - - - -
MEG-CGCNN9 0.082 0.457 - - - - -
ALI-CGCNN9 0.077 0.386 - - - - -
Ef (eV/atom), Eg (eV) and E (eV) denote the formation energy, bandgap and total energy for materials. A-B implies the front (A) and back (B) end models, and None means trained from scratch with no
front-end model. CT indicates CrystalTransformer as front-end model, and all front-end model is pretrained on the MP* dataset.
(eV/atom)
3
CT-MEGNET
3 2
None-MEGNET 2
CT-CGCNN 1
2 0
None-CGCNN -1 1
(eV/atom)
-2
(eV/atom)
Predicted
1
-3
0
-4
0 -5
(c)
-5 -4 -3 -2 -1
Target
0 1
(eV/atom)
2 3
v-1
v -1
(eV/atom)
3
-2
Predicted
-2 CT-ALIGNN
2
None-ALIGNN
1
-3 0 -3
MAE R2 -1
-2
-4 CT-CGCNN: 0.073 0.986
Predicted
-3 -4
None-CGCNN: 0.083 0.982 -4
-5 -5
-4 -3 -2 -1 0 1 2 -5 -4 -3 -2 -1 0 1 2 3 -5
0 0.5
v
E PBE
g compared to EG as large as 40–50%47,48. However, since the KS embeddings, denoted as N-CGCNN in Table 1 (N means the front-end
equation is constructed based on the kinetic energy and Coulomb model described above). As listed in Table 1, among the atomic
potentials between charged particles (electrons and ions), when embeddings pre-trained by different models, those who use ct-UAEs
specific exchange-correlation functionals are used, the eigenvalues (CT-CGCNN) perform the best, with 14% and 7% reduction in MAE for Ef
of the KS equation should capture the major physical interactions and Eg, also outperforming the best GNN front-end embeddings (CG-
within the interacting systems. Therefore, if the PBE bandgaps are CGCNN in this context) by 4% and 5% for both properties, respectively.
used as target in the deep learning model, the derived atomic The predicted formation energy versus target formation energy for
embedding should involve the atomic properties and the structural those models are listed in Fig. 2a–c.
information, since E PBE
g s have involved such information when con- Furthermore, as listed in Table 1, performances of GNN models
structing the KS Hamiltonian using the PBE-type xc functional. like CGCNN, MEGNET, and ALIGNN were enhanced by using the
Front-end models as CrystalTransformer, CGCNN, ALIGNN, and CrystalTransformer-generated atomic embeddings (ct-UAEs) eval-
MEGNET are first pre-trained on the expanded MP* dataset, focusing uated on the MP dataset. The CGCNN model transferred with
on the bandgap energy Eg and formation energy Ef predictive CrystalTransformer-generated embeddings (ct-UAEs), denoted by CT-
tasks. Subsequently, the extracted atomic embeddings are integrated CGCNN in Table 1, shows a significant reduction in MAE values for
into a CGCNN back-end model and trained on the original MP dataset, formation energy Ef, decreasing from 0.083 eV/atom to 0.071 eV/
which results in CT-CGCNN, CG-CGCNN, ALI-CGCNN, and so on. atom, a reduction of 14%, and for bandgap Eg, decreasing from
Table 1 shows a comparative MAE analysis to evaluate the relative 0.384 eV to 0.359 eV, a reduction of 7%. A similar reduction can be
performance enhancements attributable to the front-end atomic observed for MEGNET, denoted by CT-MEGNET in Table 1, with Ef
Table 2 | Single-task versus multi-task embeddings on mean Then, different training strategies are used to evaluate the per-
absolute error (MAE) for formation energy (eV/atom) and formance of the model, and the results are listed in Table 3. The CTfreeze-
bandgap (eV) and R2 CGCNN, which employs frozen pre-trained embeddings from the
CrystalTransformer or ct-UAEs, achieves an MAE of 0.073 eV/atom for
Target None- CTEf -CG CTEg -CG CTMT@2p- CTMT@3p- CTMT@4p-
CG CG CG CG formation energy Ef and 0.358 eV for bandgap Eg. However, when
MAE(Ef) 0.083 0.071 0.078 0.068 0.069 0.068 integrating the coordinate embeddings together with ct-UAEs (chem-
2 istry information) into the CGCNN framework (CTchem+coords-CGCNN),
R (Ef) 0.984 0.987 0.983 0.987 0.987 0.986
the MAE increases from 0.071 eV/atom in the atom-embedding-only
MAE(Eg) 0.384 0.383 0.359 0.357 0.356 0.367
model to 0.085 eV/atom. Similarly, the MAE worsens from 0.359 eV to
R2(Eg) 0.845 0.845 0.850 0.849 0.851 0.847
0.395 eV for bandgap Eg.
CG indicates CGCNN. None means no embeddings are used, Ef and Eg denotes embeddings The ability and transferability of the universal atomic embedding
trained on the corresponding target. MT@np demotes embeddings trained with multi-task
are further tested on different databases and tasks. Each is cut into 8:1:1
learning on n properties.
for training, validation, and testing. Details on the dataset can be found
in Supplementary 2A. As for the Jarvis dataset49, the result is shown in
Table 1. The CT-CGCNN model demonstrates an improvement in
Table 3 | Various embedding approaches comparison on predicting both formation energy Ef and bandgap energy Eg. The MAEs
mean absolute error (MAE) for formation energy (eV/atom) for formation energy and bandgap are reduced from 0.080 eV/atom to
and bandgap (eV) and R2 0.066 eV/atom by 17.5% and from 0.531 eV to 0.463 eV by 12.8%,
respectively.
Target CT-CGCNN CTchem CTfreeze-CGCNN
+coords The embedding is further evaluated on the MC3D dataset. Prop-
-CGCNN
erties such as total energy (E) are chosen as the task, and the result is
MAE(Ef) 0.071 0.085 0.073
shown in Table 1. The MAE of CGCNN is reduced from 5.558 eV to
R2(Ef) 0.987 0.983 0.986 5.341 eV, indicating a 3.9% improvement. For the ALIGNN model, the
MAE(Eg) 0.359 0.395 0.358 MAE remains nearly unchanged. While for the MEGNET model, the
R2(Ef) 0.850 0.834 0.851 MAE decreases from 5.029 eV to 4.687 eV, showing a 6.8%
CT denotes embeddings trained on corresponding properties. CTchem+coords denotes atom and improvement.
coordinates embeddings, while CTfreeze denotes embeddings with zero grad when training the Additionally, we also investigated the suitability of ct-UAE on
back-end model.
energy-conserving interatomic potential (IAP) models, which are
trained based on the MPtrj dataset50. As demonstrated in Supple-
decreasing from 0.051 eV/atom to 0.049 eV/atom, a 4% reduction, and mentary 4, we trained ct-UAE on vectorial and scalar targets, i.e., force,
for bandgap Eg, decreasing from 0.324 eV to 0.304 eV, a reduction of stress, and energy. To benchmark, we re-trained CHGNet50, M3GNet51,
6%. ALIGNN also exhibits an improvement in Ef prediction accuracy, and MACE52 models on the MP-RELAX dataset proposed by M3GNet51.
denoted by CT-ALIGNN in Table 1, decreasing from 0.022 eV/atom to Remarkably, adding ct-UAEs to CHGNet resulted in a significant
0.018 eV/atom, a reduction of 18%, and for bandgap Eg, decreasing reduction in force loss, from 0.284 to 0.242 (a 14.8% decrease), along
from 0.276 eV to 0.256 eV, a 7% reduction. with a reduction in stress loss from 1.496 to 1.437 and a slight decrease
in energy loss from 0.460 to 0.457. For M3GNet, ct-UAE led to a slight
Transferability of ct-UAEs reduction in total loss (energy, force, stress) from 2.1236 to 2.1234 and
To further investigate the performance of the ct-UAEs on different in energy loss from 0.3597 to 0.3595, indicating a minor performance
properties, task-generated embeddings are transferred to different improvement. However, for MACE, the ct-UAE did not lead to a
tasks. For example, Ef-task-generated atomic embeddings are applied reduction in loss.
to bandgap prediction and Eg-task-generated embeddings to forma-
tion energy task. The results are listed in Table 2, denoted as CTE f -CG Interpretability
and CTE g -CG. Embeddings trained on bandgap tasks, when transferred This investigation leverages straightforward clustering algorithms to
to the formation task, lead to a measurable improvement in accuracy conduct an in-depth analysis of ct-UAEs. Here, the UMAP clustering
with the MAE decreasing from 0.083 to 0.078 eV/atom, a 6% reduction. method42 is employed to project these ct-UAEs into a two-
Further, although trained on a simple task such as formation energy, dimensional space, thereby offering a means to intuitively under-
the embedding reduces MAE on the more challenging bandgap pre- stand atomic characteristics in a reduced dimensional setting. Con-
diction by 0.2%. sequently, the dimensionality of the ct-UAEs is reduced from the
Further experiments focus on multi-task-generated embeddings original 89 × 128 to 89 × 2, and through the application of
(MT). As listed in Table 2, the embeddings trained from two properties the K-means clustering method53 in the two-dimensional space,
(formation energy and bandgap), denoted as MT@2p, yield better atoms are further categorized into three distinct groups as shown in
performance compared to single-task-generated embeddings. When Fig. 3a. The t-SNE clustering method54 is used as an additional sup-
transferred to the CGCNN model (CTMT@2p-CGCNN), the model plementary comparison, as shown in Fig. S1. Furthermore, the com-
achieves an MAE of 0.068 eV/atom for Ef and 0.357 eV for Eg, out- munity detection method55 is also used to directly cluster ct-UAEs
performing the baseline CGCNN (by an 18% reduction in Ef and a 7% into three categories without dimension reduction, as an additional
reduction in Eg) as well as the CGCNN variants using single-task method to investigate the interpretability of ct-UAEs, which is shown
embeddings, with a 4% reduction in Ef and 0.5% for bandgap. in Fig. S2.
Additional multi-task variants (MT@3p and MT@4p) incorporat- To determine the best number of clusters for atoms, the elbow
ing total energy and total magnetization are introduced. When intro- plots56 and siltouette coefficient graphs56 are needed, with quantitative
ducing MT@3p with an additional property of total energy, a 0.2% analysis shown in Fig. 3b, c. Both elbow plot and silhouette coefficient
reduction in bandgap MAE is achieved, with formation energy almost graph demonstrate that 3 or 4 clusters’ solution is the best choice for
unchanged. However, the introduction of magnetization in MT@4p the classification of atoms. In this work, the CrystalTransformer model
leads to a slight increase in the MAE for bandgap prediction from 0.357 can be trained with 2, 3 or 4 different properties, (but these atomic
to 0.367 eV, which is probably due to the physical differences between embeddings all show the same best number of clusters of 3 or 4
these two properties. clusters).
Silhouette Score
0.20
7
0.15
Component 2
6 0.10
5 0.05
2 3 4 5 6 7 8 9 10
Number of Clusters
4
(c) 24
3 22
20
SSE
18
2 16
14
1 12
-4 -3 -2 -1 0 1 10
2 3 4 5 6 7 8 9 10
Component 1 Number of Clusters
(d) (e) (f)
Formation energy (eV/atom)
Total Magnetization ( B)
0
6 20
Band gap (eV)
5
-1 15
4
-2 3 10
2
5
-3 1
0
0
-4
-1
Class A Class B Class C Class A Class B Class C Class A Class B Class C
Classes Classes Classes
Fig. 3 | Interpretability for CrystalTransformer-generated universal atomic the dashed line in (c), indicating that the slope of the Sum of Squared Error (SSE)
embeddings (ct-UAE) includes clustering elements and statistically validating curve is relatively steep when the number of clusters is 3. Five random seeds are
the clustering results. a UMAP (Uniform Manifold Approximation and Projection) used to get averaged results. d–f The violin plots of formation energy, bandgap,
maps ct-UAEs into two dimensions denoted as Component 1 and 2, while K-means and total magnetization of oxide compounds and oxygen allotropes from the
method clusters them into three categories denoted by three colors. The shadow Materials Project dataset, categorized into Classes A, B, and C using MT@4p
background reflects the number of elements in the cluster in the region. The darker embedding with UMAP. The total numbers of samples for Class A, Class B and Class
shadow indicates a higher number of elements in that cluster region. b, c Elbow plot C shown in (d–f) are 2197, 2719, and 7752, respectively. Parameters like outliers or
and silhouette score graph for optimal cluster number. The dashed line in (b) is center for violin plots are listed in the Source data. Source data are provided as a
located at 3, representing the silhouette score being at a relatively high level. So is Source Data file.
Based on the clustering results, most of the elements in the per- analysis resulted in 2197 compounds containing oxygen and Class A
iodic table can be categorized, as shown in Fig. 3a. This example elements, 2719 compounds containing oxygen and Class B elements,
divides all elements into 3 classes, called as Class A (the green cluster), and 7752 compounds containing oxygen and Class C elements. Violin
Class B (the yellow cluster), and Class C (the blue cluster). And indi- plots for each of the three properties are shown in Fig. 3d–f.
vidual elements that don’t appear in datasets are colored with gray. As illustrated in Fig. 3d, the formation energy of oxide compounds
Essentially, this clustering scheme based on ct-UAEs differs from the for the three classes of elements shows significant differences. the
traditional classification rules of elements in the periodic table arran- formation energy of Class A is concentrated between −2.5 eV/atom and
ged based on the atomic number of elements, but for the clarity and −4.0 eV/atom, indicating a relatively high chemical stability for oxides
convenience, we also presented the results using the periodic table containing A-class elements. Class B exhibits the widest range of for-
scheme as well. The detailed result of the UMAP clustering shown in mation energies, from near 0 eV/atom to −4 eV/atom. The formation
the periodic-table scheme is demonstrated in Fig. S3 in energy of oxides containing C-class elements is concentrated between
Supplementary 5. −1.0 eV/atom and −2.5 eV/atom, also indicating their relatively good
To further interpret the element classification and without loss of chemical stability, albeit with generally lower stability compared to
generality, we chose oxide compounds from the Materials Project, Class A. Specifically, the Class A includes Group IIA, IIIB, and IVB ele-
which yielded a total of 62,068 retrieved materials. From these, we ments, and their similar characteristics is the tendency that valence
filtered for those that contain data on formation energy, bandgap, and electrons participate in metallic bonding, contributing to the more
total magnetization. We then categorized the filtered materials into compact lattice structure57–59. The Class B includes most of Groups VB
three groups according to the previously determined element classi- to VIIIB, with their d-orbital electrons possessing close energy levels,
fication Classes A, B, and C clustered by MT@4p embedding using which distinguishes them from the main group elements dominated by
UMAP. Each group contains only elements from the corresponding s-orbital electrons and lanthanides and actinides influenced by f-orbital
class and oxygen, with no inclusion of elements from other classes. The electrons. Previous studies reported that these elements can
participate the formation of crystals with unique electrical and thermal To further investigate the intrinsic information of the embed-
conductivity properties, as well as distinctive catalytic capabilities60–64. dings, we conduct reverse training experiments, which involves taking
The Class C includes Group IA, IB, and IIB elements, along with main a series of important elemental properties, including atomic radius,
group metals and nonmetals. These elements show electronic boiling temperature, melting temperature, electrical conductivity, first
exchange and sharing abilities in solid states. Among them, the alkali ionization energy as training targets to train a Catboost model66. 80%
metals and halogen tend to participate in the electron sharing to form of atomic embeddings are selected randomly as training data, while
the most stable structure. Group IB and IIB metals have relatively stable the rest serves as validation to calculate R2 (coefficient of determina-
d-orbital electrons, but can provide additional electron density during tion). The R2 for the model in predicting each property is calculated,
the formation of crystals, resulting in high melting points and good with the best results listed in Table 4, which reveals that, the Catboost
electrical conductivity65. model exhibits high values of R2 larger than 0.78. So even with small-
As illustrated in Fig. 3e, the bandgap distribution of oxide com- set data, the ct-UAEs are able to establish a robust connection with the
pounds for the three classes of elements also reveals distinct beha- physical and chemical properties of atoms. We further employed the
viors. The bandgaps of oxides containing A-class elements are SHAP67 algorithm to determine the most important dimensions con-
concentrated between 3 eV and 6 eV, indicating that they are primarily tributing to the final results. The outcome is averaged over multiple
wide-bandgap semiconductors. The bandgap for Class B is con- random seeds to maintain stability. While the results shown in Table 4
centrated between 0.5 eV and 2.5 eV, reflecting narrow-bandgap reveal that certain properties correspond to specific dimensions,
semiconducting behavior. The bandgap for Class C is concentrated which acts like genes. The calculated SHAP value is shown in Fig. S4.
between 1 eV and 4 eV, which fall within the typical semi- To further understand the difference between embeddings
conductor range. derived from different multi-tasks properties. We use the Dynamic
Lastly, Fig. 3f shows the distribution of magnetization across the Time Warping (DTW)68 method to measure the similarity between
three classes of elements. The magnetization of most elements across mean embedding from MT@2p, MT@3p and MT@4p. Averaging and
all three classes is concentrated near 0 μB, indicating that, most oxides window smoothing of size 5 are first conducted to reduce noise
exhibit very low net magnetic moments, characteristic of para- information and uncover the inner trends. The threshold of 0.013 is
magnetic or diamagnetic materials. Specifically, the magnetization of used to distinguish periods of high similarity from those with diver-
Class A is almost entirely centered at 0 μB. For Class B, the distribution gence, which was further shown by the inverse DTW distance in Fig. 4b,
of magnetization is broader, and a substantial number of elements and the reference line at 0.26 × 103 is served as a benchmark to
have magnetization values greater than 5 μB, demonstrating notable underscore the distinction between embeddings. Our analysis
ferromagnetic behavior. Also, Fe and Co are in class B. The magneti- revealed a pronounced alignment as shown in Fig. 4, suggesting similar
zation of Class C is primarily distributed between 0 μB and 5 μB. feature evolution despite the introduction of more tasks related to
total energy and total magnetization.
As shown in Fig. 4a, the blue region indicate that the corespond-
Table 4 | Most important feature dimensions for various ing embeddings share some similarity, which also equals to the values
properties in Fig. 4b being above the threshold. The observation indicates that
although the target properties is diverse, the basic trends of the
Properties Important Feature Dimensions R2
embeddings remain largely the same. Of particular note is the high
Radius 98, 109 0.784
similarity between the embeddings of the MT@3p and MT@2p mod-
Boiling Temperature 63, 11 0.864 els, which only differ by one total energy task. In contrast, the intro-
Melting Temperature 45, 91 0.856 duction of magnetic properties in the MT@4p model led to al
Electrical Conductivity 126, 9 0.831 disagreements but still contained relatively sufficient similarities. The
First Ionization Energy 85, 20 0.907 fact that the average standard deviation of each embedding is 0.0358,
R2 is the R-squared value in predicting each property. 0.0397, and 0.0481 shows that the average variance of the embeddings
The bold type indicates the most important dimension. are close to each other. Further, we calculate the variance of each
0.03
MT@4
0.02 0.79
0.01
0.00 0.53
-0.01
0.26
-0.02
-0.03
0.00
0 20 40 60 80 100 120 0 20 40 60 80 100 120
Dimension Index Dimension Index
Fig. 4 | The analysis of the similarity of CrystalTransformer-generated universal distance < 0.013). The average standard deviation of MT@2p, MT@3p and MT@4p
atomic embeddings (ct-UAE) obtained from multi-task training with different across different dimensions are 0.0358, 0.0397, and 0.0481 respectively. Similar
numbers of properties. MT@np demotes embeddings trained with muti-task standard deviations denote similar magnitude of variance per atom. b Inverse DTW
learning on n properties. MT@2p is trained using formation energy and bandgap, distances, with notable variance and a reference line at 0.26 (scaled by 103). This
while MT@3p adds total energy, and MT@4p further includes total magnetization. value is determined to cover the curves with similar trends in (a) and distinguish
DTW is the Dynamic Time Warpin method. a Multi-task embedding comparison for between similar and dissimilar trend regions. Source data are provided as a Source
MT@2p, MT@3p, and MT@4p, highlighting DTW similarity regions (max Data file.
Fig. 5 | Flowcharts and results comparison on using ct-UAE trained on different the MAE for UAE-free case is 0.32 eV/atom. Using the transfer learning strategy
tasks or not for perovskite property prediction. Ef is the formation energy. MAE with ct-UAE results in an MAE of 0.030 eV/atom, while the MAE for the transfer
is the Mean Absolute Error. R2 is the R-squared value in predicting each property. learning strategy with ct-UAE trained using multi-task learning is 0.021 eV/atom.
The prefix None- denotes models that do not use ct-UAE. The prefix CT- indicates b, c Predicted formation energy versus target formation energy for the MEGNET10
models that use ct-UAE. The perfix CTMT@np is models that use ct-UAE trained by n and CGCNN9 models. The upper part and the right part denotes target and pre-
properties. a Schematic representation of the workflow for applying ct-UAEs to diction data distribution respectively.
predict properties of perovskite materials. When the back-end model is MEGNET10,
embeddings’ standard Deviation (std), the result is 0.016, 0.014, and atomic species) and the coordinate input with batch × N × D size (Here
0.017, which implies that the std of the 128 dimensions is stable. D indicates the spatial dimension (equals 3 in this context)), the model
first topologically augments the coordinates input using translation
Application in hybrid organic-inorganic perovskite crystals and rotation transformation. The details of the augmentation are
Hybrid organic-inorganic perovskite (HOIP) materials are gaining tre- described in Supplementary 6. After that, both inputs are applied by
mendous attention for their outstanding optoelectronic properties. linear transformation to embed features to a dimension of C.
However, studying on the HOIP materials is hindered due to the
A0 = AW A + b ,
scarcity of training data69,70. Contrary to other materials, HOIP crystals A
ð1Þ
lack a large and high-quality database because of their complex
synthesis. Such scarcity of data presents a significant challenge for
X0 = XWX + b ,
X
traditional deep learning models. ð2Þ
After merging two distinct datasets of HOIP materials69,70, we
created a more diverse dataset containing 2103 HOIP crystals, which is where A denotes the one-hot initialization for atom input features, with
still small compared to other material databases such as the MP the dimension of batch × N × L. Tensor X denotes the atom position
dataset44. Figure 5a shows the workflow for applying ct-UAEs to predict coordinates, with the dimension of batch × N × D. WA and WX denote the
properties of HOIP materials, with the results presented in Fig. 5. The weight matrices for atom features and position coordinates respectively,
MAE of CGCNN model for predicting formation energy (Ef) of HOIP while bA and bX denote their corresponding biases. A0 and X 0 have the
materials was reduced significantly, from 0.054 eV/atom to 0.046 eV/ same dimension of batch × L × C. It should be noted that the WA and bA
atom by a 16% improvement. Similarly the MAE of MEGNET reduces matrix or the AWA + bA output (A is one-hot input) are the embedding
from 0.032 eV/atom to 0.021 eV/atom, by nearly 34.38%. For ALIGNN, matrix of the atomic information, which is the most important part of
the MAE values are not in the same magnitude as the aforementioned the CrystalTransformer model. The transformed atoms and position
models. Figure 5b, c shows the plot of predicted formation energy features are then concatenated along the feature dimension by,
versus the target formation energy for the MEGNET and CGCNN
models. M = Concat ðA0 , X0 Þ, ð3Þ
Chemical Coordinates
Information Information
Extraction Extraction
Atomic Embedding
Matrix Matrix
Linear Layer
(c)
Multi-Head Attention
Feed Forward
Topological Data
Add & Norm augmentation
Linear Layer
Crystal properties
Fig. 6 | The structure of the CrystalTransformer model. a The main part of which include multi-head self-attention, feedforward layers, and other compo-
CrystalTransformer model. InputA and InputX denote atom (chemistry) and struc- nents, to produce the output target. b Chemical information extraction layer.
ture (coordinates) information respectively. After passing through the information InputA is first passed through an embedding layer, followed by a linear transfor-
extraction layers, the inputs are transformed into the A matrix and X matrix. These mation. c Coordinates information extraction layer. InputX undergoes data aug-
two matrices are then concatenated and processed through the Transformer layers mentation followed by a linear transformation.
where Z(0) = M and l indexes the layer of the encoder. Each Multihead 0.458 eV when trained on the MP bandgap dataset. By comparison of
Transformer encoder layer processes the input sequence and updates the definition of attention weights αij in Transformer as shown in Eq
it through multihead self-attention mechanisms and point-wise feed- (S16), and the general expression describing physical interaction
forward networks, as described in the Supplementary 2B section. After between atoms V(rij) as shown in Eq (S17), it is straightforward that the
processing the crystal structure features through the Transformer attention weight αij is analogous to the physical interaction coefficients
encoder, the CrystalTransformer model selects the first token from the between atoms V(rij), which suggests that the attention mechanism can
output sequence for downstream prediction tasks, which is passed learn spatial relationships and interaction properties in real physical
through a linear layer to produce the network’s predicted material systems.
properties as, The CrystalTransformer method exhibits a theoretical complexity
primarily driven by the self-attention mechanism in its Transformer
ypred = LinearðZ ðLÞ
1 Þ,
ð5Þ layers. For a crystal with n atoms, the self-attention mechanism com-
putes pairwise correlations with a complexity of Oðn2 dÞ, where d is
where Z ðLÞ
1 denotes the first token of the final encoder layer’s output, the dimensionality of atomic features. Traditional Graph Neural Net-
and ypred is the material properties predicted by the network. works (GNNs), by contrast, typically operate with a lower theoretical
2
The Transformer’s multihead-self-attention mechanisms allow the complexity of Oðn d Þ due to their localized edge-based interactions.
model to learn representations that can capture the underlying Despite this, the manageable scale of n in conventional crystal struc-
mechanisms for material properties. It not only processes the chemical tures results in a feasible runtime for both approaches. Real runtime
part, but also incorporates the coordinates part. To further investigate experiments show CrystalTransformer required 21 seconds for 100
the role of the coordinates part of CrystalTransformer in model per- batches with 512 crystals per batch, while CGCNN needed 10 seconds,
formance, an ablation study and qualitative analysis are conducted as which is evenly matched. Further details are available in
described in Supplementary 7, which shows that the coordinates part Supplementary 3.
encapsulates important geometric characteristics of crystal systems In order to test the CrystalTransformer model’s performance on
and is important in training the embeddings. Without the coordinates crystal datasets, we conducted performance assessments against
part, the training MAE will increase, with MAE from 0.395 eV to established graph neural network models. These models were
evaluated on the MP and MP* dataset for formation energy (Ef) and PBE embeddings used in this work. Source data are provided with this
bandgap (Eg), As listed in Table 1, the None-CrystalTransformer, None- paper as a Source Data file. Source data are provided with this paper.
CGCNN, None-MEGNET, None-ALIGNN, denote the models trained
from scratch without any front-end model, which is the traditional Code availability
method. It is clear that, despite lacking the prior inputs of atomic The ct-UAE source code used in this study is publicly available on
features and edge information of crystals, the None- GitHub (https://github.com/fduabinitio/ct-UAE) under MIT license. ct-
CrystalTransformer demonstrates competitive accuracy in predicting UAEv1.075 (https://doi.org/10.5281/zenodo.14557908) was used to
material properties, i.e., only 1–4 times larger in Ef and 1–3 times larger generate all embeddings in this work.
in Eg compared to the traditional GNNs models on MP/MP* datasets.
The increase in MAE is partly because it does not strictly rely on pre- References
defined graph structures and inductive bias. The lack of certain 1. Jumper, J. et al. Highly accurate protein structure prediction with
inductive biases compels the model to acquire this knowledge inde- alphafold. Nature 596, 583–589 (2021).
pendently. Although diminishing its predictive capabilities, it does 2. Davies, A. et al. Advancing mathematics by guiding human intuition
encourage the model’s parameters to assimilate additional informa- with AI. Nature 600, 70–74 (2021).
tion, leading to more informative embeddings, as described in Table 1. 3. Kirkpatrick, J. et al. Pushing the frontiers of density functaionals by
solving the fractional electron problem. Science 374,
Crystal-symmetry restrictions and data augmentation 1385–1389 (2021).
The ct-UAE method accounts for the rotational and translational 4. Zhou, J. et al. Graph neural networks: a review of methods and
invariance through its architecture and data augmentation strategy applications. AI open 1, 57–81 (2020).
as described in Supplementary 6. While the ct-UAE front-end indeed 5. Zhuo, Y., Mansouri Tehrani, A. & Brgoch, J. Predicting the band gaps
does not explicitly enforce rotational and translational invariance, of inorganic solids by machine learning. J. Phys. Chem. Lett. 9,
the back-end GNN model is designed to ensure this restriction of 1668–1673 (2018).
symmetries. Actually, the front-end model can easily learn and 6. Butler, K. T., Davies, D. W., Cartwright, H., Isayev, O. & Walsh, A.
maintain symmetries through data augmentation. To validate this Machine learning for molecular and materials science. Nature 559,
assertion, firstly we used a stronger data augmentation method to 547–555 (2018).
train the MT@3p model on the MP* dataset. Then a group of crystals 7. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521,
are randomly selected, and subjected to random augmentations 436–444 (2015).
through rotations and translations. The consistency of the output 8. Wu, Z. et al. A comprehensive survey on graph neural networks.
vectors from these augmented samples was assessed using pairwise IEEE Transact. Neural Netw. Learn. Syst. 32, 4–24 (2020).
cosine similarity and Euclidean distance. The trained MT@3p model 9. Xie, T. & Grossman, J. C. Crystal graph convolutional neural net-
achieved an average cosine similarity of 0.998 and an average works for an accurate and interpretable prediction of material
Euclidean distance of 0.275, indicating that the output vectors were properties. Phys. Rev. Lett. 120, 145301 (2018).
nearly identical across augmentations. Notably, a recent study 10. Chen, C., Ye, W., Zuo, Y., Zheng, C. & Ong, S. P. Graph networks as a
employing a similar method of data augmentation demonstrated universal machine learning framework for molecules and crystals.
that, unconstrained model architectures like transformers can be Chem. Mater. 31, 3564–3572 (2019).
trained to achieve a high degree of invariance such as rotational 11. Choudhary, K. & DeCost, B. Atomistic line graph neural network for
invariance by learning these symmetries from data71, and this improved materials property predictions. npj Comput. Mater. 7,
unconstrained architecture can, in fact, lead to improved perfor- 185 (2021).
mance, which is essentially consistent with the rationale behind our 12. Park, C. W. & Wolverton, C. Developing an improved crystal graph
proposed front-end model of CrystalTransformer. convolutional neural network framework for accelerated materials
discovery. Phys. Rev. Mater. 4, 063801 (2020).
Multi-task learning method 13. Qiao, Z., Welborn, M., Anandkumar, A., Manby, F. R. & Miller,
MTL72–74 (multi-task learning) is a learning method. The model is T. F. Orbnet: deep learning for quantum chemistry using
trained simultaneously on different tasks, while the parameter is symmetry-adapted atomic-orbital features. J. Chem. Phys.
optimized toward the trend that all tasks improve. This training 153, 124111 (2020).
method enhances generalization. In the context of Crystal- 14. Gasteiger, J., Groß, J. & Günnemann, S. Directional message pas-
Transformer, MTL stands for different properties for materials. The sing for molecular graphs. In Proc. International Conference on
loss function is a weighted sum of the loss for each task: Learning Representations (ICLR, 2020).
X 15. Gasteiger, J., Giri, S., Margraf, J. T. & Günnemann, S. Fast and
LMTL = wi Lossi ðypred, i , ytarget, i Þ, ð6Þ uncertainty-aware directional message passing for non-equilibrium
i molecules. In Machine Learning for Molecules Workshop at NeurIPS
(NIPS, 2020).
where Lossi could be MSE or MAE, wi are the task weights, and i indexes 16. Shui, Z. & Karypis, G. Heterogeneous molecular graph neural
the task. MTL is capable of ensuring the universality of atomic networks for predicting molecule properties. In 2020 IEEE Inter-
embeddings, rather than developing an UAE that is specially optimized national Conference on Data Mining (ICDM), pages 492–500.
on a single task. (IEEE, 2020).
17. Schütt, K. T., Arbabzadah, F., Chmiela, S., Müller, K. R. & Tkatch-
Reporting summary enko, A. Quantum-chemical insights from deep tensor neural net-
Further information on research design is available in the Nature works. Nat. Commun. 8, 13890 (2017).
Portfolio Reporting Summary linked to this article. 18. Anderson, B., Hy, T.-S. & Kondor, R. Cormorant: covariant molecular
neural networks. In Proc. 33rd International Conference on Neural
Data availability Information Processing Systems (NIPS, 2019).
The embeddings generated by our ct-UAEs are available on Github 19. Zhang, S., Liu, Y. & Xie, L. Molecular mechanics-driven graph neural
(https://github.com/fduabinitio/ct-UAE) under MIT license. ct- network with multiplex graph for molecular structures. In Machine
UAEv1.075(https://doi.org/10.5281/zenodo.14557908) contains all the Learning for Molecules Workshop at NeurIPS (NIPS, 2020).
20. schuett, K. T. et al. Schnetpack: a deep learning toolbox for ato- 41. Wu, F. et al. 3d-transformer: molecular representation with trans-
mistic systems. J. Chem. Ther. Comput. 15, 448–455 (2018). former in 3d space. (2021).
21. Jha, D. et al. Elemnet: deep learning the chemistry of materials from 42. Healy, J., McInnes, L. Uniform manifold approximation and projec-
only elemental composition. Sci. Rep. 8, 17593 (2018). tion. Nat. Rev. Methods Primers 4, 82 (2024).
22. Westermayr, J., Gastegger, M. & Marquetand, P. Combining schnet 43. Herr, J. E., Koh, K., Yao, K. & Parkhill, J. Compressing physics with an
and sharc: the schnarc machine learning approach for excited-state autoencoder: creating an atomic species representation to improve
dynamics. J. Phys. Chem. Lett. 11, 3828–3834 (2020). machine learning models in the chemical sciences. J. Chem. Phys.
23. Wen, M., Blau, S. M., Spotte-Smith, E. W. C., Dwaraknath, S. & 151, 084103 (2019).
Persson, K. A. Bondnet: a graph neural network for the prediction of 44. Jain, A. et al. Commentary: the materials project: a materials gen-
bond dissociation energies for charged molecules. Chem. Sci. 12, ome approach to accelerating materials innovation. APL Mater.
1858–1868 (2021). 1, 011002 (2013).
24. Isayev, O. et al. Universal fragment descriptors for predicting 45. Perdew, J. P., Burke, K. & Ernzerhof, M. Generalized gradient
properties of inorganic crystals. Nat. Commun. 8, 15679 (2017). approximation made simple. Phys. Rev. Lett. 77, 3865 (1996).
25. Chen, C., Zuo, Y., Ye, W., Li, X. & Ong, S. P. Learning properties of 46. Perdew, J. P. & Levy, M. Physical content of the exact kohn-sham
ordered and disordered materials from multi-fidelity data. Nat. orbital energies: band gaps and derivative discontinuities. Phys.
Comput. Sci. 1, 46–53 (2021). Rev. Lett. 51, 1884 (1983).
26. Lu, S. et al. Accelerated discovery of stable lead-free hybrid 47. Borlido, P. et al. Large-scale benchmark of exchange–correlation
organic-inorganic perovskites via machine learning. Nat. Commun. functionals for the determination of electronic band gaps of solids.
9, 3405 (2018). J. Chem. Ther. Comput. 15, 5069–5079 (2019).
27. Chen, D. et al. Automating crystal-structure phase mapping by 48. Borlido, P. et al. Exchange-correlation functionals for band gaps of
combining deep learning with constraint reasoning. Nat. Machine solids: benchmark, reparametrization and machine learning. npj
Intell. 3, 812–822 (2021). Comput. Mater. 6, 1–17 (2020).
28. Lee, X. Y. et al. Fast inverse design of microstructures via generative 49. Choudhary, K. et al. The joint automated repository for various
invariance networks. Nat. Comput. Sci. 1, 229–238 (2021). integrated simulations (JARVIS) for data-driven materials design.
29. Zhang, X., Zhou, J., Lu, J. & Shen, L. Interpretable learning of voltage npj Computational Materials 6, 173 (2020).
for electrode design of multivalent metal-ion batteries. npj Comput. 50. Deng, B. et al. Chgnet as a pretrained universal neural network
Mater. 8, 175 (2022). potential for charge-informed atomistic modelling. Nat. Machine
30. Ju, S. et al. Exploring diamondlike lattice thermal conductivity Intell. 5, 1031–1041 (2023).
crystals via feature-based transfer learning. Phys. Rev. Mater. 5, 51. Chen, C. & Ong, S. P. A universal graph deep learning interatomic
053801 (2021). potential for the periodic table. Nat. Comput. Sci. 2, 718–728
31. Devlin, J., Chang, M. W., Lee, K. & Toutanova, K. BERT: pre-training (2022).
of deep bidirectional transformers for language understanding. In 52. Batatia, I., Kovacs, D. P., Simm, G., Ortner, C. & Csányi, G. Mace:
Proc. 2019 Conference of the North American Chapter of the Higher order equivariant message passing neural networks for fast
Association for Computational Linguistics: Human Language and accurate force fields. Adv. Neural Inform. Process. Syst. 35,
Technologies (Long and Short Papers) 4171–4186 (ACL, 2019). 11423–11436 (2022).
32. Kim, D., Saito, K., Saenko, K., Sclaroff, S. & Plummer, B. Mule: 53. Ahmed, M., Seraj, R. & Islam, S. M. S. The k-means algorithm: a
multimodal universal language embedding. Proc. AAAI Conference comprehensive survey and performance evaluation. Electronics 9,
on Artificial Intelligence. 34, 11254–11261 (2020). 1295 (2020).
33. Li, Y. & Yang, T. Word embedding for understanding natural lan- 54. van der Maaten, L. & Hinton, G. Visualizing Data using t-SNE. Journal
guage: a survey. Guide Big Data Appl. 26, 83–104 (2018). of Machine Learning Research 9, 2579–2605 (2008).
34. Lee, J. & Asahi, R. Transfer learning for materials informatics using 55. Traag, V. A., Waltman, L. & Van Eck, N. J. From louvain to
crystal graph convolutional neural network. Comput. Mater. Sci. leiden: guaranteeing well-connected communities. Sci. Rep. 9,
190, 110314 (2021). 1–12 (2019).
35. Feng, S., Zhou, H. & Dong, H. Application of deep transfer learning 56. Saputra, D. M., Saputra, D. & Oswari, L. D. Effect of distance metrics
to predicting crystal structures of inorganic substances. Comput. in determining k-value in k-means clustering using elbow and sil-
Mater. Sci. 195, 110476 (2021). houette method. In Sriwijaya international conference on informa-
36. Yamada, H. et al. Predicting materials properties with little data tion technology and its applications (SICONIAN 2019), pages
using shotgun transfer learning. ACS Central Sci. 5, 341–346. (Atlantis Press, 2020).
1717–1730 (2019). 57. Heaven, M. C., Bondybey, V. E., Merritt, J. M. & Kaledin, A. L. The
37. Kim, J., Jung, J., Kim, S. & Han, S. Predicting melting tem- unique bonding characteristics of beryllium and the group iia
perature of inorganic crystals via crystal graph neural network metals. Chem. Phys. Lett. 506, 1–14 (2011).
enhanced by transfer learning. Comput. Mater. Sci. 234, 58. Zhang, Y., Liu, W. & Niu, H. Native defect properties and p-type
112783 (2024). doping efficiency in group-iia doped wurtzite aln. Phys. Rev. B 77,
38. Jha, D. et al. Enhancing materials property prediction by leveraging 035201 (2008).
computational and experimental data using deep transfer learning. 59. Ri, S.-R., Ri, J.-E., Ri, N.-C. & Hong, S.-I. One way for thermoelectric
Nat. Commun. 10, 5316 (2019). performance enhancement of group iiib monochalcogenides. Solid
39. Choukroun, Y. & Wolf, L. Geometric transformer for end-to-end State Commun. 339, 114485 (2021).
molecule properties prediction. In Proc. Thirty-First International 60. Liu, W.-S., Zhang, B.-P., Zhao, L.-D. & Li, J.-F. Improvement of ther-
Joint Conference on Artificial Intelligence, IJCAI-22 2895–2901 moelectric performance of cosb3−xxtex skutterudite compounds by
(IJCAI, 2022). additional substitution of ivb-group elements for sb. Chem. Mater.
40. Qiao, Z., Welborn, M., Anandkumar, A., Manby, F. R. & Miller, 20, 7526–7531 (2008).
T. F. Orbnet: deep learning for quantum chemistry using 61. Yin, Y., Yi, M. & Guo, W. High and anomalous thermal conductivity in
symmetry-adapted atomic-orbital features. J. Chem. Phys. monolayer msi2z4 semiconductors. ACS Appl. Mater. Interfaces 13,
153, 124111 (2020). 45907–45915 (2021).
62. Song, J. et al. Performance enhancement of perovskite solar cells Author contributions
by doping tio2 blocking layer with group vb elements. J. Alloys H.Z. conceived the project and contributed to securing funding. H.Z. and
Compounds 694, 1232–1238 (2017). Y.C. supervised the research. L.J. and Z.D. developed and trained the
63. Patsalas, P. et al. Conductive nitrides: growth principles, optical and neural networks and analyzed the results. L.J. and Z.D. wrote the original
electronic properties, and their perspectives in photonics and manuscript. L.J., Z.D., L.S., Y.X., Y.M., Y.C., and H.Z. contributed to the
plasmonics. Mater. Sci. Eng. R Rep. 123, 1–55 (2018). discussion of results and manuscript preparation and revision.
64. Awadallah, A. E., Aboul-Enein, A. A., El-Desouki, D. S. & Aboul-Gheit,
A. K. Catalytic thermal decomposition of methane to cox-free Competing interests
hydrogen and carbon nanotubes over mgo supported bimetallic The authors declare no competing interests.
group viii catalysts. Appl. Surf. Sci. 296, 100–107 (2014).
65. Chattopadhyay, S., Mani, B. K. & Angom, D. Triple excitations in Additional information
perturbed relativistic coupled-cluster theory and electric dipole Supplementary information The online version contains
polarizability of group-IIB elements. Phys. Rev. A 91, 052504 (2015). supplementary material available at
66. Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V. & Gulin, https://doi.org/10.1038/s41467-025-56481-x.
A. CatBoost: Unbiased Boosting with Categorical Features. Pro-
ceedings of the 32nd International Conference on Neural Informa- Correspondence and requests for materials should be addressed to
tion Processing Systems, 6639–6649 (Curran Associates Inc., 2018). Yan Cen or Hao Zhang.
67. Lundberg, S. M. & Lee, S.-I. A Unified Approach to Interpreting
Model Predictions. Proceedings of the 31st International Conference Peer review information Nature Communications thanks the anon-
on Neural Information Processing Systems, 4768–4777 (2017). ymous, reviewer(s) for their contribution to the peer review of this work.
68. Müller, M. Dynamic Time Warping. Information Retrieval for Music A peer review file is available.
and Motion 69–84 (Springer Berlin Heidelberg, 2007).
69. Kim, C., Huan, T. D., Krishnan, S. & Ramprasad, R. A hybrid organic- Reprints and permissions information is available at
inorganic perovskite dataset. Sci. Data 4, 1–11 (2017). http://www.nature.com/reprints
70. Nakajima, T. & Sawada, K. Discovery of pb-free perovskite solar cells
via high-throughput simulation on the k computer. J. Phys. Chem. Publisher’s note Springer Nature remains neutral with regard to jur-
Lett. 8, 4826–4831 (2017). isdictional claims in published maps and institutional affiliations.
71. Langer, M. F., Pozdnyakov, S. N. & Ceriotti, M. Probing the effects of
broken symmetries in machine learning. Machine Learn. Sci. Tech- Open Access This article is licensed under a Creative Commons
nol. 5, 04LT01 (2024). Attribution-NonCommercial-NoDerivatives 4.0 International License,
72. Sanyal, S. et al. Mt-cgcnn: Integrating crystal graph convolutional which permits any non-commercial use, sharing, distribution and
neural network with multitask learning for material property pre- reproduction in any medium or format, as long as you give appropriate
diction. arXiv:1811.05660 (2018). credit to the original author(s) and the source, provide a link to the
73. Thung, K.-H. & Wee, C.-Y. A brief review on multi-task learning. Creative Commons licence, and indicate if you modified the licensed
Multimedia Tools Appl. 77, 29705–29725 (2018). material. You do not have permission under this licence to share adapted
74. Zhang, Y. & Yang, Q. A survey on multi-task learning. IEEE Transact. material derived from this article or parts of it. The images or other third
Knowledge Data Eng. 34, 5586–5609 (2022). party material in this article are included in the article’s Creative
75. Luo, J. ct-UAE (GitHub, 2025). https://doi.org/10.5281/zenodo. Commons licence, unless indicated otherwise in a credit line to the
14557909 (2025). material. If material is not included in the article’s Creative Commons
licence and your intended use is not permitted by statutory regulation or
Acknowledgements exceeds the permitted use, you will need to obtain permission directly
The authors thank G. F. Zheng and H. Y. Yu for helpful discussions. This from the copyright holder. To view a copy of this licence, visit http://
work is supported by the National Key R&D Program of China creativecommons.org/licenses/by-nc-nd/4.0/.
(2023YFA1608501), Shanghai Municipal Natural Science Foundation
under Grant No. 24ZR1406600, and Natural Science Foundation of © The Author(s) 2025
Shandong Province under grants no. ZR2021MA041. L.J. and Z.D. also
want to acknowledge the support of FDUROP (Fudan’s Undergraduate
Research Opportunities Program) (24052, 23908).