0% found this document useful (0 votes)

6 views

Transformer-generated atomic embeddings

This article presents a novel atomic embedding strategy called universal atomic embeddings (UAEs) generated by the CrystalTransformer model to improve the prediction accuracy of crystal properties using machine learning. The proposed method demonstrated significant enhancements in prediction accuracy, achieving up to 34% improvement in formation energy predictions across various databases. The study highlights the effectiveness of ct-UAEs in capturing complex atomic features and their potential to address data scarcity challenges in materials science.

Uploaded by

vanjunxin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Transformer-generated atomic embeddings

Uploaded by

vanjunxin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Article https://doi.org/10.

1038/s41467-025-56481-x

Transformer-generated atomic embeddings

to enhance prediction accuracy of crystal
properties with machine learning
Received: 26 April 2024 Luozhijie Jin1,7, Zijian Du2,7, Le Shu1, Yan Cen2 , Yuanfeng Xu3,
Yongfeng Mei 4 & Hao Zhang 1,5,6
Accepted: 17 January 2025

Accelerating the discovery of novel crystal materials by machine learning is

Check for updates crucial for advancing various technologies from clean energy to information
1234567890():,;
1234567890():,;

processing. The machine-learning models for prediction of materials proper-

ties require embedding atomic information, while traditional methods have
limited effectiveness in enhancing prediction accuracy. Here, we proposed an
atomic embedding strategy called universal atomic embeddings (UAEs) for
their broad applicability as atomic ﬁngerprints, and generated the UAE tensors
based on the proposed CrystalTransformer model. By performing experi-
ments on widely-used materials database, our CrystalTransformer-based UAEs
(ct-UAEs) are shown to accurately capture complex atomic features, leading to
a 14% improvement in prediction accuracy on CGCNN and 18% on ALIGNN
when using formation energies as the target, based on the Materials Project
database. We also demonstrated the good transferability of ct-UAEs across
various databases. Based on the clustering analysis for multi-task ct-UAEs, the
elements in the periodic table can be categorized with reasonable connections
between atomic features and targeted crystal properties. After applying
ct-UAEs to predict formation energy in hybrid perovskites database, we
realized an improvement in accuracy, with a 34% boost in MEGNET and 16%
in CGCNN, showcasing their potential as atomic ﬁngerprints to address the
data scarcity challenges.

The development of deep learning (DL) and machine learning (ML) has (iCGCNN)12, OrbNet13, and so on14–24. They have achieved success in
created research methods for kinds of research ﬁelds1–4. In materials kinds of applications, such as learning properties from multi-ﬁdelity
science, this development is leading to discoveries of the material data25, discovering stable lead-free hybrid organic-inorganic
properties, which may be a challenging task for traditional methods5–8. perovskites26, mapping the crystal-structure phase27, designing mate-
Many DL algorithms and models have been proposed, such as the rial microstructures28, and etc.
Crystal Graph Convolutional Neural Network (CGCNN)9, MatErials In the solid-state theory, the features and spatially topological
Graph Network (MEGNET)10, Atomistic Line Graph Neural Network arrangements of the constituent atoms in crystals or other condensed
(ALIGNN)11, improved Crystal Graph Convolutional Neural Networks systems determine their properties, which are intricately encapsulated

1
School of Information Science and Technology, Fudan University, Shanghai, China. 2Department of Physics, Fudan University, Shanghai, China. 3School of
Science, Shandong Jianzhu University, Jinan, Shandong, China. 4Department of Materials, Fudan University, Shanghai, China. 5Department of Optical Science
and Engineering and Key Laboratory of Micro and Nano Photonic Structures (Ministry of Education), Fudan University, Shanghai, China. 6State Key Laboratory
of Photovoltaic Science and Technology, Fudan University, Shanghai, China. 7These authors contributed equally: Luozhijie Jin, Zijian Du.
e-mail: [email protected]; [email protected]

Nature Communications | (2025)16:1210 1

Article https://doi.org/10.1038/s41467-025-56481-x

(a) (b) Deep learning methods (I, II) (c) Mainstream method (III)
Large Database CIF Files
Atomic Embedding Gathering various
properties of elements
Front - end Model

One-Hot Input
Input
Matrix Back - end Models based on
atomic number
1 2

3 4

CGCNN ALIGNN MEGNet Other Feature

Crystal Property
Extractions Artificially
vectorizing
Message Aggregation
0 0 0 1 1 0 1 0 ⋯ 1 0 0

The property is mapped

Nodes Update to a 0-1 vector space.

I. Trained by CrystalTransformer
Property Prediction
0 1 0 0 0 0 0 ⋯ 0 0 0 0
0 0 0 0 0 0 0 ⋯ 0 0 0 0
1 1 0 0 0 0 0 ⋯ 0 0 0 0
Predicted Crystal Properties 0 0 1 0 0 0 0 ⋯ 0 0 0 0
(formation energy, bandgap, etc.)
II. Trained by GNN models III. Artificially 0-1 vectorized

Fig. 1 | Workflow of model with front- and back-end parts to predict properties generate atomic embeddings. Method I uses Crystaltransformer to produce uni-
and different working principle of atomic embeddings. aThe workflow of versial atomic embeddings (UAE), while Method II uses the traditional GNN
front-end and back-end model using atomic embeddings. Atomic Embedding is model to produce ordinary atomic embedding. c The process and principles of
derived from the front-end models while graph neural networks (GNN) serve as Method III, which artificially constructs atomic embeddings using query databases
back-end models trained for different properties. b The process and principles or mapping known atomic properties to a 0–1 vector or one-hot vector in
of Method (I, II) which use deep learning to train on large database and most cases.

into the entity of “atomic embedding”29,30 in the DL algorithms. Spe- transmission and aggregation, node feature updating, etc., are con-
cifically, the atomic embedding is the process of inputting the prop- ducted to predict the crystal properties. In this context, these GNNs
erties of atoms into crystal model digitally, and this idea is originated are denoted as back-end models, while the methods for obtaining
from the natural language processing technique, in which the word atomic embeddings are denoted as front-end models. Essentially, the
embeddings transform the way textual data is represented31–33. An parameter of atomic embeddings can be transferred using pretrained
appropriate atomic embedding can accelerate the training of model, parameters or constructed based on predefined properties, which is
improve the accuracy of prediction, and some explainable information realized in the front-end model of methods (I, II, III), as shown in
can be derived from it34–38. Currently, most attention in the field of Fig. 1b, c.
materials informatics has been focused on the designing of crystal As shown in Fig. 1a, for the front-end model, we used our pro-
model architecture, for improving the accuracy of property predic- posed CrystalTransformer to generate atomic embeddings (Method I).
tion, while the studies on the atomic embedding are rare. Typically, it is Other pretrained atomic embeddings used GNN models (Method II
common to simply adopt 0–1 embedding as the atomic embedding shown in Fig. 1b). While some used artificially constructed features
algorithm9,10, which generally generates a sparse embedding matrix based on known atomic properties like the autoencoder-based
not conducive to the information extraction of models. approach43(Method III shown in Fig. 1c). The CrystalTransformer
In recent years, a large number of Transformer-based training model learns atomic embeddings directly from chemical information
methods39 and predictive models, such as OrbNet40, 3D-Transformer41, in crystal databases. Compared to Method III, which generates atomic
and so on, have been developed in the field of chemical molecular embeddings by processing on a predefined set of atomic properties,
property and structure prediction, which are believed to be able to our proposed ct-UAE can adapt to any desired material property
fully leverage the advantages of the Transformer architecture in pro- without relying on predefined atomic attributes.
cessing atomic interactions and capturing the three-dimensional To examine the atomic embeddings tensors obtained from dif-
structures, enabling efficient representation of the complex interac- ferent models, we used MP and MP* dataset for formation energy (Ef)
tions between atoms. Motivated by these advancements, we devel- and PBE bandgap (Eg), which are key properties for evaluating their
oped the home-made CrystalTransformer model to generate universal chemical stabilities and electronic performances. MP stands for the
atomic embeddings called ct-UAEs based on transformer architecture, 2018.6.1 version for MP44 dataset, which contains 69,239 materials
which learns a unique “fingerprint” for each atom, capturing the with properties. MP* denotes the 2023.6.23 version, which contains
essence of their roles and interactions within the materials. The 134,243 materials. For training, validation, and testing splits, we fol-
obtained embeddings are then transferred to different DL models. lowed the distribution of 60,000 (training), 5000 (validation), and
After using the clustering method of the Uniform Manifold Approx- 4239 (testing) for the MP dataset as used in previous works. While the
imation and Projection (UMAP) clustering42, we categorized atoms into MP*44, and their properties are split into 80% training, 10% validation,
different groups, analyzing the connection between the embeddings and 10% testing sets. It is worthy to note that, as discussed in Sup-
and the real atoms. plementary 1, the gaps in the band structures of solids in materials
databases such as the MP, which are defined as the difference
Results and discussions between the eigenvalues of the conduction-band minimum (CBM)
Universal atomic embeddings and valence-band maximum (VBM), were obtained by solving the
Generally, when predicting the properties such as formation energy Kohn-Sham (KS) equation with exchange-correlation (xc) in the
and bandgap of a material in deep-learning models, each atom is Perdew-Burke-Ernzerhof (PBE) parametrization45. In semiconductors
first embedded as features. This embedding process is the intrinsic and insulators, these PBE bandgaps E PBE g are not equal to their fun-
process of GNNs models such as CGCNN, ALIGNN, and MEGNET. Then damental gaps EG, but differ by a term called derivative discontinuity
the deeper feature extraction processes, including information of the xc energy Δxc46, leading to the substantial underestimation of

Nature Communications | (2025)16:1210 2

Article https://doi.org/10.1038/s41467-025-56481-x

Table 1 | Performance comparison (MAE) of various models on different datasets and different pretrained models
Model \ Target MP-Ef MP-Eg MP*-Ef MP*-Eg JARVIS-Ef JARVIS-Eg MC3D-E
None-CrystalTransformer 0.097 0.563 0.152 0.395 - - -
None-CGCNN9 0.083 0.384 0.085 0.342 0.080 0.531 5.558
None-MEGNET10 0.051 0.324 0.054 0.291 0.070 0.493 5.029
None-ALIGNN11 0.022 0.276 0.056 0.152 0.044 0.562 3.706
CT-CGCNN9 0.071 0.359 - - 0.066 0.463 5.341
CT-MEGNET10 0.049 0.304 - - 0.068 0.443 4.687
11
CT-ALIGNN 0.018 0.256 - - 0.043 0.536 3.705
CG-CGCNN9 0.074 0.378 - - - - -
MEG-CGCNN9 0.082 0.457 - - - - -
ALI-CGCNN9 0.077 0.386 - - - - -
Ef (eV/atom), Eg (eV) and E (eV) denote the formation energy, bandgap and total energy for materials. A-B implies the front (A) and back (B) end models, and None means trained from scratch with no
front-end model. CT indicates CrystalTransformer as front-end model, and all front-end model is pretrained on the MP* dataset.

(a) (b) (d) 3

(eV/atom)
3
CT-MEGNET
3 2
None-MEGNET 2
CT-CGCNN 1

2 0
None-CGCNN -1 1

(eV/atom)
-2
(eV/atom)

Predicted
1
-3
0
-4
0 -5

3
-2
Predicted

-2 CT-ALIGNN
2
None-ALIGNN
1
-3 0 -3
MAE R2 -1
-2
-4 CT-CGCNN: 0.073 0.986
Predicted

-3 -4
None-CGCNN: 0.083 0.982 -4
-5 -5
-4 -3 -2 -1 0 1 2 -5 -4 -3 -2 -1 0 1 2 3 -5
0 0.5
v

Target (eV/atom) Target (eV/atom)

v
Distribution
Fig. 2 | Comparison of effects on whether applying CrystalTransformer- distribution respectively. The MAE and R2 for None-CGCNN are 0.083 eV/atom and
generated universal atomic embeddings (ct-UAE) across different models and 0.982, respectively, while for CT-CGCNN, the MAE and R2 are 0.073 eV/atom and
the distribution for the entire dataset. Ef is the formation energy. MAE refers to 0.986 respectively. The MAE and R2 for None-MEGNET are 0.051 eV/atom and
the Mean Absolute Error. R2 is the R-squared value in predicting each property. 0.994, respectively, while for CT-MEGNET, the MAE and R2 are 0.049 eV/atom and
None means trained from scratch with no front-end model, and CT indicates 0.994 respectively. The MAE and R2 for None-ALIGNN are 0.022 eV/atom and 0.997,
CrystalTransformer or ct-UAE. a–c Plots of predicted formation energy versus respectively, while for CT-CGCNN, the MAE and R2 are 0.018 eV/atom and 0.997
target formation energy for CGCNN9, MEGNET10, and ALIGNN11 models on the MP respectively. d The distribution curve for the formation energy across the entire
dataset. The upper part and the right part denotes target and prediction data dataset. Source data are provided as a Source Data ﬁle.

E PBE
g compared to EG as large as 40–50%47,48. However, since the KS embeddings, denoted as N-CGCNN in Table 1 (N means the front-end
equation is constructed based on the kinetic energy and Coulomb model described above). As listed in Table 1, among the atomic
potentials between charged particles (electrons and ions), when embeddings pre-trained by different models, those who use ct-UAEs
specific exchange-correlation functionals are used, the eigenvalues (CT-CGCNN) perform the best, with 14% and 7% reduction in MAE for Ef
of the KS equation should capture the major physical interactions and Eg, also outperforming the best GNN front-end embeddings (CG-
within the interacting systems. Therefore, if the PBE bandgaps are CGCNN in this context) by 4% and 5% for both properties, respectively.
used as target in the deep learning model, the derived atomic The predicted formation energy versus target formation energy for
embedding should involve the atomic properties and the structural those models are listed in Fig. 2a–c.
information, since E PBE
g s have involved such information when con- Furthermore, as listed in Table 1, performances of GNN models
structing the KS Hamiltonian using the PBE-type xc functional. like CGCNN, MEGNET, and ALIGNN were enhanced by using the
Front-end models as CrystalTransformer, CGCNN, ALIGNN, and CrystalTransformer-generated atomic embeddings (ct-UAEs) eval-
MEGNET are first pre-trained on the expanded MP* dataset, focusing uated on the MP dataset. The CGCNN model transferred with
on the bandgap energy Eg and formation energy Ef predictive CrystalTransformer-generated embeddings (ct-UAEs), denoted by CT-
tasks. Subsequently, the extracted atomic embeddings are integrated CGCNN in Table 1, shows a significant reduction in MAE values for
into a CGCNN back-end model and trained on the original MP dataset, formation energy Ef, decreasing from 0.083 eV/atom to 0.071 eV/
which results in CT-CGCNN, CG-CGCNN, ALI-CGCNN, and so on. atom, a reduction of 14%, and for bandgap Eg, decreasing from
Table 1 shows a comparative MAE analysis to evaluate the relative 0.384 eV to 0.359 eV, a reduction of 7%. A similar reduction can be
performance enhancements attributable to the front-end atomic observed for MEGNET, denoted by CT-MEGNET in Table 1, with Ef

Nature Communications | (2025)16:1210 3

Article https://doi.org/10.1038/s41467-025-56481-x

Table 2 | Single-task versus multi-task embeddings on mean Then, different training strategies are used to evaluate the per-
absolute error (MAE) for formation energy (eV/atom) and formance of the model, and the results are listed in Table 3. The CTfreeze-
bandgap (eV) and R2 CGCNN, which employs frozen pre-trained embeddings from the
CrystalTransformer or ct-UAEs, achieves an MAE of 0.073 eV/atom for
Target None- CTEf -CG CTEg -CG CTMT@2p- CTMT@3p- CTMT@4p-
CG CG CG CG formation energy Ef and 0.358 eV for bandgap Eg. However, when
MAE(Ef) 0.083 0.071 0.078 0.068 0.069 0.068 integrating the coordinate embeddings together with ct-UAEs (chem-
2 istry information) into the CGCNN framework (CTchem+coords-CGCNN),
R (Ef) 0.984 0.987 0.983 0.987 0.987 0.986
the MAE increases from 0.071 eV/atom in the atom-embedding-only
MAE(Eg) 0.384 0.383 0.359 0.357 0.356 0.367
model to 0.085 eV/atom. Similarly, the MAE worsens from 0.359 eV to
R2(Eg) 0.845 0.845 0.850 0.849 0.851 0.847
0.395 eV for bandgap Eg.
CG indicates CGCNN. None means no embeddings are used, Ef and Eg denotes embeddings The ability and transferability of the universal atomic embedding
trained on the corresponding target. MT@np demotes embeddings trained with multi-task
are further tested on different databases and tasks. Each is cut into 8:1:1
learning on n properties.
for training, validation, and testing. Details on the dataset can be found
in Supplementary 2A. As for the Jarvis dataset49, the result is shown in
Table 1. The CT-CGCNN model demonstrates an improvement in
Table 3 | Various embedding approaches comparison on predicting both formation energy Ef and bandgap energy Eg. The MAEs
mean absolute error (MAE) for formation energy (eV/atom) for formation energy and bandgap are reduced from 0.080 eV/atom to
and bandgap (eV) and R2 0.066 eV/atom by 17.5% and from 0.531 eV to 0.463 eV by 12.8%,
respectively.
Target CT-CGCNN CTchem CTfreeze-CGCNN
+coords The embedding is further evaluated on the MC3D dataset. Prop-
-CGCNN
erties such as total energy (E) are chosen as the task, and the result is
MAE(Ef) 0.071 0.085 0.073
shown in Table 1. The MAE of CGCNN is reduced from 5.558 eV to
R2(Ef) 0.987 0.983 0.986 5.341 eV, indicating a 3.9% improvement. For the ALIGNN model, the
MAE(Eg) 0.359 0.395 0.358 MAE remains nearly unchanged. While for the MEGNET model, the
R2(Ef) 0.850 0.834 0.851 MAE decreases from 5.029 eV to 4.687 eV, showing a 6.8%
CT denotes embeddings trained on corresponding properties. CTchem+coords denotes atom and improvement.
coordinates embeddings, while CTfreeze denotes embeddings with zero grad when training the Additionally, we also investigated the suitability of ct-UAE on
back-end model.
energy-conserving interatomic potential (IAP) models, which are
trained based on the MPtrj dataset50. As demonstrated in Supple-
decreasing from 0.051 eV/atom to 0.049 eV/atom, a 4% reduction, and mentary 4, we trained ct-UAE on vectorial and scalar targets, i.e., force,
for bandgap Eg, decreasing from 0.324 eV to 0.304 eV, a reduction of stress, and energy. To benchmark, we re-trained CHGNet50, M3GNet51,
6%. ALIGNN also exhibits an improvement in Ef prediction accuracy, and MACE52 models on the MP-RELAX dataset proposed by M3GNet51.
denoted by CT-ALIGNN in Table 1, decreasing from 0.022 eV/atom to Remarkably, adding ct-UAEs to CHGNet resulted in a significant
0.018 eV/atom, a reduction of 18%, and for bandgap Eg, decreasing reduction in force loss, from 0.284 to 0.242 (a 14.8% decrease), along
from 0.276 eV to 0.256 eV, a 7% reduction. with a reduction in stress loss from 1.496 to 1.437 and a slight decrease
in energy loss from 0.460 to 0.457. For M3GNet, ct-UAE led to a slight
Transferability of ct-UAEs reduction in total loss (energy, force, stress) from 2.1236 to 2.1234 and
To further investigate the performance of the ct-UAEs on different in energy loss from 0.3597 to 0.3595, indicating a minor performance
properties, task-generated embeddings are transferred to different improvement. However, for MACE, the ct-UAE did not lead to a
tasks. For example, Ef-task-generated atomic embeddings are applied reduction in loss.
to bandgap prediction and Eg-task-generated embeddings to forma-
tion energy task. The results are listed in Table 2, denoted as CTE f -CG Interpretability
and CTE g -CG. Embeddings trained on bandgap tasks, when transferred This investigation leverages straightforward clustering algorithms to
to the formation task, lead to a measurable improvement in accuracy conduct an in-depth analysis of ct-UAEs. Here, the UMAP clustering
with the MAE decreasing from 0.083 to 0.078 eV/atom, a 6% reduction. method42 is employed to project these ct-UAEs into a two-
Further, although trained on a simple task such as formation energy, dimensional space, thereby offering a means to intuitively under-
the embedding reduces MAE on the more challenging bandgap pre- stand atomic characteristics in a reduced dimensional setting. Con-
diction by 0.2%. sequently, the dimensionality of the ct-UAEs is reduced from the
Further experiments focus on multi-task-generated embeddings original 89 × 128 to 89 × 2, and through the application of
(MT). As listed in Table 2, the embeddings trained from two properties the K-means clustering method53 in the two-dimensional space,
(formation energy and bandgap), denoted as MT@2p, yield better atoms are further categorized into three distinct groups as shown in
performance compared to single-task-generated embeddings. When Fig. 3a. The t-SNE clustering method54 is used as an additional sup-
transferred to the CGCNN model (CTMT@2p-CGCNN), the model plementary comparison, as shown in Fig. S1. Furthermore, the com-
achieves an MAE of 0.068 eV/atom for Ef and 0.357 eV for Eg, out- munity detection method55 is also used to directly cluster ct-UAEs
performing the baseline CGCNN (by an 18% reduction in Ef and a 7% into three categories without dimension reduction, as an additional
reduction in Eg) as well as the CGCNN variants using single-task method to investigate the interpretability of ct-UAEs, which is shown
embeddings, with a 4% reduction in Ef and 0.5% for bandgap. in Fig. S2.
Additional multi-task variants (MT@3p and MT@4p) incorporat- To determine the best number of clusters for atoms, the elbow
ing total energy and total magnetization are introduced. When intro- plots56 and siltouette coefficient graphs56 are needed, with quantitative
ducing MT@3p with an additional property of total energy, a 0.2% analysis shown in Fig. 3b, c. Both elbow plot and silhouette coefficient
reduction in bandgap MAE is achieved, with formation energy almost graph demonstrate that 3 or 4 clusters’ solution is the best choice for
unchanged. However, the introduction of magnetization in MT@4p the classification of atoms. In this work, the CrystalTransformer model
leads to a slight increase in the MAE for bandgap prediction from 0.357 can be trained with 2, 3 or 4 different properties, (but these atomic
to 0.367 eV, which is probably due to the physical differences between embeddings all show the same best number of clusters of 3 or 4
these two properties. clusters).

Nature Communications | (2025)16:1210 4

Article https://doi.org/10.1038/s41467-025-56481-x

(a) (b) 0.25

Silhouette Score
0.20
7
0.15

Component 2
6 0.10

5 0.05
2 3 4 5 6 7 8 9 10
Number of Clusters
4
(c) 24
3 22
20

SSE
18
2 16
14
1 12
-4 -3 -2 -1 0 1 10
2 3 4 5 6 7 8 9 10
Component 1 Number of Clusters
(d) (e) (f)
Formation energy (eV/atom)

Total Magnetization ( B)
0
6 20
Band gap (eV)

5
-1 15
4

-2 3 10

2
5
-3 1

0
0
-4
-1
Class A Class B Class C Class A Class B Class C Class A Class B Class C
Classes Classes Classes
Fig. 3 | Interpretability for CrystalTransformer-generated universal atomic the dashed line in (c), indicating that the slope of the Sum of Squared Error (SSE)
embeddings (ct-UAE) includes clustering elements and statistically validating curve is relatively steep when the number of clusters is 3. Five random seeds are
the clustering results. a UMAP (Uniform Manifold Approximation and Projection) used to get averaged results. d–f The violin plots of formation energy, bandgap,
maps ct-UAEs into two dimensions denoted as Component 1 and 2, while K-means and total magnetization of oxide compounds and oxygen allotropes from the
method clusters them into three categories denoted by three colors. The shadow Materials Project dataset, categorized into Classes A, B, and C using MT@4p
background reﬂects the number of elements in the cluster in the region. The darker embedding with UMAP. The total numbers of samples for Class A, Class B and Class
shadow indicates a higher number of elements in that cluster region. b, c Elbow plot C shown in (d–f) are 2197, 2719, and 7752, respectively. Parameters like outliers or
and silhouette score graph for optimal cluster number. The dashed line in (b) is center for violin plots are listed in the Source data. Source data are provided as a
located at 3, representing the silhouette score being at a relatively high level. So is Source Data ﬁle.

Based on the clustering results, most of the elements in the per- analysis resulted in 2197 compounds containing oxygen and Class A
iodic table can be categorized, as shown in Fig. 3a. This example elements, 2719 compounds containing oxygen and Class B elements,
divides all elements into 3 classes, called as Class A (the green cluster), and 7752 compounds containing oxygen and Class C elements. Violin
Class B (the yellow cluster), and Class C (the blue cluster). And indi- plots for each of the three properties are shown in Fig. 3d–f.
vidual elements that don’t appear in datasets are colored with gray. As illustrated in Fig. 3d, the formation energy of oxide compounds
Essentially, this clustering scheme based on ct-UAEs differs from the for the three classes of elements shows significant differences. the
traditional classification rules of elements in the periodic table arran- formation energy of Class A is concentrated between −2.5 eV/atom and
ged based on the atomic number of elements, but for the clarity and −4.0 eV/atom, indicating a relatively high chemical stability for oxides
convenience, we also presented the results using the periodic table containing A-class elements. Class B exhibits the widest range of for-
scheme as well. The detailed result of the UMAP clustering shown in mation energies, from near 0 eV/atom to −4 eV/atom. The formation
the periodic-table scheme is demonstrated in Fig. S3 in energy of oxides containing C-class elements is concentrated between
Supplementary 5. −1.0 eV/atom and −2.5 eV/atom, also indicating their relatively good
To further interpret the element classification and without loss of chemical stability, albeit with generally lower stability compared to
generality, we chose oxide compounds from the Materials Project, Class A. Specifically, the Class A includes Group IIA, IIIB, and IVB ele-
which yielded a total of 62,068 retrieved materials. From these, we ments, and their similar characteristics is the tendency that valence
filtered for those that contain data on formation energy, bandgap, and electrons participate in metallic bonding, contributing to the more
total magnetization. We then categorized the filtered materials into compact lattice structure57–59. The Class B includes most of Groups VB
three groups according to the previously determined element classi- to VIIIB, with their d-orbital electrons possessing close energy levels,
fication Classes A, B, and C clustered by MT@4p embedding using which distinguishes them from the main group elements dominated by
UMAP. Each group contains only elements from the corresponding s-orbital electrons and lanthanides and actinides influenced by f-orbital
class and oxygen, with no inclusion of elements from other classes. The electrons. Previous studies reported that these elements can

Nature Communications | (2025)16:1210 5

Article https://doi.org/10.1038/s41467-025-56481-x

participate the formation of crystals with unique electrical and thermal To further investigate the intrinsic information of the embed-
conductivity properties, as well as distinctive catalytic capabilities60–64. dings, we conduct reverse training experiments, which involves taking
The Class C includes Group IA, IB, and IIB elements, along with main a series of important elemental properties, including atomic radius,
group metals and nonmetals. These elements show electronic boiling temperature, melting temperature, electrical conductivity, first
exchange and sharing abilities in solid states. Among them, the alkali ionization energy as training targets to train a Catboost model66. 80%
metals and halogen tend to participate in the electron sharing to form of atomic embeddings are selected randomly as training data, while
the most stable structure. Group IB and IIB metals have relatively stable the rest serves as validation to calculate R2 (coefficient of determina-
d-orbital electrons, but can provide additional electron density during tion). The R2 for the model in predicting each property is calculated,
the formation of crystals, resulting in high melting points and good with the best results listed in Table 4, which reveals that, the Catboost
electrical conductivity65. model exhibits high values of R2 larger than 0.78. So even with small-
As illustrated in Fig. 3e, the bandgap distribution of oxide com- set data, the ct-UAEs are able to establish a robust connection with the
pounds for the three classes of elements also reveals distinct beha- physical and chemical properties of atoms. We further employed the
viors. The bandgaps of oxides containing A-class elements are SHAP67 algorithm to determine the most important dimensions con-
concentrated between 3 eV and 6 eV, indicating that they are primarily tributing to the final results. The outcome is averaged over multiple
wide-bandgap semiconductors. The bandgap for Class B is con- random seeds to maintain stability. While the results shown in Table 4
centrated between 0.5 eV and 2.5 eV, reflecting narrow-bandgap reveal that certain properties correspond to specific dimensions,
semiconducting behavior. The bandgap for Class C is concentrated which acts like genes. The calculated SHAP value is shown in Fig. S4.
between 1 eV and 4 eV, which fall within the typical semi- To further understand the difference between embeddings
conductor range. derived from different multi-tasks properties. We use the Dynamic
Lastly, Fig. 3f shows the distribution of magnetization across the Time Warping (DTW)68 method to measure the similarity between
three classes of elements. The magnetization of most elements across mean embedding from MT@2p, MT@3p and MT@4p. Averaging and
all three classes is concentrated near 0 μB, indicating that, most oxides window smoothing of size 5 are first conducted to reduce noise
exhibit very low net magnetic moments, characteristic of para- information and uncover the inner trends. The threshold of 0.013 is
magnetic or diamagnetic materials. Specifically, the magnetization of used to distinguish periods of high similarity from those with diver-
Class A is almost entirely centered at 0 μB. For Class B, the distribution gence, which was further shown by the inverse DTW distance in Fig. 4b,
of magnetization is broader, and a substantial number of elements and the reference line at 0.26 × 103 is served as a benchmark to
have magnetization values greater than 5 μB, demonstrating notable underscore the distinction between embeddings. Our analysis
ferromagnetic behavior. Also, Fe and Co are in class B. The magneti- revealed a pronounced alignment as shown in Fig. 4, suggesting similar
zation of Class C is primarily distributed between 0 μB and 5 μB. feature evolution despite the introduction of more tasks related to
total energy and total magnetization.
As shown in Fig. 4a, the blue region indicate that the corespond-
Table 4 | Most important feature dimensions for various ing embeddings share some similarity, which also equals to the values
properties in Fig. 4b being above the threshold. The observation indicates that
although the target properties is diverse, the basic trends of the
Properties Important Feature Dimensions R2
embeddings remain largely the same. Of particular note is the high
Radius 98, 109 0.784
similarity between the embeddings of the MT@3p and MT@2p mod-
Boiling Temperature 63, 11 0.864 els, which only differ by one total energy task. In contrast, the intro-
Melting Temperature 45, 91 0.856 duction of magnetic properties in the MT@4p model led to al
Electrical Conductivity 126, 9 0.831 disagreements but still contained relatively sufficient similarities. The
First Ionization Energy 85, 20 0.907 fact that the average standard deviation of each embedding is 0.0358,
R2 is the R-squared value in predicting each property. 0.0397, and 0.0481 shows that the average variance of the embeddings
The bold type indicates the most important dimension. are close to each other. Further, we calculate the variance of each

(a) 0.04 MT@2 (b) 1.06

MT@3
Inverse DTW Distance (× 103 )
Average Embedding Value

0.03
MT@4
0.02 0.79

0.01

0.00 0.53

-0.01
0.26
-0.02

-0.03
0.00
0 20 40 60 80 100 120 0 20 40 60 80 100 120
Dimension Index Dimension Index

Fig. 4 | The analysis of the similarity of CrystalTransformer-generated universal distance < 0.013). The average standard deviation of MT@2p, MT@3p and MT@4p
atomic embeddings (ct-UAE) obtained from multi-task training with different across different dimensions are 0.0358, 0.0397, and 0.0481 respectively. Similar
numbers of properties. MT@np demotes embeddings trained with muti-task standard deviations denote similar magnitude of variance per atom. b Inverse DTW
learning on n properties. MT@2p is trained using formation energy and bandgap, distances, with notable variance and a reference line at 0.26 (scaled by 103). This
while MT@3p adds total energy, and MT@4p further includes total magnetization. value is determined to cover the curves with similar trends in (a) and distinguish
DTW is the Dynamic Time Warpin method. a Multi-task embedding comparison for between similar and dissimilar trend regions. Source data are provided as a Source
MT@2p, MT@3p, and MT@4p, highlighting DTW similarity regions (max Data ﬁle.

Nature Communications | (2025)16:1210 6

Article https://doi.org/10.1038/s41467-025-56481-x

Fig. 5 | Flowcharts and results comparison on using ct-UAE trained on different the MAE for UAE-free case is 0.32 eV/atom. Using the transfer learning strategy
tasks or not for perovskite property prediction. Ef is the formation energy. MAE with ct-UAE results in an MAE of 0.030 eV/atom, while the MAE for the transfer
is the Mean Absolute Error. R2 is the R-squared value in predicting each property. learning strategy with ct-UAE trained using multi-task learning is 0.021 eV/atom.
The prefix None- denotes models that do not use ct-UAE. The prefix CT- indicates b, c Predicted formation energy versus target formation energy for the MEGNET10
models that use ct-UAE. The perfix CTMT@np is models that use ct-UAE trained by n and CGCNN9 models. The upper part and the right part denotes target and pre-
properties. a Schematic representation of the workflow for applying ct-UAEs to diction data distribution respectively.
predict properties of perovskite materials. When the back-end model is MEGNET10,

embeddings’ standard Deviation (std), the result is 0.016, 0.014, and atomic species) and the coordinate input with batch × N × D size (Here
0.017, which implies that the std of the 128 dimensions is stable. D indicates the spatial dimension (equals 3 in this context)), the model
first topologically augments the coordinates input using translation
Application in hybrid organic-inorganic perovskite crystals and rotation transformation. The details of the augmentation are
Hybrid organic-inorganic perovskite (HOIP) materials are gaining tre- described in Supplementary 6. After that, both inputs are applied by
mendous attention for their outstanding optoelectronic properties. linear transformation to embed features to a dimension of C.
However, studying on the HOIP materials is hindered due to the
A0 = AW A + b ,
scarcity of training data69,70. Contrary to other materials, HOIP crystals A
ð1Þ
lack a large and high-quality database because of their complex
synthesis. Such scarcity of data presents a significant challenge for
X0 = XWX + b ,
X
traditional deep learning models. ð2Þ
After merging two distinct datasets of HOIP materials69,70, we
created a more diverse dataset containing 2103 HOIP crystals, which is where A denotes the one-hot initialization for atom input features, with
still small compared to other material databases such as the MP the dimension of batch × N × L. Tensor X denotes the atom position
dataset44. Figure 5a shows the workflow for applying ct-UAEs to predict coordinates, with the dimension of batch × N × D. WA and WX denote the
properties of HOIP materials, with the results presented in Fig. 5. The weight matrices for atom features and position coordinates respectively,
MAE of CGCNN model for predicting formation energy (Ef) of HOIP while bA and bX denote their corresponding biases. A0 and X 0 have the
materials was reduced significantly, from 0.054 eV/atom to 0.046 eV/ same dimension of batch × L × C. It should be noted that the WA and bA
atom by a 16% improvement. Similarly the MAE of MEGNET reduces matrix or the AWA + bA output (A is one-hot input) are the embedding
from 0.032 eV/atom to 0.021 eV/atom, by nearly 34.38%. For ALIGNN, matrix of the atomic information, which is the most important part of
the MAE values are not in the same magnitude as the aforementioned the CrystalTransformer model. The transformed atoms and position
models. Figure 5b, c shows the plot of predicted formation energy features are then concatenated along the feature dimension by,
versus the target formation energy for the MEGNET and CGCNN
models. M = Concat ðA0 , X0 Þ, ð3Þ

Methods where M is the concatenated feature matrix with the shape of

The crystaltransformer model batch × N × 2C. Then, the Multihead Transformer’s encoder is applied
To construct universal atomic embeddings, the vanilla transformer to M, which consists of multiple layers of multihead-self-attention and
algorithm is introduced as the main part of the model, resulting in the feed-forward neural networks, written as,
home-made model named CrystalTransformer, whose architecture is
shown in Fig. 6. When given an atom input in a batch of size batch for Z ðlÞ = MultiheadTransformerEncoderLayer ðZ ðl1Þ Þ, ð4Þ
N-atom with L features per atom (L denotes a one-hot encoding of the

Nature Communications | (2025)16:1210 7

Article https://doi.org/10.1038/s41467-025-56481-x

(a) Input Input (b)

Chemical Coordinates
Information Information
Extraction Extraction

Atomic Embedding

Matrix Matrix

Linear Layer

Add & Norm

Feed Forward
Topological Data
Add & Norm augmentation

Linear Layer
Crystal properties

Fig. 6 | The structure of the CrystalTransformer model. a The main part of which include multi-head self-attention, feedforward layers, and other compo-
CrystalTransformer model. InputA and InputX denote atom (chemistry) and struc- nents, to produce the output target. b Chemical information extraction layer.
ture (coordinates) information respectively. After passing through the information InputA is ﬁrst passed through an embedding layer, followed by a linear transfor-
extraction layers, the inputs are transformed into the A matrix and X matrix. These mation. c Coordinates information extraction layer. InputX undergoes data aug-
two matrices are then concatenated and processed through the Transformer layers mentation followed by a linear transformation.

where Z(0) = M and l indexes the layer of the encoder. Each Multihead 0.458 eV when trained on the MP bandgap dataset. By comparison of
Transformer encoder layer processes the input sequence and updates the definition of attention weights αij in Transformer as shown in Eq
it through multihead self-attention mechanisms and point-wise feed- (S16), and the general expression describing physical interaction
forward networks, as described in the Supplementary 2B section. After between atoms V(rij) as shown in Eq (S17), it is straightforward that the
processing the crystal structure features through the Transformer attention weight αij is analogous to the physical interaction coefficients
encoder, the CrystalTransformer model selects the first token from the between atoms V(rij), which suggests that the attention mechanism can
output sequence for downstream prediction tasks, which is passed learn spatial relationships and interaction properties in real physical
through a linear layer to produce the network’s predicted material systems.
properties as, The CrystalTransformer method exhibits a theoretical complexity
primarily driven by the self-attention mechanism in its Transformer
ypred = LinearðZ ðLÞ
1 Þ,
ð5Þ layers. For a crystal with n atoms, the self-attention mechanism com-
putes pairwise correlations with a complexity of Oðn2 dÞ, where d is
where Z ðLÞ
1 denotes the first token of the final encoder layer’s output, the dimensionality of atomic features. Traditional Graph Neural Net-
and ypred is the material properties predicted by the network. works (GNNs), by contrast, typically operate with a lower theoretical
2
The Transformer’s multihead-self-attention mechanisms allow the complexity of Oðn d Þ due to their localized edge-based interactions.
model to learn representations that can capture the underlying Despite this, the manageable scale of n in conventional crystal struc-
mechanisms for material properties. It not only processes the chemical tures results in a feasible runtime for both approaches. Real runtime
part, but also incorporates the coordinates part. To further investigate experiments show CrystalTransformer required 21 seconds for 100
the role of the coordinates part of CrystalTransformer in model per- batches with 512 crystals per batch, while CGCNN needed 10 seconds,
formance, an ablation study and qualitative analysis are conducted as which is evenly matched. Further details are available in
described in Supplementary 7, which shows that the coordinates part Supplementary 3.
encapsulates important geometric characteristics of crystal systems In order to test the CrystalTransformer model’s performance on
and is important in training the embeddings. Without the coordinates crystal datasets, we conducted performance assessments against
part, the training MAE will increase, with MAE from 0.395 eV to established graph neural network models. These models were

Nature Communications | (2025)16:1210 8

Article https://doi.org/10.1038/s41467-025-56481-x

evaluated on the MP and MP* dataset for formation energy (Ef) and PBE embeddings used in this work. Source data are provided with this
bandgap (Eg), As listed in Table 1, the None-CrystalTransformer, None- paper as a Source Data file. Source data are provided with this paper.
CGCNN, None-MEGNET, None-ALIGNN, denote the models trained
from scratch without any front-end model, which is the traditional Code availability
method. It is clear that, despite lacking the prior inputs of atomic The ct-UAE source code used in this study is publicly available on
features and edge information of crystals, the None- GitHub (https://github.com/fduabinitio/ct-UAE) under MIT license. ct-
CrystalTransformer demonstrates competitive accuracy in predicting UAEv1.075 (https://doi.org/10.5281/zenodo.14557908) was used to
material properties, i.e., only 1–4 times larger in Ef and 1–3 times larger generate all embeddings in this work.
in Eg compared to the traditional GNNs models on MP/MP* datasets.
The increase in MAE is partly because it does not strictly rely on pre- References
defined graph structures and inductive bias. The lack of certain 1. Jumper, J. et al. Highly accurate protein structure prediction with
inductive biases compels the model to acquire this knowledge inde- alphafold. Nature 596, 583–589 (2021).
pendently. Although diminishing its predictive capabilities, it does 2. Davies, A. et al. Advancing mathematics by guiding human intuition
encourage the model’s parameters to assimilate additional informa- with AI. Nature 600, 70–74 (2021).
tion, leading to more informative embeddings, as described in Table 1. 3. Kirkpatrick, J. et al. Pushing the frontiers of density functaionals by
solving the fractional electron problem. Science 374,
Crystal-symmetry restrictions and data augmentation 1385–1389 (2021).
The ct-UAE method accounts for the rotational and translational 4. Zhou, J. et al. Graph neural networks: a review of methods and
invariance through its architecture and data augmentation strategy applications. AI open 1, 57–81 (2020).
as described in Supplementary 6. While the ct-UAE front-end indeed 5. Zhuo, Y., Mansouri Tehrani, A. & Brgoch, J. Predicting the band gaps
does not explicitly enforce rotational and translational invariance, of inorganic solids by machine learning. J. Phys. Chem. Lett. 9,
the back-end GNN model is designed to ensure this restriction of 1668–1673 (2018).
symmetries. Actually, the front-end model can easily learn and 6. Butler, K. T., Davies, D. W., Cartwright, H., Isayev, O. & Walsh, A.
maintain symmetries through data augmentation. To validate this Machine learning for molecular and materials science. Nature 559,
assertion, firstly we used a stronger data augmentation method to 547–555 (2018).
train the MT@3p model on the MP* dataset. Then a group of crystals 7. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521,
are randomly selected, and subjected to random augmentations 436–444 (2015).
through rotations and translations. The consistency of the output 8. Wu, Z. et al. A comprehensive survey on graph neural networks.
vectors from these augmented samples was assessed using pairwise IEEE Transact. Neural Netw. Learn. Syst. 32, 4–24 (2020).
cosine similarity and Euclidean distance. The trained MT@3p model 9. Xie, T. & Grossman, J. C. Crystal graph convolutional neural net-
achieved an average cosine similarity of 0.998 and an average works for an accurate and interpretable prediction of material
Euclidean distance of 0.275, indicating that the output vectors were properties. Phys. Rev. Lett. 120, 145301 (2018).
nearly identical across augmentations. Notably, a recent study 10. Chen, C., Ye, W., Zuo, Y., Zheng, C. & Ong, S. P. Graph networks as a
employing a similar method of data augmentation demonstrated universal machine learning framework for molecules and crystals.
that, unconstrained model architectures like transformers can be Chem. Mater. 31, 3564–3572 (2019).
trained to achieve a high degree of invariance such as rotational 11. Choudhary, K. & DeCost, B. Atomistic line graph neural network for
invariance by learning these symmetries from data71, and this improved materials property predictions. npj Comput. Mater. 7,
unconstrained architecture can, in fact, lead to improved perfor- 185 (2021).
mance, which is essentially consistent with the rationale behind our 12. Park, C. W. & Wolverton, C. Developing an improved crystal graph
proposed front-end model of CrystalTransformer. convolutional neural network framework for accelerated materials
discovery. Phys. Rev. Mater. 4, 063801 (2020).
Multi-task learning method 13. Qiao, Z., Welborn, M., Anandkumar, A., Manby, F. R. & Miller,
MTL72–74 (multi-task learning) is a learning method. The model is T. F. Orbnet: deep learning for quantum chemistry using
trained simultaneously on different tasks, while the parameter is symmetry-adapted atomic-orbital features. J. Chem. Phys.
optimized toward the trend that all tasks improve. This training 153, 124111 (2020).
method enhances generalization. In the context of Crystal- 14. Gasteiger, J., Groß, J. & Günnemann, S. Directional message pas-
Transformer, MTL stands for different properties for materials. The sing for molecular graphs. In Proc. International Conference on
loss function is a weighted sum of the loss for each task: Learning Representations (ICLR, 2020).
X 15. Gasteiger, J., Giri, S., Margraf, J. T. & Günnemann, S. Fast and
LMTL = wi Lossi ðypred, i , ytarget, i Þ, ð6Þ uncertainty-aware directional message passing for non-equilibrium
i molecules. In Machine Learning for Molecules Workshop at NeurIPS
(NIPS, 2020).
where Lossi could be MSE or MAE, wi are the task weights, and i indexes 16. Shui, Z. & Karypis, G. Heterogeneous molecular graph neural
the task. MTL is capable of ensuring the universality of atomic networks for predicting molecule properties. In 2020 IEEE Inter-
embeddings, rather than developing an UAE that is specially optimized national Conference on Data Mining (ICDM), pages 492–500.
on a single task. (IEEE, 2020).
17. Schütt, K. T., Arbabzadah, F., Chmiela, S., Müller, K. R. & Tkatch-
Reporting summary enko, A. Quantum-chemical insights from deep tensor neural net-
Further information on research design is available in the Nature works. Nat. Commun. 8, 13890 (2017).
Portfolio Reporting Summary linked to this article. 18. Anderson, B., Hy, T.-S. & Kondor, R. Cormorant: covariant molecular
neural networks. In Proc. 33rd International Conference on Neural
Data availability Information Processing Systems (NIPS, 2019).
The embeddings generated by our ct-UAEs are available on Github 19. Zhang, S., Liu, Y. & Xie, L. Molecular mechanics-driven graph neural
(https://github.com/fduabinitio/ct-UAE) under MIT license. ct- network with multiplex graph for molecular structures. In Machine
UAEv1.075(https://doi.org/10.5281/zenodo.14557908) contains all the Learning for Molecules Workshop at NeurIPS (NIPS, 2020).

Nature Communications | (2025)16:1210 9

Article https://doi.org/10.1038/s41467-025-56481-x

20. schuett, K. T. et al. Schnetpack: a deep learning toolbox for ato- 41. Wu, F. et al. 3d-transformer: molecular representation with trans-
mistic systems. J. Chem. Ther. Comput. 15, 448–455 (2018). former in 3d space. (2021).
21. Jha, D. et al. Elemnet: deep learning the chemistry of materials from 42. Healy, J., McInnes, L. Uniform manifold approximation and projec-
only elemental composition. Sci. Rep. 8, 17593 (2018). tion. Nat. Rev. Methods Primers 4, 82 (2024).
22. Westermayr, J., Gastegger, M. & Marquetand, P. Combining schnet 43. Herr, J. E., Koh, K., Yao, K. & Parkhill, J. Compressing physics with an
and sharc: the schnarc machine learning approach for excited-state autoencoder: creating an atomic species representation to improve
dynamics. J. Phys. Chem. Lett. 11, 3828–3834 (2020). machine learning models in the chemical sciences. J. Chem. Phys.
23. Wen, M., Blau, S. M., Spotte-Smith, E. W. C., Dwaraknath, S. & 151, 084103 (2019).
Persson, K. A. Bondnet: a graph neural network for the prediction of 44. Jain, A. et al. Commentary: the materials project: a materials gen-
bond dissociation energies for charged molecules. Chem. Sci. 12, ome approach to accelerating materials innovation. APL Mater.
1858–1868 (2021). 1, 011002 (2013).
24. Isayev, O. et al. Universal fragment descriptors for predicting 45. Perdew, J. P., Burke, K. & Ernzerhof, M. Generalized gradient
properties of inorganic crystals. Nat. Commun. 8, 15679 (2017). approximation made simple. Phys. Rev. Lett. 77, 3865 (1996).
25. Chen, C., Zuo, Y., Ye, W., Li, X. & Ong, S. P. Learning properties of 46. Perdew, J. P. & Levy, M. Physical content of the exact kohn-sham
ordered and disordered materials from multi-fidelity data. Nat. orbital energies: band gaps and derivative discontinuities. Phys.
Comput. Sci. 1, 46–53 (2021). Rev. Lett. 51, 1884 (1983).
26. Lu, S. et al. Accelerated discovery of stable lead-free hybrid 47. Borlido, P. et al. Large-scale benchmark of exchange–correlation
organic-inorganic perovskites via machine learning. Nat. Commun. functionals for the determination of electronic band gaps of solids.
9, 3405 (2018). J. Chem. Ther. Comput. 15, 5069–5079 (2019).
27. Chen, D. et al. Automating crystal-structure phase mapping by 48. Borlido, P. et al. Exchange-correlation functionals for band gaps of
combining deep learning with constraint reasoning. Nat. Machine solids: benchmark, reparametrization and machine learning. npj
Intell. 3, 812–822 (2021). Comput. Mater. 6, 1–17 (2020).
28. Lee, X. Y. et al. Fast inverse design of microstructures via generative 49. Choudhary, K. et al. The joint automated repository for various
invariance networks. Nat. Comput. Sci. 1, 229–238 (2021). integrated simulations (JARVIS) for data-driven materials design.
29. Zhang, X., Zhou, J., Lu, J. & Shen, L. Interpretable learning of voltage npj Computational Materials 6, 173 (2020).
for electrode design of multivalent metal-ion batteries. npj Comput. 50. Deng, B. et al. Chgnet as a pretrained universal neural network
Mater. 8, 175 (2022). potential for charge-informed atomistic modelling. Nat. Machine
30. Ju, S. et al. Exploring diamondlike lattice thermal conductivity Intell. 5, 1031–1041 (2023).
crystals via feature-based transfer learning. Phys. Rev. Mater. 5, 51. Chen, C. & Ong, S. P. A universal graph deep learning interatomic
053801 (2021). potential for the periodic table. Nat. Comput. Sci. 2, 718–728
31. Devlin, J., Chang, M. W., Lee, K. & Toutanova, K. BERT: pre-training (2022).
of deep bidirectional transformers for language understanding. In 52. Batatia, I., Kovacs, D. P., Simm, G., Ortner, C. & Csányi, G. Mace:
Proc. 2019 Conference of the North American Chapter of the Higher order equivariant message passing neural networks for fast
Association for Computational Linguistics: Human Language and accurate force fields. Adv. Neural Inform. Process. Syst. 35,
Technologies (Long and Short Papers) 4171–4186 (ACL, 2019). 11423–11436 (2022).
32. Kim, D., Saito, K., Saenko, K., Sclaroff, S. & Plummer, B. Mule: 53. Ahmed, M., Seraj, R. & Islam, S. M. S. The k-means algorithm: a
multimodal universal language embedding. Proc. AAAI Conference comprehensive survey and performance evaluation. Electronics 9,
on Artificial Intelligence. 34, 11254–11261 (2020). 1295 (2020).
33. Li, Y. & Yang, T. Word embedding for understanding natural lan- 54. van der Maaten, L. & Hinton, G. Visualizing Data using t-SNE. Journal
guage: a survey. Guide Big Data Appl. 26, 83–104 (2018). of Machine Learning Research 9, 2579–2605 (2008).
34. Lee, J. & Asahi, R. Transfer learning for materials informatics using 55. Traag, V. A., Waltman, L. & Van Eck, N. J. From louvain to
crystal graph convolutional neural network. Comput. Mater. Sci. leiden: guaranteeing well-connected communities. Sci. Rep. 9,
190, 110314 (2021). 1–12 (2019).
35. Feng, S., Zhou, H. & Dong, H. Application of deep transfer learning 56. Saputra, D. M., Saputra, D. & Oswari, L. D. Effect of distance metrics
to predicting crystal structures of inorganic substances. Comput. in determining k-value in k-means clustering using elbow and sil-
Mater. Sci. 195, 110476 (2021). houette method. In Sriwijaya international conference on informa-
36. Yamada, H. et al. Predicting materials properties with little data tion technology and its applications (SICONIAN 2019), pages
using shotgun transfer learning. ACS Central Sci. 5, 341–346. (Atlantis Press, 2020).
1717–1730 (2019). 57. Heaven, M. C., Bondybey, V. E., Merritt, J. M. & Kaledin, A. L. The
37. Kim, J., Jung, J., Kim, S. & Han, S. Predicting melting tem- unique bonding characteristics of beryllium and the group iia
perature of inorganic crystals via crystal graph neural network metals. Chem. Phys. Lett. 506, 1–14 (2011).
enhanced by transfer learning. Comput. Mater. Sci. 234, 58. Zhang, Y., Liu, W. & Niu, H. Native defect properties and p-type
112783 (2024). doping efficiency in group-iia doped wurtzite aln. Phys. Rev. B 77,
38. Jha, D. et al. Enhancing materials property prediction by leveraging 035201 (2008).
computational and experimental data using deep transfer learning. 59. Ri, S.-R., Ri, J.-E., Ri, N.-C. & Hong, S.-I. One way for thermoelectric
Nat. Commun. 10, 5316 (2019). performance enhancement of group iiib monochalcogenides. Solid
39. Choukroun, Y. & Wolf, L. Geometric transformer for end-to-end State Commun. 339, 114485 (2021).
molecule properties prediction. In Proc. Thirty-First International 60. Liu, W.-S., Zhang, B.-P., Zhao, L.-D. & Li, J.-F. Improvement of ther-
Joint Conference on Artificial Intelligence, IJCAI-22 2895–2901 moelectric performance of cosb3−xxtex skutterudite compounds by
(IJCAI, 2022). additional substitution of ivb-group elements for sb. Chem. Mater.
40. Qiao, Z., Welborn, M., Anandkumar, A., Manby, F. R. & Miller, 20, 7526–7531 (2008).
T. F. Orbnet: deep learning for quantum chemistry using 61. Yin, Y., Yi, M. & Guo, W. High and anomalous thermal conductivity in
symmetry-adapted atomic-orbital features. J. Chem. Phys. monolayer msi2z4 semiconductors. ACS Appl. Mater. Interfaces 13,
153, 124111 (2020). 45907–45915 (2021).

Nature Communications | (2025)16:1210 10

Article https://doi.org/10.1038/s41467-025-56481-x

62. Song, J. et al. Performance enhancement of perovskite solar cells Author contributions
by doping tio2 blocking layer with group vb elements. J. Alloys H.Z. conceived the project and contributed to securing funding. H.Z. and
Compounds 694, 1232–1238 (2017). Y.C. supervised the research. L.J. and Z.D. developed and trained the
63. Patsalas, P. et al. Conductive nitrides: growth principles, optical and neural networks and analyzed the results. L.J. and Z.D. wrote the original
electronic properties, and their perspectives in photonics and manuscript. L.J., Z.D., L.S., Y.X., Y.M., Y.C., and H.Z. contributed to the
plasmonics. Mater. Sci. Eng. R Rep. 123, 1–55 (2018). discussion of results and manuscript preparation and revision.
64. Awadallah, A. E., Aboul-Enein, A. A., El-Desouki, D. S. & Aboul-Gheit,
A. K. Catalytic thermal decomposition of methane to cox-free Competing interests
hydrogen and carbon nanotubes over mgo supported bimetallic The authors declare no competing interests.
group viii catalysts. Appl. Surf. Sci. 296, 100–107 (2014).
65. Chattopadhyay, S., Mani, B. K. & Angom, D. Triple excitations in Additional information
perturbed relativistic coupled-cluster theory and electric dipole Supplementary information The online version contains
polarizability of group-IIB elements. Phys. Rev. A 91, 052504 (2015). supplementary material available at
66. Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V. & Gulin, https://doi.org/10.1038/s41467-025-56481-x.
A. CatBoost: Unbiased Boosting with Categorical Features. Pro-
ceedings of the 32nd International Conference on Neural Informa- Correspondence and requests for materials should be addressed to
tion Processing Systems, 6639–6649 (Curran Associates Inc., 2018). Yan Cen or Hao Zhang.
67. Lundberg, S. M. & Lee, S.-I. A Unified Approach to Interpreting
Model Predictions. Proceedings of the 31st International Conference Peer review information Nature Communications thanks the anon-
on Neural Information Processing Systems, 4768–4777 (2017). ymous, reviewer(s) for their contribution to the peer review of this work.
68. Müller, M. Dynamic Time Warping. Information Retrieval for Music A peer review file is available.
and Motion 69–84 (Springer Berlin Heidelberg, 2007).
69. Kim, C., Huan, T. D., Krishnan, S. & Ramprasad, R. A hybrid organic- Reprints and permissions information is available at
inorganic perovskite dataset. Sci. Data 4, 1–11 (2017). http://www.nature.com/reprints
70. Nakajima, T. & Sawada, K. Discovery of pb-free perovskite solar cells
via high-throughput simulation on the k computer. J. Phys. Chem. Publisher’s note Springer Nature remains neutral with regard to jur-
Lett. 8, 4826–4831 (2017). isdictional claims in published maps and institutional affiliations.
71. Langer, M. F., Pozdnyakov, S. N. & Ceriotti, M. Probing the effects of
broken symmetries in machine learning. Machine Learn. Sci. Tech- Open Access This article is licensed under a Creative Commons
nol. 5, 04LT01 (2024). Attribution-NonCommercial-NoDerivatives 4.0 International License,
72. Sanyal, S. et al. Mt-cgcnn: Integrating crystal graph convolutional which permits any non-commercial use, sharing, distribution and
neural network with multitask learning for material property pre- reproduction in any medium or format, as long as you give appropriate
diction. arXiv:1811.05660 (2018). credit to the original author(s) and the source, provide a link to the
73. Thung, K.-H. & Wee, C.-Y. A brief review on multi-task learning. Creative Commons licence, and indicate if you modified the licensed
Multimedia Tools Appl. 77, 29705–29725 (2018). material. You do not have permission under this licence to share adapted
74. Zhang, Y. & Yang, Q. A survey on multi-task learning. IEEE Transact. material derived from this article or parts of it. The images or other third
Knowledge Data Eng. 34, 5586–5609 (2022). party material in this article are included in the article’s Creative
75. Luo, J. ct-UAE (GitHub, 2025). https://doi.org/10.5281/zenodo. Commons licence, unless indicated otherwise in a credit line to the
14557909 (2025). material. If material is not included in the article’s Creative Commons
licence and your intended use is not permitted by statutory regulation or
Acknowledgements exceeds the permitted use, you will need to obtain permission directly
The authors thank G. F. Zheng and H. Y. Yu for helpful discussions. This from the copyright holder. To view a copy of this licence, visit http://
work is supported by the National Key R&D Program of China creativecommons.org/licenses/by-nc-nd/4.0/.
(2023YFA1608501), Shanghai Municipal Natural Science Foundation
under Grant No. 24ZR1406600, and Natural Science Foundation of © The Author(s) 2025
Shandong Province under grants no. ZR2021MA041. L.J. and Z.D. also
want to acknowledge the support of FDUROP (Fudan’s Undergraduate
Research Opportunities Program) (24052, 23908).

Nature Communications | (2025)16:1210 11

Machine learning for materials science
No ratings yet
Machine learning for materials science
288 pages
Crystal GCN
No ratings yet
Crystal GCN
7 pages
Enabling Deeper Learning On Big Data For Materials Informatics Applications
No ratings yet
Enabling Deeper Learning On Big Data For Materials Informatics Applications
12 pages
Translated - 1 s2.0 S2095809918313559 Main
100% (1)
Translated - 1 s2.0 S2095809918313559 Main
10 pages
Ccdcgan Paper
No ratings yet
Ccdcgan Paper
7 pages
AFM Inno. Mat Sci Via ML
No ratings yet
AFM Inno. Mat Sci Via ML
14 pages
nanomaterials-14-01688
No ratings yet
nanomaterials-14-01688
22 pages
Predicting Material Properties Using Machine Learning for Accelerated Materials Discovery
No ratings yet
Predicting Material Properties Using Machine Learning for Accelerated Materials Discovery
9 pages
Predicting Microstructure-Dependent Mechanical Properties In
No ratings yet
Predicting Microstructure-Dependent Mechanical Properties In
32 pages
Machine Learning in Materials Science
No ratings yet
Machine Learning in Materials Science
7 pages
Data-Augmentation for Graph Neural Network Learning of the Relaxed Energies of Unrelaxed Structures
No ratings yet
Data-Augmentation for Graph Neural Network Learning of the Relaxed Energies of Unrelaxed Structures
7 pages
Advanced Science - 2023 - Muroga - A Comprehensive and Versatile Multimodal Deep‐Learning Approach for Predicting Diverse
No ratings yet
Advanced Science - 2023 - Muroga - A Comprehensive and Versatile Multimodal Deep‐Learning Approach for Predicting Diverse
12 pages
applsci-13-09992
No ratings yet
applsci-13-09992
22 pages
s41524-024-01426-z
No ratings yet
s41524-024-01426-z
11 pages
1 s2.0 S2666523923001575 Mainhyyyy
No ratings yet
1 s2.0 S2666523923001575 Mainhyyyy
24 pages
Tagade et al. - 2019 - Attribute driven inverse materials design using de
No ratings yet
Tagade et al. - 2019 - Attribute driven inverse materials design using de
14 pages
ARTIFICIAL INTELLIGENCE_MACHINE_LEARNING_FOR_MATERIALS_DISCOVERY_AND_OPTIMIZATION_NTMP
No ratings yet
ARTIFICIAL INTELLIGENCE_MACHINE_LEARNING_FOR_MATERIALS_DISCOVERY_AND_OPTIMIZATION_NTMP
26 pages
crystals-13-00602
No ratings yet
crystals-13-00602
10 pages
Advances of Machine Learning in Materials Science: Ideas and Techniques
No ratings yet
Advances of Machine Learning in Materials Science: Ideas and Techniques
40 pages
scientific report GNN应力应变场预测等
No ratings yet
scientific report GNN应力应变场预测等
12 pages
Machine Learning in Materials Science
No ratings yet
Machine Learning in Materials Science
21 pages
Scaling Deep Learning For Materials Discovery: Article
No ratings yet
Scaling Deep Learning For Materials Discovery: Article
11 pages
Materials 16 05977
No ratings yet
Materials 16 05977
30 pages
Atomistic Graph Networks For Experimental Materials Property Prediction
No ratings yet
Atomistic Graph Networks For Experimental Materials Property Prediction
22 pages
ML - For Renew - Energy - Mat
No ratings yet
ML - For Renew - Energy - Mat
23 pages
1 s2.0 S2542529324002360 Main
No ratings yet
1 s2.0 S2542529324002360 Main
11 pages
Predicting The Electronic and Structural Properties of Two-Dimensional Materials Using Machine Learning
No ratings yet
Predicting The Electronic and Structural Properties of Two-Dimensional Materials Using Machine Learning
14 pages
MatterGen (6)
No ratings yet
MatterGen (6)
33 pages
S0927025621000859
No ratings yet
S0927025621000859
1 page
GAMM-Mitteilungen - 2021 - Stoll - Machine Learning For Material Characterization With An Application For Predicting
No ratings yet
GAMM-Mitteilungen - 2021 - Stoll - Machine Learning For Material Characterization With An Application For Predicting
21 pages
A Review
No ratings yet
A Review
15 pages
Introduction_to_machine_learning_potentials_for_at
No ratings yet
Introduction_to_machine_learning_potentials_for_at
35 pages
ML for composites
No ratings yet
ML for composites
11 pages
2023 Representations of Materials for Machine Learning
No ratings yet
2023 Representations of Materials for Machine Learning
30 pages
Structure To Property: Chemical Element Embeddings and A Deep Learning Approach For Accurate Prediction of Chemical Properties
No ratings yet
Structure To Property: Chemical Element Embeddings and A Deep Learning Approach For Accurate Prediction of Chemical Properties
11 pages
Ai Materials Engineering
No ratings yet
Ai Materials Engineering
16 pages
A Review On Background and Applications of Machine Learning in Materials Research
No ratings yet
A Review On Background and Applications of Machine Learning in Materials Research
11 pages
Discovering and Understanding Materials Through Computation
No ratings yet
Discovering and Understanding Materials Through Computation
8 pages
1 s2.0 S0022509618310688 Main PDF
No ratings yet
1 s2.0 S0022509618310688 Main PDF
27 pages
DL - Cryst Class s41467-018-05169-6
No ratings yet
DL - Cryst Class s41467-018-05169-6
10 pages
S2666523923001575
No ratings yet
S2666523923001575
1 page
2019 Predicting tmechanical property by image based deep learning
No ratings yet
2019 Predicting tmechanical property by image based deep learning
19 pages
scalable deep GNN
No ratings yet
scalable deep GNN
20 pages
Atomistic Calculations and Materials Informatics - A Review-article-review-NA
No ratings yet
Atomistic Calculations and Materials Informatics - A Review-article-review-NA
10 pages
2309.09355v3
No ratings yet
2309.09355v3
16 pages
SchNet-a Deep Learning Architecture For Molecules and Materials Research Paper
No ratings yet
SchNet-a Deep Learning Architecture For Molecules and Materials Research Paper
11 pages
Unsupervised Machine Learning Discovery of Structural Units and Transformation Pathways From Imaging Data
No ratings yet
Unsupervised Machine Learning Discovery of Structural Units and Transformation Pathways From Imaging Data
11 pages
1 s2.0 S266638642100182X Main
No ratings yet
1 s2.0 S266638642100182X Main
36 pages
GNN ML For Materials ACS 2019
No ratings yet
GNN ML For Materials ACS 2019
31 pages
An intelligent computing system to detect material
No ratings yet
An intelligent computing system to detect material
5 pages
Artificial Intelligence and Machine Learning Techniques For Material Design and Discovery
No ratings yet
Artificial Intelligence and Machine Learning Techniques For Material Design and Discovery
10 pages
2020 Morgan D Annual Review of Matrials Research
No ratings yet
2020 Morgan D Annual Review of Matrials Research
35 pages
Machine Learning for Advanced Functional Materials Nirav Joshi - Read the ebook online or download it to own the full content
100% (1)
Machine Learning for Advanced Functional Materials Nirav Joshi - Read the ebook online or download it to own the full content
75 pages
Deep Generative Models For Materials Discovery and Machine Learning-Accelerated Innovation
No ratings yet
Deep Generative Models For Materials Discovery and Machine Learning-Accelerated Innovation
13 pages
2211.01490v1
No ratings yet
2211.01490v1
21 pages
Application of deep neural network learning in composites design
No ratings yet
Application of deep neural network learning in composites design
55 pages
Neural Networks Push Materials Beyond Their Breaking Points
No ratings yet
Neural Networks Push Materials Beyond Their Breaking Points
1 page
Materials Discovery and Design_ by Means of Data Science and Optimal Learning (Z-lib.io)
No ratings yet
Materials Discovery and Design_ by Means of Data Science and Optimal Learning (Z-lib.io)
266 pages
Mastering Deep Learning with Keras: From Basics to Expert Proficiency
From Everand
Mastering Deep Learning with Keras: From Basics to Expert Proficiency
William Smith
No ratings yet
PyTorch Essentials: A Comprehensive Guide to Machine Learning Techniques
From Everand
PyTorch Essentials: A Comprehensive Guide to Machine Learning Techniques
Adam Jones
No ratings yet
Factor_Analysis (1)
No ratings yet
Factor_Analysis (1)
8 pages
Principal_Components
No ratings yet
Principal_Components
5 pages
Comparisons_of_Several_Multivariate_Means (1)
No ratings yet
Comparisons_of_Several_Multivariate_Means (1)
9 pages
Inferences_Mean_Vector
No ratings yet
Inferences_Mean_Vector
4 pages
Lab Manual 2
100% (1)
Lab Manual 2
5 pages
Electrical Notes
No ratings yet
Electrical Notes
3 pages
Formulation Development and Evaluation of Sustained Release Tablets of Aceclofenac
0% (1)
Formulation Development and Evaluation of Sustained Release Tablets of Aceclofenac
128 pages
ma_s1-2_discrete-probability-distributions_201024
No ratings yet
ma_s1-2_discrete-probability-distributions_201024
41 pages
Schaum's Outline of Probability, Random Variables, and Random Processes, Fourth Edition Hwei P. Hsu - eBook PDFinstant download
100% (8)
Schaum's Outline of Probability, Random Variables, and Random Processes, Fourth Edition Hwei P. Hsu - eBook PDFinstant download
56 pages
MATH L4 M-A 2025- 30 COPIES
No ratings yet
MATH L4 M-A 2025- 30 COPIES
4 pages
Signature Verification Approach Using Fusion of Hybrid Texture Features
No ratings yet
Signature Verification Approach Using Fusion of Hybrid Texture Features
12 pages
Week11 Notes
No ratings yet
Week11 Notes
19 pages
syllabus
No ratings yet
syllabus
1 page
18.034 Honors Differential Equations: Mit Opencourseware
No ratings yet
18.034 Honors Differential Equations: Mit Opencourseware
5 pages
Probability
No ratings yet
Probability
4 pages
Curves For Computer Aided Geometric Design
No ratings yet
Curves For Computer Aided Geometric Design
21 pages
I YEAR TIME TABLE FOR 2024 - 2025 EVEN SEM FROM 05.02.25-5 (1)
No ratings yet
I YEAR TIME TABLE FOR 2024 - 2025 EVEN SEM FROM 05.02.25-5 (1)
1 page
Module 3
No ratings yet
Module 3
8 pages
Test 1. Sequences
No ratings yet
Test 1. Sequences
3 pages
Design and Sketching: Section
No ratings yet
Design and Sketching: Section
36 pages
Analytic Geometry
No ratings yet
Analytic Geometry
296 pages
Mind Maps and Math Problem Solving
100% (30)
Mind Maps and Math Problem Solving
11 pages
Leveson, Applying Systems Safety
No ratings yet
Leveson, Applying Systems Safety
17 pages
Ai TS 2 (X) - APT 2 - CMP - 16 09 2019 - SET A
100% (2)
Ai TS 2 (X) - APT 2 - CMP - 16 09 2019 - SET A
15 pages
Vibration & FFT Analyzer Basic
No ratings yet
Vibration & FFT Analyzer Basic
29 pages
The Democratization of Hedge Funds - (J.P. Morgan Asset Management)
No ratings yet
The Democratization of Hedge Funds - (J.P. Morgan Asset Management)
4 pages
Review On Piecewise Functions
100% (1)
Review On Piecewise Functions
46 pages
Grand Demonstaration Teaching
No ratings yet
Grand Demonstaration Teaching
6 pages
Agilent Jitter Analysis 1
No ratings yet
Agilent Jitter Analysis 1
11 pages
Solution Report For: Home My Test My Profile
No ratings yet
Solution Report For: Home My Test My Profile
18 pages
(PDF) 400+ Excel Formulas List - Excel Shortcut Keys PDF - Download Here
No ratings yet
(PDF) 400+ Excel Formulas List - Excel Shortcut Keys PDF - Download Here
60 pages
Lesson - Plan - in - Grade - 9 Math
No ratings yet
Lesson - Plan - in - Grade - 9 Math
3 pages
Finite Element Analysis of Rectangular and Anged Steel Concrete Shear Walls Under Cyclic Loading
No ratings yet
Finite Element Analysis of Rectangular and Anged Steel Concrete Shear Walls Under Cyclic Loading
23 pages

Transformer-generated atomic embeddings

Uploaded by

Transformer-generated atomic embeddings

Uploaded by

Article https://doi.org/10.

Transformer-generated atomic embeddings

Accelerating the discovery of novel crystal materials by machine learning is

processing. The machine-learning models for prediction of materials proper-

Nature Communications | (2025)16:1210 1

CGCNN ALIGNN MEGNet Other Feature

The property is mapped

Nature Communications | (2025)16:1210 2

(a) (b) (d) 3

Target (eV/atom) Target (eV/atom)

Nature Communications | (2025)16:1210 3

Nature Communications | (2025)16:1210 4

(a) (b) 0.25

Nature Communications | (2025)16:1210 5

(a) 0.04 MT@2 (b) 1.06

Nature Communications | (2025)16:1210 6

Methods where M is the concatenated feature matrix with the shape of

Nature Communications | (2025)16:1210 7

(a) Input Input (b)

Add & Norm

Nature Communications | (2025)16:1210 8

Nature Communications | (2025)16:1210 9

Nature Communications | (2025)16:1210 10

Nature Communications | (2025)16:1210 11

You might also like