0% found this document useful (0 votes)
15 views

2502.04984v1

The document presents FF7, a code package designed for high-throughput calculations and constructing material databases, facilitating data-intensive material discovery. It offers a command-line interface for customizable calculations, database creation, and machine learning model integration, specifically targeting the prediction of material properties. The package supports popular DFT software and includes modules for various functionalities, enhancing the efficiency of material research and discovery processes.

Uploaded by

darbyava27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

2502.04984v1

The document presents FF7, a code package designed for high-throughput calculations and constructing material databases, facilitating data-intensive material discovery. It offers a command-line interface for customizable calculations, database creation, and machine learning model integration, specifically targeting the prediction of material properties. The package supports popular DFT software and includes modules for various functionalities, enhancing the efficiency of material research and discovery processes.

Uploaded by

darbyava27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

FF7: A Code Package for High-throughput Calculations and Constructing Materials

Database

Tiancheng Maa , Zihan Zhanga , Shuting Wua , Defang Duana,∗, Tian Cuib,a
a Key Laboratory of Material Simulation Methods & Software of Ministry of Education and State Key Laboratory of Superhard Materials, College of Physics, Jilin
University, Changchun 130012, China
b Institute of High Pressure Physics, School of Physical Science and Technology, Ningbo University, Ningbo 315211, China
arXiv:2502.04984v1 [cond-mat.mtrl-sci] 7 Feb 2025

Abstract
Decades accumulation of theory simulations lead to boom in material database, which combined with machine learning methods
has been a valuable driver for the data-intensive material discovery, i.e., the fourth research paradigm. However, construction of
segmented databases and data reuse in generic databases with uniform parameters still lack easy-to-use code tools. We herein
develop a code package named FF7 (Fast Funnel with 7 modules) to provide command-line based interactive interface for per-
forming customized high-throughput calculations and building your own handy databases. Data correlation studies and material
property prediction can progress by built-in installation-free artificial neural network module and various post processing functions
are also supported by auxiliary module. This paper shows the usage of FF7 code package and demonstrates its usefulness by
example of database driven thermodynamic stability high-throughput calculation and machine learning model for predicting the
superconducting critical temperature of clathrate hydrides.
Keywords: high-throughput calculation; material database; mechine learning.

PROGRAM SUMMARY 1. Introduction

Program Title: FF7 Data-intensive scientific discovery was first proposed by Jim
CPC Library link to program files: (to be added by Technical Editor)
Gray in 2007 as the fourth research paradigm, following the
Code Ocean capsule: (to be added by Technical Editor)
Licensing provisions: MIT
traditional empirical trial-and-error method, theoretical mod-
Programming language: Python elling approaches and software simulation[1]. Its research idea
Nature of problem: Since the data-intensive material discovery under is analogous to that in the discovery of the laws of planetary mo-
the fourth paradigm progresses by the boom of database and machine tion by Tycho Brahe and his assistant Johannes Kepler where
learning, a handy code tool for performing flexible high-throughput the creation of theories is driven by the mining and analysis of
theory simulations, building your own database for specific re- captured and carefully archived massive astronomical observa-
search interest and constructing artificial neural network model for tion data. It emphasizes that big data and statistical learning
material properties prediction is highly desired. Comprehensive methods, or machine learning methods, are the primarily ba-
post-processing and graphic drawing functions are also required. sis of the fourth research paradigm[2]. Thanks to the develop-
Solution method: The first principal density function theory (DFT) ment of material simulation algorithms and first-principles DFT
simulations are performed by the VASP and Quantum Espresso code
calculation software (e.g. VASP[3], CASTEP[4] and Quan-
package. For flexibility of high-throughput calculations and database
construction, the calculation tasks are abstracted into a “calculation
tum Espresso(QE)[5]), vast volumes of raw materials science
card”. It contains the DFT software name, the computational parame- data have been accumulated by the high-performance comput-
ters, the variant (or file) to be written to the database, etc., all of which ing resources on a 24/7 basis and has brought us to the stage
can be customized by users. The functions of FF7 code package are of transformation of materials research methods. At the same
realized via the Linux command line for ease of use. time, the boom in machine learning models, like CGCNN[6]
Additional comments including restrictions and unusual features: and ALIGNN[7] which targets on crystal represention, has also
This program works on the Linux operating system with VASP and become another push for us to move forward. The stage is
Quantum Espresso code packages installed. set for the fourth paradigm to be applied in material design
field. In 2011, the Materials Genome Initiative (MGI) was
proposed and take the lead in bringing related research to the
fore[8]. It shifts our emphasis to targeted materials discov-
ery via high-throughput identification of the key factors (i.e.,
“genes”) and via showing how these factors can be quanti-
∗ Corresponding author. tatively integrated by statistical learning methods into design
E-mail address: [email protected] rules (i.e., “gene sequencing”) governing targeted materials
Preprint submitted to Computer Physics Communications February 10, 2025
functionality. The MGI generally involves three basic data ac- built-in HTP calculation functions could help to build the basic
tivities: capture, curation, and analysis, which correspond to database including stoichiometry, structure, energy, etc. and the
three infrastructures of high-throughput DFT calculation tools, code architecture design strategy of “calculation card” allows
databases, and machine learning models, respectively. Rely- the highly customized HTP calculations by users and con-
ing on the materials analysis python library pymatgen[9], the structing databases for niche areas. Furthermore, full featured
largest inorganic crystal materials database, Materials Project, post-processing tools and artificial neural network module for
containing hundred thousand items of the structural (e.g. lat- materials properties analysis and predictions are also integrated
tice parameters, space group) and material properties informa- in it. All functions of the FF7 code package are available
tion (e.g. electronic band structure, phonon dispersion curves through the Linux command-line based user-friendly com-
and elastic tensors), was created and became one of the key mand with a uniform command style for two DFT software.
infrastructures in the materials discovery field[10]. Based on The code architecture (section 2), detailed usages for each
the rise of various databases such as Materials Project[10], module (section 3) and example of aiding data driven materi-
ICSD[11], OQMD[12], etc. and the rapid development of als discovery (section 4) are discussed in the following sections.
machine learning arithmetic represented by the introduction
of Graph Convolutional Networks, a series of machine learn-
ing models based on crystal geometric features or simple de-
scriptors for predicting material properties (e.g. formation
enthalpy[6, 7], bandgap[6, 13], hardness[14] and superconduct-
ing transition temperatures (T c )[15–18]) have been developed,
which have significantly reduced computational costs and facil-
itated the subsequent discovery of materials with target proper-
ties.
In addition to utilizing large scale general-purpose databases,
the development of dedicated databases by individuals to ac-
celerate discovery of materials with specific properties is also
a way forward. Although the data scale may not be as large
as these general databases, it can also play an important role
in supporting the data-intensive discovery of targeted materi-
als by its localization and accuracy. Successful examples in
bandgap prediction[6, 13], high entropy alloy design[19] and
T c prediction of superconducting materials validate the feasi-
bility of this idea. The need for fine-grained domain databases,
spawned by the MGI and unmet by general-purpose databases,
creates an urgent requirement for user-friendly, full featured
code tools that can assist in building one’s own database. Fur-
thermore, reuse of data in general-purpose databases is a major
problem, especially for formation enthalpy convex hull calcu-
lations where uniform parameters are highly demanded, and
this requires code tools for high-throughput calculations and
Fig. 1. Architecture of FF7 code package.
databases construction. Finally, the stock of data dispersed
among small laboratories or individuals cannot be ignored.
Based on the MGI’s principle of data inclusiveness, code tools
are needed for interface-unified database construction, allowing 2. Architecture and design strategy
anyone to contribute to the data accumulation.
We have many excellent tools to assist DFT high-throughput The FF7 code package consists of seven main modules,
computation, such as VASPKIT[20], qvasp[21], JAMIP[13], which are streamed together in a workflow that fits the fourth
VASPMATE[22], etc., but there is still a lack of a code tool research paradigm, as shown in Fig. 1. Firstly, the “init”
that can be tightly connected to high-throughput computation, module initialize the FF7 code package by loading basic
create databases senselessly, and provide powerful database parameters such as the DFT software path, database path,
access and management functions. We herein report a code pseudopotentials path, etc. Then, a structure pool, consisting of
package named FF7 (Fast Funnels with 7 modules) that fully a series of crystal files in POSCAR format with “.vasp” suffix,
meet the requirements of constructing private database through is provided by user or generated by the “gen” module based on
high-throughput (HTP) calculations and support the workflow different strategies. The “htp” module can help to perform HTP
of data-intensive materials discovery. The mainstream density calculations traversing it, in which the DFT calculations is sup-
function theory (DFT) calculation software VASP and QE ported by the mainstream software of VASP and QE. In “htp”
are both supported, and notably latter’s high-throughput module, the high-throughput calculation workflow is abstracted
computational tools are developed for the first time. Several into a “calculation card” in a cyclic framework, as shown in the
2
third step in Fig. 1. The “calculation card” controls the DFT 3.2. gen
calculations and the rules for creating database, e.g. which The structure pool is a directory containing structure files
variable or file will be stored in the database. Apart from a with suffix “.vasp” in the form of POSCAR, which is the main
wealth of built-in calculation cards in FF7, users can customize work path of FF7 code package. Users can easily build their
“calculation cards” to perform task-specific high-throughput structure pools by copying structures of interest to them for
calculations and create databases that match their own research high-throughput calculations. Also, the “gen” module can help
interests, which greatly ensures the flexibility of FF7 code to construct a structure pool through element substitution based
package. The post-processing functions are also equipped on the POSCAR-formatted structure file “seed”. For binary
for most of the built-in HTP calculations by “post” module, compounds, the replacement of the first element in the “seed”
including extracting and summarizing data from output files, file can be done by with the command
graphic drawing, and as the highlight, the database driven high- ff7 gen -1 [Li,Na,K,Rb,Cs].
throughput calculations for thermodynamic stability (see the The “gen” module also supports the structure generation of
fourth step in Fig. 1). The “db” module allows users to print the ternary compounds from “seed” file. Commands for replacing
constructed database on the screen and supplied command-line two spatial unequal and equivalent elements are shown line 1
based functions of adding, deleting and extracting data of it. and line 11 in Fig. 2, respectively, and the FF7 will print a brief
Finally, the “nn” module and self-built databases are utilized for the generated compounds on the screen for checking (see
for the construction and training of artificial neural networks to line 2-9 and 12-27 in Fig. 2).
have the role of predicting material properties, which in turn
facilitates material discovery, realizing the research life cycle 1 $ ff7 gen -1 [Li,Na,K,Rb] -2 [Be,Si,P]
in the fourth research paradigm. Overall, the design strategy of 2 M_1 X_1 H_8
FF7 code packages targeted on the ease to use and flexibility. 3 | Be Si P
The database occupies the most important place, and all the 4 -----------
5 Li| @ @ @
modules are highly interconnected with it, which is one of the 6 X Na| @ @ @
superiorities of the FF7 code package. 7 K | @ @ @
8 Rb| @ @ @
9 M
10
11 $ ff7 gen -1 -2 [Mg,Ca,Sc,Ti,Sr,Y,Zr,Ba,La,Ce,Hf,Th]
3. Functions and usage 12 M_1 X_1 H_12
13 | Mg Ca Sc Ti Sr Y Zr Ba La Ce Hf Th
14 --+------------------------------------
We show the functions and usages of FF7 code packages in
15 Mg|
this section. In generally, the functions in FF7 are implemented 16 Ca| @
by command line on the Linux system with a uniform style of 17 Sc| @ @
$ ff7 module func -para1 xx -para2 xx ... , 18 Ti| @ @ @
19 Sr| @ @ @ @
where the “module”, “func” and “para” denote the module 20 X Y | @ @ @ @ @
name, function of the module and parameters for the function, 21 Zr| @ @ @ @ @ @
respectively. Detailed discussions for each module are listed 22 Ba| @ @ @ @ @ @ @
below. 23 La| @ @ @ @ @ @ @ @
24 Ce| @ @ @ @ @ @ @ @ @
25 Hf| @ @ @ @ @ @ @ @ @ @
3.1. init 26 Th| @ @ @ @ @ @ @ @ @ @ @
27 M
The “init” module is executed automatically at the begin-
ning of the FF7 program life circle and users have to com-
Fig. 2. Commands and output of the “gen” module.
plete the initialization file before running FF7 code pack-
age. It loads the default variables including the path of
the DFT calculation software VASP and QE, the pseudopo-
tentials path and server configurations from the initializa- 3.3. htp
tion file “/.../installation/init/BASE.ini” (the “installation” de- The “htp” module facilitates the high-throughput DFT
notes the installation path of the FF7 code package). As calculations for the compounds in the structure pool and
the VASP provides a complete pseudopotentials package with transfers data or file results, which vary depending on the
different pseudopotential versions for each element, users calculation task, to the database. The workflow of the “htp”
can specify the pseudopotentials name by editing the file module is shown in Fig. 3. First, it renames the compounds
“/.../installation/init/POTCAR.ini”. For the QE, the pseudopo- in structure pool in the form of “ele1 x1 ele2 x2 sg.vasp”,
tentials directory needs to be created by users through collect- where “ele”, “x” and “sg” denote the element, stoichiometric
ing pseudopotential files for each element with name of “ele- number and space group number respectively, and lists them in
ment.UPF”. Users can specify three pseudopotential paths for the file “jobs.txt” as a queue for high-throughput calculations.
QE according to the pseudopotential type of US, PAW and NC The calculation task for each compound can be abstracted as
respectively. a calculation card (see Fig. 3) that records how the input files
3
are generated, the commands to perform DFT calculation, the functions support DFT calculations by VASP and QE, with the
post-processing codes for output files and how the calculated difference being the suffix “ qe” to the function name. The
results are transferred to the database. The FF7 code package commands, for the function “bandos” as an example, are
$ ff7 htp bandos -e 600 -k 0.03
and
$ ff7 htp bandos qe -e 80 -k 0.03
for VASP and QE, respectively. Calculations combined with
structural optimization and electronic band structure are
supported by adding the flag “-dopt”:
$ ff7 htp bandos -e 600 -dopt -p 200.
The FF7 code package supports the phonon calculation by
density functional perturbation theory and finite displacement
method using VASP+phonopy and QE, respectively, and the
example commands for phonon calculation combined with
structural optimization are
$ ff7 htp phonon -e 600 -k 0.02 -dim 2 2 2
-dopt -p 200
and
$ ff7 htp phonon qe -e 80 -k 3 3 3 -q 12
12 12 -pps us -dopt -p 200.
The “dim” parameter is the rules for creating supercells from
unit cell that corresponds to the “-dim” tags in phonopy. When
calculating the phonon spectra with QE, the electron-phonon
coupling is calculated at the same time, which does not take
Fig. 3. The workflow of “htp” module and the calculation card. much extra time.

provides several calculation cards for various DFT calculations 1 Software : vasp
2 Dirname : fermi
by VASP and QE. For example, users can perform structural 3 InFile : INCAR.self
optimizations for all compounds with VASP by command 4 RunCommand : mpirun -np 8 /../vasp/bin/vasp_std
$ ff7 htp opt -p 200 -e 600 -k 0.03 , 5 KPOINTS : 0.03
where the “-p”, “-e” and “-k” denote the parameters of pressure 6 # KPOINTS : HIGH_SYMMETRY_PATH
7 DataType : file
(GPa, default as 0.001), energy cutoff (eV) and k points mesh 8 DataLabel : Efermi_file
spacing (2π/Å, default as 0.03), respectively. The command 9 GrepDataCommand : grep ’E-fermi’ OUTCAR | awk ’{
only needs to be changed slightly as print $3}’ > Efermi_file.txt
$ ff7 htp opt qe -p 200 -e 80 -pps us 10 # GrepDatafile : grep.py
for the QE supported DFT calculations. It should be noted
that the unit of energy cutoff here is R.y and users need
Fig. 4. Input files templet for customizing calculation cards.
to specify the pseudopotentials type by parameter “-pps”
(default as “us”). After the structural optimization, the en-
In fact, the built-in functionality is far from satisfying
tries containing stoichiometry, crystal structure, space group
all users’ requirements. As a solution, users can perform
symbol and energy will be automatically transferred to the
high-throughput calculation by customize the calculation cards
database forming the basic framework of the database. We
for their various research interest by the command
therefore recommend running “opt” function first to build
$ ff7 htp self -file self.in.
the initial database. The FF7 code package also provides
The templet of file “self.ini” that governs the high-throughput
built-in calculation cards of “scf” for static self-consistent
DFT calculation and constructing database is shown in Fig. 4
calculations, “bandos” for electronic band structures and
and each component of this file are described as follows:
density of states and “elf” for electron localization functions.
The DFT calculations for each compound are performed under
the path of “/.../structure pool/compound name/function” and (1) Software name, “vasp” and “qe” are optional;
the calculated values or files are automatically stored in the (2) Directory name where calculations are performed;
database. For numeric results, they are stored directly in (3) Input file name;
the database, while for file results, the files are copied to a (4) Command to run DFT calculations;
subfolder and the path will be recorded by database. For every (5) K point mesh spacing valure;
function, the “-db” parameter (default as “/.../installation/db”) (6) Generating k-points along high symmetry path;
can specify the database path. Users can also tell the FF7 (7) Type of the results, “file” and “value” are optional;
to only perform high-throughput calculations without any (8) The label of the result to be stored in the database;
connection to the database with the flag “-nodb”. All of these (9) The command to extract the result data which must be
4
stored in a file named “DataLabel.txt”; (MP). Users need to construct a database for single substances
(10) The python code is also support to grape the result in MP and run the command
data which must be stored in a file named “DataLabel.txt”. $ ff7 post ch -path ./ -MPsingleDb
/path/to/MPsingle.db -MPID xxx,
where the “MPID” is the ID of the Materials Project account
Overall, the design strategy of highly customizable calculation for visiting the online database API. Before that, you need to
cards gives users more flexibility for high-throughput calcula- build a MP single substance database and declare it with the
tions and constructing their own database. “-MPsingleDb” parameter. The single substance database is
used to calculate the formation enthalpy and that for binary and
3.4. post ternary compounds are directly caught through the API of the
The “post” module mainly implements post-processing material projects. This strategy for creating convex hulls save a
functions such as drawing diagrams and generating formation lot of computational effort while guaranteeing accuracy.
enthalpy convex hulls. After high-throughput calculations for
the electronic band structures and density of states by the “htp” 3.5. db
module, users can run command The “db” module is a main module in FF7 code package with
$ ff7 post bandos -lim -27 8 database engine being the self-contained and highly reliable
in the path “/.../structure pool/compound name” to draw an sqlite3 Python package. It provides a full-featured command
electronic band structure diagram for a certain compound, line interface for browsing, manipulating and outputting
where the parameter “-lim” controls the energy limit. In self-built databases, which correspond to the “show”, “add” (or
addition, the command “delete”) and “save” (or “catch”) function, respectively. The
$ ff7 post bandos -byjobs database accepts numeric type and file type data. Numeric data
ran in “/structure pool” directory allows batch drawing for are directly stored in the database while the file type data are
compounds listed in the “jobs.txt” file. In keeping with the copied to the subfolder in the database path and the database
consistent unity of command style, the command with addition record the storage path. User can use the command
of “ qe” to function can processes the results calculated by QE. $ ff7 db show -summary
Similar drawing commands also support the visualization of to preview the important information of database including the
phonon spectra, except that the function “bandos” is replaced database path, the columns names and the number of database
by “phonon”. Taking H3 S[23–25] as an example, the electronic entries. Also, command
band structures and phonon spectra drawn by FF7 are shown $ ff7 db show -summary -dbpath
in Fig. 5a-b. The “post” module provides full functions /your/db/self.db
for generating formation enthalpy convex hull based on the allows users to print all the data of a specific database that
self-built database. Users can use command declared by the “-dbpath” parameter. The “- compound” and
$ ff7 post ch -path /your/path -nodb “-cols” parameters can completely locate a data or a file in the
to generate the convex hull for the compounds in the path de- database, and users can take a full control of the database via
clared by “-path” parameter (default as “./”), where the format the “add” (or “update”) and “delete” functions. The example
of compounds file should follow that in “opt” calculations. commands for deleting and adding data are
To generate an enthalpy convex based on a self-constructed $ ff7 db add -compound H3S -cols bands
database, users need to specify the path of the database: and
$ ff7 post ch -dbpath /path/to/self.db . $ ff7 db add -compound H3S -cols energy
In addition, users can create formation enthalpy convex hull -data 10.68,
diagrams based on the elements directly from a self-built respectively. The “db” module also supports data retrieval
database: by element system and column name, which are accepted
$ ff7 post ch -dbpath self.db -sys H S . by the “-system” and “-cols” parameters respectively. The
The convex hull diagram for Ca-H[26–28] binary and La-Be- two parameters can be used together to specify the data to
H[29, 30] ternary system drawn by FF7 is shown in Fig. 5c-d. be manipulated. For example, to retrieve the energy of the
Although FF7 does not support the drawing of higher dimen- compounds in the S-H system in the database, users can print
sional convex hulls, it supports the creation of them and could the retrieved information on the screen with the command
print them on the screen. For the high-throughput calculations $ ff7 db show -system S H -cols energy.
for ternary compounds based on the elemental substitution Anything printed on the screen can be stored in a file by
with “opt” function, the “post” module can create convex replacing the “show” function with the “save” function.
hulls for the compounds in file “jobs.txt” and summarizes the Furthermore, to obtain the file type data, users need to use the
thermodynamic stability information into a heatmap diagram, “catch” function with the command
as shown in Fig. 5e, and the command is $ ff7 db catch -system S H -cols bands
$ ff7 post heatmap -M 1 -X 2, to copy the files to the current folder. The “db” module for
where the “-M” and “-X” denote the index of substituted the creation and modification of the database provides a more
element in the file “seed”. Additionally, the FF7 code package flexible way to greatly reduce the threshold for constructing
provide the interface with the database of material project database and provide a rich of interface to the database making
5
Fig. 5. (a) Electronic band structures and density of states and (b) phonon spectrum of H3S drawn by “post” module. (c) Formation enthalpy convex hull of Ca-H
system and (d) La-Be-H system at 200 GPa generated by “post” module. (e) Thermodynamic stability heatmap diagram of hydrides MXH8 (M=Be, B, Al, Si, P, S,
X=Ca, Sr, Y, Ba, La, Ce, Th) calculated and drawn by ff7 code package combined with self-build high pressure database at 200 GPa. Colors in (d) and (e) denote
formation enthalpy and enthalpy above the convex hull, respectively. Red crosses in (e) represent compounds with thermodynamic stability.

it accessible to users and allow the interactions with other which is the only third-party library that the “nn” module
module in FF7 code package and can be merged in users’ depends on, works properly. Users can easily build and train a
familiar workflow easier. three hidden layer neural network with 8, 16 and 8 nodes by
command
$ ff7 nn train -hidden 8 16 8 -inX X.txt
3.6. nn -label label.txt -trainrate 0.8 -batch 50
-epoch 20 -lr 0.01,
The fully connected neural network is an important al- where the parameter “-inX” and “-label” receive the files
gorithm in machine learning field that directly contributes containing descriptors and target materials properties. The
to deep learning. It is theoretically possible to fit arbitrary “-trainrate” parameter declares the ratio of the training set
functions with an appropriate number of layers and nodes selected by the random strategy and the remaining parameters
and sufficiently large training sets, making it ideal for un- control the training of the neural network, e.g. the number
covering deeper relationships between factors and material of batches, epochs and the learning rate, respectively. The
properties and for abstracting design strategies for materials prediction error of each epoch and the finally training result
with target functionality. However, the realization of a neural will be summarized in figure “out.svg”. After training, the
network algorithm usually depends on the machine learning neural network model will be saved in file “model.npy” and the
frameworks of Tensorflow or Pytorch that demands a high command
programming threshold. The FF7 code package equips an $ ff7 nn predict -model module.npy
artificial-intelligence module “nn” that natively supports -predict predictX.txt
the command-line based interface for building and training is able to make properties predictions with trained models. All
artificial neural networks, which greatly reduces the operating the input file of “nn” module share the same format with the
difficulty. There is no need to worry about installing additional first two lines and columns being the comment region that will
complex modules, just making sure that the NumPy library,
6
Fig. 6. (a) The formation enthalpy above the convex hull of compound MXH12 at 200 GPa. (b) The ”Hdos.cc” file and ”Catch Hdos.sh” file. (c) The loss function
value (MAE) for predicting Tc of clathrate hydrides as the function of train epoch. (d) The machine learning model for predicting Tc of clathrate hydrides.

be ignored when reading, which is fully compatible with data executed) containing compounds with stoichiometry MXH12
files saved by the “save” function of the “db” module. generated by element substitution. Then, the high-throughput
calculations for structural optimization and electronic band
structures of all compounds were performed by the combina-
4. Example tion command $ ff7 htp bandos -e 600 -k 0.03
-dopt -p 200 -dbpath ./db dir/my.db,
The realization of room temperature superconductivity is the and the stoichiometries, structure files and electronic band
long-sought goal of researchers. In this section, we demon- structures files were summarized and saved in a database in the
strated the power, user-friendliness and flexibility of FF7 by specified path (i.e. “./db dir”). This is the complete process
assisting the high-throughput calculations and superconducting of building a database through high-throughput calculations
properties analysis of compounds MXH12 (M, X= Mg, Ca, Sc, using the FF7 code package: only two lines of commands are
Ti, Sr, Y, Zr, Ba, La, Ce, Hf, Th) at high pressure[31]. Firstly, required, which is quite intuitive and user-friendly. We can
we ran the command further build up a high-pressure structures pool by collecting
$ ff7 gen -1 -2 [Mg,Ca,Sc,Ti,Sr,Y,Zr,Ba,La,Ce,Th] stable compounds from previous high-pressure work and struc-
in an empty directory to construct a structure pool (i.e. the ture searching methods and construct high-pressure database
main working path in which the following commands were
7
with similar command. Users can browse the entire database atom into a file “inX.txt” and command
by the command “$ff7 db show -dbpath ./db dir” or print the $ ff7 db save -cols Tc -filename label.txt
brief database information by adding a “-summary” flag. More to obtain a “label.txt” file containing target property of Tc.
commands for operating the database can be found in section Finally, we trained a fully connected neural network model
3.5. For the HTP calculation results of compounds MXH12 , we built by two hidden layers with 4 nodes for predicting Tc of
used the command clathrate hydrides by the command
$ ff7 post ch -dbpath $ ff7 nn train -hidden 4 4 -inX X.txt
/home/HighPressure.db -byjobs -label label.txt -trainrate 0.8 -batch 6
and -epoch 2500 -lr 0.0001.
$ ff7 post heatmap -M 1 -X 2 The loss function and the performance of the model are shown
to calculate the formation enthalpy convex hulls based on the in Fig 6c and 6d, respectively. There is a strong correlation
self-built high-pressure database and summarized them into a between them and this machine learning model can be used to
thermodynamic stability heatmap, as shown in Fig. 6a. We predict the Tc of other clathrate hydrides.
then performed HTP calculations for their dynamical stability
and electron-phonon coupling constant λ using QE software by
the command $ ff7 phonon qe -e 80 -q 3 3 3 -k 5. Conclusion
12 12 12 -pps nc -dopt True
Finally, we can acquire the phonon spectra diagrams and su- We herein introduce a self-develop code package named
perconducting critical temperature (Tc) of stable compounds. FF7 to assist in high-throughput DFT calculations and build-
The functions of HTP calculations and constructing database ing user’s own database through a Linux command-line based
exhibited above were realized under the government of cor- interface. Mainstream DFT calculation software VASP and
responding built-in “calculation cards”. Although FF7 code QE are both supported while the interactive interface remains
package equips thorough “calculation cards” and the calcula- fairly uniform for ease of use, and the high-throughput func-
tion tasks for these preset scenarios can progress with excellent tions for the latter are groundbreaking. The design strategy
robustness, in real research scenarios of different users, the of “calculation card” ensures flexibility of high-throughput cal-
preset functions are not enough for various requirements. In culations making the FF7 code package to be an easy-to-use
this case, for deep understanding of the superconductivity of programmable infrastructure that facilitates secondary develop-
clathrate hydrides like MXH12 and summarizing design rules ment and build a robust and flexible connection between HTP
for high-Tc hydrides, we may, for example, consider the related calculations and the database enabling users to easily build
material features such as the contribution of H electrons to the databases that match their research interests. Full-featured post-
total density of states at the fermi level (HDos) and the average processing modules with strong database connectivity are inte-
H-H bound length (l). The FF7 code package provides an grated into the FF7 code package to process data from high-
interface for customizing the “calculation card”. This code de- throughput calculations, realize data visualization and gener-
sign makes it a solid secondary development platform allowing ate formation enthalpy convex hulls, with the last being the
us to extremely extend FF7’s HTP calculations and database highlight. As the heart of the FF7 code, the database module
construction functionalities with minimal programming re- opens up a fully interactive interface to the user allowing com-
quired. The example of customized “calculation card” and the plete mastery of the database. In particular, we developed a
post-processing script to extract the variable HDos and store it command-line based machine learning module that makes the
into the database are illustrated in Fig. 6b (which for variable l process of building and training artificial neural networks as
are supplied in SM). In the “calculation card” shown in upper easy as “building blocks”, based on which we reveal the rela-
panel in Fig. 6b, we assigned the variable “RunCommand” tionships between the T c of clathrate hydrides and other prop-
as “sleep 1” so that no DFT calculations are performed and erties with low computational cost.
declared “Dirname” as “dos” to ensure that the post-processing
script is ran in the “.../structure pool/compound/dos” directory.
After running the command References
$ ff7 htp self -file Hdos.cc,
the database added a “Hdos” column and stored HDos values [1] A. Agrawal, A. Choudhary, Perspective: Materials informatics and big
data: Realization of the “fourth paradigm” of science in materials science,
of all the compounds in the structure pool. All in all, the APL Materials 4 (5) (2016) 053208. doi:10.1063/1.4946894.
programming design of “calculation card” could meet any [2] L. M. Ghiringhelli, J. Vybiral, S. V. Levchenko, C. Draxl, M. Schef-
specific requirement. fler, Big data of materials science: Critical role of the descriptor, Phys.
The built-in machine learning module “nn” allows us to Rev. Lett. 114 (2015) 105503. doi:10.1103/PhysRevLett.114.
105503.
have a deep understanding of relationships between Tc and [3] G. Kresse, J. Furthmüller, Efficient iterative schemes for ab initio total-
other material features and train a model for predicting Tc that energy calculations using a plane-wave basis set, Phys. Rev. B 54 (1996)
facilitates further materials design. We used the command 11169–11186. doi:10.1103/PhysRevB.54.11169.
[4] M. D. Segall, P. J. D. Lindan, M. J. Probert, C. J. Pickard, P. J. Hasnip,
$ ff7 db save -cols Hdos l -filename S. J. Clark, M. C. Payne, First-principles simulation: ideas, illustrations
inX.txt and the castep code, Journal of Physics: Condensed Matter 14 (11) (2002)
to extract the descriptors of variable RHdos, l, Hdos per H 2717. doi:10.1088/0953-8984/14/11/301.

8
[5] P. Giannozzi, S. Baroni, N. Bonini, M. Calandra, R. Car, C. Cavazzoni, [22] Z. Pan, Z. Liu, T. Xu, D. Legut, R. Zhang, Vaspmate: An inte-
D. Ceresoli, G. L. Chiarotti, M. Cococcioni, I. Dabo, A. Dal Corso, grated user-interface program for high-throughput first principles compu-
S. de Gironcoli, S. Fabris, G. Fratesi, R. Gebauer, U. Gerstmann, tations through vasp code, Computational Materials Science 233 (2024)
C. Gougoussis, A. Kokalj, M. Lazzeri, L. Martin-Samos, N. Marzari, 112707. doi:https://doi.org/10.1016/j.commatsci.
F. Mauri, R. Mazzarello, S. Paolini, A. Pasquarello, L. Paulatto, 2023.112707.
C. Sbraccia, S. Scandolo, G. Sclauzero, A. P. Seitsonen, A. Smogunov, [23] M. Einaga, M. Sakata, T. Ishikawa, K. Shimizu, M. I. Eremets, A. P.
P. Umari, R. M. Wentzcovitch, Quantum espresso: a modular and open- Drozdov, I. A. Troyan, N. Hirao, Y. Ohishi, Crystal structure of the super-
source software project for quantum simulations of materials, Journal of conducting phase of sulfur hydride, Nature Physics 12 (2016) 835–838.
Physics: Condensed Matter 21 (39) (2009) 395502. doi:10.1088/ doi:10.1038/nphys3760.
0953-8984/21/39/395502. [24] A. P. Drozdov, M. I. Eremets, I. A. Troyan, V. Ksenofontov, S. I. Shylin,
[6] T. Xie, J. C. Grossman, Crystal graph convolutional neural networks Conventional superconductivity at 203 kelvin at high pressures in the sul-
for an accurate and interpretable prediction of material properties, Phys. fur hydride system, Nature 525 (7567) (2015) 73–76. doi:10.1038/
Rev. Lett. 120 (2018) 145301. doi:10.1103/PhysRevLett.120. nature14964.
145301. [25] D. Duan, Y. Liu, F. Tian, D. Li, X. Huang, Z. Zhao, H. Yu, B. Liu,
[7] K. Choudhary, B. DeCost, Atomistic line graph neural network for im- W. Tian, T. Cui, Pressure-induced metallization of dense (h2s)2h2 with
proved materials property predictions, npj Computational Materials 8 high-tc superconductivity, Scientific Reports 4 (1) (2014) 6968. doi:
(2022) 221. doi:10.1038/s41524-022-00913-5. 10.1038/srep06968.
[8] Materials genome initiative for global competitiveness, OSTP (June [26] H. Wang, J. S. Tse, K. Tanaka, T. Iitaka, Y. Ma, Superconductive sodalite-
2011). like clathrate calcium hydride at high pressures, Proceedings of the Na-
[9] A. Jain, G. Hautier, C. J. Moore, S. Ping Ong, C. C. Fischer, T. Mueller, tional Academy of Sciences 109 (17) (2012) 6463–6466. doi:10.
K. A. Persson, G. Ceder, A high-throughput infrastructure for den- 1073/pnas.1118168109.
sity functional theory calculations, Computational Materials Science [27] D. An, D. Duan, Z. Zhang, Q. Jiang, T. Ma, Z. Huo, H. Song, T. Cui,
50 (8) (2011) 2295–2310. doi:https://doi.org/10.1016/j. Type-i clathrate calcium hydride and its hydrogen-vacancy structures
commatsci.2011.02.023. at high pressure, Phys. Rev. B 110 (2024) 054505. doi:10.1103/
[10] A. Jain, S. P. Ong, G. Hautier, W. Chen, W. D. Richards, S. Dacek, PhysRevB.110.054505.
S. Cholia, D. Gunter, D. Skinner, G. Ceder, K. A. Persson, Commen- [28] L. Ma, K. Wang, Y. Xie, X. Yang, Y. Wang, M. Zhou, H. Liu, X. Yu,
tary: The materials project: A materials genome approach to acceler- Y. Zhao, H. Wang, G. Liu, Y. Ma, High-temperature superconduct-
ating materials innovation, APL Materials 1 (1) (2013) 011002. doi: ing phase in clathrate calcium hydride cah6 up to 215 k at a pressure
10.1063/1.4812323. of 172 gpa, Phys. Rev. Lett. 128 (2022) 167001. doi:10.1103/
[11] D. Zagorac, H. Müller, S. Ruehl, J. Zagorac, S. Rehme, Recent devel- PhysRevLett.128.167001.
opments in the Inorganic Crystal Structure Database: theoretical crystal [29] Y. Song, J. Bi, Y. Nakamoto, K. Shimizu, H. Liu, B. Zou, G. Liu,
structure data and related features, Journal of Applied Crystallography H. Wang, Y. Ma, Stoichiometric ternary superhydride labeh8 as a new
52 (5) (2019) 918–925. doi:10.1107/S160057671900997X. template for high-temperature superconductivity at 110 k under 80 gpa,
[12] S. Kirklin, J. E. Saal, B. Meredig, A. Thompson, J. W. Doak, M. Aykol, Phys. Rev. Lett. 130 (2023) 266001. doi:10.1103/PhysRevLett.
S. Rühl, C. Wolverton, The open quantum materials database (oqmd): 130.266001.
assessing the accuracy of dft formation energies, npj Computational Ma- [30] Z. Zhang, T. Cui, M. J. Hutcheon, A. M. Shipley, H. Song, M. Du,
terials 1 (2015) 15010. doi:10.1038/npjcompumats.2015.10. V. Z. Kresin, D. Duan, C. J. Pickard, Y. Yao, Design principles for
[13] X.-G. Zhao, K. Zhou, B. Xing, R. Zhao, S. Luo, T. Li, Y. Sun, G. Na, high-temperature superconductors with a hydrogen-based alloy back-
J. Xie, X. Yang, X. Wang, X. Wang, X. He, J. Lv, Y. Fu, L. Zhang, Jamip: bone at moderate pressure, Phys. Rev. Lett. 128 (2022) 047001. doi:
an artificial-intelligence aided data-driven infrastructure for computa- 10.1103/PhysRevLett.128.047001.
tional materials informatics, Science Bulletin 66 (19) (2021) 1973–1985. [31] T. Ma, Z. Zhang, M. Du, Z. Huo, W. Chen, F. Tian, D. Duan, T. Cui, High-
doi:https://doi.org/10.1016/j.scib.2021.06.011. throughput calculation for superconductivity of sodalite-like clathrate
[14] Y.-J. Chang, C.-Y. Jui, W.-J. Lee, A.-C. Yeh, Prediction of the composi- ternary hydrides MXH12 at high pressure, Materials Today Physics 38
tion and hardness of high-entropy alloys by machine learning, JOM 71 (2023) 101233. doi:https://doi.org/10.1016/j.mtphys.
(2019) 3433–3442. doi:10.1007/s11837-019-03704-4. 2023.101233.
[15] H. Tran, T. N. Vu, Machine-learning approach for discovery of con-
ventional superconductors, Phys. Rev. Mater. 7 (2023) 054805. doi:
10.1103/PhysRevMaterials.7.054805.
[16] A. D. Smith, S. B. Harris, R. P. Camata, D. Yan, C.-C. Chen, Ma-
chine learning the relationship between debye temperature and super-
conducting transition temperature, Phys. Rev. B 108 (2023) 174514.
doi:10.1103/PhysRevB.108.174514.
[17] M. J. Hutcheon, A. M. Shipley, R. J. Needs, Predicting novel supercon-
ducting hydrides using machine learning approaches, Phys. Rev. B 101
(2020) 144505. doi:10.1103/PhysRevB.101.144505.
[18] A. M. Shipley, M. J. Hutcheon, R. J. Needs, C. J. Pickard, High-
throughput discovery of high-temperature conventional superconductors,
Phys. Rev. B 104 (2021) 054501. doi:10.1103/PhysRevB.104.
054501.
[19] Z. Luo, W. Gao, Q. Jiang, Determinants of vacancy formation and
migration in high-entropy alloys, Science Advances 11 (1) (2025)
eadr4697. arXiv:https://www.science.org/doi/pdf/10.
1126/sciadv.adr4697, doi:10.1126/sciadv.adr4697.
[20] V. Wang, N. Xu, J.-C. Liu, G. Tang, W.-T. Geng, Vaspkit: A user-
friendly interface facilitating high-throughput computing and analysis us-
ing vasp code, Computer Physics Communications 267 (2021) 108033.
doi:https://doi.org/10.1016/j.cpc.2021.108033.
[21] W. Yi, G. Tang, X. Chen, B. Yang, X. Liu, qvasp: A flexible toolkit for
vasp users in materials simulations, Computer Physics Communications
257 (2020) 107535. doi:https://doi.org/10.1016/j.cpc.
2020.107535.

You might also like