ICONIP2024论文
ICONIP2024论文
Time-Dependent PDEs
1 Introduction
Partial Differential Equations (PDEs) describe the underlying principles of nu-
merous phenomena in the real world with broad applications ranging from
weather forecasting[2,26] and pollution detection[27] to industrial designing[42].
Consequently, solving PDE accurately and efficiently remains a pivotal research
area. Traditional numerical methods such as finite element method and spectral
method solve PDEs by transforming continuous differential equations into dis-
crete difference equations, which requires specialized knowledge and substantial
computational resources. With the advent of deep learning, employing surrogate
models based on neural network as alternatives to traditional numerical methods
offers a fresh approach to PDE solving with cheaper computational requirement
and potentially better generalizability to characterize the dynamics of the real
world even beyond the scope of PDE-based models.
2 Tian Wang and Chuang Wang
The rest of the paper is organized as follows. Firstly, we introduce existing works
of neural operator and PDE pretraining in Section 2. We illustrate the working
principles of LNO backbone and our proposed LNOP framework in Section 3.
We describe the dataset details in Section 4. Subsequently, we present a series of
experiments and corresponding analysis to study the performance of our LNOP
framework in terms of solution precision, transfer capability and data efficiency
in Section 5. Finally, we provide concluding remarks of our work in Section 6.
2 Related Work
2.1 Neural Operator
Neural operator methods aim to solve PDEs by learning mappings between
functions. For instance, they map the coefficients, initial conditions or boundary
Latent Neural Operator Pretraining for Solving Time-Dependent PDEs 3
conditions of PDEs, which serve as input functions, to the solution, which serves
as the output function. DeepONet[24] designs trunk and branch structures for
encoding query positions of the output function and observed values of the
input function respectively, and the results from these two parts are combined
to predict the output function. FNO[21] utilizes Fourier transform to learn
transformations between functions in the frequency domain and derives a series
of variants including Geo-FNO[20], U-FNO[39], and F-FNO[35], which extend
the applicability or enhance the precision and efficiency of FNO.
With the tremendous success of Transformer[36] structures in fields of com-
puter vision and natural language processing, Transformer-based neural operator
methods have also been proposed. Galerkin-Transformer[5] firstly introduces
Galerkin-type and Fourier-type attention mechanisms as kernels in neural opera-
tor. OFormer[18] extends Galerkin-type attention to the case of cross-attention.
GNOT[11] further proposes Heterogeneous Normalized Cross-Attention to ac-
commodate multiple input functions.
To address the issues of the significant computational cost of quadratic
complexity attention mechanisms when applied to PDE problems with large
spatial grids, many works have been proposed. FactFormer[19] projects high-
dimensional PDEs into multiple single-dimensional functions. Transolver[41]
uses physical attention to allocate geometric features to a constant number
of physical slices in each Transformer block. LNO[38] employs Physics-Cross-
Attention (PhCA) to solve PDEs in the latent space. Our LNOP framework
follows the idea of LNO, where we train encoder and decoder to learn universal
transformation which extracts the common representations of multiple PDEs in
a shared latent space.
Despite the strong nonlinear approximating capability of neural operator,
it faces the challenge of insufficient training data. Simulated data generated
using traditional numerical methods is often computationally expensive, while
real-world data is difficult to collect. Some approaches such as PINO[22] and
PI-DeepONet[37] incorporate physical priors into neural operator to alleviate
data scarcity. Other works attempt to construct pretrained foundation models
for various downstream PDE tasks involving scarce data.
3 Method
We first provide the formal definition of solving time-dependent PDEs, and then
introduce the Latent Neural Operator (LNO) backbone. Finally, we present the
framework of our Latent Neural Operator Pretraining (LNOP) approach.
space, a sequence of Transformer layers for modeling operator in the latent space,
a decoder to recover the latent representation back to the real-world space and
an output projector to project back the lifted dimension of the output data.
Physics-Cross-Attention (PhCA) is the core of LNO, used for transforming
between N embeddings in the real geometric space and M representation tokens
in the latent space. Since the latent space is much more compact than the large
geometric space, PhCA significantly reduces the computational load when solving
PDEs.
The input projector contains two parts, a branch projector and a trunk
projector, following the convention of DeepONet [24]. In the encoding phase: i)
the branch projector converts N observation positions and corresponding values
of the input function into N embeddings in real geometric space, which serve as
the value matrix; ii) the trunk projector lifts the dimension of each observation
position of the input function, which is then converted into M attention scores
through MLP in the PhCA encoder, and N observation positions yield M × N
attention score matrix; iii) the row-normalized attention score matrix is multiplied
by the value matrix to obtain the representation tokens in the latent space.
Conversely, the decoding phase is a inverse process of the encoding. Specifically,
i) the representation tokens transformed through multiple Transformer layers
serve as the value matrix; ii) the trunk projector lifts the dimension of each query
position of the output function, which is then converted into M attention scores
through MLP in the PhCA decoder, and N query positions yield M ×N attention
score matrix; iii) the column-normalized attention score matrix is transposed
and multiplied by the value matrix to map the representation tokens in the
latent space back to the real geometric space, which finally converted into output
function values by the output projector.
...
...
Trunk M M
Fintuning Projector �
Representation Representation
Observation Tokens Tokens
/Query
Position
4 Dataset
We consider the hybrid dataset containing multiple physical systems which are
all time-dependent PDEs in 2D space, including the Navier-Stokes equation,
Shallow-Water equation, Burgers’ equation and Reaction-Diffusion equation.
All these PDEs describe time-varying systems whose response is determined
by interactions among different spatial locations. Therefore, we can solve them
by neural network which extracts representations of PDE spatial states and
approximates the temporal evolution of the representations.
Navier-Stokes Equation We use the Navier-Stokes equation in the FNO dataset[21]
with the form of
∂t u(x, t) + w(x, t) · ∇u(x, t) = υ∆u(x, t) + f (x)
∇ · w(x, t) = 0
x ∈ Ω, t ∈ [0, T ],
where w is the velocity, u = ∇ × w is the vorticity, υ is the viscosity coefficient
and f (x) is the forcing term. This equation is the fundamental equation used to
explain and predict the behavior of fluids under various conditions.
We set Ω = [0, 1]2 , T = 20 and f (x) = 0.1(sin(2π(x1 +x2 ))+cos(2π(x1 +x2 ))).
The initial condition is generated according to u(x, 0) ∼ N (72/3 (−∆ + 49I)−2.5 ).
Periodic boundary condition is applied. We use the data under three different
viscosity coefficient values υ = 10−5 , 10−4 , 10−3 . For υ = 10−5 , there are 1200
trajectories each containing 20 frames on 64 × 64 spatial grids. We use 1100
trajectories for training and the rest 100 for testing. For υ = 10−4 and υ = 10−3 ,
there are 1100 trajectories respectively, each containing 25 frames on 64 × 64
spatial grids. We use 1000 trajectories for training and the rest 100 for testing.
The data with viscosity coefficient υ = 10−5 will be used during pretraining,
while the data with viscosity coefficients υ = 10−4 , 10−3 will be used to evaluate
the model’s transfer capability.
8 Tian Wang and Chuang Wang
5 Experiment
We conduct a series of experiments on our dataset and compare the results with
both classical and newly proposed neural operator methods. We demonstrate our
LNOP framework effectively improves the solution precision for time-dependent
PDEs, exhibits strong transfer capability and data efficiency.
Implementation All models are trained for 500 epoch using AdamW[23] optimizer
and OneCycleLR[32] scheduler with initial learning rate 0.001. We choose relative
L2 error as the loss function. For FNO, we set the mode number to 12. For
Transolver, we set the slice number to 32. We construct both small-scale and
large-scale versions of LNO. The small-scale version (marked with the suffix
-S) consists of 4 Transformer layers and has 64 representation tokens each of
128 dimension. The large-scale version (marked with the suffix -L) consists of 8
Transformer layers and has 256 representation tokens each of 256 dimension. All
experiments are conducted on a single RTX 3090 GPU, with batch sizes adjusted
from 4 to 16 based on the memory usage. The FNO and Transolver has about
0.9 and 1.6 million model parameters respectively, while the two scale versions of
LNO have 0.8 and 5.0 million model parameters respectively.
Table 1. The solution precision of different models on various PDE problems respectively.
Relative L2 error is recorded. The best result in each group is in bold. Values in
parentheses indicate the change in error relative to the LNO model of the same scale,
where ’-’ denotes a reduction and ’+’ denotes an increase.
5.3 Scaling
We conduct scaling experiments to demonstrate the solution precision on all
time-dependent PDE problems of our proposed LNOP framework as the number
and dimension of representation tokens vary. The results in Figure 2(a) indicate
the solution precision of the Navier-Stokes equation consistently improves as the
token dimension increases from 32 to as large as 256, while that of the other
three PDEs gradually saturates when the token dimension reaches 192. The
results in Figure 2(b) show that, aside from the shallow-water equation which
consistently maintains precise solution, the solution precision of the other three
PDEs improves continuously with an increasing number of representation tokens.
5.4 Ablation Study
We conduct ablation study to investigate the impact of finetuning different
components in the LNOP framework on the solution precision. Specifically, we
compare the following three finetuning scenarios: i) all parameters; ii) only
the PhCA encoder/decoder; iii) only the components other than the PhCA
encoder/decoder, including the input and output projector and the propagator.
12 Tian Wang and Chuang Wang
Table 4. The solution precision of LNOP pretrained with different approach or finetuned
with different component on various PDE problems respectively. Relative L2 is recorded.
The best result is in bold.
Model Navier-Stokes Shallow-Water Burgers’ Reaction-Diffusion
LNOP-S(pretrain) 0.0730 0.0014 0.0153 0.0419
LNOP-S(finetune-All) 0.0456 0.0004 0.0112 0.0236
LNOP-S(finetune-PhCA) 0.0722 0.0014 0.0151 0.0411
LNOP-S(finetune-Others) 0.0526 0.0010 0.0117 0.0263
LNOP-S(two-stage) 0.2236 0.0122 0.0817 0.1274
0.12 0.12
Navier-Stokes Navier-Stokes
0.10 Shallow-Water 0.10 Shallow-Water
Burgers' Burgers'
0.08 Reaction-Diffusion 0.08 Reaction-Diffusion
Relative L2 errors
Relative L2 errors
0.06 0.06
0.04 0.04
0.02 0.02
0.00 0.00
64 128 192 256 32 64 128 256
Representation Dimension in Latent Space Representation Quantity in Latent Space
(a) (b)
Fig. 2. Results of scaling experiments. (a) Impact of representation token dimension on
solution precision of various PDE problems. (b) Impact of representation token quantity
on solution precision of various PDE problems.
The results in Table 4 show that, although finetuning all parameters achieves
the highest solution precision, finetuning the components other than the PhCA
encoder/decoder can yield higher precision than finetuning only the PhCA
encoder/decoder. This indicates that the PhCA encoder/decoder effectively learn
universal transformation for representation extraction from multiple physical
systems.
We also try modifying the LNOP framework into two-stage approach. In the
first stage, we pretrain a PhCA-based autoencoder on the hybrid dataset using
reconstruction task. The autoencoder takes several frames of time-dependent
PDEs as input, extracts representations in the latent space, and reconstructs
them back into PDE information. In the second stage, we train propagators using
autoregressive task on each single PDE problem to predict the temporal evolution.
The propagator iterates the PDE representations from the initial state to the final
state in the latent space. This implies that the PhCA encoder/decoder is only
used at the initial and final moments of the PDE system, with the intermediate
temporal evolution prediction relying solely on the propagator.
The result in the last row of Table 4 indicates that the two-stage LNOP
approach performs not as well as the end-to-end one. This may be due to the
discrepancy between the representation spaces required by the reconstruction task
and PDE solution task, which introduces a mismatch between the autoencoder
and the propagator.
Latent Neural Operator Pretraining for Solving Time-Dependent PDEs 13
6 Conclusion
We propose the Latent Neural Operator Pretraining (LNOP) framework to
learn universal transformation for extracting representations of different PDEs
in a shared latent space across multiple physical systems. We pretrains the
LNO backbone on hybrid dataset comprising multiple time-dependent PDE
problems and compare its solution precision under different finetuning conditions
for both in-distribution and out-of-distribution time-dependent PDE problems.
Through a series of experiments, we verifies the precision improvement gained
from learning shared representations through pretraining, and also validates
the transfer capability and data efficiency brought by the learned universal
transformation.
Our work also has some limitations. First, our purely data-driven method
does not leverage prior knowledge from different PDEs, which may compromise
the solution precision. Additionally, our method does not completely separate
PDE representation learning from PDE time evolution prediction, which slows
down the pretraining process.
Future work should focus on how to incorporate physical prior knowledge
as constraints or additional modalities into the PDE solving process, how to
improve PDE representation learning, and how to achieve PDE time evolution
estimation entirely in the latent space.
References
1. Arnab, A., Dehghani, M., Heigold, G., Sun, C., Lučić, M., Schmid, C.: ViViT: A
video vision transformer. In: Proceedings of the IEEE/CVF International Conference
on Computer Vision (ICCV). pp. 6836–6846 (2021)
2. Bi, K., Xie, L., Zhang, H., Chen, X., Gu, X., Tian, Q.: Accurate medium-range
global weather forecasting with 3d neural networks. Nature 619(7970), 533–538
(2023)
3. Bonev, B., Kurth, T., Hundt, C., Pathak, J., Baust, M., Kashinath, K., Anandkumar,
A.: Spherical fourier neural operators: Learning stable dynamics on the sphere.
In: Proceedings of the International Conference on Machine Learning (ICML). pp.
2806–2823 (2023)
4. Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Nee-
lakantan, A., Shyam, P., Sastry, G., Askell, A., et al.: Language models are few-shot
learners. In: Advances in Neural Information Processing Systems (NeurIPS) (2020)
5. Cao, S.: Choose a transformer: Fourier or Galerkin. In: Advances in Neural Infor-
mation Processing Systems (NeurIPS) (2021)
6. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive
learning of visual representations. In: Proceedings of the International Conference
on Machine Learning (ICML) (2020)
7. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidi-
rectional Transformers for language understanding. arXiv preprint arXiv:1810.04805
(2018)
8. Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive
Fourier neural operators: efficient token mixers for Transformers. arXiv preprint
arXiv:2111.13587 (2021)
14 Tian Wang and Chuang Wang
9. Hang, Z., Ma, Y., Wu, H., Wang, H., Long, M.: Unisolver: PDE-conditional Trans-
formers are universal PDE solvers. arXiv preprint arXiv:2405.17527 (2024)
10. Hao, Z., Su, C., Liu, S., Berner, J., Ying, C., Su, H., Anandkumar, A., Song, J.,
Zhu, J.: DPOT: Auto-regressive denoising operator transformer for large-scale pde
pre-training. arXiv preprint arXiv:2403.03542 (2024)
11. Hao, Z., Wang, Z., Su, H., Ying, C., Dong, Y., Liu, S., Cheng, Z., Song, J., Zhu, J.:
GNOT: a general neural operator Transformer for operator learning. In: Proceedings
of the International Conference on Machine Learning (ICML) (2023)
12. He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are
scalable vision learners. In: Proceedings of the IEEE/CVF conference on Computer
Vision and Pattern Recognition (CVPR). pp. 16000–16009 (2022)
13. He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised
visual representation learning. In: Proceedings of the IEEE/CVF conference on
Computer Vision and Pattern Recognition (CVPR) (2020)
14. Karlbauer, M., Praditia, T., Otte, S., Oladyshkin, S., Nowak, W., Butz, M.V.:
Composing partial differential equations with physics-aware neural networks. In:
Proceedings of the International Conference on Machine Learning (ICML) (2022)
15. Kovachki, N., Li, Z., Liu, B., Azizzadenesheli, K., Bhattacharya, K., Stuart, A.,
Anandkumar, A.: Neural operator: learning maps between function spaces with
applications to PDEs. Journal of Machine Learning Research 24(89), 1–97 (2023)
16. Lee-Thorp, J., Ainslie, J., Eckstein, I., Ontanon, S.: FNet: mixing tokens with
Fourier transforms. arXiv preprint arXiv:2105.03824 (2021)
17. Li, J., Li, D., Savarese, S., Hoi, S.: Blip-2: Bootstrapping language-image pre-
training with frozen image encoders and large language models. In: Proceedings
of the International Conference on Machine Learning (ICML). pp. 19730–19742
(2023)
18. Li, Z., Meidani, K., Farimani, A.B.: Transformer for partial differential equations’
operator learning. arXiv preprint arXiv:2205.13671 (2022)
19. Li, Z., Shu, D., Barati Farimani, A.: Scalable Transformer for PDE surrogate
modeling. In: Advances in Neural Information Processing Systems (NeurIPS) (2024)
20. Li, Z., Huang, D.Z., Liu, B., Anandkumar, A.: Fourier neural operator with learned
deformations for PDEs on general geometries. Journal of Machine Learning Research
24(388), 1–26 (2023)
21. Li, Z., Kovachki, N., Azizzadenesheli, K., Liu, B., Bhattacharya, K., Stuart, A.,
Anandkumar, A.: Fourier neural operator for parametric partial differential equa-
tions. arXiv preprint arXiv:2010.08895 (2020)
22. Li, Z., Zheng, H., Kovachki, N., Jin, D., Chen, H., Liu, B., Azizzadenesheli, K.,
Anandkumar, A.: Physics-informed neural operator for learning partial differential
equations. ACM/JMS Journal of Data Science 1(3), 1–27 (2024)
23. Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: Proceedings
of the International Conference on Learning Representations (ICLR) (2018)
24. Lu, L., Jin, P., Karniadakis, G.E.: DeepONet: learning nonlinear operators for
identifying differential equations based on the universal approximation theorem of
operators. arXiv preprint arXiv:1910.03193 (2019)
25. McCabe, M., Blancard, B.R.S., Parker, L.H., Ohana, R., Cranmer, M., Bietti, A.,
Eickenberg, M., Golkar, S., Krawezik, G., Lanusse, F., et al.: Multiple physics
pretraining for physical surrogate models. arXiv preprint arXiv:2310.02994 (2023)
26. Pathak, J., Subramanian, S., Harrington, P., Raja, S., Chattopadhyay, A., Mardani,
M., Kurth, T., Hall, D., Li, Z., Azizzadenesheli, K., et al.: FourCastNet: a global
data-driven high-resolution weather model using adaptive Fourier neural operators.
arXiv preprint arXiv:2202.11214 (2022)
Latent Neural Operator Pretraining for Solving Time-Dependent PDEs 15
27. Praditia, T., Karlbauer, M., Otte, S., Oladyshkin, S., Butz, M.V., Nowak, W.:
Learning groundwater contaminant diffusion-sorption processes with a finite volume
neural network. Water Resources Research 58(12), e2022WR033149 (2022)
28. Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G.,
Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from
natural language supervision. In: Proceedings of the International Conference on
Machine Learning (ICML) (2021)
29. Raissi, M., Perdikaris, P., Karniadakis, G.E.: Physics-informed neural networks:
a deep learning framework for solving forward and inverse problems involving
nonlinear partial differential equations. Journal of Computational Physics 378,
686–707 (2019)
30. Rao, C., Ren, P., Wang, Q., Buyukozturk, O., Sun, H., Liu, Y.: Encoding physics
to learn reaction–diffusion processes. Nature Machine Intelligence 5(7), 765–779
(2023)
31. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution
image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
32. Smith, L.N., Topin, N.: Super-convergence: very fast training of neural networks
using large learning rates. In: Artificial Intelligence and Machine Learning for
Multi-Domain Operations Applications (2019)
33. Takamoto, M., Praditia, T., Leiteritz, R., MacKinlay, D., Alesiani, F., Pflüger, D.,
Niepert, M.: Pdebench: An extensive benchmark for scientific machine learning.
Advances in Neural Information Processing Systems (NeurIPS) 35, 1596–1611
(2022)
34. Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.A., Lacroix, T.,
Rozière, B., Goyal, N., Hambro, E., Azhar, F., et al.: LLaMA: open and efficient
foundation language models. arXiv preprint arXiv:2302.13971 (2023)
35. Tran, A., Mathews, A., Xie, L., Ong, C.S.: Factorized Fourier neural operators.
arXiv preprint arXiv:2111.13802 (2021)
36. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser,
Ł., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information
Processing Systems (NeurIPS) (2017)
37. Wang, S., Wang, H., Perdikaris, P.: Learning the solution operator of parametric
partial differential equations with physics-informed DeepONets. Science Advances
7(40), eabi8605 (2021)
38. Wang, T., Wang, C.: Latent neural operator for solving forward and inverse PDE
problems. arXiv preprint arXiv:2406.03923 (2024)
39. Wen, G., Li, Z., Azizzadenesheli, K., Anandkumar, A., Benson, S.M.: U-FNO–an
enhanced Fourier neural operator-based deep-learning model for multiphase flow.
Advances in Water Resources 163, 104180 (2022)
40. Wen, G., Li, Z., Long, Q., Azizzadenesheli, K., Anandkumar, A., Benson, S.M.:
Real-time high-resolution CO 2 geological storage prediction using nested fourier
neural operators. Energy & Environmental Science 16(4), 1732–1741 (2023)
41. Wu, H., Luo, H., Wang, H., Wang, J., Long, M.: Transolver: a fast Transformer
solver for PDEs on general geometries. arXiv preprint arXiv:2402.02366 (2024)
42. Yang, Z., Yu, C.H., Buehler, M.J.: Deep learning model to predict complex stress
and strain fields in hierarchical composites. Science Advances 7(15), eabd7416
(2021)
43. Ye, Z., Huang, X., Chen, L., Liu, H., Wang, Z., Dong, B.: PDEformer: Towards a
foundation model for one-dimensional partial differential equations. arXiv preprint
arXiv:2402.12652 (2024)