Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DPOT: Auto-Regressive Denoising Operator Transformer for Large-Scale PDE Pre-Training (2403.03542v4)

Published 6 Mar 2024 in cs.LG, cs.NA, and math.NA

Abstract: Pre-training has been investigated to improve the efficiency and performance of training neural operators in data-scarce settings. However, it is largely in its infancy due to the inherent complexity and diversity, such as long trajectories, multiple scales and varying dimensions of partial differential equations (PDEs) data. In this paper, we present a new auto-regressive denoising pre-training strategy, which allows for more stable and efficient pre-training on PDE data and generalizes to various downstream tasks. Moreover, by designing a flexible and scalable model architecture based on Fourier attention, we can easily scale up the model for large-scale pre-training. We train our PDE foundation model with up to 0.5B parameters on 10+ PDE datasets with more than 100k trajectories. Extensive experiments show that we achieve SOTA on these benchmarks and validate the strong generalizability of our model to significantly enhance performance on diverse downstream PDE tasks like 3D data. Code is available at \url{https://github.com/thu-ml/DPOT}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. Neural operator-based surrogate solver for free-form electromagnetic inverse design. ACS Photonics, 2023.
  2. Neural operators for accelerating scientific simulations and design. arXiv preprint arXiv:2309.15325, 2023.
  3. Scheduled sampling for sequence prediction with recurrent neural networks. Advances in neural information processing systems, 28, 2015.
  4. Message passing neural pde solvers. arXiv preprint arXiv:2202.03376, 2022.
  5. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  6. Cao, S. Choose a transformer: Fourier or galerkin. Advances in neural information processing systems, 34:24924–24940, 2021.
  7. Scientific machine learning through physics–informed neural networks: Where we are and what’s next. Journal of Scientific Computing, 92(3):88, 2022.
  8. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  9. Funahashi, K.-I. On the approximate realization of continuous mappings by neural networks. Neural networks, 2(3):183–192, 1989.
  10. Adaptive fourier neural operators: Efficient token mixers for transformers. arXiv preprint arXiv:2111.13587, 2021.
  11. Towards multi-spatiotemporal-scale generalized pde modeling. arXiv preprint arXiv:2209.15616, 2022.
  12. Gnot: A general neural operator transformer for operator learning. In International Conference on Machine Learning, pp.  12556–12569. PMLR, 2023.
  13. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  9729–9738, 2020.
  14. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  16000–16009, 2022.
  15. Highly accurate protein structure prediction with alphafold. Nature, 596(7873):583–589, 2021.
  16. Physics-informed machine learning. Nature Reviews Physics, 3(6):422–440, 2021.
  17. Variational physics-informed neural networks for solving partial differential equations. arXiv preprint arXiv:1912.00873, 2019.
  18. On universal approximation and error bounds for fourier neural operators. The Journal of Machine Learning Research, 22(1):13237–13312, 2021.
  19. Fourier neural operator for parametric partial differential equations. arXiv preprint arXiv:2010.08895, 2020.
  20. Physics-informed neural operator for learning partial differential equations. arXiv preprint arXiv:2111.03794, 2021.
  21. Transformer for partial differential equations’ operator learning. arXiv preprint arXiv:2205.13671, 2022a.
  22. Fourier neural operator approach to large eddy simulation of three-dimensional turbulence. Theoretical and Applied Mechanics Letters, 12(6):100389, 2022b.
  23. Geometry-informed neural operator for large-scale 3d pdes. arXiv preprint arXiv:2309.00583, 2023.
  24. Nuno: A general framework for learning parametric pdes with non-uniform data. arXiv preprint arXiv:2305.18694, 2023.
  25. Learning nonlinear operators via deeponet based on the universal approximation theorem of operators. Nature machine intelligence, 3(3):218–229, 2021.
  26. Cfdbench: A comprehensive benchmark for machine learning methods in fluid dynamics. arXiv preprint arXiv:2310.05963, 2023.
  27. Multiple physics pretraining for physical surrogate models. arXiv preprint arXiv:2310.02994, 2023.
  28. Self-supervised learning with lie symmetries for partial differential equations. In ICLR 2023 Workshop on Physics for Machine Learning, 2023.
  29. Neural operator learning for long-time integration in dynamical systems with recurrent neural networks. arXiv preprint arXiv:2303.02243, 2023.
  30. Ai foundation models for weather and climate: Applications, design, and implementation. arXiv preprint arXiv:2309.10808, 2023.
  31. Climax: A foundation model for weather and climate. arXiv preprint arXiv:2301.10343, 2023.
  32. Vito: Vision transformer-operator. arXiv preprint arXiv:2303.08891, 2023.
  33. Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. arXiv preprint arXiv:2202.11214, 2022.
  34. Improving language understanding by generative pre-training. 2018.
  35. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  36. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational physics, 378:686–707, 2019.
  37. Convolutional neural operators. arXiv preprint arXiv:2302.01178, 2023.
  38. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pp.  234–241. Springer, 2015.
  39. Deep neural operators as accurate surrogates for shape optimization. Engineering Applications of Artificial Intelligence, 129:107615, 2024.
  40. Towards foundation models for scientific machine learning: Characterizing scaling and transfer behavior. arXiv preprint arXiv:2306.00258, 2023.
  41. Pdebench: An extensive benchmark for scientific machine learning. Advances in Neural Information Processing Systems, 35:1596–1611, 2022.
  42. Factorized fourier neural operators. arXiv preprint arXiv:2111.13802, 2021.
  43. Understanding and mitigating gradient flow pathologies in physics-informed neural networks. SIAM Journal on Scientific Computing, 43(5):A3055–A3081, 2021a.
  44. Learning the solution operator of parametric partial differential equations with physics-informed deeponets. Science advances, 7(40):eabi8605, 2021b.
  45. Solving high-dimensional pdes with latent spectral models. arXiv preprint arXiv:2301.12664, 2023.
  46. Group normalization. In Proceedings of the European conference on computer vision (ECCV), pp.  3–19, 2018.
  47. In-context operator learning with data prompts for differential equation problems. Proceedings of the National Academy of Sciences, 120(39):e2310142120, 2023.
  48. Introduction to partial differential equations with applications. Courier Corporation, 1986.
  49. Bridging the gap between training and inference for neural machine translation. arXiv preprint arXiv:1906.02448, 2019.
  50. Recfno: a resolution-invariant flow and heat field reconstruction method from sparse observations via fourier neural operator. International Journal of Thermal Sciences, 195:108619, 2024.
  51. Uni-mol: a universal 3d molecular representation learning framework. 2023.
Citations (15)

Summary

  • The paper presents an auto-regressive denoising pre-training strategy that significantly enhances PDE solution operator generalization.
  • It leverages Fourier attention and temporal aggregation to efficiently handle diverse, large-scale PDE datasets.
  • DPOT outperforms existing models by reducing prediction errors by up to 52% while scaling to 0.5 billion parameters.

Insightful Overview of DPOT: Auto-Regressive Denoising Operator Transformer for Large-Scale PDE Pre-Training

The paper "DPOT: Auto-Regressive Denoising Operator Transformer for Large-Scale PDE Pre-Training" introduces a novel framework for pre-training neural operators on large-scale data derived from partial differential equations (PDEs). This approach targets the foundational task of learning solution operators for PDEs in scientific machine learning, addressing the complexities inherent in handling PDE datasets that exhibit significant heterogeneity in terms of dimensions, scales, and numerical ranges.

Key Contributions and Model Architecture

DPOT leverages an auto-regressive denoising pre-training strategy, which is designed to enhance the model's generalization ability across multiple PDE tasks. This innovative approach involves corrupting training data with Gaussian noise and predicting future timesteps from these noisy inputs, thereby improving robustness and transferability to various downstream PDE scenarios. The model employs a flexible transformer architecture based on Fourier attention, allowing it to efficiently scale up to 0.5 billion parameters—a current milestone in the field—when pre-training on comprehensive datasets containing over 100,000 trajectories from more than ten types of PDEs.

The architecture of DPOT is distinguished by several components:

  • Temporal Aggregation Layer: This layer is designed to extract PDE properties by aggregating information from adjacent timesteps, which helps infer the specifics of the PDE governing the data sample.
  • Fourier Attention Layer: By utilizing a kernel integral transformation in the frequency domain, this layer circumvents the quadratic scaling issues associated with traditional attention mechanisms, aiding in approximating PDE solutions efficiently.
  • Multi-head structure: Facilitating learning in distinct representation subspaces, it enhances parameter efficiency and scalability.

Numerical Results and Implications

In terms of empirical outcomes, DPOT delivers state-of-the-art (SOTA) performance across numerous benchmarks. The auto-regressive denoising approach not only outperforms existing architectures like MPP and FNO but also reduces errors significantly—up to 52% on some tasks. Its scalability is evidenced by the consistent improvement in performance noted as model sizes increase from 7 million to 0.5 billion parameters.

The fine-tuning results on diverse downstream tasks, including high-resolution turbulence flow prediction, 3D Navier-Stokes equations, and steady-state PDEs, underscore the versatility and adaptability of DPOT. It excels in transferring learned representations to higher-dimensional and data-scarce tasks, consolidating its utility as a foundational model in scientific machine learning.

Theoretical Insights

The theoretical underpinnings of DPOT are reinforced by a universal approximation theorem for its Fourier attention layers, which assures that they can approximate any continuous operator functions to a desired degree of accuracy. This positions DPOT as a robust framework capable of capturing the intricate dependencies in PDE data.

Future Directions

The research opens several avenues for further exploration. The integration of advanced noise-crafting techniques and adaptive learning rates could potentially enhance pre-training stability and efficiency. Moreover, extending the DPOT framework to support real-time simulations and integrating it with reinforcement learning environments could vastly expand its applicability.

In conclusion, the DPOT framework represents a significant step forward in the domain of large-scale PDE pre-training, providing a scalable solution capable of generalizing across diverse PDE tasks. Its contributions to robustness, flexibility, and computational efficiency mark it as a promising candidate for future research and application in scientific computing.

X Twitter Logo Streamline Icon: https://streamlinehq.com