DPOT: Auto-Regressive Denoising Operator Transformer for Large-Scale PDE Pre-Training (2403.03542v4)
Abstract: Pre-training has been investigated to improve the efficiency and performance of training neural operators in data-scarce settings. However, it is largely in its infancy due to the inherent complexity and diversity, such as long trajectories, multiple scales and varying dimensions of partial differential equations (PDEs) data. In this paper, we present a new auto-regressive denoising pre-training strategy, which allows for more stable and efficient pre-training on PDE data and generalizes to various downstream tasks. Moreover, by designing a flexible and scalable model architecture based on Fourier attention, we can easily scale up the model for large-scale pre-training. We train our PDE foundation model with up to 0.5B parameters on 10+ PDE datasets with more than 100k trajectories. Extensive experiments show that we achieve SOTA on these benchmarks and validate the strong generalizability of our model to significantly enhance performance on diverse downstream PDE tasks like 3D data. Code is available at \url{https://github.com/thu-ml/DPOT}.
- Neural operator-based surrogate solver for free-form electromagnetic inverse design. ACS Photonics, 2023.
- Neural operators for accelerating scientific simulations and design. arXiv preprint arXiv:2309.15325, 2023.
- Scheduled sampling for sequence prediction with recurrent neural networks. Advances in neural information processing systems, 28, 2015.
- Message passing neural pde solvers. arXiv preprint arXiv:2202.03376, 2022.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Cao, S. Choose a transformer: Fourier or galerkin. Advances in neural information processing systems, 34:24924–24940, 2021.
- Scientific machine learning through physics–informed neural networks: Where we are and what’s next. Journal of Scientific Computing, 92(3):88, 2022.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Funahashi, K.-I. On the approximate realization of continuous mappings by neural networks. Neural networks, 2(3):183–192, 1989.
- Adaptive fourier neural operators: Efficient token mixers for transformers. arXiv preprint arXiv:2111.13587, 2021.
- Towards multi-spatiotemporal-scale generalized pde modeling. arXiv preprint arXiv:2209.15616, 2022.
- Gnot: A general neural operator transformer for operator learning. In International Conference on Machine Learning, pp. 12556–12569. PMLR, 2023.
- Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9729–9738, 2020.
- Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 16000–16009, 2022.
- Highly accurate protein structure prediction with alphafold. Nature, 596(7873):583–589, 2021.
- Physics-informed machine learning. Nature Reviews Physics, 3(6):422–440, 2021.
- Variational physics-informed neural networks for solving partial differential equations. arXiv preprint arXiv:1912.00873, 2019.
- On universal approximation and error bounds for fourier neural operators. The Journal of Machine Learning Research, 22(1):13237–13312, 2021.
- Fourier neural operator for parametric partial differential equations. arXiv preprint arXiv:2010.08895, 2020.
- Physics-informed neural operator for learning partial differential equations. arXiv preprint arXiv:2111.03794, 2021.
- Transformer for partial differential equations’ operator learning. arXiv preprint arXiv:2205.13671, 2022a.
- Fourier neural operator approach to large eddy simulation of three-dimensional turbulence. Theoretical and Applied Mechanics Letters, 12(6):100389, 2022b.
- Geometry-informed neural operator for large-scale 3d pdes. arXiv preprint arXiv:2309.00583, 2023.
- Nuno: A general framework for learning parametric pdes with non-uniform data. arXiv preprint arXiv:2305.18694, 2023.
- Learning nonlinear operators via deeponet based on the universal approximation theorem of operators. Nature machine intelligence, 3(3):218–229, 2021.
- Cfdbench: A comprehensive benchmark for machine learning methods in fluid dynamics. arXiv preprint arXiv:2310.05963, 2023.
- Multiple physics pretraining for physical surrogate models. arXiv preprint arXiv:2310.02994, 2023.
- Self-supervised learning with lie symmetries for partial differential equations. In ICLR 2023 Workshop on Physics for Machine Learning, 2023.
- Neural operator learning for long-time integration in dynamical systems with recurrent neural networks. arXiv preprint arXiv:2303.02243, 2023.
- Ai foundation models for weather and climate: Applications, design, and implementation. arXiv preprint arXiv:2309.10808, 2023.
- Climax: A foundation model for weather and climate. arXiv preprint arXiv:2301.10343, 2023.
- Vito: Vision transformer-operator. arXiv preprint arXiv:2303.08891, 2023.
- Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. arXiv preprint arXiv:2202.11214, 2022.
- Improving language understanding by generative pre-training. 2018.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
- Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational physics, 378:686–707, 2019.
- Convolutional neural operators. arXiv preprint arXiv:2302.01178, 2023.
- U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pp. 234–241. Springer, 2015.
- Deep neural operators as accurate surrogates for shape optimization. Engineering Applications of Artificial Intelligence, 129:107615, 2024.
- Towards foundation models for scientific machine learning: Characterizing scaling and transfer behavior. arXiv preprint arXiv:2306.00258, 2023.
- Pdebench: An extensive benchmark for scientific machine learning. Advances in Neural Information Processing Systems, 35:1596–1611, 2022.
- Factorized fourier neural operators. arXiv preprint arXiv:2111.13802, 2021.
- Understanding and mitigating gradient flow pathologies in physics-informed neural networks. SIAM Journal on Scientific Computing, 43(5):A3055–A3081, 2021a.
- Learning the solution operator of parametric partial differential equations with physics-informed deeponets. Science advances, 7(40):eabi8605, 2021b.
- Solving high-dimensional pdes with latent spectral models. arXiv preprint arXiv:2301.12664, 2023.
- Group normalization. In Proceedings of the European conference on computer vision (ECCV), pp. 3–19, 2018.
- In-context operator learning with data prompts for differential equation problems. Proceedings of the National Academy of Sciences, 120(39):e2310142120, 2023.
- Introduction to partial differential equations with applications. Courier Corporation, 1986.
- Bridging the gap between training and inference for neural machine translation. arXiv preprint arXiv:1906.02448, 2019.
- Recfno: a resolution-invariant flow and heat field reconstruction method from sparse observations via fourier neural operator. International Journal of Thermal Sciences, 195:108619, 2024.
- Uni-mol: a universal 3d molecular representation learning framework. 2023.