UPS: Efficiently Building Foundation Models for PDE Solving via Cross-Modal Adaptation
Abstract: We present Unified PDE Solvers (UPS), a data- and compute-efficient approach to developing unified neural operators for diverse families of spatiotemporal PDEs from various domains, dimensions, and resolutions. UPS embeds different PDEs into a shared representation space and processes them using a FNO-transformer architecture. Rather than training the network from scratch, which is data-demanding and computationally expensive, we warm-start the transformer from pretrained LLMs and perform explicit alignment to reduce the modality gap while improving data and compute efficiency. The cross-modal UPS achieves state-of-the-art results on a wide range of 1D and 2D PDE families from PDEBench, outperforming existing unified models using 4 times less data and 26 times less compute. Meanwhile, it is capable of few-shot transfer to unseen PDE families and coefficients.
- Learning data-driven discretizations for partial differential equations. Proceedings of the National Academy of Sciences, 116(31):15344–15349, 2019.
- Boyd, J. P. Chebyshev and Fourier spectral methods. Courier Corporation, 2001.
- Chemcrow: Augmenting large-language models with chemistry tools. arXiv preprint arXiv:2304.05376, 2023.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Neural galerkin schemes with active learning for high-dimensional evolution equations. Journal of Computational Physics, 496:112588, 2024.
- Cao, S. Choose a transformer: Fourier or galerkin. Advances in neural information processing systems, 34:24924–24940, 2021.
- Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to dynamical systems. IEEE Transactions on Neural Networks, 6(4):911–917, 1995.
- Scaling instruction-finetuned language models, 2022.
- An image is worth 16x16 words: Transformers for image recognition at scale. International Conference on Learning Representations, 2021.
- Gnot: A general neural operator transformer for operator learning. In International Conference on Machine Learning, pp. 12556–12569. PMLR, 2023.
- Axial attention in multidimensional transformers. International Conference on Learning Representations, 2020.
- Learning neural pde solvers with convergence guarantees. arXiv preprint arXiv:1906.01200, 2019.
- Gene set summarization using large language models. ArXiv, 2023.
- Solving parametric pde problems with artificial neural networks. European Journal of Applied Mathematics, 32(3):421–435, 2021.
- Machine learning–accelerated computational fluid dynamics. Proceedings of the National Academy of Sciences, 118(21):e2101784118, 2021.
- Kopriva, D. A. Implementing spectral methods for partial differential equations: Algorithms for scientists and engineers. Springer Science & Business Media, 2009.
- LeVeque, R. J. Finite difference methods for ordinary and partial differential equations: steady-state and time-dependent problems. SIAM, 2007.
- Solving quantitative reasoning problems with language models. Advances in Neural Information Processing Systems, 35:3843–3857, 2022.
- Fourier neural operator for parametric partial differential equations. arXiv preprint arXiv:2010.08895, 2020a.
- Neural operator: Graph kernel network for partial differential equations. arXiv preprint arXiv:2003.03485, 2020b.
- Transformer for partial differential equations’ operator learning. arXiv preprint arXiv:2205.13671, 2022.
- Pde-refiner: Achieving accurate long rollouts with neural pde solvers. arXiv preprint arXiv:2308.05732, 2023.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
- Unified-io 2: Scaling autoregressive multimodal models with vision, language, audio, and action. ArXiv, abs/2312.17172, 2023. URL https://api.semanticscholar.org/CorpusID:266573555.
- Frozen pretrained transformers as universal computation engines. Proceedings of the AAAI Conference on Artificial Intelligence, 36(7):7628–7636, Jun. 2022.
- Deeponet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators. arXiv preprint arXiv:1910.03193, 2019.
- Deep equilibrium based neural operators for steady-state pdes. arXiv preprint arXiv:2312.00234, 2023.
- Multiple physics pretraining for physical surrogate models. arXiv preprint arXiv:2310.02994, 2023.
- The finite volume method. Springer, 2016.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pp. 8748–8763. PMLR, 2021.
- Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
- Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational physics, 378:686–707, 2019.
- U-net: Convolutional networks for biomedical image segmentation. ArXiv, abs/1505.04597, 2015. URL https://api.semanticscholar.org/CorpusID:3719281.
- Introduction to the conll-2003 shared task: Language-independent named entity recognition. In Conference on Computational Natural Language Learning, 2003. URL https://api.semanticscholar.org/CorpusID:2470716.
- Cross-modal fine-tuning: Align then refine. arXiv preprint arXiv:2302.05738, 2023.
- Tag-llm: Repurposing general-purpose llms for specialized domains, 2024.
- A deep learning algorithm for solving partial differential equations. ArXiv e-prints, 2017.
- Towards foundation models for scientific machine learning: Characterizing scaling and transfer behavior. arXiv preprint arXiv:2306.00258, 2023.
- Pdebench: An extensive benchmark for scientific machine learning. Advances in Neural Information Processing Systems, 35:1596–1611, 2022.
- Llama: Open and efficient foundation language models. ArXiv, abs/2302.13971, 2023. URL https://api.semanticscholar.org/CorpusID:257219404.
- NAS-bench-360: Benchmarking neural architecture search on diverse tasks. In Advances in Neural Information Processing Systems (NeurIPS) Datasets and Benchmarks Track, 2022.
- Reprogramming pretrained language models for protein sequence representation learning. arXiv preprint arXiv:2301.02120, 2023.
- Yu, B. et al. The deep ritz method: a deep learning-based numerical algorithm for solving variational problems. Communications in Mathematics and Statistics, 6(1):1–12, 2018.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.