- The paper demonstrates a novel scOT architecture with time-conditioned normalization to enable continuous-in-time PDE evaluations.
- The paper employs an innovative all2all training strategy that leverages the semi-group property, significantly expanding training data volume.
- The paper achieves robust generalization across 15 diverse downstream tasks, outperforming traditional baselines in accuracy and sample efficiency.
Poseidon: Efficient Foundation Models for PDEs
The paper introduces Poseidon, a novel foundation model for learning solution operators of Partial Differential Equations (PDEs). Poseidon is built upon a scalable Operator Transformer (scOT) architecture, enriched with time-conditioned layer normalization to support continuous-in-time evaluations. This model leverages a distinct training strategy that utilizes the semi-group property of time-dependent PDEs, thereby enhancing the scale of the training data. Pretrained on a comprehensive dataset for fluid dynamics equations, Poseidon demonstrates superior performance across 15 diverse downstream tasks, showcasing its generalization to unseen physics and emphasizing sample efficiency and accuracy. Importantly, the pretraining and downstream datasets, as well as the Poseidon model itself, are made publicly accessible for further research.
Introduction
PDEs are fundamental in modeling various physical phenomena across multiple domains. Traditional numerical methods such as finite difference, finite element, and spectral methods, though effective, often incur high computational costs, especially for many-query problems. This complexity has driven the development of data-driven ML methods for simulating PDEs, among which operator learning algorithms have shown significant promise. These algorithms aim to map function space inputs (like initial and boundary conditions) to PDE solutions, leveraging methods like convolutions, graph neural networks, and transformers.
Model Architecture
Poseidon is underpinned by scOT, a hierarchical multiscale vision transformer enhanced with SwinV2 attention. It processes inputs as patch embeddings, which are transformed through a sequence of windowed multi-head self-attention layers and MLPs, with shifting windows ensuring comprehensive domain attention. Additionally, layer normalization is dynamically modulated by time to support continuous timescales. The architecture employs a U-Net style encoder-decoder construct, utilizing ConvNeXt layers for efficient high-dimensional feature mapping.
Training and Inference Strategy
A notable contribution is the all2all training strategy, which maximizes training data usage by exploiting the semi-group property inherent in time-dependent PDEs. This strategy significantly increases data volume for trajectories, enhancing training efficiency and model robustness. For inference, Poseidon can generate full solution trajectories either through direct application for continuous timescales or via autoregressive rollouts.
Pretraining and Finetuning
Poseidon is pretrained on a dataset encompassing the compressible Euler and incompressible Navier-Stokes equations, selected for their diverse physical characteristics like shocks, turbulence, and mixing layers. The pretraining data includes trajectories sampled at uniform intervals, forming a comprehensive base for downstream task generalization. Finetuning on downstream tasks involves updating only a subset of model parameters, allowing Poseidon to efficiently adapt to new data distributions while leveraging pre-learned representations.
Experimental Evaluations
Poseidon’s performance is thoroughly evaluated on 15 downstream tasks spanning various PDE types and complexities. These tasks cover different PDE classifications such as linear/nonlinear, elliptic/parabolic/hyperbolic/mixed types, and diverse physical phenomena across spatio-temporal scales. Poseidon consistently outperforms traditional baselines such as FNO and CNO, showcasing significant gains in accuracy and sample efficiency. The model's performance is also robust across tasks involving PDEs unseen during pretraining, indicating strong generalization capabilities.
Scaling and Dataset Quality
Poseidon exhibits scalable performance with respect to model size, demonstrating that larger models yield lower training and evaluation losses. The model also scales with the size and diversity of the pretraining dataset, with larger and more diverse datasets resulting in better performance on downstream tasks. These findings highlight the importance of extensive and varied pretraining data for foundational models in PDE learning.
Case Studies
Three case studies elucidate Poseidon’s intrinsic ability to leverage prelearned representations for new tasks. For instance, in the CE-RPUI task, Poseidon efficiently learns shock propagation and vortices by integrating features from different pretraining operators. In the ACE task, the model rapidly adapts to reaction-diffusion dynamics, showcasing the flexibility and adaptability of learned representations. These case studies highlight how Poseidon synthesizes multiple features from its pretraining phase for effective downstream task adaptation.
Conclusion
Poseidon sets a new benchmark in the field of PDE foundation models. By innovative utilization of scOT and advanced training strategies, Poseidon not only demonstrates excellent accuracy and efficiency but also showcases robust generalization to varied and complex physical phenomena. The open-source release of Poseidon and its datasets further underscores its potential for broad applicability and future advancements in the field. These findings affirm the feasibility of developing general-purpose PDE foundation models capable of addressing diverse and challenging tasks in computational physics and beyond.