Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Poseidon: Efficient Foundation Models for PDEs (2405.19101v2)

Published 29 May 2024 in cs.LG

Abstract: We introduce Poseidon, a foundation model for learning the solution operators of PDEs. It is based on a multiscale operator transformer, with time-conditioned layer norms that enable continuous-in-time evaluations. A novel training strategy leveraging the semi-group property of time-dependent PDEs to allow for significant scaling-up of the training data is also proposed. Poseidon is pretrained on a diverse, large scale dataset for the governing equations of fluid dynamics. It is then evaluated on a suite of 15 challenging downstream tasks that include a wide variety of PDE types and operators. We show that Poseidon exhibits excellent performance across the board by outperforming baselines significantly, both in terms of sample efficiency and accuracy. Poseidon also generalizes very well to new physics that is not seen during pretraining. Moreover, Poseidon scales with respect to model and data size, both for pretraining and for downstream tasks. Taken together, our results showcase the surprising ability of Poseidon to learn effective representations from a very small set of PDEs during pretraining in order to generalize well to unseen and unrelated PDEs downstream, demonstrating its potential as an effective, general purpose PDE foundation model. Finally, the Poseidon model as well as underlying pretraining and downstream datasets are open sourced, with code being available at https://github.com/camlab-ethz/poseidon and pretrained models and datasets at https://huggingface.co/camlab-ethz.

Citations (15)

Summary

  • The paper demonstrates a novel scOT architecture with time-conditioned normalization to enable continuous-in-time PDE evaluations.
  • The paper employs an innovative all2all training strategy that leverages the semi-group property, significantly expanding training data volume.
  • The paper achieves robust generalization across 15 diverse downstream tasks, outperforming traditional baselines in accuracy and sample efficiency.

Poseidon: Efficient Foundation Models for PDEs

The paper introduces Poseidon, a novel foundation model for learning solution operators of Partial Differential Equations (PDEs). Poseidon is built upon a scalable Operator Transformer (scOT) architecture, enriched with time-conditioned layer normalization to support continuous-in-time evaluations. This model leverages a distinct training strategy that utilizes the semi-group property of time-dependent PDEs, thereby enhancing the scale of the training data. Pretrained on a comprehensive dataset for fluid dynamics equations, Poseidon demonstrates superior performance across 15 diverse downstream tasks, showcasing its generalization to unseen physics and emphasizing sample efficiency and accuracy. Importantly, the pretraining and downstream datasets, as well as the Poseidon model itself, are made publicly accessible for further research.

Introduction

PDEs are fundamental in modeling various physical phenomena across multiple domains. Traditional numerical methods such as finite difference, finite element, and spectral methods, though effective, often incur high computational costs, especially for many-query problems. This complexity has driven the development of data-driven ML methods for simulating PDEs, among which operator learning algorithms have shown significant promise. These algorithms aim to map function space inputs (like initial and boundary conditions) to PDE solutions, leveraging methods like convolutions, graph neural networks, and transformers.

Model Architecture

Poseidon is underpinned by scOT, a hierarchical multiscale vision transformer enhanced with SwinV2 attention. It processes inputs as patch embeddings, which are transformed through a sequence of windowed multi-head self-attention layers and MLPs, with shifting windows ensuring comprehensive domain attention. Additionally, layer normalization is dynamically modulated by time to support continuous timescales. The architecture employs a U-Net style encoder-decoder construct, utilizing ConvNeXt layers for efficient high-dimensional feature mapping.

Training and Inference Strategy

A notable contribution is the all2all training strategy, which maximizes training data usage by exploiting the semi-group property inherent in time-dependent PDEs. This strategy significantly increases data volume for trajectories, enhancing training efficiency and model robustness. For inference, Poseidon can generate full solution trajectories either through direct application for continuous timescales or via autoregressive rollouts.

Pretraining and Finetuning

Poseidon is pretrained on a dataset encompassing the compressible Euler and incompressible Navier-Stokes equations, selected for their diverse physical characteristics like shocks, turbulence, and mixing layers. The pretraining data includes trajectories sampled at uniform intervals, forming a comprehensive base for downstream task generalization. Finetuning on downstream tasks involves updating only a subset of model parameters, allowing Poseidon to efficiently adapt to new data distributions while leveraging pre-learned representations.

Experimental Evaluations

Poseidon’s performance is thoroughly evaluated on 15 downstream tasks spanning various PDE types and complexities. These tasks cover different PDE classifications such as linear/nonlinear, elliptic/parabolic/hyperbolic/mixed types, and diverse physical phenomena across spatio-temporal scales. Poseidon consistently outperforms traditional baselines such as FNO and CNO, showcasing significant gains in accuracy and sample efficiency. The model's performance is also robust across tasks involving PDEs unseen during pretraining, indicating strong generalization capabilities.

Scaling and Dataset Quality

Poseidon exhibits scalable performance with respect to model size, demonstrating that larger models yield lower training and evaluation losses. The model also scales with the size and diversity of the pretraining dataset, with larger and more diverse datasets resulting in better performance on downstream tasks. These findings highlight the importance of extensive and varied pretraining data for foundational models in PDE learning.

Case Studies

Three case studies elucidate Poseidon’s intrinsic ability to leverage prelearned representations for new tasks. For instance, in the CE-RPUI task, Poseidon efficiently learns shock propagation and vortices by integrating features from different pretraining operators. In the ACE task, the model rapidly adapts to reaction-diffusion dynamics, showcasing the flexibility and adaptability of learned representations. These case studies highlight how Poseidon synthesizes multiple features from its pretraining phase for effective downstream task adaptation.

Conclusion

Poseidon sets a new benchmark in the field of PDE foundation models. By innovative utilization of scOT and advanced training strategies, Poseidon not only demonstrates excellent accuracy and efficiency but also showcases robust generalization to varied and complex physical phenomena. The open-source release of Poseidon and its datasets further underscores its potential for broad applicability and future advancements in the field. These findings affirm the feasibility of developing general-purpose PDE foundation models capable of addressing diverse and challenging tasks in computational physics and beyond.

Github Logo Streamline Icon: https://streamlinehq.com