General Physics Transformer (GPₕyT)
- GPₕyT is a deep learning architecture that fuses transformer-based neural differentiation with classical numerical integration to simulate a wide range of physical phenomena without explicit equations.
- The model processes simulation frames using unified spatiotemporal self-attention and incorporates local gradient information to accurately capture sharp features like shock fronts.
- Benchmark evaluations show up to a 29-fold reduction in mean squared error compared to standard models, highlighting its zero-shot generalization across multiple physics domains.
The General Physics Transformer (GPₕyT) is a hybrid deep learning architecture that establishes a foundation model paradigm for computational physics. GPₕyT leverages the unified context-aware sequence modeling power of transformers, in combination with classical numerical integration schemes, to simulate a vast range of physical phenomena—including fluid-solid interactions, shock waves, thermal convection, and multi-phase flows—in a zero-shot setting, without explicit knowledge of the governing equations (Wiesner et al., 17 Sep 2025).
1. Architectural Foundations
GPₕyT is constructed as a dual-component system consisting of a transformer-based neural differentiator and an explicit numerical integrator. Physical state sequences, typically fields such as pressure, velocity, or temperature, are ingested as a stack of simulation frames. The model applies a linear transformation over both spatial and temporal axes, yielding non-overlapping tubelet-like patches which are augmented with absolute positional encodings. Stacks of these tokens are processed by transformer blocks with unified spatiotemporal self-attention, producing a tokenized representation suitable for context-aware sequence analysis.
The model concatenates first-order spatial and temporal derivatives—computed via central differences—along the channel dimension. This explicit inclusion of local gradient information facilitates the accurate handling of sharp solution features, such as shock fronts. The transformer then predicts the time derivative of the physical state, ∂X/∂t, over the prescribed spatial domain. The subsequent state is then computed using a classical time integration scheme, most commonly the Forward Euler method:
Despite the simplicity of this first-order integrator, empirical ablations indicate no significant advantage in moving to higher-order schemes (e.g. RK4) for the bulk of tasks addressed.
2. Data Regime and Training Strategies
GPₕyT's training data regime spans over 1.8 TB of simulation data extracted from eight diverse public and private datasets, amounting to more than 2.4 million unique snapshots. This corpus includes incompressible and compressible flow, shock dynamics (Euler equations), obstacle flow, Rayleigh–Bénard convection, and multiphase flow through porous media.
Each training example consists of a sequence of consecutive state snapshots—between four and sixteen, depending on the dataset—which serve as the model's “prompt,” conditioning it on the recent trajectory and enabling in-context learning of underlying dynamics. Crucially, the time increments Δt are randomized, compelling the model to infer temporal scales from context rather than relying on fixed step size, and each dataset is individually normalized to accentuate learning of relative, rather than absolute, physical magnitudes.
A standard mean squared error loss between the predicted and ground-truth physical state at the next step is employed:
3. Unified Multi-Domain Performance and Comparative Analysis
GPₕyT demonstrates the ability to infer and apply governing physical dynamics across markedly heterogeneous physics domains, including:
- Incompressible shear and obstacle flows (Navier–Stokes)
- Compressible shock phenomena (Euler equations)
- Buoyancy-driven convection
- Multiphase flows (e.g., drainage and imbibition in porous substrates)
When benchmarked on one-step prediction tasks, GPₕyT achieves up to a 5-fold reduction in median MSE compared to standard UNet baselines and up to a 29-fold improvement over Fourier Neural Operator (FNO) models of similar scale. For both smooth and discontinuous systems, GPₕyT maintains sharp interfaces (e.g., shock fronts) and fine-scale coherent structures, exhibiting resilience against over-smoothing and loss of high-frequency detail over long prediction horizons.
4. Generalization, Zero-shot and In-Context Learning
A defining characteristic of GPₕyT is its foundation-model ability for zero-shot generalization. The transformer architecture enables in-context learning: previous state sequences act as prompts, allowing the model to infer system-specific dynamics at inference time without explicit access to governing equations.
Zero-shot experiments demonstrate that GPₕyT:
- Accurately simulates systems with novel boundary conditions (e.g., open rather than periodic/symmetric)
- Produces physical rollouts (e.g., bow shock formation) for types of flows (supersonic shock, turbulent radiative layer) absent from the training corpus
- Maintains global invariances and produces plausible field evolution, despite increases in local error on tasks furthest from the training distribution
Notably, even with error accumulation in high-frequency components over 50-timestep rollouts, global flow structures and vortex coherence are preserved.
5. Governing Equations, Input Targeting, and Implicit Modeling
The GPₕyT paradigm departs from classical simulation in that explicit knowledge or imposition of partial differential equations (PDEs) is unnecessary. Instead, the model infers temporal evolution via data-driven approximation of the time derivative. For reference, training data include systems governed by equations such as:
- Incompressible Navier–Stokes:
- Compressible Euler equations for shocks:
The model is agnostic to these equations: only state sequences are provided.
6. Implications and the Path Toward Universal Physics Foundation Models
GPₕyT represents a paradigm shift relative to narrow, equation-specific neural surrogates. By training a single model on vast, multi-domain, multi-scale data, and relying on context-aware neural architectures, GPₕyT demonstrates that foundation model behavior—train once, deploy across domains—is achievable in physics.
This has implications for:
- Democratizing access to scientific simulation by reducing the need for domain-specific solver development and tuning
- The possibility of universal physics foundation models (PFMs) that can be extended to 3D, additional physical domains (e.g., mechanics, chemistry), and arbitrary boundary conditions
- The acceleration of computational science via direct, high-fidelity surrogate modeling deployable in diverse scientific and engineering environments
Key directions identified include improving stability for long-horizon rollouts, extending to higher-dimensional and multi-resolution systems, and further scaling model and data.
7. Contextual Relationship to Related Work
The GPₕyT advances on approaches constrained to a single physics domain or equation family (e.g., PINNsFormer (Zhao et al., 2023), PDE-Transformer (Holzschuh et al., 30 May 2025)) by demonstrating that transformer models can infer physical processes directly from data, including local gradient features, over a large heterogeneity of phenomena. Contrasted with approaches that require equation knowledge and tailored model architectures, GPₕyT achieves accurate, physically plausible extrapolation—establishing the feasibility of physics foundation models analogous to those proved effective in language and vision domains (Wiesner et al., 17 Sep 2025).