A Robust Foundation Model for Conservation Laws: Injecting Context into Flux Neural Operators via Recurrent Vision Transformers

Published 6 May 2026 in cs.LG | (2605.05488v1)

Abstract: We propose an architecture that augments the Flux Neural Operator (Flux NO), which combines the classical finite volume method (FVM) with neural operators, with ViT-based context injection. Our model is formulated as a hypernetwork: it extracts solution dynamics over a finite temporal window, encodes them with a recurrent Vision Transformer, and generates the parameters of a context-conditioned neural operator. This enables the model to infer and solve conservation laws without explicit access to the governing equation or PDE coefficients. Experimentally, we show that the proposed method preserves the robustness, generalization ability, and long-time prediction advantages of Flux NO over standard neural operators, while delivering reliable numerical solutions across a broad range of conservative systems, including previously unseen fluxes. Our code is available at https://github.com/xx257xx/CONTEXT_FLUX_NO.

Abstract PDF Upgrade to Chat

Authors (2)

Summary

The paper introduces HFluxNO, a context-conditioned neural operator that integrates recurrent Vision Transformers with finite volume-inspired conservative updates.
It achieves lower relative ℓ₂ and ℓ∞ errors compared to baselines, ensuring long-time stability and robust performance in out-of-distribution scenarios.
The model adapts its flux operator parameters dynamically from trajectory data, preserving conservation without requiring explicit PDE coefficients.

Injecting Context into Flux Neural Operators via Recurrent Vision Transformers: A Robust Foundation Model for Conservation Laws

Architectural Motivation and Formulation

This paper addresses the challenge of robust, generalizable neural operator modeling for conservation-law dynamics. The authors propose an architecture that merges the inductive bias of the classical finite volume method (FVM) with context-adaptive neural operator learning. The central innovation is the integration of a recurrent Vision Transformer (ViT) into a hypernetwork framework for the Flux Neural Operator (Flux NO). This model encodes solution dynamics from finite temporal windows, generates a compact context vector via a temporally recurrent ViT encoder, and then produces the parameters for a context-conditioned Flux NO. The governing idea is that this architecture is capable of inferring and solving conservation laws without explicit access to the underlying PDE coefficients or analytical flux functions.

Crucially, the architecture enforces the conservative structure in update rules, ensuring discrete conservation and autoregressive stability—particularly for nonlinear hyperbolic problems. Unlike standard neural operators, the design constrains solution evolution through flux-difference updates, embedding the physical structure necessary for solving conservation laws robustly.

Context Injection via Vision Transformers

The context encoder operates over a short trajectory segment, leveraging temporal recurrent mixing (gated linear recurrent units) and spatial self-attention (transformer blocks). This recurrent ViT, inspired by TRec ViT, alternates temporal blocks and spatial transformer blocks across layers, culminating in a context code obtained via layer normalization and spatial averaging. The context vector is then mapped to the target-network parameters using a hypernetwork MLP.

This context injection approach allows the model to flexibly adapt its numerical flux operator for unseen dynamics, relying only on trajectory observations. This is in contrast to prior approaches, which either: (1) use generic sequence modeling architectures not tailored to preserve conservative structure, or (2) instantiate fixed flux operators incapable of in-context adaptation. The proposed method achieves an overview—adapting neural operator behavior in context while enforcing the flux-difference structure mandated by conservation laws.

Flux Neural Operator Target and Conservative Update

The target Flux NO instantiates its parameters from the context vector, producing numerical fluxes at cell interfaces. These fluxes are then used in finite-volume conservative updates. By constructing left- and right-stencil representations, the model predicts flux differences explicitly, ensuring that solution changes adhere to conservation principles.

The neural operator within the target network can utilize various forms (e.g., depth-L neural operator with kernel integral transforms), but all weights and kernel functions are generated from the encoded context. This means the architecture adapts to the latent dynamics inferred from trajectory data, preserving physical structure rather than globally approximating solution fields.

Empirical Results

Evaluated against strong recent baselines—DPOT, DISCO, ICON—on benchmark datasets including 1D cubic conservation laws, parametric shallow-water equations, and viscous Burgers-type equations, the proposed HFluxNO demonstrates superior predictive accuracy and stability. Notable findings include:

Single-step and long-horizon prediction: HFluxNO achieves consistently lower relative $\ell_2$ and $\ell^\infty$ errors for both single-step and autoregressive rollouts compared with DPOT and DISCO. DISCO's dynamical priors help with longer rollouts, but HFluxNO's conservative inductive bias delivers stronger overall performance.
Long-time stability: HFluxNO avoids the accumulation of high-frequency artifacts found in DPOT and DISCO during extended rollouts, with errors primarily arising from minor wave speed mispredictions rather than instability.
Out-of-distribution generalization: In tests with shock-dominated initial conditions and unseen equation forms (e.g., sine-flux dynamics), HFluxNO exhibits robust OOD performance—maintaining lower prediction errors compared to baselines.
Generalization beyond strictly conservative settings: The model shows strong results for the viscous Burgers equation, despite the presence of a dissipative term outside the conservation-law regime.

Numerical evidence is highlighted by consistently lower mean relative $\ell_2$ errors and competitive or improved $\ell^\infty$ errors across datasets, both within distribution and under OOD shifts.

Implications and Future Directions

Practically, this architecture offers enhanced robustness and adaptability for scientific machine learning tasks involving conservation laws, particularly in regimes where explicit equation coefficients or flux forms are unavailable. The conservative backbone ensures stability and proper physical generalization, while the context injection enables adaptation to unseen regimes—a requirement for multiphysics and real-world scenarios.

Theoretically, the study confirms that enforcing inductive bias aligned with conservation structures yields superior neural operator performance, especially over long temporal horizons and under OOD conditions. The hypernetwork approach, wherein context encoding directly generates operator parameters, bridges the gap between generic context-conditioned models and physics-informed numerical solvers.

Future research directions include extending this framework to higher-dimensional systems, more diverse equation families, multiphysics couplings, and real-world noisy observations. The scalability of context-conditioned foundation models for arbitrary PDEs remains a promising avenue, and the results presented here provide evidence that combining in-context adaptation with conservative numerical structure is a viable path forward.

Conclusion

This work introduces HFluxNO, a context-conditioned foundation model for conservation laws, leveraging recurrent Vision Transformers as context encoders and conservative numerical updates in the Flux Neural Operator. Empirical results substantiate its advantages in both predictive accuracy and robust generalization across a range of PDEs. The architecture embodies strong inductive physical priors while maintaining flexibility for in-context adaptation—setting the stage for further advancements in neural operator-based scientific computing (2605.05488).

Markdown Report Issue