CoDA-NO: Codomain Attention Neural Operator

Updated 6 October 2025

CoDA-NO is a neural operator that uses codomain tokenization and functional self-attention to model complex, coupled multiphysics interactions.
The architecture integrates graph neural operator blocks and variable-specific positional encodings to manage irregular geometries and enhance scalability.
It achieves data-efficient pretraining and robust generalization, reducing simulation errors by over 36% on challenging multiphysics benchmarks.

The Codomain Attention Neural Operator (CoDA-NO) is a neural operator architecture designed to efficiently solve multiphysics partial differential equations (PDEs) characterized by coupled physical fields, complex geometries, and limited high-resolution training data. CoDA-NO introduces codomain ("channel-wise") attention, extending the transformer paradigm from sequence and image processing to learning mappings between function spaces with a focus on the codomain variables. The architecture’s core design enables flexible, data-efficient pretraining and superior generalization properties in challenging multiphysics simulation settings.

1. Concept and Mathematical Foundation

CoDA-NO tokenizes input functions along the codomain or channel space. Each physical variable—such as velocity, pressure, or displacement—is treated as an individual token, enabling the model to explicitly capture dependencies among variables that co-evolve within multiphysics systems. Let $a : D \to \mathbb{R}^{d_{\text{in}}}$ be the input function on spatial domain $D \subseteq \mathbb{R}^p$ ; it is decomposed into $d_{\text{in}}$ functions $a_1, \ldots, a_{d_{\text{in}}}$ , each a distinctive codomain token.

The attention mechanism is rigorously extended to infinite-dimensional function spaces. The operators for tokens $w_i$ are formulated as:

Key operator: $K: w_i \mapsto \{k_i: D \to \mathbb{R}^{d_k}\}$ ,
Query operator: $Q: w_i \mapsto \{q_i: D \to \mathbb{R}^{d_q}\}$ ,
Value operator: $V: w_i \mapsto \{v_i: D \to \mathbb{R}^{d_v}\}$ .

Self-attention is then defined via an integral inner product:

$(q_i, k_i) = \int_D q_i(x) \cdot k_i(x) \, dx,$

producing the output token function:

$o_i = \text{Softmax}\left(\frac{(q_i, k_i)}{T}\right) \cdot v_i,$

where $T$ is a temperature parameter. This functional extension ensures discretization invariance and converges in the limit to operator-valued mappings.

2. Addressed Challenges in Multiphysics Operator Learning

CoDA-NO addresses several limitations inherent to classical neural operator frameworks:

Multiphysics Coupling: Standard operators often neglect explicit modeling of dependencies among physical fields. Codomain tokenization and attention allow CoDA-NO to capture variable interactions across different physical subsystems.
Irregular Geometries and Grids: Many real-world PDE systems are posed on non-uniform meshes or domains. CoDA-NO layers utilize graph neural operator (GNO) encoders and decoders, mapping variable functions from arbitrary mesh discretizations onto a uniform latent grid for further processing.
Sparse High-Resolution Data: High-fidelity data acquisition is computationally and experimentally expensive. Through self-supervised pretraining on partially masked or unlabeled data, CoDA-NO efficiently extracts regularities in the underlying physics.
Architectural Flexibility: The codomain-wise design allows seamless adaptation to PDE systems with varying numbers or types of variables, avoiding costly retraining or bespoke network surgery.

3. Architectural Methodology

The CoDA-NO architecture incorporates several novel strategies:

Codomain Tokenization

Each input channel (physical variable) is processed as an independent codomain token, $a_j$ , facilitating targeted modeling of inter-variable couplings. These are processed in parallel, with per-variable architectural flexibility.

Variable Specific Positional Encoding (VSPE)

Each token $a_j$ is augmented by a learnable positional encoding $e_j: D \to \mathbb{R}^{d_{\text{en}}}$ , producing extended representations:

$w_j = P([a_j, e_j])$

where $P$ is a lifting operator, and $[a_j, e_j]$ denotes concatenation. This encodes spatial context per variable.

Functional Self-Attention Mechanism

Key, query, and value operators defined over function spaces yield attention weights via integral dot products. Multi-head attention is realized by independent projection and concatenation over the codomain tokens, enabling rich cross-variable interaction.

Normalization in Function Spaces

Instance normalization is extended: For any $w$ , mean $p$ and standard deviation $\sigma$ are computed as

$p = \frac{1}{|D|} \int_D w(x) dx, \quad \sigma = \sqrt{\frac{1}{|D|} \int_D (w(x) - p)^2 dx},$

with normalization

$\text{Norm}[w](x) = g \odot \frac{w(x) - p}{\sigma + \epsilon} + b,$

where $g$ , $b$ are trainable, and $\epsilon$ is for stability.

Graph Neural Operator (GNO) Blocks

Irregular geometric inputs and outputs are encoded and decoded through GNO layers, which act as canonical function interpolators between unstructured domains and uniform latent grids.

4. Pretraining and Learning Strategies

CoDA-NO employs domain-agnostic, self-supervised learning, which is crucial for generalizing representations across families of PDEs:

Masked Reconstruction Pretraining: The model reconstructs original functions from masked inputs (randomly zeroed evaluations or variables), compelling the encoding of mutual dependencies and intrinsic structure among variables.
Domain-Agnostic Transfer: Pretraining does not require knowledge of target tasks; learned encodings are reused for downstream PDEs, with only minimal fine-tuning required on a small number of supervised examples. Additional output variables at test time (e.g., displacements in FSI scenarios) are supported by incorporating new VSPEs into the system without architectural modifications.
Two-Stage Training: The encoder and VSPE are pretrained with a functional "Reconstructor," which is later replaced by a "Predictor," fine-tuned on supervised task data.

5. Empirical Performance and Evaluation

CoDA-NO has been extensively validated on canonical and multiphysics PDE benchmarks:

Downstream Tasks: Includes Navier–Stokes fluid simulations, fluid–structure interaction with elastic coupling, and Rayleigh-Bénard convection.
Supervised Few-Shot Evaluation: In regimes with restricted labeled data, CoDA-NO demonstrates superior performance, reducing task errors by over 36% compared to baselines such as GINO, DeepONet, GNN, UNet, and vision transformers.
Cross-System Adaptability: A pretrained model generalizes robustly to systems with new variables, with the mere addition of corresponding VSPEs yielding strong adaptation—no retraining of the core architecture is required.
Mesh Robustness: Through GNO layers, the approach is demonstrated on variable and irregular meshes, confirming discretization-invariant capabilities in simulation settings that are challenging for other architectures.
Ablation and Physical Consistency: Component ablations highlight the critical contributions of codomain-wise tokenization, VSPE, and function-space normalization. Spectral analysis shows that predictions maintain physical energy distributions consistent with the true underlying systems.

6. Theoretical and Practical Implications

Unified Operator Framework: By structuring neural operators along the codomain and leveraging function-space attention, CoDA-NO unifies modeling for multiphysics systems with arbitrary variable composition.
Generalization and Scalability: The architecture provides a path to sample-efficient, scalable neural operators, benefiting from both theoretical universal approximation (by functional attention mechanisms) and empirical robustness to mesh, geometry, and data-scarce settings.
Foundation Model for Scientific Computing: A plausible implication is that CoDA-NO can serve as a reusable backbone for PDE simulation, inference, and analysis, possibly filling the role of a "foundation model" in computational science analogous to those in natural language processing.
Future Extensions: Owing to the flexibility of its codomain-attention design, CoDA-NO can be extended towards interpretable physical modeling or integrated with physically motivated regularization, as suggested by connections with operator-theoretic, kernel-based, and nonlocal attention frameworks (Calvello et al., 10 Jun 2024, Yu et al., 14 Aug 2024, Kissas et al., 2022).

7. Context within Attention-Based and Operator Learning Paradigms

CoDA-NO builds upon, and is complementary to, several directions in operator learning:

Coupled Output Attention: LOCA (Kissas et al., 2022) shares the codomain-centric attention idea, coupling output query locations with kernel-based softmax weighting to encode output correlations and ensure expressivity and generalization in operator learning.
Continuum Attention in Function Spaces: Transformer architectures extended to infinite-dimensional contexts (functionals, integral operators) generalize standard attention paradigms, allowing direct operator approximation in discretization-invariant fashion (Calvello et al., 10 Jun 2024).
Regularization and Interpretability: Orthogonal attention mechanisms impose spectral regularization, while nonlocal and double-integral attention (NAO) architectures enable generalizability and physically meaningful kernel representations (Xiao et al., 2023, Yu et al., 14 Aug 2024). CoDA-NO’s codomain focus further enhances interpretability and adaptability.
Spatio-Temporal Operator Decomposition: Architectures such as ASNO (Karkaria et al., 12 Jun 2025) demonstrate how modular attention designs can be tailored to separable physical processes (temporal evolution, external spatial forcing), suggesting a broader design space for operator decomposition where codomain attention is integral.

In conclusion, CoDA-NO establishes a new paradigm for learning, adapting, and generalizing mappings between function spaces in multiphysics PDE settings, directly exploiting codomain structure via advanced attention mechanisms. The architecture’s functional design, pretraining strategy, and empirical efficacy render it a significant and extensible contribution to the field of neural operators.