Geometry-Aware Operator Transformer

Updated 1 March 2026

GAOT is a neural operator architecture that integrates geometric context into transformer attention mechanisms to solve PDEs on arbitrary domains.
It employs customized self- and cross-attention with geometry embeddings and multiscale locality to achieve superior accuracy and transferability.
GAOT frameworks overcome limitations in traditional surrogate models by enhancing geometric generalization, query flexibility, and computational efficiency.

A Geometry-Aware Operator Transformer (GAOT) is a class of neural operator architectures that integrates geometric information directly into transformer-based attention mechanisms for learning solution operators to partial differential equations (PDEs) and related physics-based problems on arbitrary domains. GAOTs address longstanding limitations of neural PDE surrogates regarding geometric generalization, query flexibility, and computational efficiency by systematically encoding local and global geometric structure into every layer, yielding accuracy and transferability across complex, unstructured, and out-of-distribution (OOD) domains (Chen et al., 12 Feb 2026, Koh et al., 18 Apr 2025, Liu et al., 28 Apr 2025, Wen et al., 24 May 2025, Adams et al., 23 Dec 2025, Zhang et al., 29 Dec 2025, Versano et al., 2024, Lin et al., 2023). The GAOT paradigm aligns attention mechanisms with inductive biases for geometric locality, boundary/structure awareness, and multiscale coupling, thereby attaining performance unattainable with purely grid-based, MLP, or canonical transformer operator learners.

1. Core Architectural Principles and Variants

GAOT frameworks encode geometry—often as point clouds, signed distance functions (SDFs), or graph neighborhoods—alongside PDE parameters and queries, and integrate this information through customized attention schemes and geometry embeddings:

Self-attention-based GAOTs: Input tokens correspond to query points in the domain $\{x_i\in\mathbb R^d\}_{i=1}^N$ , optionally augmented with SDF values $d_i$ . A shared MLP projects $[x_i; d_i]$ to feature space. Standard multi-head self-attention is applied, usually with geometry-aware positional encodings (e.g., block-diagonal Rotary Position Embeddings) that ensure the attention scores reflect spatial relationships. Output tokens encode both query location and neighborhood geometry (Chen et al., 12 Feb 2026).
Cross-attention-based GAOTs: Geometry is sampled as a point cloud $\Omega = \{(x^g_m, d^g_m)\}_{m=1}^M$ , projected into key/value embeddings; queries $\{x_i; d_i\}$ are independently projected. Query-to-geometry cross-attention (with geometry in keys/values) fuses spatial and geometric context, enabling independent sampling between geometry and query (out-of-sample spatial generalization) (Chen et al., 12 Feb 2026, Liu et al., 28 Apr 2025).
Hybrid attention: Combines a cross-attention pass from queries to geometry with a self-attention step among updated query tokens (Chen et al., 12 Feb 2026).
Multiscale- and locality-aware GAOTs: KNN or ball-query patchifying dynamically partitions the domain around each point, enabling local attention within spatial neighborhoods and concurrent global attention. Linear attention (e.g., kernelized/dot-product) is used to maintain scalability, while KNN neighborhoods inject locality and facilitate geometric inductive bias (Koh et al., 18 Apr 2025, Adams et al., 23 Dec 2025). Multi-scale radii partitionings are also used to handle disparate length-scales, e.g. in boundary layers (Wen et al., 24 May 2025).
Graph neural operator (GNO) fusion: MAGNO-style GAOTs aggregate graph node features at multiple scales using per-neighborhood attention-based quadrature weights, fusing outputs via learned softmax and adding per-patch geometric descriptors (eigenvalues, anisotropy statistics). ViT or Transformer processors act globally on geometry-encoded latent tokens (Wen et al., 24 May 2025).
Physics-geometry operator transformers (e.g. PGOT): Explicit separation of global, geometry-aware aggregation (spectrum-preserving geometric attention) and adaptive local modeling via dynamic routing in feed-forward submodules. Multi-scale geometric encodings are injected into every attention and FFN layer (Zhang et al., 29 Dec 2025).

2. Mathematical Formulation of Attention with Geometry

Across GAOT variants, the attention mechanism is systematically augmented with positional encodings, geometric relationships, or direct geometry embeddings:

For query tokens $h_i = \phi_0([x_i; d_i])$ and geometry tokens $g_m = \psi_0([x^g_m; d^g_m])$ ,

$\text{In cross-attention:}\quad q_i = \Theta(x_i) W^Q h_i,\quad k_m = \Theta(x^g_m) W^K g_m,\quad v_m = W^V g_m,$

where $\Theta(\cdot)$ applies a relative-position encoding (e.g., RoPE).

The attention is

$\alpha_{im} = \frac{\exp((q_i \cdot k_m)/\sqrt{D_h})}{\sum_{m'} \exp((q_i \cdot k_{m'})/\sqrt{D_h})},\quad o_i = \sum_{m=1}^M \alpha_{im} v_m.$

Locality-aware GAOTs further define for each point $i$ a set of $K$ nearest neighbors and calculate local attention within those patches. Multiscale variants aggregate outputs from neighborhoods of varying radii $\{r_m\}$ , fusing scale-specific outputs via a learned weight $\beta_m(y)$ (Wen et al., 24 May 2025).

Spectrum-preserving attention variants in PGOT compute, per point $i$ and scale $s$ : $P_{\mathrm{geo}}(g_i) = \sigma(W_{\mathrm{fuse}}[h_1(i) \Vert \ldots \Vert h_S(i)])$ where $h_s(i)=\phi_s(10^{s-1}g_i)$ are scale-specific geometric encodings, injected additively into queries and keys (Zhang et al., 29 Dec 2025).

3. Integration with Operator Learning Frameworks

GAOTs are typically embedded within neural operator architectures to allow query flexibility and high-order field approximation over arbitrary domains:

DeepONet integration: Geometry-aware Transformer serves as the trunk network, producing feature vectors $f_i$ at each query location; the branch network produces coefficients $c(\mu)$ , supporting non-geometric parameterization. The operator is then $u(x_i) \approx c(\mu) \cdot f_i$ (Chen et al., 12 Feb 2026, Versano et al., 2024).
Two-stage neural operator architectures: In architectures such as GINOT, the geometry encoder processes point clouds to form geometry tokens (KEY/VALUES); the solution decoder attends from arbitrary query locations to these tokens via cross-attention, enabling geometry-conditioned field prediction at arbitrary points, including those not observed during training (Liu et al., 28 Apr 2025).
Encoder–processor–decoder/ViT structures: Multiscale attentional GNO encoders/decoders process domain and query information over graphs; a global vision Transformer (with patch/token grouping) performs efficient global mixing (Wen et al., 24 May 2025).
Dynamic operator fusion: Adaptive gates (e.g., in PGOT’s Taylor-decomposed FFN) route computation between linear and nonlinear paths based on the geometry embedding, yielding spatially adaptive operator evaluation (Zhang et al., 29 Dec 2025).

4. Training, Implementation Strategies, and Efficiency

GAOTs displace the traditional reliance on grid-structured input, enabling direct training and inference on point cloud, mesh, or graph representations:

Point clouds and grouping/sampling (FPS, KNN, ball queries) support invariance to permutation and sampling density.
Precomputed graph neighborhoods and edge dropping support scalability to very large 3D industrial domains (Wen et al., 24 May 2025).
Mini-batch sampling of queries/geometry points decouples geometry from evaluation points, supporting OOD generalization.
Optimization is typically with Adam/AdamW, moderate learning rates, and early stopping. No special regularization is generally required beyond standard $L_2$ or dropout.
For transformer-based processors, attention cost is managed by leveraging patching or local window attention, keeping complexity near-linear in number of points (as opposed to quadratic for full attention) (Koh et al., 18 Apr 2025, Zhang et al., 29 Dec 2025, Wen et al., 24 May 2025).
End-to-end loss functions are standard (e.g., mean squared error over field variables), with additional objectives (e.g., Chamfer distance in point set completion) for domain-specific tasks (Yu et al., 2021).

5. Empirical Performance, Generalization, and Limitations

GAOTs yield state-of-the-art accuracy across a wide spectrum of PDE surrogate and geometric machine learning tasks. Notable empirical observations include:

Surrogate modeling: On complex fluid, solid, and electrochemical benchmarks, cross-attention GAOT (ArGEnT) attains orders-of-magnitude lower $L_2$ error than DeepONet, Point-DeepONet, GraphSAGE, and NURBS-based surrogates. Examples: turbulent airfoil $u$ -field errors of $O(10^{-3})$ for GAOT vs. $O(10^{-1})$ for MLP/graph baselines (Chen et al., 12 Feb 2026).
Mesh and geometry generalization: Cross-attention and geometry-token architectures generalize to new shapes (curved boundaries, internal voids, arrangement-shifted objects) that are not parameterized at training time (Liu et al., 28 Apr 2025, Wen et al., 24 May 2025, Chen et al., 12 Feb 2026).
Point cloud completion and pose estimation: GAOTs with geometry-aware blocks in the transformer encoder yield lower Chamfer distances and higher accuracy on shape completion and 6D pose estimation, outperforming vanilla or even DGCNN-only baselines (Yu et al., 2021, Lin et al., 2023).
Industrial 3D fluid dynamics: On DrivAerNet++ and SHIFT, GAOTs improve pressure/wall-shear MAE by up to 30% and maintain inference efficiency on 500k-point test meshes (Wen et al., 24 May 2025, Adams et al., 23 Dec 2025).
Computational efficiency: Patchwise attention, precomputed graph adjacency, and multiscale locality-aware architectures lead to an order-of-magnitude acceleration relative to full transformers, and enable near-constant throughput as point counts scale from $10^4$ to $10^5$ (Koh et al., 18 Apr 2025, Wen et al., 24 May 2025, Zhang et al., 29 Dec 2025).

Limitations cited in leading references include the need for careful graph preprocessing, incompleteness of theoretical approximation guarantees, and the lack (in some variants) of fully physics-informed loss integration (Wen et al., 24 May 2025). This suggests the field is moving toward greater theoretical analysis and foundation-model pretraining as future directions.

GAOT’s core design is instantiated in several notable recent architectures:

Name	Key Features	Primary References
ArGEnT	Cross/self/hybrid attention; DeepONet trunk	(Chen et al., 12 Feb 2026)
LA2Former	Global linear + KNN-local attention, soft masks	(Koh et al., 18 Apr 2025)
GINOT	Geometry token encoding + cross-attention decoder	(Liu et al., 28 Apr 2025)
MAGNO-GAOT	Multiscale graph operator, geometry embeddings, ViT	(Wen et al., 24 May 2025)
GeoTransolver	Multiscale ball query, GALE cross-attention	(Adams et al., 23 Dec 2025)
PGOT	Physics-geometry spectrum attention, dynamic routing	(Zhang et al., 29 Dec 2025)

Editor's term: "GAOT" is often used generically for transformers with explicit geometric inductive bias at the attention and token level, although architectures differ in their exact implementation details—ranging from KNN-patchifying to spectral geometry encoding.

Comparisons with full-attention (unstructured transformer), FNO/Geo-FNO, PointNet-derived, and GNO/RIGNO baselines consistently show GAOTs are superior on generalization to arbitrary/unseen domains, data efficiency, error on fine-scale phenomena, and scalability.

7. Applications and Outlook

GAOTs are now employed for:

Rapid surrogate modeling and uncertainty quantification in industrial PDE workflows (fluid, solid, and multiscale porous systems) (Chen et al., 12 Feb 2026, Wen et al., 24 May 2025, Zhang et al., 29 Dec 2025, Liu et al., 28 Apr 2025).
Robust operator inversion and as learned preconditioners for PDE solvers on irregular domains (Versano et al., 2024).
Accurate and efficient 6D pose estimation and 3D point cloud reasoning (Lin et al., 2023, Yu et al., 2021).
High-fidelity modeling in aerodynamic simulation and CAE, including drag/lift data-driven design (Adams et al., 23 Dec 2025).

Future work is directed toward: (a) foundation-model pretraining over compositional geometry/physics task suites (Wen et al., 24 May 2025); (b) incorporation of physics-based constraints and loss terms; (c) further architectural innovation for ultra-large problems (million-scale point clouds); (d) theoretical investigations into approximation and generalization properties.

In summary, GAOTs embody the paradigm shift from fixed-grid, mesh-dependent surrogates to scalable, geometry-conditioned operator learners capable of generalizing across families of PDEs and domains, with demonstrated superiority over previous baselines in accuracy, flexibility, and computational tractability.