Physics-Geometry Operator Transformer (PGOT)

Updated 5 January 2026

PGOT is a neural operator that integrates multi-scale geometric encoding with self-attention to preserve critical boundary details in PDE models.
The SpecGeo-Attention module fuses physical feature maps and explicit geometric encodings, enabling high-fidelity modeling with linear computational scaling.
PGOT sets new benchmarks on standard and industrial PDE tasks by effectively addressing geometric aliasing in complex, unstructured mesh environments.

The Physics-Geometry Operator Transformer (PGOT) is a neural operator framework designed for modeling complex partial differential equations (PDEs) on unstructured meshes with intricate geometries. PGOT addresses the challenge of geometric aliasing—loss of essential boundary information—that arises when standard dimensionality-reduction and linear-attention methods are applied to physics problems with high geometric complexity. Central to PGOT is the Spectrum-Preserving Geometric Attention (SpecGeo-Attention) module, which injects explicit, multi-scale geometric encodings into the attention mechanism, enabling high-fidelity and adaptive physical field modeling with linear computational scaling in the number of mesh points. PGOT achieves state-of-the-art results across standard PDE benchmarks and large-scale industrial tasks, including aerodynamic design problems (Zhang et al., 29 Dec 2025).

1. Architecture and Formal Definition

SpecGeo-Attention operates on two primary inputs: a physical feature map $X \in \mathbb{R}^{n \times c}$ (for $n$ mesh points and $c$ channels) and associated spatial coordinates $G = \{g_1, \dots, g_n\}$ , $g_i \in \mathbb{R}^d$ . The attention mechanism consists of four stages:

Multi-scale geometry encoding: For scale $s = 1 \ldots S$ , compute $h_s = \Phi_s(10^{-(s-1)} G)$ , where $\Phi_s$ is a two-layer MLP and $10^{-(s-1)}G$ provides scale-specific geometric modulation. Combine all scales by concatenation and a linear transformation:

$P_{\text{geo}}(G) = \sigma(W_{\text{fuse}} \cdot [h_1; \dots; h_S])$

where $\sigma$ is typically ReLU.

Geometry-informed slicing: Query features are constructed by augmenting linearly projected features with the multi-scale geometry encoding:

$X_q = X W_q + P_{\text{geo}}(G)$

Generate a soft assignment of mesh points to $m$ latent tokens using learned slice prototypes $\{w_j\}$ and temperature $T$ :

$A_{ij} = \frac{\exp((X_q)_i \cdot w_j / T)}{\sum_{k=1}^m \exp((X_q)_i \cdot w_k / T)}$

Global self-attention on tokens: Aggregate physical states via

$Z = D^{-1} A^{\top} (X W_v)$

where $D = \text{diag}(\sum_i A_{ij})$ . Standard MHSA (multi-head self-attention) is applied to $Z$ to produce $Z'$ .

Geometry-guided reconstruction: Distribute the attended token features back to mesh points using the (geometry-informed) assignments:

$X' = A Z'$

All linear maps ( $W_q$ , $W_v$ , $W_{\text{fuse}}$ ), slice prototypes ( $w_j$ ), and MLPs ( $\Phi_s$ ) are learned end to end (Zhang et al., 29 Dec 2025).

2. Multi-scale Geometric Encoding and Spectral Preservation

Traditional attention mechanisms or clustering-based slicing can act as spatial low-pass filters, leading to loss of high-frequency (fine geometric) modes. SpecGeo-Attention addresses this by encoding and injecting geometry at multiple, exponentially spaced scales through the family of functions $h_s = \Phi_s(10^{-(s-1)} g)$ . These are fused such that $P_{\text{geo}}$ retains geometric features at all scales, from global shape to local boundaries.

Mathematically, for a geometric boundary function $f(g)$ , standard aggregation operates as a convolution, $\hat{y}(\omega) = \hat{K}_g(\omega)\hat{f}(\omega)$ , losing modes where $\hat{K}_g \rightarrow 0$ . SpecGeo-Attention, by ensuring $P_{\text{geo}}$ includes all frequency bands $[0, \lambda_S^{-1}]$ , guarantees that no geometric mode is irreversibly suppressed at any layer, even when the mesh, features, and boundaries are highly irregular. Assignment $A$ thus depends on both physical state and a full spectrum of geometric context (Zhang et al., 29 Dec 2025).

3. Physics Slicing and Geometry Injection Workflow

A compact procedure for a single SpecGeo-Attention layer is summarized below:

// Input: X ∈ ℝ[n×c], G ∈ ℝ[n×d], S, m, {Φₛ}, W_fuse, W_q, W_v, {w_j}, T
for s in 1…S:
    λₛ = 10^(–(s–1))
    Hₛ = Φₛ(λₛ * G)
P_geo = σ( [H₁ ‖ H₂ ‖ … ‖ H_S] · W_fuse )
X_q = X · W_q + P_geo
A = softmax_rowwise( X_q · [w₁ … w_m]^T / T )
Z = D⁻¹ · Aᵀ · (X · W_v);  D = diag(sum over rows of A)
Z′ = MHSA( Z )
X_out = A · Z′
return X_out

The computational cost is dominated by $O(n m + m^2)$ ; for $m \ll n$ , this yields overall $O(n)$ scaling per layer, a critical property for industrial-scale PDE solvers (Zhang et al., 29 Dec 2025).

4. Computational Complexity, Scaling, and Guarantees

PGOT achieves linear complexity in the number of mesh points $n$ per attention layer, as the dominant operations involve $O(n m)$ arithmetic, where $m$ is the number of latent tokens and typically much smaller than $n$ . All other practical constants (channels $c$ , scales $S$ ) are fixed. This is a marked improvement over standard self-attention, which scales as $O(n^2)$ . Each layer's operations—geometry encoding, feature projection, tokenization, self-attention, and reconstruction—have been analyzed in detail and confirm the $O(n)$ complexity for fixed $m$ (Zhang et al., 29 Dec 2025).

Spectrally, the multi-scale geometric injection prevents high-frequency attenuation introduced by soft clustering and low-rank methods, ensuring critical boundary and interface information is retained, with particular benefit for solutions with sharp gradients or complex physical interfaces (Zhang et al., 29 Dec 2025).

5. Behavior on Complex, Unstructured Meshes

PGOT adapts its computations depending on local physical and geometric structure. In regions of laminar flow or smooth geometry, the geometry encodings are uniform; corresponding assignments $A$ cluster points into a small number of latent tokens, allowing the attention mechanism to capture low-order global modes efficiently. For domains with shocks, discontinuities, or high-curvature boundaries, multi-scale encodings $h_s$ detect and preserve detailed structures. The assignments $A$ subdivide the mesh into finer clusters aligned with physical boundaries, ensuring that the attention mechanism respects discontinuities—preserving, for example, sharp shock fronts near airfoil surfaces. By contrast, vanilla linear attention architectures tend to blur such discontinuities (Zhang et al., 29 Dec 2025).

6. Integration in PGOT and Application Scope

A full PGOT model comprises a stack of $L$ PhysGeoBlocks, each containing a SpecGeo-Attention layer and a TaylorDecomp-FFN. The architecture explicitly decouples global physical aggregation (via MHSA) from local geometry reconstruction (via geometry-informed assignments and encodings). This mitigates geometric aliasing—a major failure mode of other efficient attention and low-rank operator methods.

Additionally, the TaylorDecomp-FFN adaptively routes computation: a low-order, linear expert dominates in smooth regions, while a high-order, nonlinear expert is activated near discontinuities. The resultant model enables high-resolution field prediction for industrial-scale PDEs, such as elasticity on point clouds and flows around complex geometries (e.g., automotive or aerodynamic shapes with $N \sim 10^4$ – $10^6$ ). Ablation studies confirm the centrality of SpecGeo-Attention for preserving boundary fidelity and overall accuracy. PGOT establishes new state-of-the-art results across multiple standard and industrial PDE benchmarks (Zhang et al., 29 Dec 2025).

Markdown Report Issue Upgrade to Chat

References (1)

PGOT: A Physics-Geometry Operator Transformer for Complex PDEs (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Physics-Geometry Operator Transformer (PGOT).