Linear Attention Neural Operator (LANO)

Updated 23 October 2025

LANO is a neural operator that employs an agent-based attention mechanism to efficiently approximate mappings in parameterized PDEs with linear complexity.
The architecture reduces computational complexity from quadratic to linear while retaining high predictive accuracy, as demonstrated on multiple PDE benchmarks.
LANO offers universal approximation guarantees, enhanced stability, and flexible integration for real-time solvers and inverse problem applications in scientific computing.

The Linear Attention Neural Operator (LANO) is a neural architecture designed to efficiently learn mappings between function spaces, such as those arising in parameterized partial differential equations (PDEs), while overcoming the fundamental scalability–accuracy trade-off faced by transformer-based neural operators. Standard attention mechanisms provide excellent fidelity but incur quadratic complexity $𝒪(N^2 d)$ in the number of mesh points $N$ and hidden dimension $d$ . Linear attention variants reduce this cost, but frequently degrade predictive accuracy. LANO introduces an agent-based attention strategy that achieves linear complexity $𝒪(MNd)$ (with $M \ll N$ ), yet retains the expressive power of softmax attention. This design is supported by theoretical guarantees of universal approximation and empirically demonstrates state-of-the-art accuracy across practical PDE benchmarks (Zhong et al., 19 Oct 2025).

1. Agent-Based Attention Mechanism

LANO reformulates attention via a compact set of agent tokens. Instead of direct full $N\times N$ attention, the architecture inserts $M$ agent tokens ( $M \ll N$ ), mediating global information exchange. The process has two stages:

Agent Aggregation: Agent tokens $\mathcal{A} \in \mathbb{R}^{M \times C}$ are constructed by pooling features from the full set of queries $\mathsf{Q}$ . These agent tokens attend to the keys and values:

$\mathsf{Y}_{\text{agg}} = \mathrm{softmax}\left(\frac{\mathcal{A} \mathsf{K}^\top}{\sqrt{d}}\right) \mathsf{V}$

This summarizes global feature information. Cost: $𝒪(MNd)$ .

Agent-Mediated Attention: Each original token then attends to the agents:

$\mathsf{O}_{\text{agent}} = \mathrm{softmax}\left(\frac{\mathsf{Q} \mathcal{A}^\top}{\sqrt{d}}\right) \mathsf{Y}_{\text{agg}}$

The agents distribute global context back to the original tokens, again with $𝒪(NMd)$ .

The agent mechanism provides an effective bottleneck for global interaction, essentially mirroring the effect of full attention but at much lower cost.

2. Computational Complexity and Scalability

Traditional full attention computes the similarity matrix over $N^2$ token pairs, scaling as $𝒪(N^2d)$ . Previous linear attention designs reduce costs to $𝒪(Nd^2)$ using kernel approximations, but typically at the expense of accuracy.

LANO’s agent-based attention operates via two sequential $𝒪(MNd)$ steps ( $M \ll N$ ). This scaling allows models to handle large, finely discretized domains (e.g., high-resolution PDE meshes) that would otherwise be intractable.

Mechanism	Complexity	Accuracy
Softmax attention	$𝒪(N^2 d)$	High
Linear kernel-based	$𝒪(N d^2)$	Lower
LANO (agents)	$𝒪(M N d)$	High

LANO bridges the gap, achieving softmax-level accuracy with linear scaling in $N$ .

3. Universal Approximation and Theoretical Properties

LANO is proven to be universally approximating: for any continuous operator between Sobolev spaces (e.g., $W^{s_1,p_1}(\Omega) \to W^{s_2,p_2}(\Omega)$ ), there exists a LANO parameterization $G_\theta$ such that

$\sup_{a \in K} \|G^\dagger(a) - G_\theta(a)\|_{W^{s_2, p_2}} \leq \varepsilon$

for any compact $K$ and $\varepsilon > 0$ . The agent tokens can be interpreted as a Monte Carlo approximation of nonlocal kernel integrals, reflecting global structure and improving conditioning and stability in operator learning (Zhong et al., 19 Oct 2025).

4. Empirical Performance

Empirical tests on standard PDE benchmarks reveal strong performance improvements:

Elasticity: On a point cloud of $972$ points, LANO achieves $37.5\%$ lower error than Transolver.
Transonic Airfoil: $24.5\%$ error reduction compared to prior state-of-the-art.
Pipe Flow (Navier–Stokes): $6$– $7\%$ improvement.
Darcy Flow: $21.1\%$ error reduction.

Across several benchmarks, the average improvement is reported as $19.5\%$ . LANO demonstrates consistent accuracy and reduced inference cost on both structured grids and irregular geometries.

5. Architectural and Practical Implications

The agent-based reformulation makes LANO robust to problem scale and flexible for irregular domain applications:

Real-Time PDE Solvers: Universal operator inference allows near-instant solution prediction for new parameters.
Inverse Problems: Stable conditioning and nonlocal expressivity lend themselves to parameter inference under uncertainty.
Complex Geometry: The agent mechanism adapts to point cloud, mesh, or unstructured domain layouts, broadening applicability.
Scientific Simulation: Enables rapid surrogate modeling for engineering design, control, and optimization.

LANO’s flexibility enables integration into pipelines for uncertainty quantification and sequential experimental design.

LANO advances the paradigm of attention-based neural operators by overcoming traditional accuracy–scalability limitations. Compared to kernel-based linear attention (Li et al., 2020), low-rank projections in turbulence simulation (Peng et al., 2022), and coupled/Fourier attention operator designs, LANO’s agent tokens deliver both efficient global mixing and theoretical universality (Zhong et al., 19 Oct 2025).

The agent framework is compatible with ideas from continuum attention operator theory (Calvello et al., 2024) and RKHS-based regularization (Yu et al., 2024), and can be extended or hybridized with orthogonal attention (Xiao et al., 2023), latent space designs (Wang et al., 2024), or derivative-informed reduction (Go et al., 2024) as required by problem specifics.

7. Summary

The Linear Attention Neural Operator (LANO) is a neural operator architecture that achieves both linear-scalable complexity and softmax-level accuracy by mediating global attention through a compact set of agent tokens. It possesses universal approximation guarantees, exhibits enhanced stability and conditioning, and outperforms current state-of-the-art scientific machine learning models in predictive accuracy on a variety of PDE benchmarks. LANO’s architectural strategy and empirical results position it as a scalable foundation for scientific and engineering computation (Zhong et al., 19 Oct 2025).