Papers
Topics
Authors
Recent
2000 character limit reached

Linear Attention Neural Operator (LANO)

Updated 23 October 2025
  • LANO is a neural operator that employs an agent-based attention mechanism to efficiently approximate mappings in parameterized PDEs with linear complexity.
  • The architecture reduces computational complexity from quadratic to linear while retaining high predictive accuracy, as demonstrated on multiple PDE benchmarks.
  • LANO offers universal approximation guarantees, enhanced stability, and flexible integration for real-time solvers and inverse problem applications in scientific computing.

The Linear Attention Neural Operator (LANO) is a neural architecture designed to efficiently learn mappings between function spaces, such as those arising in parameterized partial differential equations (PDEs), while overcoming the fundamental scalability–accuracy trade-off faced by transformer-based neural operators. Standard attention mechanisms provide excellent fidelity but incur quadratic complexity 𝒪(N2d)𝒪(N^2 d) in the number of mesh points NN and hidden dimension dd. Linear attention variants reduce this cost, but frequently degrade predictive accuracy. LANO introduces an agent-based attention strategy that achieves linear complexity 𝒪(MNd)𝒪(MNd) (with MNM \ll N), yet retains the expressive power of softmax attention. This design is supported by theoretical guarantees of universal approximation and empirically demonstrates state-of-the-art accuracy across practical PDE benchmarks (Zhong et al., 19 Oct 2025).

1. Agent-Based Attention Mechanism

LANO reformulates attention via a compact set of agent tokens. Instead of direct full N×NN\times N attention, the architecture inserts MM agent tokens (MNM \ll N), mediating global information exchange. The process has two stages:

  • Agent Aggregation: Agent tokens ARM×C\mathcal{A} \in \mathbb{R}^{M \times C} are constructed by pooling features from the full set of queries Q\mathsf{Q}. These agent tokens attend to the keys and values:

Yagg=softmax(AKd)V\mathsf{Y}_{\text{agg}} = \mathrm{softmax}\left(\frac{\mathcal{A} \mathsf{K}^\top}{\sqrt{d}}\right) \mathsf{V}

This summarizes global feature information. Cost: 𝒪(MNd)𝒪(MNd).

  • Agent-Mediated Attention: Each original token then attends to the agents:

Oagent=softmax(QAd)Yagg\mathsf{O}_{\text{agent}} = \mathrm{softmax}\left(\frac{\mathsf{Q} \mathcal{A}^\top}{\sqrt{d}}\right) \mathsf{Y}_{\text{agg}}

The agents distribute global context back to the original tokens, again with 𝒪(NMd)𝒪(NMd).

The agent mechanism provides an effective bottleneck for global interaction, essentially mirroring the effect of full attention but at much lower cost.

2. Computational Complexity and Scalability

Traditional full attention computes the similarity matrix over N2N^2 token pairs, scaling as 𝒪(N2d)𝒪(N^2d). Previous linear attention designs reduce costs to 𝒪(Nd2)𝒪(Nd^2) using kernel approximations, but typically at the expense of accuracy.

LANO’s agent-based attention operates via two sequential 𝒪(MNd)𝒪(MNd) steps (MNM \ll N). This scaling allows models to handle large, finely discretized domains (e.g., high-resolution PDE meshes) that would otherwise be intractable.

Mechanism Complexity Accuracy
Softmax attention 𝒪(N2d)𝒪(N^2 d) High
Linear kernel-based 𝒪(Nd2)𝒪(N d^2) Lower
LANO (agents) 𝒪(MNd)𝒪(M N d) High

LANO bridges the gap, achieving softmax-level accuracy with linear scaling in NN.

3. Universal Approximation and Theoretical Properties

LANO is proven to be universally approximating: for any continuous operator between Sobolev spaces (e.g., Ws1,p1(Ω)Ws2,p2(Ω)W^{s_1,p_1}(\Omega) \to W^{s_2,p_2}(\Omega)), there exists a LANO parameterization GθG_\theta such that

supaKG(a)Gθ(a)Ws2,p2ε\sup_{a \in K} \|G^\dagger(a) - G_\theta(a)\|_{W^{s_2, p_2}} \leq \varepsilon

for any compact KK and ε>0\varepsilon > 0. The agent tokens can be interpreted as a Monte Carlo approximation of nonlocal kernel integrals, reflecting global structure and improving conditioning and stability in operator learning (Zhong et al., 19 Oct 2025).

4. Empirical Performance

Empirical tests on standard PDE benchmarks reveal strong performance improvements:

  • Elasticity: On a point cloud of $972$ points, LANO achieves 37.5%37.5\% lower error than Transolver.
  • Transonic Airfoil: 24.5%24.5\% error reduction compared to prior state-of-the-art.
  • Pipe Flow (Navier–Stokes): $6$–7%7\% improvement.
  • Darcy Flow: 21.1%21.1\% error reduction.

Across several benchmarks, the average improvement is reported as 19.5%19.5\%. LANO demonstrates consistent accuracy and reduced inference cost on both structured grids and irregular geometries.

5. Architectural and Practical Implications

The agent-based reformulation makes LANO robust to problem scale and flexible for irregular domain applications:

  • Real-Time PDE Solvers: Universal operator inference allows near-instant solution prediction for new parameters.
  • Inverse Problems: Stable conditioning and nonlocal expressivity lend themselves to parameter inference under uncertainty.
  • Complex Geometry: The agent mechanism adapts to point cloud, mesh, or unstructured domain layouts, broadening applicability.
  • Scientific Simulation: Enables rapid surrogate modeling for engineering design, control, and optimization.

LANO’s flexibility enables integration into pipelines for uncertainty quantification and sequential experimental design.

LANO advances the paradigm of attention-based neural operators by overcoming traditional accuracy–scalability limitations. Compared to kernel-based linear attention (Li et al., 2020), low-rank projections in turbulence simulation (Peng et al., 2022), and coupled/Fourier attention operator designs, LANO’s agent tokens deliver both efficient global mixing and theoretical universality (Zhong et al., 19 Oct 2025).

The agent framework is compatible with ideas from continuum attention operator theory (Calvello et al., 2024) and RKHS-based regularization (Yu et al., 2024), and can be extended or hybridized with orthogonal attention (Xiao et al., 2023), latent space designs (Wang et al., 2024), or derivative-informed reduction (Go et al., 2024) as required by problem specifics.

7. Summary

The Linear Attention Neural Operator (LANO) is a neural operator architecture that achieves both linear-scalable complexity and softmax-level accuracy by mediating global attention through a compact set of agent tokens. It possesses universal approximation guarantees, exhibits enhanced stability and conditioning, and outperforms current state-of-the-art scientific machine learning models in predictive accuracy on a variety of PDE benchmarks. LANO’s architectural strategy and empirical results position it as a scalable foundation for scientific and engineering computation (Zhong et al., 19 Oct 2025).

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Linear Attention Neural Operator (LANO).