Papers
Topics
Authors
Recent
Search
2000 character limit reached

Multipole Attention Neural Operator (MANO)

Updated 21 April 2026
  • The paper introduces MANO as a novel neural architecture that reformulates self-attention using multipole expansions and FMM principles to achieve linear computational complexity.
  • The methodology employs a hierarchical structure combining fine-grained local windowed attention with coarse far-field approximations for scalable global context integration.
  • Empirical evaluations demonstrate MANO’s superior accuracy and efficiency in tasks such as image classification and Darcy Flow operator learning compared to traditional models.

The Multipole Attention Neural Operator (MANO) is a neural architecture that reformulates self-attention using principles from multipole expansions and the fast multipole method (FMM), achieving both linear computational complexity and a global receptive field. MANO enables scalable modeling of high-dimensional data and integral operator learning, particularly in image analysis and scientific computing, while drawing explicit inspiration from nn-body interactions and classical FMM methodologies (Colagrande et al., 3 Jul 2025, Fognini et al., 24 Sep 2025).

1. Theoretical Foundations

MANO is grounded in the analogy between self-attention and nn-body potential interactions. Standard self-attention computes all-pair interactions AijA_{ij} between NN tokens using a kernel κ(Qi,Kj)=exp(QiKj/d)\kappa(Q_i, K_j) = \exp(Q_i^\top K_j / \sqrt{d}), resulting in a quadratic O(N2)O(N^2) scaling. FMM, by contrast, achieves O(N)O(N) or O(NlogN)O(N \log N) scaling for long-range interaction computations in physics via multipole expansions and hierarchical grouping.

By recasting attention mechanisms as equivalent to multipole expansions, MANO approximates the sum of all pairwise interactions through a hierarchical decomposition of near-field and far-field effects. At each scale \ell in the hierarchy, grid locations or tokens are clustered; local interactions are computed at full resolution (near-field), while interactions at increasing distance (far-field) are handled on progressively coarser grids. This mirrors the FMM computation pattern in which local direct sums and low-rank multipole approximations are combined (Colagrande et al., 3 Jul 2025, Fognini et al., 24 Sep 2025).

2. Mathematical and Algorithmic Formulation

The MANO attention mechanism introduces LL levels of dyadic downsampling via a shared convolution nn0. For each level nn1:

  • Feature maps nn2 are computed recursively via nn3.
  • Query, key, and value projections nn4 are formed by learned linear mappings.
  • Windowed self-attention is performed within nn5 local windows on nn6; the attention matrix nn7 is block-sparse, covering only tokens within the same window.
  • For far-field nn8, the outputs are upsampled with a shared transposed convolution nn9.
  • The final representation is the sum of all upsampled level outputs:

AijA_{ij}0

The algorithmic structure is succinctly represented in the following pseudocode (Colagrande et al., 3 Jul 2025):

κ(Qi,Kj)=exp(QiKj/d)\kappa(Q_i, K_j) = \exp(Q_i^\top K_j / \sqrt{d})9

This structure ensures every input token receives global context via far-field communication at coarse levels while preserving local detail through fine-level interactions.

3. Complexity and Scaling Properties

MANO achieves linear time and memory complexity with respect to the input sequence length AijA_{ij}1, a significant improvement over traditional self-attention:

  • At each scale AijA_{ij}2, the number of tokens AijA_{ij}3 and the number of windows is AijA_{ij}4 for AijA_{ij}5.
  • Windowed attention per level costs AijA_{ij}6.
  • Summing over all levels yields AijA_{ij}7 operations (with AijA_{ij}8 constant).

In contrast, dense self-attention requires AijA_{ij}9 operations and memory. MANO, therefore, addresses the scalability bottleneck of attention-based models and extends to large inputs and high-resolution scientific fields (Colagrande et al., 3 Jul 2025).

4. Network Architectures and Applications

MANO has been integrated into both vision and scientific neural operator benchmarks:

A. Vision (Image Classification)

  • Embedding dimension NN0, four stages aligned with SwinV2-Tiny, window size NN1.
  • Replacement of every Swin attention block with a hierarchical, multiscale MANO attention block, with levels per stage NN2.
  • Maintains plug-and-play compatibility, adding NN3 extra convolutional parameters.

B. Scientific Computing (Darcy Flow Operator)

  • Operates on grids of size NN4, NN5.
  • Embedding dimension NN6, NN7 attention heads, hierarchical depth NN8, window size NN9.
  • Input features include spatial coordinates and physical coefficients; final output via pointwise MLP.

Across both domains, MANO preserves both fine-grained detail and global correlations and is robust to increases in input resolution (Colagrande et al., 3 Jul 2025).

5. Empirical Evaluation

MANO exhibits state-of-the-art empirical performance with clear superiority in key benchmarks:

Model Tiny-IN-202 CIFAR-100 FNO (κ(Qi,Kj)=exp(QiKj/d)\kappa(Q_i, K_j) = \exp(Q_i^\top K_j / \sqrt{d})0) MANO (κ(Qi,Kj)=exp(QiKj/d)\kappa(Q_i, K_j) = \exp(Q_i^\top K_j / \sqrt{d})1)
ViT-base 73.1 80.6
SwinV2-T 80.5 75.5
MANO-tiny 87.5 85.1
FNO 0.0035
ViT (patch=4) 0.0019
LocalAttn 0.0431
MANO 0.0013 0.0013
  • In image classification, MANO-tiny achieves top-1 accuracies of κ(Qi,Kj)=exp(QiKj/d)\kappa(Q_i, K_j) = \exp(Q_i^\top K_j / \sqrt{d})2 on Tiny-IN-202 and κ(Qi,Kj)=exp(QiKj/d)\kappa(Q_i, K_j) = \exp(Q_i^\top K_j / \sqrt{d})3 on CIFAR-100.
  • For Darcy Flow, MANO attains relative MSE of κ(Qi,Kj)=exp(QiKj/d)\kappa(Q_i, K_j) = \exp(Q_i^\top K_j / \sqrt{d})4 at κ(Qi,Kj)=exp(QiKj/d)\kappa(Q_i, K_j) = \exp(Q_i^\top K_j / \sqrt{d})5, consistently outperforming FNO and ViT (Colagrande et al., 3 Jul 2025).
  • Runtime and memory benchmarks show MANO-tiny with κ(Qi,Kj)=exp(QiKj/d)\kappa(Q_i, K_j) = \exp(Q_i^\top K_j / \sqrt{d})6 images/s throughput and peak memory κ(Qi,Kj)=exp(QiKj/d)\kappa(Q_i, K_j) = \exp(Q_i^\top K_j / \sqrt{d})7GB, surpassing SwinV2-T and ViT-base.

These results underscore the architecture's efficiency and competitive accuracy across tasks.

6. Relationship to Fast Multipole Methods and Neural FMM

MANO is situated within a broader trend of incorporating multipole expansions into neural architectures, exemplified by the Neural FMM approach. The Neural FMM replaces FMM's linear translation operators with learned MLPs operating within a hierarchical tree structure. The mapping between FMM passes—upward (P2M, M2M), interaction (M2L), and downward (L2L, L2P, near-field)—and corresponding neural modules is explicit. MANO generalizes this by interpreting M2L translations as multi-head cross-attention over box-level tokens, enabling learned, data-driven kernels for non-analytic or heterogeneous domains (Fognini et al., 24 Sep 2025).

A distinguished feature of MANO in this context is its support for regularization of attention weights to match known Green’s function decay (e.g., κ(Qi,Kj)=exp(QiKj/d)\kappa(Q_i, K_j) = \exp(Q_i^\top K_j / \sqrt{d})8), pretraining on analytic kernels, and expanding to continuous, point-cloud-indexed tokens for discretization invariance.

7. Limitations and Prospective Developments

MANO's main constraints are:

  • Fixed Structural Hierarchy: The hierarchy of downsampling/upsampling is fixed and parameter-sharing is enforced, potentially limiting adaptability to non-uniform or highly anisotropic data.
  • Grid Dependence: The architecture is naturally suited to regular grids; adaptation to irregular meshes or graphs remains challenging and an open direction.
  • No Explicit Inter-Level Attention: All inter-scale communication is via additive upsampled features rather than cross-level attention.

Research directions include adaptive hierarchical clustering, extensions to unstructured domains (e.g., via MGNO), embedding in recurrent/temporal PDE solution schemes, and adaptation for dense vision tasks such as semantic segmentation (Colagrande et al., 3 Jul 2025, Fognini et al., 24 Sep 2025).

In summary, the Multipole Attention Neural Operator constitutes a principled synthesis of attention mechanisms and multipole expansions, enabling scalable neural architectures with physical interpretability and wide applicability to vision and operator learning.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Multipole Attention Neural Operator (MANO).