Multipole Attention Neural Operator (MANO)
- The paper introduces MANO as a novel neural architecture that reformulates self-attention using multipole expansions and FMM principles to achieve linear computational complexity.
- The methodology employs a hierarchical structure combining fine-grained local windowed attention with coarse far-field approximations for scalable global context integration.
- Empirical evaluations demonstrate MANO’s superior accuracy and efficiency in tasks such as image classification and Darcy Flow operator learning compared to traditional models.
The Multipole Attention Neural Operator (MANO) is a neural architecture that reformulates self-attention using principles from multipole expansions and the fast multipole method (FMM), achieving both linear computational complexity and a global receptive field. MANO enables scalable modeling of high-dimensional data and integral operator learning, particularly in image analysis and scientific computing, while drawing explicit inspiration from -body interactions and classical FMM methodologies (Colagrande et al., 3 Jul 2025, Fognini et al., 24 Sep 2025).
1. Theoretical Foundations
MANO is grounded in the analogy between self-attention and -body potential interactions. Standard self-attention computes all-pair interactions between tokens using a kernel , resulting in a quadratic scaling. FMM, by contrast, achieves or scaling for long-range interaction computations in physics via multipole expansions and hierarchical grouping.
By recasting attention mechanisms as equivalent to multipole expansions, MANO approximates the sum of all pairwise interactions through a hierarchical decomposition of near-field and far-field effects. At each scale in the hierarchy, grid locations or tokens are clustered; local interactions are computed at full resolution (near-field), while interactions at increasing distance (far-field) are handled on progressively coarser grids. This mirrors the FMM computation pattern in which local direct sums and low-rank multipole approximations are combined (Colagrande et al., 3 Jul 2025, Fognini et al., 24 Sep 2025).
2. Mathematical and Algorithmic Formulation
The MANO attention mechanism introduces levels of dyadic downsampling via a shared convolution 0. For each level 1:
- Feature maps 2 are computed recursively via 3.
- Query, key, and value projections 4 are formed by learned linear mappings.
- Windowed self-attention is performed within 5 local windows on 6; the attention matrix 7 is block-sparse, covering only tokens within the same window.
- For far-field 8, the outputs are upsampled with a shared transposed convolution 9.
- The final representation is the sum of all upsampled level outputs:
0
The algorithmic structure is succinctly represented in the following pseudocode (Colagrande et al., 3 Jul 2025):
9
This structure ensures every input token receives global context via far-field communication at coarse levels while preserving local detail through fine-level interactions.
3. Complexity and Scaling Properties
MANO achieves linear time and memory complexity with respect to the input sequence length 1, a significant improvement over traditional self-attention:
- At each scale 2, the number of tokens 3 and the number of windows is 4 for 5.
- Windowed attention per level costs 6.
- Summing over all levels yields 7 operations (with 8 constant).
In contrast, dense self-attention requires 9 operations and memory. MANO, therefore, addresses the scalability bottleneck of attention-based models and extends to large inputs and high-resolution scientific fields (Colagrande et al., 3 Jul 2025).
4. Network Architectures and Applications
MANO has been integrated into both vision and scientific neural operator benchmarks:
A. Vision (Image Classification)
- Embedding dimension 0, four stages aligned with SwinV2-Tiny, window size 1.
- Replacement of every Swin attention block with a hierarchical, multiscale MANO attention block, with levels per stage 2.
- Maintains plug-and-play compatibility, adding 3 extra convolutional parameters.
B. Scientific Computing (Darcy Flow Operator)
- Operates on grids of size 4, 5.
- Embedding dimension 6, 7 attention heads, hierarchical depth 8, window size 9.
- Input features include spatial coordinates and physical coefficients; final output via pointwise MLP.
Across both domains, MANO preserves both fine-grained detail and global correlations and is robust to increases in input resolution (Colagrande et al., 3 Jul 2025).
5. Empirical Evaluation
MANO exhibits state-of-the-art empirical performance with clear superiority in key benchmarks:
| Model | Tiny-IN-202 | CIFAR-100 | FNO (0) | MANO (1) |
|---|---|---|---|---|
| ViT-base | 73.1 | 80.6 | – | – |
| SwinV2-T | 80.5 | 75.5 | – | – |
| MANO-tiny | 87.5 | 85.1 | – | – |
| FNO | – | – | 0.0035 | – |
| ViT (patch=4) | – | – | 0.0019 | – |
| LocalAttn | – | – | 0.0431 | – |
| MANO | – | – | 0.0013 | 0.0013 |
- In image classification, MANO-tiny achieves top-1 accuracies of 2 on Tiny-IN-202 and 3 on CIFAR-100.
- For Darcy Flow, MANO attains relative MSE of 4 at 5, consistently outperforming FNO and ViT (Colagrande et al., 3 Jul 2025).
- Runtime and memory benchmarks show MANO-tiny with 6 images/s throughput and peak memory 7GB, surpassing SwinV2-T and ViT-base.
These results underscore the architecture's efficiency and competitive accuracy across tasks.
6. Relationship to Fast Multipole Methods and Neural FMM
MANO is situated within a broader trend of incorporating multipole expansions into neural architectures, exemplified by the Neural FMM approach. The Neural FMM replaces FMM's linear translation operators with learned MLPs operating within a hierarchical tree structure. The mapping between FMM passes—upward (P2M, M2M), interaction (M2L), and downward (L2L, L2P, near-field)—and corresponding neural modules is explicit. MANO generalizes this by interpreting M2L translations as multi-head cross-attention over box-level tokens, enabling learned, data-driven kernels for non-analytic or heterogeneous domains (Fognini et al., 24 Sep 2025).
A distinguished feature of MANO in this context is its support for regularization of attention weights to match known Green’s function decay (e.g., 8), pretraining on analytic kernels, and expanding to continuous, point-cloud-indexed tokens for discretization invariance.
7. Limitations and Prospective Developments
MANO's main constraints are:
- Fixed Structural Hierarchy: The hierarchy of downsampling/upsampling is fixed and parameter-sharing is enforced, potentially limiting adaptability to non-uniform or highly anisotropic data.
- Grid Dependence: The architecture is naturally suited to regular grids; adaptation to irregular meshes or graphs remains challenging and an open direction.
- No Explicit Inter-Level Attention: All inter-scale communication is via additive upsampled features rather than cross-level attention.
Research directions include adaptive hierarchical clustering, extensions to unstructured domains (e.g., via MGNO), embedding in recurrent/temporal PDE solution schemes, and adaptation for dense vision tasks such as semantic segmentation (Colagrande et al., 3 Jul 2025, Fognini et al., 24 Sep 2025).
In summary, the Multipole Attention Neural Operator constitutes a principled synthesis of attention mechanisms and multipole expansions, enabling scalable neural architectures with physical interpretability and wide applicability to vision and operator learning.