Multipole Attention Neural Operator (MANO)

Updated 21 April 2026

The paper introduces MANO as a novel neural architecture that reformulates self-attention using multipole expansions and FMM principles to achieve linear computational complexity.
The methodology employs a hierarchical structure combining fine-grained local windowed attention with coarse far-field approximations for scalable global context integration.
Empirical evaluations demonstrate MANO’s superior accuracy and efficiency in tasks such as image classification and Darcy Flow operator learning compared to traditional models.

The Multipole Attention Neural Operator (MANO) is a neural architecture that reformulates self-attention using principles from multipole expansions and the fast multipole method (FMM), achieving both linear computational complexity and a global receptive field. MANO enables scalable modeling of high-dimensional data and integral operator learning, particularly in image analysis and scientific computing, while drawing explicit inspiration from $n$ -body interactions and classical FMM methodologies (Colagrande et al., 3 Jul 2025, Fognini et al., 24 Sep 2025).

1. Theoretical Foundations

MANO is grounded in the analogy between self-attention and $n$ -body potential interactions. Standard self-attention computes all-pair interactions $A_{ij}$ between $N$ tokens using a kernel $\kappa(Q_i, K_j) = \exp(Q_i^\top K_j / \sqrt{d})$ , resulting in a quadratic $O(N^2)$ scaling. FMM, by contrast, achieves $O(N)$ or $O(N \log N)$ scaling for long-range interaction computations in physics via multipole expansions and hierarchical grouping.

By recasting attention mechanisms as equivalent to multipole expansions, MANO approximates the sum of all pairwise interactions through a hierarchical decomposition of near-field and far-field effects. At each scale $\ell$ in the hierarchy, grid locations or tokens are clustered; local interactions are computed at full resolution (near-field), while interactions at increasing distance (far-field) are handled on progressively coarser grids. This mirrors the FMM computation pattern in which local direct sums and low-rank multipole approximations are combined (Colagrande et al., 3 Jul 2025, Fognini et al., 24 Sep 2025).

2. Mathematical and Algorithmic Formulation

The MANO attention mechanism introduces $L$ levels of dyadic downsampling via a shared convolution $n$ 0. For each level $n$ 1:

Feature maps $n$ 2 are computed recursively via $n$ 3.
Query, key, and value projections $n$ 4 are formed by learned linear mappings.
Windowed self-attention is performed within $n$ 5 local windows on $n$ 6; the attention matrix $n$ 7 is block-sparse, covering only tokens within the same window.
For far-field $n$ 8, the outputs are upsampled with a shared transposed convolution $n$ 9.
The final representation is the sum of all upsampled level outputs:

$A_{ij}$ 0

The algorithmic structure is succinctly represented in the following pseudocode (Colagrande et al., 3 Jul 2025):

$\kappa(Q_i, K_j) = \exp(Q_i^\top K_j / \sqrt{d})$ 9

This structure ensures every input token receives global context via far-field communication at coarse levels while preserving local detail through fine-level interactions.

3. Complexity and Scaling Properties

MANO achieves linear time and memory complexity with respect to the input sequence length $A_{ij}$ 1, a significant improvement over traditional self-attention:

At each scale $A_{ij}$ 2, the number of tokens $A_{ij}$ 3 and the number of windows is $A_{ij}$ 4 for $A_{ij}$ 5.
Windowed attention per level costs $A_{ij}$ 6.
Summing over all levels yields $A_{ij}$ 7 operations (with $A_{ij}$ 8 constant).

In contrast, dense self-attention requires $A_{ij}$ 9 operations and memory. MANO, therefore, addresses the scalability bottleneck of attention-based models and extends to large inputs and high-resolution scientific fields (Colagrande et al., 3 Jul 2025).

4. Network Architectures and Applications

MANO has been integrated into both vision and scientific neural operator benchmarks:

A. Vision (Image Classification)

Embedding dimension $N$ 0, four stages aligned with SwinV2-Tiny, window size $N$ 1.
Replacement of every Swin attention block with a hierarchical, multiscale MANO attention block, with levels per stage $N$ 2.
Maintains plug-and-play compatibility, adding $N$ 3 extra convolutional parameters.

B. Scientific Computing (Darcy Flow Operator)

Operates on grids of size $N$ 4, $N$ 5.
Embedding dimension $N$ 6, $N$ 7 attention heads, hierarchical depth $N$ 8, window size $N$ 9.
Input features include spatial coordinates and physical coefficients; final output via pointwise MLP.

Across both domains, MANO preserves both fine-grained detail and global correlations and is robust to increases in input resolution (Colagrande et al., 3 Jul 2025).

5. Empirical Evaluation

MANO exhibits state-of-the-art empirical performance with clear superiority in key benchmarks:

Model	Tiny-IN-202	CIFAR-100	FNO ( $\kappa(Q_i, K_j) = \exp(Q_i^\top K_j / \sqrt{d})$ 0)	MANO ( $\kappa(Q_i, K_j) = \exp(Q_i^\top K_j / \sqrt{d})$ 1)
ViT-base	73.1	80.6	–	–
SwinV2-T	80.5	75.5	–	–
MANO-tiny	87.5	85.1	–	–
FNO	–	–	0.0035	–
ViT (patch=4)	–	–	0.0019	–
LocalAttn	–	–	0.0431	–
MANO	–	–	0.0013	0.0013

In image classification, MANO-tiny achieves top-1 accuracies of $\kappa(Q_i, K_j) = \exp(Q_i^\top K_j / \sqrt{d})$ 2 on Tiny-IN-202 and $\kappa(Q_i, K_j) = \exp(Q_i^\top K_j / \sqrt{d})$ 3 on CIFAR-100.
For Darcy Flow, MANO attains relative MSE of $\kappa(Q_i, K_j) = \exp(Q_i^\top K_j / \sqrt{d})$ 4 at $\kappa(Q_i, K_j) = \exp(Q_i^\top K_j / \sqrt{d})$ 5, consistently outperforming FNO and ViT (Colagrande et al., 3 Jul 2025).
Runtime and memory benchmarks show MANO-tiny with $\kappa(Q_i, K_j) = \exp(Q_i^\top K_j / \sqrt{d})$ 6 images/s throughput and peak memory $\kappa(Q_i, K_j) = \exp(Q_i^\top K_j / \sqrt{d})$ 7GB, surpassing SwinV2-T and ViT-base.

These results underscore the architecture's efficiency and competitive accuracy across tasks.

6. Relationship to Fast Multipole Methods and Neural FMM

MANO is situated within a broader trend of incorporating multipole expansions into neural architectures, exemplified by the Neural FMM approach. The Neural FMM replaces FMM's linear translation operators with learned MLPs operating within a hierarchical tree structure. The mapping between FMM passes—upward (P2M, M2M), interaction (M2L), and downward (L2L, L2P, near-field)—and corresponding neural modules is explicit. MANO generalizes this by interpreting M2L translations as multi-head cross-attention over box-level tokens, enabling learned, data-driven kernels for non-analytic or heterogeneous domains (Fognini et al., 24 Sep 2025).

A distinguished feature of MANO in this context is its support for regularization of attention weights to match known Green’s function decay (e.g., $\kappa(Q_i, K_j) = \exp(Q_i^\top K_j / \sqrt{d})$ 8), pretraining on analytic kernels, and expanding to continuous, point-cloud-indexed tokens for discretization invariance.

7. Limitations and Prospective Developments

MANO's main constraints are:

Fixed Structural Hierarchy: The hierarchy of downsampling/upsampling is fixed and parameter-sharing is enforced, potentially limiting adaptability to non-uniform or highly anisotropic data.
Grid Dependence: The architecture is naturally suited to regular grids; adaptation to irregular meshes or graphs remains challenging and an open direction.
No Explicit Inter-Level Attention: All inter-scale communication is via additive upsampled features rather than cross-level attention.

Research directions include adaptive hierarchical clustering, extensions to unstructured domains (e.g., via MGNO), embedding in recurrent/temporal PDE solution schemes, and adaptation for dense vision tasks such as semantic segmentation (Colagrande et al., 3 Jul 2025, Fognini et al., 24 Sep 2025).

In summary, the Multipole Attention Neural Operator constitutes a principled synthesis of attention mechanisms and multipole expansions, enabling scalable neural architectures with physical interpretability and wide applicability to vision and operator learning.

Markdown Report Issue Upgrade to Chat

References (2)

Linear Attention with Global Context: A Multipole Attention Mechanism for Vision and Physics (2025)

Learning Greens Operators through Hierarchical Neural Networks Inspired by the Fast Multipole Method (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Multipole Attention Neural Operator (MANO).

Multipole Attention Neural Operator (MANO)

1. Theoretical Foundations

2. Mathematical and Algorithmic Formulation

3. Complexity and Scaling Properties

4. Network Architectures and Applications

5. Empirical Evaluation

6. Relationship to Fast Multipole Methods and Neural FMM

7. Limitations and Prospective Developments

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Multipole Attention Neural Operator (MANO)

1. Theoretical Foundations

2. Mathematical and Algorithmic Formulation

3. Complexity and Scaling Properties

4. Network Architectures and Applications

5. Empirical Evaluation

6. Relationship to Fast Multipole Methods and Neural FMM

7. Limitations and Prospective Developments

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research