Multiscale Butterfly Algorithm
- The multiscale butterfly algorithm is a hierarchical method that exploits complementary low-rank structures to approximate high-dimensional oscillatory kernels with near-linear complexity.
- It uses corona-based decomposition to partition the frequency domain, effectively isolating singularities and enabling efficient recursive factorization.
- Recent extensions apply tensor decompositions to achieve linear scaling, reducing memory and computation in large-scale 3D and high-frequency problems.
The multiscale butterfly algorithm is a class of hierarchical, data-sparse algorithms designed for efficient approximation and application of high-dimensional oscillatory integral operators and kernel matrices, especially those arising in the discretization of Fourier integral operators (FIOs), Green’s functions, and related transforms. The multiscale butterfly approach achieves near-linear or subquadratic complexity for problems that would otherwise require operations, where is the number of degrees of freedom per dimension and the dimensionality. The algorithm exploits special low-rank structure, notably the complementary low-rank property, present in off-diagonal blocks away from kernel singularities, and uses a recursive multiscale decomposition of the frequency or spatial domain to localize and factorize interactions efficiently (Li et al., 2015, Li et al., 2014, Kielstra et al., 2024).
1. Mathematical Foundation: Complementary Low-Rank Structure and Corona Decomposition
The foundational principle of the multiscale butterfly algorithm is the complementary low-rank property. Let be an kernel matrix with and discrete sets in . The complementary low-rank property states that for two quadtrees and of depth 0, each submatrix 1, indexed by blocks 2 (at level 3 in 4) and 5 (at level 6 in 7), has numerical rank 8 independent of 9 (with polynomial dependence on 0 for accuracy 1):
2
This property is leveraged only away from singularities (e.g., 3 in FIOs). For domains with singularities, the frequency (or analogous) domain is decomposed into a central low-frequency region and a sequence of “coronas” (dyadic annuli or axis-aligned cubes) 4, achieving low-rank structure within each corona (Li et al., 2015, Li et al., 2014).
2. Algorithmic Structure and Hierarchical Factorization
The algorithm builds two hierarchical trees (typically quadtrees in 2D, octrees in 3D) over the spatial and frequency domains. At the “middle” scale (level 5, the kernel matrix is block-diagonalized:
6
where 7 and 8 are block-diagonal matrices with blocks of size 9, and 0 is block-sparse. Recursive compression is then performed up and down the tree structures:
1
Iteratively, this forms an overall factorization:
2
Each factor is data-sparse with 3 non-zeros, and the total number of factors is 4 (Li et al., 2015).
In the multiscale butterfly algorithm, this process is repeated independently for each corona. The central small region is treated directly, and each corona’s contribution is handled using the above recursive butterfly factorization (Li et al., 2014).
3. Extension to Fourier Integral Operators and Corona-Based Partitioning
For Fourier integral operators (FIOs), where the kernel 5 exhibits singular behavior near 6, the frequency domain is decomposed as
7
Here, 8 is the central cube, and 9 are successive dyadic coronas. On each 0, the phase is sufficiently regular to admit a rank-1 separated form over appropriately chosen spatial/frequency boxes (see Theorem 3.1 in (Li et al., 2014)). The full operator action is then written as:
2
with 3 restriction operators. Each 4 is then factorized independently using the butterfly scheme (Li et al., 2014, Li et al., 2015).
This cartesian corona-based multiscale decomposition allows the butterfly algorithm to efficiently bypass the singular origin and to maintain low application and factorization costs.
4. Construction Methods and Implementation Considerations
Two principal construction methods are employed for the kernel factorizations at the middle levels:
- Black-box matvec (fast apply): Requires access to routines that compute 5 and 6 at 7 cost. Block SVDs at the middle level are built via randomized algorithms using Gaussian test matrices, with costs scaling as 8 (Li et al., 2015).
- Entrywise sampling (random sampling): Assumes access to individual 9 in 0. Randomized sampling and pivoted QR/ID are used to compute the basis matrices, with costs 1 (Li et al., 2015).
Recursive upward/downward compression of the hierarchical block factors is applied after the middle-level SVDs. Chebyshev interpolation in both the spatial and frequency directions is typically employed for numerical stability and error control (Börm et al., 2017, Li et al., 2014).
5. Complexity, Accuracy, and Numerical Performance
The multiscale butterfly algorithm achieves, for 2 degrees of freedom:
| Step | Complexity (Sampling) | Complexity (Fast Matvec) |
|---|---|---|
| Factorization | 3 | 4 |
| Application | 5 | same |
| Memory | 6 | same (can be reduced to 7 with interleaving) |
Accuracy depends on the prescribed butterfly rank 8, which scales polynomially in 9. For example, for 0, 1, direct and factorized applications are accelerated by factors of 2 at errors 3 (Li et al., 2015, Li et al., 2014).
The corona-based multiscale version exhibits substantially reduced prefactors and eliminates the need for polar (coordinate) transformations. Empirical results indicate speedups of 4 to 5 over polar-butterfly and direct FIO evaluation in both 2D and 3D (Li et al., 2014).
6. High-Dimensional and Tensor Extensions
Recent algorithmic developments generalize the multiscale butterfly structure to tensors, yielding linear 6 scaling for 7-mode tensors in 8 dimensions (Kielstra et al., 2024). The tensor butterfly algorithm partitions modes along each dimension and applies mode-wise interpolative decomposition (Tucker-ID), maintaining small ranks and achieving superior constants relative to quantized tensor train (QTT) and matrix-based butterfly algorithms. In large-scale 3D Helmholtz problems (up to 9 per side), this approach enables 0 speedup and 1 memory reduction versus matrix butterfly (Kielstra et al., 2024).
This tensor extension is the first to achieve linear (in 2) factorization and contraction costs for high-dimensional oscillatory integral operators, outperforming existing FFT, matrix butterfly, and QTT-based schemes in large and high-frequency regimes (Kielstra et al., 2024).
7. Analytical Stability and Theoretical Guarantees
Refined stability analysis demonstrates that the use of Chebyshev polynomial interpolation and carefully controlled nested reinterpolation keeps operator norms and error amplification under control. The required interpolation degree 3 scales as 4 in practical regimes, which further reduces storage and computational complexity. For analytic kernels (e.g., those with phases and amplitudes analytic in neighborhoods of the real domain), the overall error exhibits exponential decay in 5 at all hierarchical levels (Börm et al., 2017). The stability of the nested butterfly representation guarantees that complexity remains near-optimal, and that catastrophic error growth is avoided (Börm et al., 2017).
References:
(Li et al., 2014) "A Multiscale Butterfly Algorithm for Multidimensional Fourier Integral Operators" (Li et al., 2015) "Multidimensional Butterfly Factorization" (Börm et al., 2017) "An analysis of a butterfly algorithm" (Kielstra et al., 2024) "A Linear-complexity Tensor Butterfly Algorithm for Compressing High-dimensional Oscillatory Integral Operators"