Multi-Level Low Rank (MLR) Matrices Explained

Updated 10 September 2025

MLR matrices are hierarchically structured as a sum of block-diagonal low-rank submatrices that enable nearly linear storage and efficient matrix operations.
They leverage recursive factorization and local low-rank updates to perform additions, multiplications, and inversions with reduced computational cost.
Applications include PDE solvers, covariance estimation, and attention mechanisms in deep learning, showcasing scalability and improved performance.

A multi-level low rank (MLR) matrix is a structured matrix representation in which the matrix is expressed as a sum of hierarchically organized, block-diagonal, low-rank submatrices. Each level corresponds to a refinement of the block partition from the previous level, with all blocks at all levels stored in factored form. This framework generalizes classical low rank and hierarchical (e.g., $\mathcal{H}^2$ or HSS) formats, provides optimal or nearly-optimal storage and arithmetic complexity, and has become foundational in large-scale scientific computing, structured covariance estimation, fast transforms, and modern deep learning attention mechanisms.

1. Mathematical Structure and Formal Definition

An MLR matrix $A \in \mathbb{R}^{m \times n}$ is defined as a sum of block-diagonal matrices with hierarchical refinement, possibly up to a permutation: $A = P^\top \Big[\sum_{l=1}^L \bigoplus_{k=1}^{p_l} B_{l,k} C_{l,k}^\top \Big] Q,$ where $P, Q$ are permutation matrices, each $l$ indexes a refinement level with $p_l$ blocks, and $B_{l,k}\in \mathbb{R}^{m_{l,k}\times r_l}$ , $C_{l,k}\in \mathbb{R}^{n_{l,k}\times r_l}$ are low-rank factors for the $k$ th block at level $l$ . The row and column partitions at level $l$ refine those at level $l-1$ , producing a hierarchical clustering.

Properties:

The total storage is $O((m + n) r)$ , matching that of a pure rank- $r$ factorization.
Matrix-vector multiplication costs $O((m + n) r)$ FLOPs.
The effective rank can greatly exceed $r$ due to the sum over levels and blocks.

Relation to other formats:

MLR extends both classic low rank and hierarchical ( $\mathcal{H}^2$ , HODLR, HSS) representations but typically avoids extra logarithmic storage overhead if the partitions are exploited recursively (Parshakova et al., 2023).

2. Algorithms and Arithmetic Operations

MLR and closely related $\mathcal{H}^2$ -matrix algorithms perform core matrix operations—addition, multiplication, inversion, and factorizations such as LR or Cholesky—in nearly linear time by hierarchically exploiting nested low-rank structure (Börm et al., 2014).

Key Components:

Local Low-Rank Update: For a block $(t, s)$ , adding a low-rank $X Y^*$ to the current ( $H^2$ or MLR) block $V_t S_b W_s^*$ , leads to rank-doubling, necessitating recompression at each node via QR/SVD on small sections.
Recursive Factorization: Factorizations are computed by recursively splitting the index tree, solving on leaf blocks, and updating Schur complements with low-rank operations.
Rank Truncation and Recompression: Maintains bounded rank throughout iterations, keeping operation counts optimal.

Operation Complexity (for degree of freedom $n$ , rank $k$ ):

One block update: $O((|t_0| + |s_0|) k^2)$ .
Full matrix multiplications, inversions, or factorizations: $O(n \log n)$ .
Storage: $O(n)$ (Börm et al., 2014).

3. Fitting and Learning MLR Structure

Fitting an MLR matrix to data involves several intertwined subproblems (Parshakova et al., 2023):

Factor Fitting: Given the block partition and rank allocation, adjust factors $B_{l,k}, C_{l,k}$ to minimize $\|A - \hat{A}_{MLR}\|_F^2$ . This is solved via block coordinate descent or alternating least squares, with each update often performed exactly via local SVD.
Rank Allocation: Distribute total allowed rank $r$ among blocks and levels to optimize approximation. Predicted error reductions for allocating one unit of rank to a block are computed from singular values of the residuals; rank is exchanged between levels accordingly.
Hierarchical Partitioning: The row/column partitions can be selected using spectral graph bi-clustering (for symmetric or non-symmetric matrices), greedy swapping, or prior problem structure, to maximize blockwise low-rankness.

A summary of these fitting steps is in the following table:

Step	Method	Complexity
Factor fitting	Block coordinate descent, ALS, SVD	per block, poly(r)
Rank allocation	Greedy exchange, singular value stats	per exchange, $O(r^2)$
Partitioning	Spectral methods, bi-clustering	$O(n^2)$ (init)

4. Applications in Scientific Computing and Machine Learning

MLR matrices and nested hierarchical low-rank approaches are prominent in several domains:

PDE and Integral Equation Solvers: In finite/boundary element discretizations, system matrices exhibit off-diagonal low-rank blocks due to Green's function decay; $\mathcal{H}^2$ /hierarchical low-rank preconditioners yield $O(n)$ storage and $O(n \log n)$ setup with uniform iteration counts (Börm et al., 2014).
Covariance Estimation and Factor Models: Large covariance matrices with multiscale factor components (e.g., in finance or genomics) are fit as MLR matrices, enabling linear time inference, MLE, inversion, and Cholesky computations (Parshakova et al., 18 Sep 2024). The inverse of an MLR matrix is shown to be MLR with the same partition/rank (Parshakova et al., 18 Sep 2024).
Large-Scale Attention and Deep Learning: In transformer attention, MLR matrices serve as attention scoring matrices enabling full-rank or distance-dependent biasing, surpassing standard low-rank attention under compute constraints and providing improved scaling laws in language modeling and long-range time-series forecasting (Kuang et al., 9 Sep 2025).
Data Compression and Fast Transforms: MLR-based compressions efficiently factorize images, neural network layers, and other data, combining randomized low-rank sketching with quantization for aggressive storage savings (Saha et al., 2023).

5. Numerical Results and Empirical Performance

Numerical experiments consistently demonstrate the efficiency and effectiveness of MLR representations:

Solver experiments: For PDEs (2D Poisson, BEM), setup time per degree of freedom scales as $O(\log n)$ with constant average block rank. Iteration counts and errors in conjugate gradients are stable under mesh refinement (Börm et al., 2014).
Statistical modeling: In multilevel factor models with MLR covariance structure, EM iterations, matrix inversions, and determinant computation are all performed in $O(n r)$ , with the structure maintained under inversion and Cholesky factorization (Parshakova et al., 18 Sep 2024).
Learning tasks: In image and representation compression, and nearest neighbor classification, MLR-based randomized and quantized schemes achieve significant compression ratios (as little as 1 bit per parameter) with negligible accuracy loss (Saha et al., 2023).
Transformer models: Embedding MLR into attention matrices overcomes the low-rank bottleneck in regression, boosts in-context learning performance, reduces key-cache and compute, and preserves (or improves) accuracy on long-range prediction benchmarks (Kuang et al., 9 Sep 2025).

6. Comparison to Alternative Approaches

MLR matrices generalize several structured and hierarchical matrix formats:

Classical $\mathcal{H}$ / $\mathcal{H}^2$ -matrices: MLR/ $\mathcal{H}^2$ use nested cluster bases, allowing storage and computation to be strictly $O(n)$ or nearly linear in $n$ , with on-the-fly basis recompression. In contrast, non-nested methods incur higher $O(n k^2 \log n)$ cost, and lose flexibility in local rank adaptation (Börm et al., 2014).
Factor plus diagonal/covariance models: MLR extends factor plus diagonal (i.e., "flat" factor models) by embedding hierarchical blockwise refinement, leading to improved fit for structured data (Parshakova et al., 2023).
Hierarchical semi-separable (HSS), HODLR, BTT: MLR permits block partitions without strict nesting and can potentially achieve better empirical storage/performance by flexibly matching data-driven partitioning (Parshakova et al., 2023, Kuang et al., 9 Sep 2025).
Standard low-rank approximation (SVD, CUR): While SVD and CUR decompose a matrix globally, MLR hierarchically and adaptively captures different correlation ranges and outperforms global methods when data exhibit multiscale structure (Saha et al., 2023, Saberian et al., 2019).

A summary of features:

Format	Storage	Fast ops	Hierarchical	Local Adaptivity	Full rank possible
SVD	$O((m+n)r)$	$O(nr)$	No	No	No
HODLR	$O(n\log n)$	$O(n\log n)$	Yes	Limited	No
$\mathcal{H}^2$ /MLR	$O(n)$	$O(n)$	Yes	Yes	Yes

7. Open Problems and Future Directions

Open problems and directions in MLR research include:

Automated Partitioning: Data-driven, scalable methods for discovering the hierarchical partition remain an active area (with spectral, bi-clustering, and greedy heuristics as partial solutions) (Parshakova et al., 2023).
Adaptive Rank Selection: Integrating randomized local rank estimation and sketching into MLR fitting for large, streaming, or distributed data (Meier et al., 2021).
Generalization to Tensors: Extending the MLR paradigm to tensor decompositions (e.g., Tensor-Train, Block Tensor-Train), for applications in multidimensional arrays and high-order attention (Budzinskiy, 3 Jul 2024, Kuang et al., 9 Sep 2025).
Integration in Machine Learning: Leveraging efficient MLR representations in deep learning, especially for scalable attention mechanisms and model compression (Saha et al., 2023, Kuang et al., 9 Sep 2025).
Open-source Implementation: Several open-source packages now exist for MLR fitting, inversion, and statistical inference, accelerating adoption in applied domains (Parshakova et al., 2023, Parshakova et al., 18 Sep 2024).

References

Efficient arithmetic operations for rank-structured matrices based on hierarchical low-rank updates (Börm et al., 2014)
Factor Fitting, Rank Allocation, and Partitioning in Multilevel Low Rank Matrices (Parshakova et al., 2023)
Fitting Multilevel Factor Models (Parshakova et al., 18 Sep 2024)
CUR Low Rank Approximation of a Matrix at Sublinear Cost (Go et al., 2019)
Matrix Compression via Randomized Low Rank and Low Precision Factorization (Saha et al., 2023)
Customizing the Inductive Biases of Softmax Attention using Structured Matrices (Kuang et al., 9 Sep 2025)