Sketching Low-Rank Plus Diagonal Matrices
- Sketching LoRD matrices is a technique that decomposes a matrix into a low-rank and diagonal part using randomized matrix–vector products, capturing both global interactions and local variations.
- Key algorithms like SKETCHLORD and alternating spectral methods leverage joint convex relaxation and eigenvalue truncation to achieve accurate recovery with provable error bounds.
- This approach scales to high-dimensional applications such as deep learning Hessians and covariance estimation, significantly reducing computational cost while maintaining high accuracy.
High-dimensional linear operators and covariance matrices commonly arise in scientific computing and machine learning, where access is often restricted to implicit matrix–vector products (MVPs) due to computational constraints. Accurate and scalable representation of such operators is typically sought via structured decompositions. The low-rank plus diagonal (LoRD or LRPD) model provides a flexible yet parsimonious approximation, capturing both global interactions (low-rank) and localized effects (diagonal). Sketching methods, which reconstruct operators from a limited set of randomized MVPs, enable efficient computation and storage of these structured approximations. Recent advances—including SKETCHLORD, randomized alternating algorithms, and nested sketching frameworks—have systematically advanced the accurate recovery of LoRD matrices, demonstrating theoretical and empirical superiority over approaches that target only low-rank or diagonal structure.
1. Mathematical Formulation and Structural Motivation
Given (or more generally ), the LoRD model assumes
where is rank- (), and is diagonal. The decomposition
with formalizes the LRPD structure for symmetric matrices. Such models capture global covariance through (shared directions or factors) and local, variable-specific variation through . This structure is highly pertinent in scenarios such as deep learning Hessians, where pronounced eigenmodes coexist with direction-specific curvatures, and in large-scale covariance estimation, where slow spectral decay necessitates corrections beyond pure low-rank (Fernandez et al., 28 Sep 2025, Yeon et al., 18 Dec 2025).
2. Sketching Access Model and Problem Reduction
In high-dimensional settings, explicit access to is infeasible; only MVPs can be queried at significant computational cost. The sketching access model chooses a test matrix (e.g., Rademacher or Gaussian), yielding the forward sketch and optionally an adjoint sketch . With , sketching efficiently compresses ’s relevant action.
For LoRD recovery, the sketch constraint exploits the property that is constant across columns of if ’s entries satisfy , yielding constraints:
Defining , the linear constraint for reads
(Fernandez et al., 28 Sep 2025).
3. Algorithmic Approaches for LoRD Recovery
3.1 Joint Convex Relaxation (SKETCHLORD)
SKETCHLORD formulates joint recovery as a convex program:
The Frobenius term enforces sketch consistency; the nuclear norm regularizer biases toward low rank. The problem is solved by accelerated proximal methods (e.g., ADMM), with singular value thresholding at each iteration:
where , and is a projected gradient (Fernandez et al., 28 Sep 2025).
Upon convergence, the diagonal is recovered via
Compact (small ) SVDs finalize the factorization.
3.2 Alternating Spectral and Stochastic Sketching
The alternating (Alt) algorithm iteratively projects away the current diagonal, computes a rank- eigenvalue truncated approximation (“low-rank step”), then updates the diagonal by subtracting the current from (“diagonal step”):
In large-scale settings, the low-rank update is replaced by a Nyström sketch; diagonal terms are estimated via stochastic Diag++ (Rademacher projection estimates) (Yeon et al., 18 Dec 2025).
A summary table of representative LoRD sketching approaches is below:
| Method | Sketches Used | Core Optimization |
|---|---|---|
| SKETCHLORD (Fernandez et al., 28 Sep 2025) | Forward (and adjoint) | Convex nuclear norm |
| Alt–Nyström (Yeon et al., 18 Dec 2025) | Nyström + Diag++ | Spectral alternating |
| Nested Sketch (Bahmani et al., 2015) | Quadratic via | Two-stage convex/Group-Lasso |
4. Theoretical Guarantees and Comparison with Sequential Approaches
LoRD sketching admits rigorous analysis regarding sample complexity and approximation error:
- For random , suffices to recover the rank- column space with high probability, matching classical randomized low-rank results (Tropp et al. 2017) (Fernandez et al., 28 Sep 2025).
- Nuclear-norm relaxation is exact provided satisfies a restricted isometry over rank-$2r$ matrices.
- In explicit toy examples (), sequential pipelines—diagonal-only, low-rank-only, diagonal-then-low-rank, or low-rank-then-diagonal—are proven to incur strictly greater error than the joint formulation. Detailed closed-form expressions for relative Frobenius error demonstrate that any sequential method leaves an irreducible approximation gap, while joint optimization achieves zero error once (Fernandez et al., 28 Sep 2025).
For randomized alternating approaches, per-iterate max-norm error and sample allocation between low-rank and diagonal sketches are governed by the spectrum of the residual and concentration parameters, with explicit nonasymptotic bounds (Yeon et al., 18 Dec 2025).
Nested sketching with group-Lasso, for matrices whose diagonal is -sparse, achieves near-optimal sketch and storage complexity , leveraging specialized RIP properties (Bahmani et al., 2015).
5. Empirical and Computational Performance
Empirical studies demonstrate superior recovery of LoRD matrices in both synthetic and real-world settings:
- On synthetic with controlled low-rank and variable diagonal scaling, SKETCHLORD’s advantage becomes pronounced as diagonal strength increases. Baseline (pure low-rank or diagonal) and sequential variants cannot capture strong diagonal–low-rank coupling (Fernandez et al., 28 Sep 2025).
- Alternating and stochastic-Alt algorithms achieve machine-precision recovery on synthetic LRPD models (e.g., relative Frobenius error for within 20 iterations) (Yeon et al., 18 Dec 2025).
- For large-scale operators such as deep learning Hessians (e.g., in ResNet-50), LoRD sketching can reveal both sharp local curvature (diagonal) and outlier eigenmodes (low-rank), surpassing block-diagonal or KFAC-style approximations (Fernandez et al., 28 Sep 2025).
- In finance applications, Block-LRPD (Alt extended to block-diagonal corrections) achieves lower approximation error for S&P 500 correlation matrices—attaining full-rank accuracy at small via blockwise diagonal updates (Yeon et al., 18 Dec 2025).
Computationally, LoRD sketching enables reductions in both MVP queries and flops:
- SKETCHLORD: overall, the number of gradient steps, practical for (Fernandez et al., 28 Sep 2025).
- Alt (full): per iteration; Nyström–Alt: ; Stochastic-Alt: .
- Nested-sketch: Stage 1 uses (with ) and Stage 2 (Bahmani et al., 2015).
6. Practical Implementation Guidelines
Key recommendations for high-fidelity LoRD sketching:
- Choose sketch size –$3r$ for accurate recovery; further increases reduce error as but incur extra cost (Fernandez et al., 28 Sep 2025).
- Utilize compact recovery (small SVD) to reduce wall-clock time without accuracy penalty.
- Employ momentum/Nesterov acceleration for 30–50% cut in iteration count.
- Monitor convergence by or similar; early stopping may double speed.
- In stochastic-Alt, allocate budget mat-vecs, with for relative accuracy (Yeon et al., 18 Dec 2025).
- Use robust eigenvalue thresholding and ensure nonnegativity in the diagonal when reconstructing positive semidefinite matrices.
Pitfalls include poorly chosen sketch size (leading to underestimated rank), subtracting unknown diagonals before projection (compromising PSD structure), and noisy diagonal corrections if the Rademacher query budget is insufficient (Yeon et al., 18 Dec 2025).
7. Extensions and Broader Context
Sketching LoRD matrices directly generalizes classic randomized SVD, diagonal extraction, and compressed covariance estimation. The LoRD structure is also a special case of simultaneously sparse and low-rank models. For covariance with -sparse diagonals, nested sketching with two-stage convex recovery leverages both restricted isometry and sparsity for optimal sample efficiency, outperforming generic sparse+low-rank settings that would require more measurements (Bahmani et al., 2015).
The LoRD approach’s spectrum of algorithms—joint convex (SKETCHLORD), alternating spectral, Nyström-diag stochastic, and nested sketch—form a comprehensive toolkit, suitable across scientific domains where only MVP access is possible. The paradigm’s flexibility is especially impactful when high-dimensional operators combine pronounced global and local structure, such as kernel matrices, large-scale factor-covariances, and deep learning Hessians (Fernandez et al., 28 Sep 2025, Yeon et al., 18 Dec 2025).