Sketching Low-Rank Plus Diagonal Matrices

Updated 23 February 2026

Sketching LoRD matrices is a technique that decomposes a matrix into a low-rank and diagonal part using randomized matrix–vector products, capturing both global interactions and local variations.
Key algorithms like SKETCHLORD and alternating spectral methods leverage joint convex relaxation and eigenvalue truncation to achieve accurate recovery with provable error bounds.
This approach scales to high-dimensional applications such as deep learning Hessians and covariance estimation, significantly reducing computational cost while maintaining high accuracy.

High-dimensional linear operators and covariance matrices commonly arise in scientific computing and machine learning, where access is often restricted to implicit matrix–vector products (MVPs) due to computational constraints. Accurate and scalable representation of such operators is typically sought via structured decompositions. The low-rank plus diagonal (LoRD or LRPD) model provides a flexible yet parsimonious approximation, capturing both global interactions (low-rank) and localized effects (diagonal). Sketching methods, which reconstruct operators from a limited set of randomized MVPs, enable efficient computation and storage of these structured approximations. Recent advances—including SKETCHLORD, randomized alternating algorithms, and nested sketching frameworks—have systematically advanced the accurate recovery of LoRD matrices, demonstrating theoretical and empirical superiority over approaches that target only low-rank or diagonal structure.

1. Mathematical Formulation and Structural Motivation

Given $M \in \mathbb{R}^{d \times d}$ (or more generally $A \in \mathbb{R}^{n \times n}$ ), the LoRD model assumes

$M = L + D,$

where $L$ is rank- $r$ ( $r \ll d$ ), and $D$ is diagonal. The decomposition

$A \approx U U^\top + \operatorname{diag}(d)$

with $U \in \mathbb{R}^{n \times r}$ formalizes the LRPD structure for symmetric matrices. Such models capture global covariance through $L$ (shared directions or factors) and local, variable-specific variation through $D$ . This structure is highly pertinent in scenarios such as deep learning Hessians, where pronounced eigenmodes coexist with direction-specific curvatures, and in large-scale covariance estimation, where slow spectral decay necessitates corrections beyond pure low-rank (Fernandez et al., 28 Sep 2025, Yeon et al., 18 Dec 2025).

2. Sketching Access Model and Problem Reduction

In high-dimensional settings, explicit access to $M$ is infeasible; only MVPs $x \mapsto Mx$ can be queried at significant computational cost. The sketching access model chooses a test matrix $\Psi \in \mathbb{R}^{d \times p}$ (e.g., Rademacher or Gaussian), yielding the forward sketch $Z = M \Psi$ and optionally an adjoint sketch $Z^* = \Psi^\top M$ . With $p = O(r)$ , sketching efficiently compresses $M$ ’s relevant action.

For LoRD recovery, the sketch constraint exploits the property that $D \Psi$ is constant across columns of $\Psi$ if $\Psi$ ’s entries satisfy $\Psi_{ij}^2=1$ , yielding constraints:

$Z - L\Psi = D\Psi \qquad \text{and} \qquad (Z - L\Psi)\left(I_p - \frac{1}{p}11^\top\right) = 0.$

Defining $\widetilde{Z} = Z\left(I_p - \frac{1}{p} 11^\top\right)$ , the linear constraint for $L$ reads

$(L\Psi)\left(I_p - \frac{1}{p} 11^\top\right) = \widetilde{Z}$

(Fernandez et al., 28 Sep 2025).

3. Algorithmic Approaches for LoRD Recovery

3.1 Joint Convex Relaxation (SKETCHLORD)

SKETCHLORD formulates joint recovery as a convex program:

$\widehat{L} = \arg\min_{L \in \mathbb{R}^{d \times d}} \frac{1}{2} \big\|\widetilde{Z} - (L\Psi)\left(I_p - \frac{1}{p} 11^\top\right)\big\|_F^2 + \lambda \|L\|_*$

The Frobenius term enforces sketch consistency; the nuclear norm regularizer biases toward low rank. The problem is solved by accelerated proximal methods (e.g., ADMM), with singular value thresholding at each iteration:

$L^{t+1} = \mathrm{SVT}_{\lambda\eta}\left(L^t-\eta\nabla_L\frac{1}{2}\|\widetilde{Z}-(L\Psi)P\|_F^2\right),$

where $P = I_p - (1/p) 11^\top$ , and $\nabla_L$ is a projected gradient (Fernandez et al., 28 Sep 2025).

Upon convergence, the diagonal is recovered via

$\widehat{D} = \operatorname{diag}\left(\frac{1}{p} [Z - \widehat{L}\Psi]1\right)$

Compact (small $p \times p$ ) SVDs finalize the factorization.

3.2 Alternating Spectral and Stochastic Sketching

The alternating (Alt) algorithm iteratively projects away the current diagonal, computes a rank- $r$ eigenvalue truncated approximation (“low-rank step”), then updates the diagonal by subtracting the current $L$ from $A$ (“diagonal step”):

$L_t = T_r(A - D_{t-1}),\quad D_t = \operatorname{diag}(A-L_t)$

In large-scale settings, the low-rank update is replaced by a Nyström sketch; diagonal terms are estimated via stochastic Diag++ (Rademacher projection estimates) (Yeon et al., 18 Dec 2025).

A summary table of representative LoRD sketching approaches is below:

Method	Sketches Used	Core Optimization
SKETCHLORD (Fernandez et al., 28 Sep 2025)	Forward (and adjoint)	Convex nuclear norm
Alt–Nyström (Yeon et al., 18 Dec 2025)	Nyström + Diag++	Spectral alternating
Nested Sketch (Bahmani et al., 2015)	Quadratic via $\Psi$	Two-stage convex/Group-Lasso

4. Theoretical Guarantees and Comparison with Sequential Approaches

LoRD sketching admits rigorous analysis regarding sample complexity and approximation error:

For random $\Psi$ , $p \approx r + 20$ suffices to recover the rank- $r$ column space with high probability, matching classical randomized low-rank results (Tropp et al. 2017) (Fernandez et al., 28 Sep 2025).
Nuclear-norm relaxation is exact provided $\Psi$ satisfies a restricted isometry over rank-$2r$ matrices.
In explicit toy examples ( $M = 11^\top + I_d$ ), sequential pipelines—diagonal-only, low-rank-only, diagonal-then-low-rank, or low-rank-then-diagonal—are proven to incur strictly greater error than the joint formulation. Detailed closed-form expressions for relative Frobenius error demonstrate that any sequential method leaves an irreducible approximation gap, while joint optimization achieves zero error once $p \geq r$ (Fernandez et al., 28 Sep 2025).

For randomized alternating approaches, per-iterate max-norm error and sample allocation between low-rank and diagonal sketches are governed by the spectrum of the residual and concentration parameters, with explicit nonasymptotic bounds (Yeon et al., 18 Dec 2025).

Nested sketching with group-Lasso, for matrices whose diagonal is $k$ -sparse, achieves near-optimal sketch and storage complexity $O(rk\log(N/k))$ , leveraging specialized RIP properties (Bahmani et al., 2015).

5. Empirical and Computational Performance

Empirical studies demonstrate superior recovery of LoRD matrices in both synthetic and real-world settings:

On synthetic $M = L + D$ with controlled low-rank and variable diagonal scaling, SKETCHLORD’s advantage becomes pronounced as diagonal strength increases. Baseline (pure low-rank or diagonal) and sequential variants cannot capture strong diagonal–low-rank coupling (Fernandez et al., 28 Sep 2025).
Alternating and stochastic-Alt algorithms achieve machine-precision recovery on synthetic LRPD models (e.g., relative Frobenius error $\sim 10^{-15}$ for $n = 150, r = 5$ within 20 iterations) (Yeon et al., 18 Dec 2025).
For large-scale operators such as deep learning Hessians (e.g., in ResNet-50), LoRD sketching can reveal both sharp local curvature (diagonal) and outlier eigenmodes (low-rank), surpassing block-diagonal or KFAC-style approximations (Fernandez et al., 28 Sep 2025).
In finance applications, Block-LRPD (Alt extended to block-diagonal corrections) achieves lower approximation error for S&P 500 correlation matrices—attaining full-rank accuracy at small $r$ via blockwise diagonal updates (Yeon et al., 18 Dec 2025).

Computationally, LoRD sketching enables reductions in both MVP queries and flops:

SKETCHLORD: $\sim O(dr^2T)$ overall, $T$ the number of gradient steps, practical for $d \sim 10^5$ (Fernandez et al., 28 Sep 2025).
Alt (full): $O(n^3)$ per iteration; Nyström–Alt: $O(n^2k)$ ; Stochastic-Alt: $O(nb\,r)$ .
Nested-sketch: Stage 1 uses $O(M^3)$ (with $M \ll N$ ) and Stage 2 $O(Nrk)$ (Bahmani et al., 2015).

6. Practical Implementation Guidelines

Key recommendations for high-fidelity LoRD sketching:

Choose sketch size $p \approx 2r$ –$3r$ for accurate recovery; further increases reduce error as $O(1/p)$ but incur extra cost (Fernandez et al., 28 Sep 2025).
Utilize compact recovery (small $p \times p$ SVD) to reduce wall-clock time without accuracy penalty.
Employ momentum/Nesterov acceleration for 30–50% cut in iteration count.
Monitor convergence by $\|L^{t+1} - L^t\|_*$ or similar; early stopping may double speed.
In stochastic-Alt, allocate budget $b \approx 2r$ mat-vecs, with $k = (1+1/\varepsilon)r$ for relative accuracy $\varepsilon$ (Yeon et al., 18 Dec 2025).
Use robust eigenvalue thresholding and ensure nonnegativity in the diagonal when reconstructing positive semidefinite matrices.

Pitfalls include poorly chosen sketch size (leading to underestimated rank), subtracting unknown diagonals before projection (compromising PSD structure), and noisy diagonal corrections if the Rademacher query budget is insufficient (Yeon et al., 18 Dec 2025).

7. Extensions and Broader Context

Sketching LoRD matrices directly generalizes classic randomized SVD, diagonal extraction, and compressed covariance estimation. The LoRD structure is also a special case of simultaneously sparse and low-rank models. For covariance with $k$ -sparse diagonals, nested sketching with two-stage convex recovery leverages both restricted isometry and sparsity for optimal sample efficiency, outperforming generic sparse+low-rank settings that would require more measurements (Bahmani et al., 2015).

The LoRD approach’s spectrum of algorithms—joint convex (SKETCHLORD), alternating spectral, Nyström-diag stochastic, and nested sketch—form a comprehensive toolkit, suitable across scientific domains where only MVP access is possible. The paradigm’s flexibility is especially impactful when high-dimensional operators combine pronounced global and local structure, such as kernel matrices, large-scale factor-covariances, and deep learning Hessians (Fernandez et al., 28 Sep 2025, Yeon et al., 18 Dec 2025).

Markdown Report Issue Upgrade to Chat

References (3)

Sketching Low-Rank Plus Diagonal Matrices (2025)

Beyond Low Rank: Fast Low-Rank + Diagonal Decomposition with a Spectral Approach (2025)

Sketching for Simultaneously Sparse and Low-Rank Covariance Matrices (2015)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Sketching Low-Rank Plus Diagonal Matrices.