SKETCHLORD: Joint LoRD Recovery

Updated 28 December 2025

SKETCHLORD is a convex optimization-based sketching method that simultaneously recovers low-rank and diagonal structures in large operators with limited matrix entry access.
The method leverages matrix–vector products and nuclear norm relaxation to jointly optimize low-rank recovery and diagonal estimation, yielding lower approximation errors than sequential approaches.
SKETCHLORD scales efficiently by performing intensive computations in reduced sketch spaces, demonstrating superior empirical performance in large-scale scientific and deep learning applications.

SKETCHLORD refers to a convex optimization-based sketching method for the simultaneous recovery of low-rank plus diagonal (LoRD) structure in large linear operators. The method specifically targets scenarios where direct access to matrix entries is infeasible, but matrix–vector products (MVPs) can be evaluated efficiently—a frequent occurrence in scientific computing and large-scale machine learning. SKETCHLORD was introduced to overcome provable limitations of prior sketched approaches that can only recover low-rank or diagonal structure in isolation, rather than both jointly, and thus provides superior approximation error and scalability for large operators such as those encountered in Hessian matrices of deep learning models (Fernandez et al., 28 Sep 2025).

1. LoRD Operators and Motivation

A linear operator $A \in \mathbb{R}^{N \times N}$ has LoRD (Low-Rank plus Diagonal) structure if $A = L_* + D_*$ , where $L_*$ is low-rank ( $\mathrm{rank}(L_*) = k \ll N$ ) and $D_*$ is diagonal. LoRD structure arises in deep network Hessians, kernel methods, and scientific simulation, where $L_*$ represents a dominant spectral component while $D_*$ models regularization or noise.

Due to the size of $A$ , explicit storage and manipulation become impractical for large $N$ , but MVP access offers the opportunity for randomized sketching. Prior methods include:

Sketched SVD (SSVD): Recovers a low-rank approximation via $p$ MVPs.
Randomized trace/diagonal estimators (e.g., Hutchinson/XDiag): Estimate diagonal structure.

However, when $A$ admits an exact or approximate LoRD decomposition, sequential application of these methods—estimating low-rank then diagonal, or vice versa—is suboptimal in approximation error compared to joint estimation.

2. SKETCHLORD: Convex Formulation and Algorithm

SKETCHLORD formulates joint LoRD recovery as a convex program in the space of sketches. The process involves the following key constructs:

Sketch Matrix $\Omega$ : An i.i.d. Rademacher matrix $\Omega \in \mathbb{R}^{N \times p}$ .
Sketched Matrix: $Y = A \Omega \in \mathbb{R}^{N \times p}$ .
Centering Operator: $J = (1 1^T)/p \in \mathbb{R}^{p \times p}$ , $\tilde{Y} = Y (I_p - J)$ .

For any LoRD decomposition $A = L + D$ , it holds that $(D\Omega)(I_p - J) = 0$ , so $\tilde{Y} = (L\Omega)(I_p - J)$ . The recovery problem becomes:

$(P_0) \quad \min_L \ \mathrm{rank}(L) \quad \text{subject to} \quad (L\Omega)(I_p - J) = \tilde{Y}$

This NP-hard problem is relaxed to a nuclear norm penalty,

$(P_\lambda) \quad \hat{L} = \arg\min_{L} \frac{1}{2} \| \tilde{Y} - (L\Omega)(I_p - J) \|_F^2 + \lambda \|L\|_*$

where $\lambda > 0$ promotes low-rank solutions, and all variables remain in the sketch domain. The diagonal component is recovered post-hoc.

Optimization is performed via prox-gradient or ADMM methods, involving gradients

$\nabla f(L) = [(L\Omega - Y)(I_p - J)] \Omega^T$

and singular value thresholding (SVT) operations.

3. Theoretical Distinction from Sequential Approaches

SKETCHLORD addresses a central theoretical limitation of sequential recovery approaches. For the canonical example $A = 1 1^T + I_N$ , closed-form calculations show:

Diagonal-only or low-rank-only methods, as well as sequential (diagonal-then-low-rank or low-rank-then-diagonal), cannot achieve zero residual unless $k = N$ .
SKETCHLORD (the joint program) with $\lambda \to 0$ can recover $A$ exactly (zero residual) with $p > k$ MVPs.

This theoretical property generalizes to LoRD matrices with $p \gtrsim k$ MVPs, aligning with randomized SVD sample complexity for high-probability exact low-rank recovery.

4. Computational Workflow and Scalability

SKETCHLORD operates efficiently in the sketched (i.e., $N \times p$ or $p \times p$ ) subspaces:

Sketch generation: Sample $\Omega$ ; compute $Y = A \Omega$ .
Sketch centering: Form $J$ , then $\tilde{Y} = Y(I_p - J)$ .
Initialization: Set $L^{(0)} = \tilde{Y} \Omega^+ (I_p - J)^+$ .
Prox-gradient descent: Update $L^{(t)}$ via SVT steps, using the explicit gradient structure.
Diagonal recovery: Set $d = (1/p) \cdot \mathrm{diag}((A - \hat{L}) \Omega 1_p)$ .
SVD reconstruction: Efficient computation of $\hat{L}$ from $(\hat{L}\Omega, Y^*)$ via two QR decompositions and a $p \times p$ eigendecomposition.

This architecture achieves memory and runtime scaling compatible with large $N$ , as all intensive computations are confined to sketch spaces, with sketch dimension $p \sim O(k) \ll N$ .

5. Empirical Performance and Benchmarks

Extensive experiments validate SKETCHLORD’s effectiveness:

Synthetic LoRD Matrices: For $N \in \{500, 1000, 5000\}$ $N \in {500, 1000, 5000}$ , $k = N/100$ $k = N /100$ , $p = 18k$ $p = 18 k$ , with varying diagonal-to-low-rank ratios ( $\xi \in \{0, 0.1, 1, 10\}$ $ξ \in {0, 0.1, 1, 10}$ ) and low-rank spectra.
- Residual energy: Baseline (sequential and isolated) methods exceed 100% error at large $\xi$ , while SKETCHLORD remains below 10%.
- Low-rank limiting case ( $\xi=0$ ): SKETCHLORD is competitive with optimal low-rank-only methods.
- Efficiency: Compact two-step recovery matches single-pass accuracy at reduced runtime.
Planned Deep-learning Hessian tests: In pilot ResNet experiments, SKETCHLORD yields an order-of-magnitude lower Frobenius-norm residual than SSVD or XDiag for the same $p$ .

This empirical evidence demonstrates joint LoRD structure recovery with only $O(k)$ MVPs, and practical outperformance in both approximation error and computational resource usage (Fernandez et al., 28 Sep 2025).

6. Applications, Limitations, and Implications

SKETCHLORD applies directly when operators exhibit LoRD structure, a property supported by empirical studies of deep-learning Hessians and scientific simulation operators. A plausible implication is that SKETCHLORD can become a standard structured-approximation tool in large-scale machine learning model diagnostics, scientific computation, and kernel learning, enabling scalable surrogates where spectral and diagonal regularization both matter.

No complete sample-complexity theorem is proved, but practical guidance from randomized SVD applies. Limitations include the need for sufficient MVPs ( $p \gtrsim k$ ), and the assumption of dominant LoRD structure for best performance.

7. Comparative Summary of Structured Approximation Methods

The following table summarizes approximation strategies for structured $A$ with MVP access, contextualizing SKETCHLORD versus alternatives:

Method	Recovers	Joint Error
SSVD	Low-rank only	Suboptimal
XDiag	Diagonal only	Suboptimal
D→LoR or LoR→D	Sequential LoRD	Suboptimal
SKETCHLORD	Joint LoRD	Optimal (zero for exact LoRD, $p>k$ )

SKETCHLORD's ability to capture both spectral and local diagonal structure positions it as a key algorithmic development where LoRD structure is prevalent and high-fidelity operator surrogacy is critical (Fernandez et al., 28 Sep 2025).

PDF Markdown Chat (Pro)

References (1)

Sketching Low-Rank Plus Diagonal Matrices (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to SKETCHLORD.