Papers
Topics
Authors
Recent
2000 character limit reached

SKETCHLORD: Joint LoRD Recovery

Updated 28 December 2025
  • SKETCHLORD is a convex optimization-based sketching method that simultaneously recovers low-rank and diagonal structures in large operators with limited matrix entry access.
  • The method leverages matrix–vector products and nuclear norm relaxation to jointly optimize low-rank recovery and diagonal estimation, yielding lower approximation errors than sequential approaches.
  • SKETCHLORD scales efficiently by performing intensive computations in reduced sketch spaces, demonstrating superior empirical performance in large-scale scientific and deep learning applications.

SKETCHLORD refers to a convex optimization-based sketching method for the simultaneous recovery of low-rank plus diagonal (LoRD) structure in large linear operators. The method specifically targets scenarios where direct access to matrix entries is infeasible, but matrix–vector products (MVPs) can be evaluated efficiently—a frequent occurrence in scientific computing and large-scale machine learning. SKETCHLORD was introduced to overcome provable limitations of prior sketched approaches that can only recover low-rank or diagonal structure in isolation, rather than both jointly, and thus provides superior approximation error and scalability for large operators such as those encountered in Hessian matrices of deep learning models (Fernandez et al., 28 Sep 2025).

1. LoRD Operators and Motivation

A linear operator ARN×NA \in \mathbb{R}^{N \times N} has LoRD (Low-Rank plus Diagonal) structure if A=L+DA = L_* + D_*, where LL_* is low-rank (rank(L)=kN\mathrm{rank}(L_*) = k \ll N) and DD_* is diagonal. LoRD structure arises in deep network Hessians, kernel methods, and scientific simulation, where LL_* represents a dominant spectral component while DD_* models regularization or noise.

Due to the size of AA, explicit storage and manipulation become impractical for large NN, but MVP access offers the opportunity for randomized sketching. Prior methods include:

  • Sketched SVD (SSVD): Recovers a low-rank approximation via pp MVPs.
  • Randomized trace/diagonal estimators (e.g., Hutchinson/XDiag): Estimate diagonal structure.

However, when AA admits an exact or approximate LoRD decomposition, sequential application of these methods—estimating low-rank then diagonal, or vice versa—is suboptimal in approximation error compared to joint estimation.

2. SKETCHLORD: Convex Formulation and Algorithm

SKETCHLORD formulates joint LoRD recovery as a convex program in the space of sketches. The process involves the following key constructs:

  • Sketch Matrix Ω\Omega: An i.i.d. Rademacher matrix ΩRN×p\Omega \in \mathbb{R}^{N \times p}.
  • Sketched Matrix: Y=AΩRN×pY = A \Omega \in \mathbb{R}^{N \times p}.
  • Centering Operator: J=(11T)/pRp×pJ = (1 1^T)/p \in \mathbb{R}^{p \times p}, Y~=Y(IpJ)\tilde{Y} = Y (I_p - J).

For any LoRD decomposition A=L+DA = L + D, it holds that (DΩ)(IpJ)=0(D\Omega)(I_p - J) = 0, so Y~=(LΩ)(IpJ)\tilde{Y} = (L\Omega)(I_p - J). The recovery problem becomes:

(P0)minL rank(L)subject to(LΩ)(IpJ)=Y~(P_0) \quad \min_L \ \mathrm{rank}(L) \quad \text{subject to} \quad (L\Omega)(I_p - J) = \tilde{Y}

This NP-hard problem is relaxed to a nuclear norm penalty,

(Pλ)L^=argminL12Y~(LΩ)(IpJ)F2+λL(P_\lambda) \quad \hat{L} = \arg\min_{L} \frac{1}{2} \| \tilde{Y} - (L\Omega)(I_p - J) \|_F^2 + \lambda \|L\|_*

where λ>0\lambda > 0 promotes low-rank solutions, and all variables remain in the sketch domain. The diagonal component is recovered post-hoc.

Optimization is performed via prox-gradient or ADMM methods, involving gradients

f(L)=[(LΩY)(IpJ)]ΩT\nabla f(L) = [(L\Omega - Y)(I_p - J)] \Omega^T

and singular value thresholding (SVT) operations.

3. Theoretical Distinction from Sequential Approaches

SKETCHLORD addresses a central theoretical limitation of sequential recovery approaches. For the canonical example A=11T+INA = 1 1^T + I_N, closed-form calculations show:

  • Diagonal-only or low-rank-only methods, as well as sequential (diagonal-then-low-rank or low-rank-then-diagonal), cannot achieve zero residual unless k=Nk = N.
  • SKETCHLORD (the joint program) with λ0\lambda \to 0 can recover AA exactly (zero residual) with p>kp > k MVPs.

This theoretical property generalizes to LoRD matrices with pkp \gtrsim k MVPs, aligning with randomized SVD sample complexity for high-probability exact low-rank recovery.

4. Computational Workflow and Scalability

SKETCHLORD operates efficiently in the sketched (i.e., N×pN \times p or p×pp \times p) subspaces:

  1. Sketch generation: Sample Ω\Omega; compute Y=AΩY = A \Omega.
  2. Sketch centering: Form JJ, then Y~=Y(IpJ)\tilde{Y} = Y(I_p - J).
  3. Initialization: Set L(0)=Y~Ω+(IpJ)+L^{(0)} = \tilde{Y} \Omega^+ (I_p - J)^+.
  4. Prox-gradient descent: Update L(t)L^{(t)} via SVT steps, using the explicit gradient structure.
  5. Diagonal recovery: Set d=(1/p)diag((AL^)Ω1p)d = (1/p) \cdot \mathrm{diag}((A - \hat{L}) \Omega 1_p).
  6. SVD reconstruction: Efficient computation of L^\hat{L} from (L^Ω,Y)(\hat{L}\Omega, Y^*) via two QR decompositions and a p×pp \times p eigendecomposition.

This architecture achieves memory and runtime scaling compatible with large NN, as all intensive computations are confined to sketch spaces, with sketch dimension pO(k)Np \sim O(k) \ll N.

5. Empirical Performance and Benchmarks

Extensive experiments validate SKETCHLORD’s effectiveness:

  • Synthetic LoRD Matrices: For N{500,1000,5000}N \in \{500, 1000, 5000\}, k=N/100k = N/100, p=18kp = 18k, with varying diagonal-to-low-rank ratios (ξ{0,0.1,1,10}\xi \in \{0, 0.1, 1, 10\}) and low-rank spectra.
    • Residual energy: Baseline (sequential and isolated) methods exceed 100% error at large ξ\xi, while SKETCHLORD remains below 10%.
    • Low-rank limiting case (ξ=0\xi=0): SKETCHLORD is competitive with optimal low-rank-only methods.
    • Efficiency: Compact two-step recovery matches single-pass accuracy at reduced runtime.
  • Planned Deep-learning Hessian tests: In pilot ResNet experiments, SKETCHLORD yields an order-of-magnitude lower Frobenius-norm residual than SSVD or XDiag for the same pp.

This empirical evidence demonstrates joint LoRD structure recovery with only O(k)O(k) MVPs, and practical outperformance in both approximation error and computational resource usage (Fernandez et al., 28 Sep 2025).

6. Applications, Limitations, and Implications

SKETCHLORD applies directly when operators exhibit LoRD structure, a property supported by empirical studies of deep-learning Hessians and scientific simulation operators. A plausible implication is that SKETCHLORD can become a standard structured-approximation tool in large-scale machine learning model diagnostics, scientific computation, and kernel learning, enabling scalable surrogates where spectral and diagonal regularization both matter.

No complete sample-complexity theorem is proved, but practical guidance from randomized SVD applies. Limitations include the need for sufficient MVPs (pkp \gtrsim k), and the assumption of dominant LoRD structure for best performance.

7. Comparative Summary of Structured Approximation Methods

The following table summarizes approximation strategies for structured AA with MVP access, contextualizing SKETCHLORD versus alternatives:

Method Recovers Joint Error
SSVD Low-rank only Suboptimal
XDiag Diagonal only Suboptimal
D→LoR or LoR→D Sequential LoRD Suboptimal
SKETCHLORD Joint LoRD Optimal (zero for exact LoRD, p>kp>k)

SKETCHLORD's ability to capture both spectral and local diagonal structure positions it as a key algorithmic development where LoRD structure is prevalent and high-fidelity operator surrogacy is critical (Fernandez et al., 28 Sep 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to SKETCHLORD.