Papers
Topics
Authors
Recent
Search
2000 character limit reached

Deterministic Ridge Leverage Score Sampling

Updated 11 April 2026
  • Deterministic Ridge Leverage Score Sampling is a data-dependent column selection method that balances low-rank approximation with regularization while ensuring strong worst-case guarantees.
  • It computes ridge leverage scores via SVD and sorts columns to select a subset until a cumulative score threshold is met, offering interpretability and repeatability.
  • The method guarantees tight spectral bounds and controlled risk inflation in regression, making it effective for feature selection, data valuation, and kernel approximations.

Deterministic Ridge Leverage Score Sampling is a data-dependent, subset selection technique for linear algebraic and statistical tasks that balances the goals of low-rank approximation and regularization. Unlike randomized variants, the deterministic approach provides interpretability, repeatability, and strong worst-case guarantees, making it particularly attractive for applications in scientific data analysis, regression, matrix approximation, and feature selection.

1. Formal Definition and Structural Properties

Given a matrix ARn×dA\in\mathbb R^{n\times d} and a regularization parameter λ>0\lambda>0, the ridge leverage score for the iith column aia_i is defined as

τˉi(A)=aiT(AAT+λIn)+ai,\bar\tau_i(A) = a_i^T (A A^T + \lambda I_n)^+ a_i,

where ()+(\cdot)^+ denotes the Moore–Penrose pseudoinverse. In terms of the thin SVD A=UΣVTA=U\Sigma V^T, with singular values σ1σr>0\sigma_1\ge\cdots\ge\sigma_r>0 and right singular vectors VV, this score becomes

τˉi(A)=j=1rσj2σj2+λVij2.\bar\tau_i(A) = \sum_{j=1}^r \frac{\sigma_j^2}{\sigma_j^2+\lambda} V_{ij}^2.

Ridge leverage scores interpolate between subspace leverage scores (unregularized) and their regularized counterparts, smoothly down-weighting directions associated with small singular values. This stabilization makes ridge leverage scores adaptive for both regularized regression and low-rank matrix approximations, simultaneously capturing the informative structure and dampening the effect of noise or degeneracy (McCurdy, 2018).

2. Deterministic Sampling Algorithms

Deterministic ridge leverage score sampling proceeds by ranking columns according to their ridge leverage scores and selecting the subset with maximal cumulative score. The canonical procedure is:

  1. Compute all ridge leverage scores λ>0\lambda>00.
  2. Sort the column indices so that λ>0\lambda>01.
  3. Initialize an empty selection set λ>0\lambda>02 and a partial sum λ>0\lambda>03.
  4. Iteratively add λ>0\lambda>04 to λ>0\lambda>05 and update λ>0\lambda>06 until λ>0\lambda>07, with λ>0\lambda>08 for target rank λ>0\lambda>09 and tolerance ii0.
  5. If ii1, continue selecting the largest remaining scores to ensure the subset has at least ii2 columns.
  6. Construct the sampling matrix ii3 and resulting sketch ii4.

This routine yields an unweighted, deterministic subset of columns, with computational complexity ii5 for score computation and ii6 for sorting (McCurdy, 2018). For kernelized and feature-map settings, the deterministic variant is implemented by sorting data points by (kernel) ridge leverage scores and taking the top set (Schreurs et al., 2021, Chen et al., 2021).

3. Theoretical Guarantees

Deterministic ridge leverage score sampling provides strong spectral and prediction risk bounds:

  • Additive-multiplicative spectral bound: For ii7 the selected column subset,

ii8

where ii9 and aia_i0 is the best rank-aia_i1 approximation (McCurdy, 2018).

  • Projection-cost preservation: For any orthogonal projector aia_i2 of rank aia_i3,

aia_i4

with constant aia_i5, preserving objectives such as low-rank approximation and aia_i6-means cost up to aia_i7 factors (McCurdy, 2018).

  • Risk inflation in regression: For ridge regression on the sketch aia_i8 versus the full data, the statistical risk aia_i9, where τˉi(A)=aiT(AAT+λIn)+ai,\bar\tau_i(A) = a_i^T (A A^T + \lambda I_n)^+ a_i,0 (McCurdy, 2018).
  • Sample complexity: When ridge leverage scores decay as a power law τˉi(A)=aiT(AAT+λIn)+ai,\bar\tau_i(A) = a_i^T (A A^T + \lambda I_n)^+ a_i,1, τˉi(A)=aiT(AAT+λIn)+ai,\bar\tau_i(A) = a_i^T (A A^T + \lambda I_n)^+ a_i,2, the deterministic subset size matches or exceeds the efficiency of randomized sampling, i.e., τˉi(A)=aiT(AAT+λIn)+ai,\bar\tau_i(A) = a_i^T (A A^T + \lambda I_n)^+ a_i,3 for τˉi(A)=aiT(AAT+λIn)+ai,\bar\tau_i(A) = a_i^T (A A^T + \lambda I_n)^+ a_i,4 (McCurdy, 2018).

4. Applications in Regression, Feature Selection, and Kernel Methods

Deterministic ridge leverage sampling is applicable and effective in a range of settings:

  • Ridge regression and classification: Using a sketch τˉi(A)=aiT(AAT+λIn)+ai,\bar\tau_i(A) = a_i^T (A A^T + \lambda I_n)^+ a_i,5 formed from deterministic RLS sampling, regression coefficients corresponding to non-selected columns are forced to zero, resulting in built-in feature selection with provable risk control. Selecting with respect to the regularized leverage scores yields a risk bound competitive with alternatives such as elastic net (McCurdy, 2018).
  • Design and data valuation: Ridge leverage scores measure marginal gain under A- and D-optimality criteria, and when normalized can serve as Shapley-like data value surrogates (Mendoza-Smith, 3 Nov 2025).
  • Active learning and data subset selection: In deterministic active learning, acquiring samples with the highest ridge leverage scores yields models whose test accuracy closely matches or exceeds classical uncertainty-based and geometric selection strategies (Mendoza-Smith, 3 Nov 2025).
  • Nyström approximations in kernel ridge regression: Deterministic selection of kernel landmarks by (approximate) ridge leverage yields Nyström approximations with the same in-sample risk as full KRR and near-linear computational complexity, especially in cases with stationary kernels (Chen et al., 2021).

5. Connections to Spectral Sparsification and Feature Selection

Deterministic feature selection via RLS connects closely to single-set spectral sparsification (BSS), wherein a greedy procedure selects rows with weights to spectrally approximate the Gram matrix τˉi(A)=aiT(AAT+λIn)+ai,\bar\tau_i(A) = a_i^T (A A^T + \lambda I_n)^+ a_i,6 up to τˉi(A)=aiT(AAT+λIn)+ai,\bar\tau_i(A) = a_i^T (A A^T + \lambda I_n)^+ a_i,7, ensuring that the risk of ridge regression on the reduced feature space inflates by at most τˉi(A)=aiT(AAT+λIn)+ai,\bar\tau_i(A) = a_i^T (A A^T + \lambda I_n)^+ a_i,8 relative to the original (Paul et al., 2015). Both methods yield deterministic, interpretable subset selectors with explicit sample complexity guarantees—typically τˉi(A)=aiT(AAT+λIn)+ai,\bar\tau_i(A) = a_i^T (A A^T + \lambda I_n)^+ a_i,9 for rank ()+(\cdot)^+0 and error ()+(\cdot)^+1.

6. Practical Considerations and Empirical Observations

  • Parameter selection: Good practice suggests setting ()+(\cdot)^+2 via spectrum elbowing, ()+(\cdot)^+3, and ()+(\cdot)^+4 to balance sketch size and error (McCurdy, 2018).
  • Computational cost: Direct SVD or Cholesky is ()+(\cdot)^+5 for dense problems, with further acceleration possible via randomized or approximate techniques for large ()+(\cdot)^+6 (Schreurs et al., 2021, Chen et al., 2021).
  • Empirical efficacy: In applications such as multi-omic cancer data and deep-learning model training, deterministic RLS sampling yields compact, interpretable data sketches (small ()+(\cdot)^+7) with negligible loss in predictive accuracy, and in GAN training, empirically corrects mode drop and improves rare-mode coverage (McCurdy, 2018, Schreurs et al., 2021).
  • Feature and landmark selection: Deterministic top-()+(\cdot)^+8 RLS selection matches or outperforms standard baselines (uncertainty, margin, entropy) in data-efficient regimes, particularly for high-dimensional or overparameterized models (Mendoza-Smith, 3 Nov 2025).

7. Extensions and Kernel Generalizations

Kernelized deterministic ridge leverage score sampling extends the theory and algorithmic guarantees to non-linear settings. In those contexts, the scores are computed in dual (kernel) or primal (feature) form and can leverage the structure of stationary kernels for efficient approximation. A one-dimensional integral formula, based on input density and kernel spectral density, enables linear-time computation (up to poly-log) of approximate scores. Sorting and selecting the top estimated scores yields a deterministic Nyström approximation matching the statistical risk of full-data solutions under regularity assumptions (Chen et al., 2021). Tuning of regularization, landmark count, and kernel hyperparameters directly impact both efficiency and downstream generalization.


Key references: (McCurdy, 2018, Schreurs et al., 2021, Mendoza-Smith, 3 Nov 2025, Paul et al., 2015, Chen et al., 2021).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Deterministic Ridge Leverage Score Sampling.