Papers
Topics
Authors
Recent
Search
2000 character limit reached

Sequential Thresholded Least Squares (STLSQ)

Updated 13 May 2026
  • STLSQ is a class of algorithms that solve sparse regression by alternating between least-squares fitting and hard thresholding, promoting interpretability and sparsity.
  • It iteratively identifies the active support through hard thresholding and refits using least-squares, ensuring convergence to a stable sparse solution under proper threshold choice.
  • STLSQ finds applications in system identification, high-dimensional statistics, and robust learning, offering interpretable models that are resilient to noise and overparameterization.

Sequential Thresholded Least Squares (STLSQ) is a class of iterative algorithms designed to address the sparse regression problem by explicitly promoting sparsity in model coefficients through alternating least-squares fitting and hard thresholding. These procedures are motivated by the need for interpretable, data-driven models in high-dimensional or ill-posed estimation settings, particularly in system identification and structured signal recovery. Core instances include applications in system identification (notably SINDy-type algorithms), high-dimensional statistics, robust machine learning under heavy-tailed noise, and sparse architectures for nonlinear modeling (Cho et al., 16 Dec 2025, Li, 22 Nov 2025, Wang et al., 2015, Wei, 2018).

1. Mathematical Formulation

STLSQ addresses the sparse regression problem by minimizing an objective that combines a data-fit term (usually a least-squares loss) and an explicit sparsity penalty of the form 0\ell_0 or related proxies. Given data yRmy\in\mathbb{R}^m, a dictionary/design matrix DRm×nD\in\mathbb{R}^{m\times n}, and a coefficient vector ξRn\xi\in\mathbb{R}^n, the canonical STLSQ formulation is

minξRn yDξ22+λ2ξ0\min_{\xi\in\mathbb{R}^n} \ \|\mathbf{y} - D\xi\|_2^2 + \lambda^2 \|\xi\|_0

where λ>0\lambda>0 promotes sparsity by thresholding small coefficients. In high-dimensional statistics, an analogous objective is

minβRp yXβ22+αβ0\min_{\beta\in\mathbb{R}^p} \ \|y - X\beta\|_2^2 + \alpha\|\beta\|_0

Extensions allow for structured penalties (e.g., group-sparsity, nuclear-norm), robustification via data truncation, and vectorized or matrix-valued coefficients in multi-output setups (Cho et al., 16 Dec 2025, Li, 22 Nov 2025, Wang et al., 2015, Wei, 2018).

2. Algorithmic Structure

The STLSQ methodology universally employs an alternating procedure consisting of:

  1. Support identification: Given the current coefficient estimate, apply a hard-thresholding operator—zeroing out all entries with magnitude less than λ\lambda.
  2. Least-squares fit: Restrict attention to the active (nonzero) set and solve the unconstrained least-squares problem for these features.
  3. Iteration: Repeat until the active support ceases to change (support-stability) or a fixed number of iterations is reached.

A representative iteration:

  • Initialize ξ0=Dy\xi^0 = D^\dagger y
  • For i=0,1,i=0,1,\dots:
    • yRmy\in\mathbb{R}^m0
    • yRmy\in\mathbb{R}^m1
    • Terminate when yRmy\in\mathbb{R}^m2

Variants include the three-stage version (pre-selection, hard-threshold, refit) (Wang et al., 2015), matrix coefficient iterations for multi-output settings (Li, 22 Nov 2025), or a single-shot thresholding procedure within a regularized least-squares context for robust statistics (Wei, 2018). The underlying mechanism splits an otherwise intractable nonconvex program into projections (least squares) and support-extraction (thresholding).

3. Theoretical Properties and Guarantees

STLSQ is characterized by favorable convergence and support recovery properties under regime-dependent conditions:

  • Convergence: The alternation monotonically decreases the yRmy\in\mathbb{R}^m3-penalized loss and converges to a local (not necessarily global) minimizer; under full-rank and mild coherence, support stabilization is achieved quickly (Cho et al., 16 Dec 2025, Li, 22 Nov 2025).
  • Support recovery: For exact sparsity, recovery of the true support holds provided the mutual coherence yRmy\in\mathbb{R}^m4 satisfies yRmy\in\mathbb{R}^m5 for yRmy\in\mathbb{R}^m6-sparse yRmy\in\mathbb{R}^m7 and with sufficiently small threshold yRmy\in\mathbb{R}^m8. In the three-stage variant, stagewise consistency and model-selection rates match those of the LASSO/MCP/SCAD under milder or weaker irrepresentability-type requirements (Wang et al., 2015).
  • Robustness: For heavy-tailed or non-sub-Gaussian settings, coordinatewise thresholding of data, followed by regularized least-squares, achieves optimal minimax rates for structured signal recovery (sparse, low-rank, etc.) with only moment assumptions (Wei, 2018).
  • Computational complexity: For yRmy\in\mathbb{R}^m9-sparse solutions, each iteration costs DRm×nD\in\mathbb{R}^{m\times n}0 (with DRm×nD\in\mathbb{R}^{m\times n}1 precomputed), and the entire procedure usually converges in a handful of steps. Extensions involving projection scores or stepwise regressors (e.g., ESR/GBSR) incur higher combinatorial cost but may improve selection for moderate DRm×nD\in\mathbb{R}^{m\times n}2 (Cho et al., 16 Dec 2025).

4. Projection-Based Library Selection and Score Metrics

STLSQ exploits explicit projection-based diagnostics for model refinement:

  • Projected reconstruction error (score): For a candidate feature DRm×nD\in\mathbb{R}^{m\times n}3 in DRm×nD\in\mathbb{R}^{m\times n}4, the score is

DRm×nD\in\mathbb{R}^{m\times n}5

quantifying loss in predictivity when omitting DRm×nD\in\mathbb{R}^{m\times n}6. Features with small scores are eligible for immediate pruning.

  • Mutual coherence: Key to support-recovery, coherence DRm×nD\in\mathbb{R}^{m\times n}7 governs the maximum allowable sparsity DRm×nD\in\mathbb{R}^{m\times n}8 for exact recovery.
  • Score-guided threshold selection: Empirically, DRm×nD\in\mathbb{R}^{m\times n}9 is proportional to the partial projection score; thus, thresholding on coefficient magnitude is equivalent to thresholding score. Hybrid strategies such as Exhaustive Stepwise Regressor (ESR) and Greedy Backward Stepwise Regressor (GBSR) leverage these scores to select the library size ξRn\xi\in\mathbb{R}^n0 rather than a real threshold ξRn\xi\in\mathbb{R}^n1 (Cho et al., 16 Dec 2025).

5. Variants and Extensions

Multiple algorithmic contexts adapt STLSQ:

  • SINDy and Weak SINDy: Core to sparse identification of nonlinear dynamics, STLSQ underlies library selection for dynamical system modeling, with additional weak formulations (integral functionals) for noise-robustification (Cho et al., 16 Dec 2025).
  • Sparse Broad Learning System (S-BLS): In multi-output, over-parameterized representations, STLSQ provides sparse parameterizations for efficient and robust broad learning architectures amid measurement noise (Li, 22 Nov 2025).
  • High-dimensional statistics: Three-stage or truncated-data STLSQ matches or improves penalized regression (LASSO/SCAD/MCP) support recovery and estimation error rates, requiring only ordinary least squares solvers plus thresholding. Adaptive threshold selection via cross-validation, extended BIC, or sample-dependent formulas is common (Wang et al., 2015, Wei, 2018).
  • Heavy-tailed and robust learning: Coordinate-wise and response truncations in the STLSQ pipeline enable optimal recovery rates without sub-Gaussian design assumptions. The analysis leverages so-called "critical radii" in combination with convex regularizers (Wei, 2018).

6. Practical Considerations and Implementation

Practitioners implementing STLSQ must address:

  • Threshold choice: Empirical or theoretically guided selection (e.g., ξRn\xi\in\mathbb{R}^n2) is vital for stable recovery; in practice, cross-validation or score-curve analysis yields reliable results (Cho et al., 16 Dec 2025, Li, 22 Nov 2025).
  • Overparameterization: Robustness and maximal benefit arise in settings where the dictionary is (moderately) overcomplete with respect to the true model, allowing STLSQ to prune back to informative features.
  • Noise and regularization: STLSQ alone may be sensitive to noise; augmentations including smoothing/weak-forms, ensemble averaging, or Bayesian/prior-driven variants are beneficial.
  • Complexity: Main computational bottlenecks are initial full least-squares solutions and, for score-based variants, repeated projections. Fixed iteration counts (typically ξRn\xi\in\mathbb{R}^n3–ξRn\xi\in\mathbb{R}^n4) provide balance between accuracy and runtime (Li, 22 Nov 2025).
  • Comparison to ridge/pseudoinverse: Unlike ridge regression or dense pseudoinverse-based estimates, STLSQ imposes explicit sparsity—yielding sparser, interpretable, and noise-robust models (Li, 22 Nov 2025).

7. Representative Example and Empirical Impact

STLSQ is effective for both synthetic and empirical modeling tasks. For example, identification of a scalar ODE ξRn\xi\in\mathbb{R}^n5 from ξRn\xi\in\mathbb{R}^n6 noisy samples with a polynomial dictionary recovers the correct model:

  • Initial projection: ξRn\xi\in\mathbb{R}^n7
  • Thresholding/pruning: Retain indices with ξRn\xi\in\mathbb{R}^n8 (only ξRn\xi\in\mathbb{R}^n9 and minξRn yDξ22+λ2ξ0\min_{\xi\in\mathbb{R}^n} \ \|\mathbf{y} - D\xi\|_2^2 + \lambda^2 \|\xi\|_00)
  • Final model: minξRn yDξ22+λ2ξ0\min_{\xi\in\mathbb{R}^n} \ \|\mathbf{y} - D\xi\|_2^2 + \lambda^2 \|\xi\|_01

This demonstrates exact recovery and interpretable model selection in low-noise, identifiable regimes (Cho et al., 16 Dec 2025). In broader settings, such as nonlinear system identification with sensor noise or outlier contamination, STLSQ-empowered frameworks yield improved compactness and out-of-sample robustness compared to baseline pseudoinverse or ridge-based alternatives (Li, 22 Nov 2025).

In summary, Sequential Thresholded Least Squares constitutes a robust, theoretically grounded, and computationally practical family of algorithms for sparse regression and system identification, applicable across a range of modern scientific modeling tasks. Its effectiveness is underpinned by the alternating project-threshold structure, explicit projection-based diagnostics, and minimal structural assumptions required for theoretical guarantees.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Sequential Thresholded Least Squares (STLSQ).