Sequential Thresholded Least Squares (STLSQ)

Updated 13 May 2026

STLSQ is a class of algorithms that solve sparse regression by alternating between least-squares fitting and hard thresholding, promoting interpretability and sparsity.
It iteratively identifies the active support through hard thresholding and refits using least-squares, ensuring convergence to a stable sparse solution under proper threshold choice.
STLSQ finds applications in system identification, high-dimensional statistics, and robust learning, offering interpretable models that are resilient to noise and overparameterization.

Sequential Thresholded Least Squares (STLSQ) is a class of iterative algorithms designed to address the sparse regression problem by explicitly promoting sparsity in model coefficients through alternating least-squares fitting and hard thresholding. These procedures are motivated by the need for interpretable, data-driven models in high-dimensional or ill-posed estimation settings, particularly in system identification and structured signal recovery. Core instances include applications in system identification (notably SINDy-type algorithms), high-dimensional statistics, robust machine learning under heavy-tailed noise, and sparse architectures for nonlinear modeling (Cho et al., 16 Dec 2025, Li, 22 Nov 2025, Wang et al., 2015, Wei, 2018).

1. Mathematical Formulation

STLSQ addresses the sparse regression problem by minimizing an objective that combines a data-fit term (usually a least-squares loss) and an explicit sparsity penalty of the form $\ell_0$ or related proxies. Given data $y\in\mathbb{R}^m$ , a dictionary/design matrix $D\in\mathbb{R}^{m\times n}$ , and a coefficient vector $\xi\in\mathbb{R}^n$ , the canonical STLSQ formulation is

$\min_{\xi\in\mathbb{R}^n} \ \|\mathbf{y} - D\xi\|_2^2 + \lambda^2 \|\xi\|_0$

where $\lambda>0$ promotes sparsity by thresholding small coefficients. In high-dimensional statistics, an analogous objective is

$\min_{\beta\in\mathbb{R}^p} \ \|y - X\beta\|_2^2 + \alpha\|\beta\|_0$

Extensions allow for structured penalties (e.g., group-sparsity, nuclear-norm), robustification via data truncation, and vectorized or matrix-valued coefficients in multi-output setups (Cho et al., 16 Dec 2025, Li, 22 Nov 2025, Wang et al., 2015, Wei, 2018).

2. Algorithmic Structure

The STLSQ methodology universally employs an alternating procedure consisting of:

Support identification: Given the current coefficient estimate, apply a hard-thresholding operator—zeroing out all entries with magnitude less than $\lambda$ .
Least-squares fit: Restrict attention to the active (nonzero) set and solve the unconstrained least-squares problem for these features.
Iteration: Repeat until the active support ceases to change (support-stability) or a fixed number of iterations is reached.

A representative iteration:

Initialize $\xi^0 = D^\dagger y$
For $i=0,1,\dots$ $i = 0, 1, \dots$ :
- $y\in\mathbb{R}^m$ 0
- $y\in\mathbb{R}^m$ 1
- Terminate when $y\in\mathbb{R}^m$ 2

Variants include the three-stage version (pre-selection, hard-threshold, refit) (Wang et al., 2015), matrix coefficient iterations for multi-output settings (Li, 22 Nov 2025), or a single-shot thresholding procedure within a regularized least-squares context for robust statistics (Wei, 2018). The underlying mechanism splits an otherwise intractable nonconvex program into projections (least squares) and support-extraction (thresholding).

3. Theoretical Properties and Guarantees

STLSQ is characterized by favorable convergence and support recovery properties under regime-dependent conditions:

Convergence: The alternation monotonically decreases the $y\in\mathbb{R}^m$ 3-penalized loss and converges to a local (not necessarily global) minimizer; under full-rank and mild coherence, support stabilization is achieved quickly (Cho et al., 16 Dec 2025, Li, 22 Nov 2025).
Support recovery: For exact sparsity, recovery of the true support holds provided the mutual coherence $y\in\mathbb{R}^m$ 4 satisfies $y\in\mathbb{R}^m$ 5 for $y\in\mathbb{R}^m$ 6-sparse $y\in\mathbb{R}^m$ 7 and with sufficiently small threshold $y\in\mathbb{R}^m$ 8. In the three-stage variant, stagewise consistency and model-selection rates match those of the LASSO/MCP/SCAD under milder or weaker irrepresentability-type requirements (Wang et al., 2015).
Robustness: For heavy-tailed or non-sub-Gaussian settings, coordinatewise thresholding of data, followed by regularized least-squares, achieves optimal minimax rates for structured signal recovery (sparse, low-rank, etc.) with only moment assumptions (Wei, 2018).
Computational complexity: For $y\in\mathbb{R}^m$ 9-sparse solutions, each iteration costs $D\in\mathbb{R}^{m\times n}$ 0 (with $D\in\mathbb{R}^{m\times n}$ 1 precomputed), and the entire procedure usually converges in a handful of steps. Extensions involving projection scores or stepwise regressors (e.g., ESR/GBSR) incur higher combinatorial cost but may improve selection for moderate $D\in\mathbb{R}^{m\times n}$ 2 (Cho et al., 16 Dec 2025).

4. Projection-Based Library Selection and Score Metrics

STLSQ exploits explicit projection-based diagnostics for model refinement:

Projected reconstruction error (score): For a candidate feature $D\in\mathbb{R}^{m\times n}$ 3 in $D\in\mathbb{R}^{m\times n}$ 4, the score is

$D\in\mathbb{R}^{m\times n}$ 5

quantifying loss in predictivity when omitting $D\in\mathbb{R}^{m\times n}$ 6. Features with small scores are eligible for immediate pruning.

Mutual coherence: Key to support-recovery, coherence $D\in\mathbb{R}^{m\times n}$ 7 governs the maximum allowable sparsity $D\in\mathbb{R}^{m\times n}$ 8 for exact recovery.
Score-guided threshold selection: Empirically, $D\in\mathbb{R}^{m\times n}$ 9 is proportional to the partial projection score; thus, thresholding on coefficient magnitude is equivalent to thresholding score. Hybrid strategies such as Exhaustive Stepwise Regressor (ESR) and Greedy Backward Stepwise Regressor (GBSR) leverage these scores to select the library size $\xi\in\mathbb{R}^n$ 0 rather than a real threshold $\xi\in\mathbb{R}^n$ 1 (Cho et al., 16 Dec 2025).

5. Variants and Extensions

Multiple algorithmic contexts adapt STLSQ:

SINDy and Weak SINDy: Core to sparse identification of nonlinear dynamics, STLSQ underlies library selection for dynamical system modeling, with additional weak formulations (integral functionals) for noise-robustification (Cho et al., 16 Dec 2025).
Sparse Broad Learning System (S-BLS): In multi-output, over-parameterized representations, STLSQ provides sparse parameterizations for efficient and robust broad learning architectures amid measurement noise (Li, 22 Nov 2025).
High-dimensional statistics: Three-stage or truncated-data STLSQ matches or improves penalized regression (LASSO/SCAD/MCP) support recovery and estimation error rates, requiring only ordinary least squares solvers plus thresholding. Adaptive threshold selection via cross-validation, extended BIC, or sample-dependent formulas is common (Wang et al., 2015, Wei, 2018).
Heavy-tailed and robust learning: Coordinate-wise and response truncations in the STLSQ pipeline enable optimal recovery rates without sub-Gaussian design assumptions. The analysis leverages so-called "critical radii" in combination with convex regularizers (Wei, 2018).

6. Practical Considerations and Implementation

Practitioners implementing STLSQ must address:

Threshold choice: Empirical or theoretically guided selection (e.g., $\xi\in\mathbb{R}^n$ 2) is vital for stable recovery; in practice, cross-validation or score-curve analysis yields reliable results (Cho et al., 16 Dec 2025, Li, 22 Nov 2025).
Overparameterization: Robustness and maximal benefit arise in settings where the dictionary is (moderately) overcomplete with respect to the true model, allowing STLSQ to prune back to informative features.
Noise and regularization: STLSQ alone may be sensitive to noise; augmentations including smoothing/weak-forms, ensemble averaging, or Bayesian/prior-driven variants are beneficial.
Complexity: Main computational bottlenecks are initial full least-squares solutions and, for score-based variants, repeated projections. Fixed iteration counts (typically $\xi\in\mathbb{R}^n$ 3– $\xi\in\mathbb{R}^n$ 4) provide balance between accuracy and runtime (Li, 22 Nov 2025).
Comparison to ridge/pseudoinverse: Unlike ridge regression or dense pseudoinverse-based estimates, STLSQ imposes explicit sparsity—yielding sparser, interpretable, and noise-robust models (Li, 22 Nov 2025).

7. Representative Example and Empirical Impact

STLSQ is effective for both synthetic and empirical modeling tasks. For example, identification of a scalar ODE $\xi\in\mathbb{R}^n$ 5 from $\xi\in\mathbb{R}^n$ 6 noisy samples with a polynomial dictionary recovers the correct model:

Initial projection: $\xi\in\mathbb{R}^n$ 7
Thresholding/pruning: Retain indices with $\xi\in\mathbb{R}^n$ 8 (only $\xi\in\mathbb{R}^n$ 9 and $\min_{\xi\in\mathbb{R}^n} \ \|\mathbf{y} - D\xi\|_2^2 + \lambda^2 \|\xi\|_0$ 0)
Final model: $\min_{\xi\in\mathbb{R}^n} \ \|\mathbf{y} - D\xi\|_2^2 + \lambda^2 \|\xi\|_0$ 1

This demonstrates exact recovery and interpretable model selection in low-noise, identifiable regimes (Cho et al., 16 Dec 2025). In broader settings, such as nonlinear system identification with sensor noise or outlier contamination, STLSQ-empowered frameworks yield improved compactness and out-of-sample robustness compared to baseline pseudoinverse or ridge-based alternatives (Li, 22 Nov 2025).

In summary, Sequential Thresholded Least Squares constitutes a robust, theoretically grounded, and computationally practical family of algorithms for sparse regression and system identification, applicable across a range of modern scientific modeling tasks. Its effectiveness is underpinned by the alternating project-threshold structure, explicit projection-based diagnostics, and minimal structural assumptions required for theoretical guarantees.

Markdown Report Issue Upgrade to Chat

References (4)

From STLS to Projection-based Dictionary Selection in Sparse Regression for System Identification (2025)

Sparse Broad Learning System via Sequential Threshold Least-Squares for Nonlinear System Identification under Noise (2025)

No penalty no tears: Least squares in high-dimensional linear models (2015)

Structured Recovery with Heavy-tailed Measurements: A Thresholding Procedure and Optimal Rates (2018)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Sequential Thresholded Least Squares (STLSQ).

Sequential Thresholded Least Squares (STLSQ)

1. Mathematical Formulation

2. Algorithmic Structure

3. Theoretical Properties and Guarantees

4. Projection-Based Library Selection and Score Metrics

5. Variants and Extensions

6. Practical Considerations and Implementation

7. Representative Example and Empirical Impact

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sequential Thresholded Least Squares (STLSQ)

1. Mathematical Formulation

2. Algorithmic Structure

3. Theoretical Properties and Guarantees

4. Projection-Based Library Selection and Score Metrics

5. Variants and Extensions

6. Practical Considerations and Implementation

7. Representative Example and Empirical Impact

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research