Papers
Topics
Authors
Recent
Search
2000 character limit reached

Sequential Thresholding Least Squares (STLS)

Updated 24 March 2026
  • Sequential Thresholding Least Squares (STLS) is a sparse regression method that alternates between least squares estimation and hard thresholding to prune insignificant coefficients.
  • It employs various algorithmic strategies, including greedy coordinate descent and multi-stage refinements, to achieve an optimal balance between accuracy and sparsity.
  • STLS underpins applications in system identification, robust regression, and nonlinear model discovery with strong theoretical and computational guarantees.

Sequential Thresholding Least Squares (STLS) is a data-driven algorithmic framework for sparse regression, variable selection, and model identification in high-dimensional settings. It operates by alternating least-squares estimation and hard thresholding to prune weak coefficients, thus solving sparse, often nonconvex, least-squares problems. STLS covers several algorithmic incarnations—greedy coordinate descent with thresholding, multi-stage LS + pruning procedures, and iterative hard thresholding in both 0\ell_0 and nonconvex q\ell_q-penalized contexts. The method is fundamental to many modern approaches for sparse system identification, robust regression, and interpretable nonlinear model discovery, particularly in the context of SINDy-type algorithms.

1. Formulations and Algorithmic Structure

The core problem attacked by STLS is the 0\ell_0-penalized or thresholded least-squares: minxRp Axy22+λ2x0,\min_{x \in \mathbb{R}^p} \ \|Ax - y\|_2^2 + \lambda^2 \|x\|_0, where x0\|x\|_0 counts nonzero elements, ARn×pA \in \mathbb{R}^{n \times p} (often pnp \gg n), and yRny \in \mathbb{R}^n. This problem is nonconvex and combinatorial; STLS approximates it by iteratively alternating between:

  • Least-squares estimation on an "active" set (features with large current coefficients).
  • Hard thresholding: setting coefficients below a threshold λ\lambda to zero.

This split is expressed as:

  1. Support Update:

S(k)={j:xj(k)λ}S^{(k)} = \{j: |x_j^{(k)}| \ge \lambda\}

  1. Least Squares Step:

xS(k)(k+1)=(AS(k)TAS(k))1AS(k)Ty,xj(k+1)=0 for jS(k)x_{S^{(k)}}^{(k+1)} = (A_{S^{(k)}}^T A_{S^{(k)}})^{-1} A_{S^{(k)}}^T y, \quad x_j^{(k+1)} = 0 \text{ for } j \notin S^{(k)}

This process repeats until the support stabilizes. The method generalizes to include nonconvex q\ell_q penalties for $0 < q < 1$ and can be equipped with coordinate-descent or Gauss-Seidel updating to further enhance practical convergence (Cho et al., 16 Dec 2025, Zeng et al., 2015).

2. Key Algorithmic Variants and Implementation

Three-Stage STLS for High-Dimensional Models

The three-stage LAT/RAT algorithm (Wang et al., 2015)—now widely referred to as a canonical STLS scheme—comprises:

  1. Stage 1: High-dimensional OLS (pre-selection)

β^(HD)=XT(XXT+r0In)1Y,\hat{\beta}^{(HD)} = X^T (XX^T + r_0 I_n)^{-1} Y,

with r0>0r_0 > 0 small for stability. Select the top dd features by β^i(HD)|\hat{\beta}_i^{(HD)}|.

  1. Stage 2: Restricted LS + Hard Thresholding

β^M~d(OLS)=(XM~dTXM~d)1XM~dTY\hat{\beta}^{(OLS)}_{\tilde{M}_d} = (X_{\tilde{M}_d}^T X_{\tilde{M}_d})^{-1} X_{\tilde{M}_d}^T Y

Threshold at γ\gamma, with

γ=meani{2σ^2Cˉiiln(4d/δ)}\gamma = \mathrm{mean}_i \left\{ \sqrt{2 \hat{\sigma}^2 \bar{C}_{ii} \ln(4d/\delta)} \right\}

and Hγ\mathcal{H}_\gamma the hard thresholding operator.

  1. Stage 3: Final LS Refinement Refit OLS on the remaining support.

The method is computationally efficient: initial LS fit is O(n2p)O(n^2 p), subsequent steps are O(n3)O(n^3) when d=O(n)d = O(n), and all operations involve dense linear algebra, amenable to BLAS/LAPACK acceleration or parallelization (Wang et al., 2015).

Gauss-Seidel and Nonconvex Thresholding

The STLS/GAITA procedure tackles nonconvex q\ell_q-regularized regression via cyclic coordinate updates: xin+1=proxμ,λq(zin),zin=xinμAiT(Axny),x_i^{n+1} = \mathrm{prox}_{\mu, \lambda |\cdot|^q}(z_i^n), \quad z_i^n = x_i^n - \mu A_i^T(Ax^n - y), where the prox operator sets xin+1=0x_i^{n+1}=0 if zinτμ,q|z_i^n| \leq \tau_{\mu, q} and otherwise solves v+λμqsgn(v)vq1=zv + \lambda \mu q\, \mathrm{sgn}(v)\,|v|^{q-1} = z (Zeng et al., 2015). This allows for larger step sizes and faster convergence compared to full Jacobi updates.

STLS in Practical Settings

  • SINDy and Dictionary Learning: STLS is a foundational solver for sparse identification of dynamical systems, often referred to as "SINDy," alternating LS and hard thresholding, with clear links to score-based screening and dictionary selection (Cho et al., 16 Dec 2025).
  • Broad Learning Systems: STLS replaces dense pseudo-inverse solutions in BLS with iterative prune-and-refit cycles, promoting noise robustness and weight sparsity (Li, 22 Nov 2025).
  • Bootstrap Aggregation: Ensemble STLS via replicate thresholded LS and inclusion probability aggregation enhances variable selection reliability, uncertainty quantification, and robustness to hyper-parameters (Gao et al., 2023).

3. Theoretical Guarantees and Properties

STLS methods enjoy strong theoretical support under mild assumptions:

  • Descent and Local Convergence: The sequence of objective values is nonincreasing and converges to a local minimizer. For 0\ell_0-penalized LS, STLS provides monotonic objective descent; for nonconvex settings, support and sign patterns stabilize in finite steps, and global convergence is guaranteed via Kurdyka-Łojasiewicz theory (Zeng et al., 2015, Cho et al., 16 Dec 2025).
  • Support Recovery and Error Bounds: In high-dimensional regimes, STLS achieves with high probability:

    • Exact support recovery under signal separation (strong–weak coefficient gap).
    • Max-norm error bounds

    β^βCσlogpnα\|\hat\beta - \beta\|_\infty \leq C \sigma \sqrt{\frac{\log p}{n^\alpha}} - Oracle properties: For bootstrap-ensemble STLS, the false discovery and true discovery probabilities for support selection improve exponentially in nn under standard eigenvalue and signal strength assumptions (Gao et al., 2023).

  • Connection to Projection Scores: The thresholded coefficients in the first sweep align with changes in reconstruction error under column removal (projection score), allowing interpretable screening and guiding dictionary selection (Cho et al., 16 Dec 2025).

4. Parameter Selection and Practical Recommendations

Algorithmic performance hinges on key parameters:

  • Threshold (λ\lambda or τ\tau): Main lever for sparsity. Can be selected via cross-validation, Pareto curve “knee” in projection scores, or theoretical guidance tied to noise level.
  • Submodel size (dd in 3-stage STLS): Chosen as O(n)O(n) or via extended BIC.
  • Error-level (δ\delta): Controls type I error and appears inside logarithmic thresholds; set proportionally to $1/p$ or by CV.
  • Step-size (μ\mu in coordinate algorithms): For Gauss-Seidel, as large as 0.95/maxiAi20.95/\max_i\|A_i\|^2 for maximal progress per iteration (Zeng et al., 2015).

Table: Example Choices and Implications

Parameter Typical Value Effect
λ\lambda, τ\tau Grid/CV or score "knee" Controls sparsity, sensitivity to noise
dd $0.3 n$ or BIC Submodel width, trades FDR/TPR
δ\delta $0.5$ or $1/p$ Type I error in screening
TT (iterations) 5–10 Sufficient for support stabilization
μ\mu $0.9$–0.95/maxiAi20.95/\max_i\|A_i\|^2 Ensures fast, stable Gauss-Seidel convergence

5. Applications and Empirical Performance

STLS-style methods are applied extensively in:

  • System Identification and Control: Including SINDy for discovering governing ODE/PDEs, where both vanilla and projection-score variants improve screening accuracy and interpretability (Cho et al., 16 Dec 2025).
  • Nonlinear and Noisy System Modeling: In broad learning systems, STLS enhances robustness to sensor noise and outliers by promoting sparsity in the output weights, consistently reducing RMSE and achieving 50–70% sparsity in active connections (Li, 22 Nov 2025).
  • Sparse Linear Regression: Empirical comparisons show bootstrapped STLS outperforms LASSO and standard thresholding in both true/false discovery metrics and in low-sample, high-noise regimes (Gao et al., 2023).

Empirical results indicate rapid support stabilization (few iterations), computational efficiency (dense matrix algebra accelerable and parallelizable), and competitive or superior accuracy versus classical penalized models.

6. Advanced Perspectives: Extensions and Connections

Recent work has connected and generalized STLS:

  • Score-Guided Screening: Projected reconstruction error (“score”) methods inspired by STLS are now advocated for initial dictionary/pruning stages, reducing LS subproblem sizes and improving interpretability without sacrificing support accuracy (Cho et al., 16 Dec 2025).
  • Nonconvex and Block Thresholding: STLS extends naturally to nonconvex q\ell_q penalties (q<1q<1), and block/sequential variants (coordinate, group, or blockwise updates) for structured sparsity (Zeng et al., 2015).
  • Ensemble and Bayesian Comparisons: Ensemble STLS (via subsample/aggregate) achieves computationally efficient, provably valid uncertainty quantification, matching asymptotic coverage properties of full Bayesian MCMC but at orders-of-magnitude lower cost (Gao et al., 2023).

A plausible implication is that projection-screened STLS pipelines may become dominant in high-dimensional system identification due to their strong theoretical guarantees, practical interpretability, and computational tractability.

7. Limitations, Open Directions, and Interpretability

STLS, while powerful, has known limitations and areas of ongoing development:

  • Nonconvexity: Convergence is generally to a local minimizer, not guaranteed globally optimal solution. Careful coefficient separation (signal gap) or projection-score screening mitigates false exclusions.
  • Parameter Sensitivity: Performance can be sensitive to threshold choices (λ\lambda, δ\delta), though ensemble and score-screening approaches improve robustness.
  • Correlation and Coherence: High mutual coherence among predictors can degrade screening; projection-score screening is proposed to relieve this issue (Cho et al., 16 Dec 2025).
  • Interpretability: Score and support patterns motivated by STLS iterations enable model interpretability and variable selection transparency, with clear links to projection-based model diagnostics.

Active areas of research include principled adaptive threshold selection, theory for support stability under model misspecification, and projection-based refinements tailored for structured or partially observed system identification.


References:

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Sequential Thresholding Least Squares (STLS).