Sequential Thresholding Least Squares (STLS)
- Sequential Thresholding Least Squares (STLS) is a sparse regression method that alternates between least squares estimation and hard thresholding to prune insignificant coefficients.
- It employs various algorithmic strategies, including greedy coordinate descent and multi-stage refinements, to achieve an optimal balance between accuracy and sparsity.
- STLS underpins applications in system identification, robust regression, and nonlinear model discovery with strong theoretical and computational guarantees.
Sequential Thresholding Least Squares (STLS) is a data-driven algorithmic framework for sparse regression, variable selection, and model identification in high-dimensional settings. It operates by alternating least-squares estimation and hard thresholding to prune weak coefficients, thus solving sparse, often nonconvex, least-squares problems. STLS covers several algorithmic incarnations—greedy coordinate descent with thresholding, multi-stage LS + pruning procedures, and iterative hard thresholding in both and nonconvex -penalized contexts. The method is fundamental to many modern approaches for sparse system identification, robust regression, and interpretable nonlinear model discovery, particularly in the context of SINDy-type algorithms.
1. Formulations and Algorithmic Structure
The core problem attacked by STLS is the -penalized or thresholded least-squares: where counts nonzero elements, (often ), and . This problem is nonconvex and combinatorial; STLS approximates it by iteratively alternating between:
- Least-squares estimation on an "active" set (features with large current coefficients).
- Hard thresholding: setting coefficients below a threshold to zero.
This split is expressed as:
- Support Update:
- Least Squares Step:
This process repeats until the support stabilizes. The method generalizes to include nonconvex penalties for $0 < q < 1$ and can be equipped with coordinate-descent or Gauss-Seidel updating to further enhance practical convergence (Cho et al., 16 Dec 2025, Zeng et al., 2015).
2. Key Algorithmic Variants and Implementation
Three-Stage STLS for High-Dimensional Models
The three-stage LAT/RAT algorithm (Wang et al., 2015)—now widely referred to as a canonical STLS scheme—comprises:
- Stage 1: High-dimensional OLS (pre-selection)
with small for stability. Select the top features by .
- Stage 2: Restricted LS + Hard Thresholding
Threshold at , with
and the hard thresholding operator.
- Stage 3: Final LS Refinement Refit OLS on the remaining support.
The method is computationally efficient: initial LS fit is , subsequent steps are when , and all operations involve dense linear algebra, amenable to BLAS/LAPACK acceleration or parallelization (Wang et al., 2015).
Gauss-Seidel and Nonconvex Thresholding
The STLS/GAITA procedure tackles nonconvex -regularized regression via cyclic coordinate updates: where the prox operator sets if and otherwise solves (Zeng et al., 2015). This allows for larger step sizes and faster convergence compared to full Jacobi updates.
STLS in Practical Settings
- SINDy and Dictionary Learning: STLS is a foundational solver for sparse identification of dynamical systems, often referred to as "SINDy," alternating LS and hard thresholding, with clear links to score-based screening and dictionary selection (Cho et al., 16 Dec 2025).
- Broad Learning Systems: STLS replaces dense pseudo-inverse solutions in BLS with iterative prune-and-refit cycles, promoting noise robustness and weight sparsity (Li, 22 Nov 2025).
- Bootstrap Aggregation: Ensemble STLS via replicate thresholded LS and inclusion probability aggregation enhances variable selection reliability, uncertainty quantification, and robustness to hyper-parameters (Gao et al., 2023).
3. Theoretical Guarantees and Properties
STLS methods enjoy strong theoretical support under mild assumptions:
- Descent and Local Convergence: The sequence of objective values is nonincreasing and converges to a local minimizer. For -penalized LS, STLS provides monotonic objective descent; for nonconvex settings, support and sign patterns stabilize in finite steps, and global convergence is guaranteed via Kurdyka-Łojasiewicz theory (Zeng et al., 2015, Cho et al., 16 Dec 2025).
- Support Recovery and Error Bounds: In high-dimensional regimes, STLS achieves with high probability:
- Exact support recovery under signal separation (strong–weak coefficient gap).
- Max-norm error bounds
- Oracle properties: For bootstrap-ensemble STLS, the false discovery and true discovery probabilities for support selection improve exponentially in under standard eigenvalue and signal strength assumptions (Gao et al., 2023).
- Connection to Projection Scores: The thresholded coefficients in the first sweep align with changes in reconstruction error under column removal (projection score), allowing interpretable screening and guiding dictionary selection (Cho et al., 16 Dec 2025).
4. Parameter Selection and Practical Recommendations
Algorithmic performance hinges on key parameters:
- Threshold ( or ): Main lever for sparsity. Can be selected via cross-validation, Pareto curve “knee” in projection scores, or theoretical guidance tied to noise level.
- Submodel size ( in 3-stage STLS): Chosen as or via extended BIC.
- Error-level (): Controls type I error and appears inside logarithmic thresholds; set proportionally to $1/p$ or by CV.
- Step-size ( in coordinate algorithms): For Gauss-Seidel, as large as for maximal progress per iteration (Zeng et al., 2015).
Table: Example Choices and Implications
| Parameter | Typical Value | Effect |
|---|---|---|
| , | Grid/CV or score "knee" | Controls sparsity, sensitivity to noise |
| $0.3 n$ or BIC | Submodel width, trades FDR/TPR | |
| $0.5$ or $1/p$ | Type I error in screening | |
| (iterations) | 5–10 | Sufficient for support stabilization |
| $0.9$– | Ensures fast, stable Gauss-Seidel convergence |
5. Applications and Empirical Performance
STLS-style methods are applied extensively in:
- System Identification and Control: Including SINDy for discovering governing ODE/PDEs, where both vanilla and projection-score variants improve screening accuracy and interpretability (Cho et al., 16 Dec 2025).
- Nonlinear and Noisy System Modeling: In broad learning systems, STLS enhances robustness to sensor noise and outliers by promoting sparsity in the output weights, consistently reducing RMSE and achieving 50–70% sparsity in active connections (Li, 22 Nov 2025).
- Sparse Linear Regression: Empirical comparisons show bootstrapped STLS outperforms LASSO and standard thresholding in both true/false discovery metrics and in low-sample, high-noise regimes (Gao et al., 2023).
Empirical results indicate rapid support stabilization (few iterations), computational efficiency (dense matrix algebra accelerable and parallelizable), and competitive or superior accuracy versus classical penalized models.
6. Advanced Perspectives: Extensions and Connections
Recent work has connected and generalized STLS:
- Score-Guided Screening: Projected reconstruction error (“score”) methods inspired by STLS are now advocated for initial dictionary/pruning stages, reducing LS subproblem sizes and improving interpretability without sacrificing support accuracy (Cho et al., 16 Dec 2025).
- Nonconvex and Block Thresholding: STLS extends naturally to nonconvex penalties (), and block/sequential variants (coordinate, group, or blockwise updates) for structured sparsity (Zeng et al., 2015).
- Ensemble and Bayesian Comparisons: Ensemble STLS (via subsample/aggregate) achieves computationally efficient, provably valid uncertainty quantification, matching asymptotic coverage properties of full Bayesian MCMC but at orders-of-magnitude lower cost (Gao et al., 2023).
A plausible implication is that projection-screened STLS pipelines may become dominant in high-dimensional system identification due to their strong theoretical guarantees, practical interpretability, and computational tractability.
7. Limitations, Open Directions, and Interpretability
STLS, while powerful, has known limitations and areas of ongoing development:
- Nonconvexity: Convergence is generally to a local minimizer, not guaranteed globally optimal solution. Careful coefficient separation (signal gap) or projection-score screening mitigates false exclusions.
- Parameter Sensitivity: Performance can be sensitive to threshold choices (, ), though ensemble and score-screening approaches improve robustness.
- Correlation and Coherence: High mutual coherence among predictors can degrade screening; projection-score screening is proposed to relieve this issue (Cho et al., 16 Dec 2025).
- Interpretability: Score and support patterns motivated by STLS iterations enable model interpretability and variable selection transparency, with clear links to projection-based model diagnostics.
Active areas of research include principled adaptive threshold selection, theory for support stability under model misspecification, and projection-based refinements tailored for structured or partially observed system identification.
References: