Sequential Threshold Least-Squares Algorithm

Updated 29 November 2025

STLS is a sparse regression method that iteratively prunes low-magnitude coefficients to retain only robust model parameters.
By alternating hard thresholding and least-squares projection, it enhances noise resistance and improves interpretability in Broad Learning Systems.
Empirical studies demonstrate that STLS reduces RMSE by 20–35% while pruning up to 70% of nodes in high-noise industrial applications.

The Sequential Threshold Least-Squares (STLS) algorithm is a sparse regression method designed to identify informative model parameters while removing connections dominated by noise. In the context of the Sparse Broad Learning System (S-BLS), STLS is used to solve for the output weights of the Broad Learning System (BLS) by iteratively pruning small coefficients and re-solving the least-squares problem over the active (unpruned) set. This approach enhances robustness to noise and improves interpretability through weight sparsity, making it particularly effective for nonlinear system identification tasks in industrial environments subject to substantial measurement noise (Li, 22 Nov 2025).

1. Mathematical Formulation of Sequential Threshold Least-Squares

STLS seeks a sparse solution for the weight matrix $W\in\mathbb{R}^{L\times C}$ that links the system matrix $A\in\mathbb{R}^{N\times L}$ to output targets $Y\in\mathbb{R}^{N\times C}$ through an objective function:

$\min_{W}\;\mathcal{J}(W) = \tfrac12\|Y - A\,W\|_F^2 + \alpha\,\|W\|_0$

Here, $\|W\|_0$ denotes the number of nonzero elements in $W$ , and $\alpha > 0$ controls the trade-off between prediction fidelity and model sparsity. This $\ell_0$ -penalized least-squares formulation encourages the elimination of connections manifesting as low-magnitude coefficients, directly targeting nodes that contribute little to robust signal reconstruction.

STLS performs an alternating sequence of two key steps:

Hard Thresholding (Pruning): Using a parameter $\lambda$ derived from $\alpha$ , entries of $W$ with magnitude $<\lambda$ are set to zero (pruned), maintaining only elements above threshold.
Least-Squares Projection: With the reduced set of retained parameters $\mathcal{S}^{(t)}$ , a constrained least-squares update is performed for each output dimension.

2. Algorithmic Workflow and Pseudocode

The implementation cycle for STLS within S-BLS involves:

Input: 
• X∈ℝ^(N×D), Y∈ℝ^(N×C)
• BLS hyper-parameters n, m (feature/enhancement groups)
• Threshold λ, max iterations T
Output: 
• Sparse output weight W∈ℝ^(L×C)

1. Initialize feature/enhancement weights and biases.
2. Compose mapped features: Zⁿ = [φ(XW_{e_i} + β_{e_i})]
3. Compose enhancement nodes: H^m = [ξ(ZⁿW_{h_j} + β_{h_j})]
4. Concatenate node matrix: A = [Zⁿ | H^m]
5. Compute initial dense weights: W^(0) = A† Y
6. For t = 1 to T:
    a) Hard Thresholding: 
        W_{i,d}^{(t)} = 0 if |W_{i,d}^{(t-1)}| < λ
    b) Least-Squares Projection:
        For each output d, restrict A to surviving support S, update W with unconstrained least-squares.
7. Return final sparse weights W^(T).

This procedure ensures that noise-dominated nodes are physically excluded, and only meaningful feature/enhancement groups are retained in the output.

3. Threshold Selection and Pruning Dynamics

The truncation parameter $\lambda$ determines the minimal magnitude required for coefficients to survive each iteration. Cross-validation is typically employed to set $\lambda$ , given its relationship to the sparsity regularization parameter $\alpha$ . At each hard-thresholding step, weights below $\lambda$ are zeroed, removing associated nodes from the model architecture. This process yields a sparse support structure, with systematically excluded nodes no longer participating in model updates. The result is a compact node set focused on noise-resistant system identification, with sparsity levels controlled via $\lambda$ selection mechanisms.

4. Computational Complexity and Convergence Properties

The computational profile of STLS-BLS is characterized by:

Algorithm	Core Operations	Complexity
Standard BLS	Single pseudoinverse ( $L \times L$ )	$\mathcal{O}(NL^2 + L^3)$
Lasso-BLS	$K_{\mathrm{lasso}}$ iterations	$\mathcal{O}(K_{\mathrm{lasso}} N L)$
STLS-BLS	Initialization + $T$ sparse updates	$\mathcal{O}(NL^2+L^3+\sum_t N L_\text{active}^2)$

STLS requires initializing with a standard dense pseudoinverse computation, followed by iterative thresholding ( $\mathcal{O}(LC)$ ) and least-squares updates restricted to the active node subset ( $L_{\rm active}\ll L$ ). Because $T$ (the number of iterations) is low and $L_{\rm active}$ decreases sharply, STLS training time is comparable to standard BLS and significantly less than Lasso-BLS. The support set empirically stabilizes in few iterations, without oscillatory behavior, yielding bounded and predictable training cycles suitable for deployment in real-time or low-latency environments.

5. Integration with Broad Learning Systems

Standard BLS output weight estimation uses dense ridge regression, typified by:

$W = (A^T A + \lambda_{\text{ridge}} I)^{-1} A^T Y$

This yields dense connections for all nodes. In contrast, S-BLS incorporates STLS by replacing the single regression step with the iterative sparsification loop. The system matrix $A = [Z^n|H^m]$ remains, but the output weights $W$ undergo successive pruning and projection cycles. Irrelevant or noise-dominated nodes are physically removed, resulting in a sparser $W$ and a more interpretable and noise-resilient architecture. The ultrafast training emphasis of BLS is retained due to the efficiency of the STLS update cycle.

6. Empirical Results and Practical Implications

Experimental evaluations on nonlinear system identification and a noisy Continuous Stirred Tank Reactor (CSTR) highlight STLS’s effectiveness. With uniform noise levels up to 40%, standard BLS exhibits increased mean-square errors and overfits noise. S-BLS, employing STLS, consistently reduces RMSE by 20–35% versus the dense baseline, while pruning approximately half the feature and enhancement nodes. On the CSTR benchmark, S-BLS achieves 70% reduction in active nodes, yet maintains precise tracking of system dynamics by filtering out outlier contributions.

These findings demonstrate that STLS provides robust sparsity, computational efficiency, and enhanced modeling accuracy in challenging noise conditions, confirming its suitability for industrial nonlinear system identification where interpretability and resistance to sensor noise are critical (Li, 22 Nov 2025).

Markdown Upgrade to Chat

References (1)

Sparse Broad Learning System via Sequential Threshold Least-Squares for Nonlinear System Identification under Noise (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Sequential Threshold Least-Squares (STLS) Algorithm.