Concatenated Recursive Least Squares

Updated 14 September 2025

CRLS is a framework that integrates multiple RLS estimators to model heterogeneous systems through distinct regularization and forgetting mechanisms.
It extends traditional RLS by partitioning parameter spaces, enabling efficient adaptation in multi-task learning, matrix estimation, and sparse representations.
CRLS methods offer scalable computational efficiencies and improved robustness in practical applications such as adaptive filtering, deep learning, and control systems.

Concatenated Recursive Least Squares (CRLS) encompasses a class of algorithms and architectures in which multiple recursive least squares (RLS) estimators—often with distinct structure, regularization, or forgetting mechanisms—are integrated, stacked, or contemporaneously applied to partitioned or structured parameter spaces. These methods are especially relevant when modeling evolving systems with heterogeneous dynamics, multi-task learning settings, sparse representations, or matrix-valued parameter estimation, where a single RLS update is insufficient to capture the nuances of the underlying processes. The CRLS framework synthesizes approaches such as multi-block and multi-forgetting RLS, structured matrix identification, recursive kernel dictionary learning, and the use of RLS in deep learning optimization and multi-agent systems.

1. Mathematical Foundations and Algorithmic Structures

CRLS techniques build upon the canonical RLS update, which recursively minimizes a least squares cost function with exponential or directional forgetting. For a parameter vector $\theta$ , the standard RLS update rule is

$\begin{aligned} P_{t+1}^{-1} &= \lambda^{-1} [P_t^{-1} + \phi_t \phi_t^\top], \ \theta_{t+1} &= \theta_t + P_{t+1} \phi_t (y_t - \phi_t^\top \theta_t) \end{aligned}$

where $P_t$ is the inverse covariance (information) matrix, $\phi_t$ is the regressor, $y_t$ the observation, and $\lambda$ a forgetting factor.

CRLS extends this by applying multiple, potentially parallel or partitioned RLS estimators to blocks or groups of parameters. In block-structured settings, such as matrix parameter estimation, each column (or block) $\theta_j$ is updated independently:

$P_{t+1,j}^{-1} = P_{t,j}^{-1} + \phi_{t,j}^{(\Gamma_t)} \phi_{t,j}, \ \theta_{t+1,j} = \theta_{t,j} + P_{t+1,j} \phi_{t,j}^{(\Gamma_t)} (y_{t,j} - \phi_{t,j} \theta_{t,j}),$

subject to persistent excitation conditions for each block (Lai et al., 16 Apr 2024).

When model parameters evolve at different rates, the CRLS methodology can assign multiple forgetting factors:

$V(\theta, t) = \sum_{s=1}^t \lambda^{(t-s)} (y(s) - \phi(s)^\top \theta)^2 + (\theta - \theta_{t-1})^\top F_\lambda(R_{t-1}) (\theta - \theta_{t-1}),$

where $F_\lambda$ is a diagonal, tuned-correlated, or cubic spline forgetting map, allowing the CRLS update to modulate adaptation depth per parameter or group (Fraccaroli et al., 2015).

For structured multi-task learning, parameter stacking is critical. Defining $w = [w_1^\top, w_2^\top, ..., w_T^\top]^\top$ for $T$ tasks and block-diagonal input matrices enables simultaneous recursive updates:

$w_t = \Phi_t^{-1} \Psi_t,$

where $\Phi_t$ incorporates both data correlation and inter-task regularization via graph-based matrices, yielding quadratic computational complexity in the number of tasks (Lencione et al., 2023).

2. Specialized Recursive Architectures: Matrix, Kernel, and Dictionary Updates

Traditional RLS methods require vectorization (vec) and Kronecker products for matrix-valued parameter estimation; this introduces substantial computational and memory overhead. CRLS remedies this by preserving matrix structure, allowing per-block or per-column RLS updates with independent or correlated weighting:

$J_k(\hat{\theta}) = \mathrm{tr}\left[\sum_{i=0}^k (y_i - \phi_i \hat{\theta})^{(\Gamma_i)} (y_i - \phi_i \hat{\theta}) + (\hat{\theta} - \theta_0)^{(R)} (\hat{\theta} - \theta_0)\right],$

where the recursive solution operates directly in the matrix domain (Lai et al., 16 Apr 2024). In kernelized settings, like Kernel RLS Dictionary Learning, the recursive update leverages the matrix inversion lemma to efficiently update the dictionary matrix $C$ in reproducing kernel Hilbert space (RKHS):

$C_{i+1}^{-1} = \lambda C_i^{-1} + w w^\top, \ C_{i+1} = \lambda^{-1} (C_i - C_i w (\lambda I + w^\top C_i w)^{-1} w^\top C_i),$

where $w$ represents sparse coefficients and $\lambda$ the forgetting factor (Alipoor et al., 2 Jul 2025). Such methods ensure fast convergence and tracking, especially when combined with profile abstraction and pruning to maintain computational tractability.

3. Regularization, Sparsity, and Blockwise Adaptation

Integration of sparse regularization and directional forgetting into CRLS is exemplified by the SPARLS algorithm (0901.0734), where an $\ell_1$ -regularized cost is recursively minimized:

$J(\hat{w}(n)) = \frac{1}{2\sigma^2} \| D^{1/2}(n)d(n) - D^{1/2}(n)X(n)\hat{w}(n) \|_2^2 + \gamma \| \hat{w}(n) \|_1,$

with recursive updates only on active support sets. EM-type iterations and soft-thresholding are used for efficient online optimization, yielding MSE improvements and computational reductions relative to classical RLS in sparse environments.

Dictionary learning and kernel-based CRLS methods employ similar regularization and block updates. For evolving systems, combinations of $\ell_1$ regularization and directional forgetting factors can be applied per block or per dictionary atom to mitigate overfitting, promote sparsity, and enhance robustness.

4. Scaling Properties, Computational Complexity, and Numerical Stability

CRLS approaches are distinguished by their scaling behavior. In scenarios with high-dimensional or structured regression, the complexity is governed by the block sizes and the structure of forgetting factors or regularization maps. For matrix RLS in MIMO identification, computational and space complexity is reduced by factors of $O(m^3)$ and $O(m^2)$ relative to vec-permutation-based RLS (Lai et al., 16 Apr 2024).

In multi-task stacked WRLS, complexity per update is $O(d^2 T^2)$ , with $d$ the per-task feature dimension and $T$ the number of tasks (Lencione et al., 2023). Kernel dictionary learning with profile abstraction maintains $O(L^2)$ complexity, where $L$ is the profile size (Alipoor et al., 2 Jul 2025).

In rank-deficient RLS problems, the rank-Greville update maintains computations on $r \times r$ matrices, yielding $O(mr)$ per sample, thus outperforming LAPACK solvers for $r \ll m$ (Staub et al., 2021). However, issues of numerical stability persist, particularly in ill-conditioned regimes or when thresholds for independence must be adaptively managed.

5. Practical Applications: Control, Signal Processing, Deep Learning, and Beyond

CRLS methods have broad application domains:

Adaptive Filtering and Sparse Channel Estimation: SPARLS and block-sparse CRLS enable reliable, low-complexity online estimation in communication systems with sparse impulse responses (0901.0734).
Multi-Task Regression and Forecasting: Online multi-task WRLS and recursive kernel methods demonstrate improved error rates and tracking fidelity in wind speed prediction when leveraging inter-task regularization and task stacking (Lencione et al., 2023).
Deep Reinforcement Learning: RLS-based actor-critic algorithms (RLSSA2C, RLSNA2C) integrate CRLS updates in hidden and output layers, achieving superior sample efficiency in both discrete (Atari) and continuous (MuJoCo) control tasks through per-layer recursive optimization and natural gradient adaptations (Wang et al., 2022).
Neural Network Optimization: Average-approximation RLS and equivalent gradient methods allow CRLS-based training of DNNs, CNNs, and RNNs with layerwise recursive adaptation, regularization, and momentum (Zhang et al., 2021).
Matrix System Identification: Efficient matrix RLS formulations are critical for indirect adaptive MPC, dramatically reducing identification time for MIMO systems in real-time control (Lai et al., 16 Apr 2024).
Array Pattern Synthesis: Recursive LS methods for concentric ring arrays enable high-accuracy, low-iteration pattern synthesis in antenna engineering (Akbari-Bardaskan, 2023).
Echo State Networks in RL: Recursive mean-approximation LS updates in ESN-RLS methods improve policy learning convergence under strong temporal correlations (Zhang et al., 2022).

6. Extensions, Limitations, and Unifying Principles

CRLS admits substantial extensibility. Multi-forgetting architectures are natural for systems with partitioned dynamics and time-varying rates (Fraccaroli et al., 2015). Kernelized CRLS and matrix-structured adaptations facilitate nonlinear learning and high-dimensional system identification (Alipoor et al., 2 Jul 2025, Lencione et al., 2023). The unification of RLS and Kalman filtering cost structures in adaptive settings (with flexible prior covariance updates) underscores the universality of CRLS principles in filtering and robust estimation, especially in the presence of non-classical disturbances (Lai et al., 16 Apr 2024).

Limitations include sensitivity to model mis-specification, convergence dependence on persistent excitation within each block or task, and numerical stability challenges in highly rank-deficient or ill-conditioned regimes. Algorithmic overhead in profile management or regularization tuning must be addressed for scalability.

A plausible implication is that CRLS frameworks will continue to evolve, integrating adaptive, structured, and kernelized updates, further enhancing real-time adaptation, sample efficiency, and robustness in complex systems. The versatility of the concatenated paradigm allows for both theoretical unification and practical specialization, making it central to modern adaptive learning and estimation.