Recursive Residual Decomposition for Time Series

Updated 28 January 2026

Recursive residual decomposition is an adaptive method that iteratively separates a time series into interpretable components, trends, and residual noise.
It employs diverse techniques including deep CNNs (e.g., IRCNN+), hybrid linear-nonlinear models (LiNo), and modal approaches (RDMD) for robust extraction.
Empirical results show significant error reductions and computational efficiency improvements over classical methods in nonstationary data.

Recursive residual decomposition for time series refers to a class of methods that decompose a signal or temporal sequence by recursively extracting, modeling, or removing components—be they physical modes, noise, trends, seasonality, or learned nonlinearities—leaving a residual that itself is further decomposed in subsequent steps. These approaches offer adaptive, data-driven alternatives to traditional time-frequency or modal decompositions, improving robustness, physical interpretability, and computational efficiency in complex real-world time series.

1. Mathematical Foundation and General Framework

Recursive residual decomposition (RRD) is conceptually grounded in iteratively partitioning the original signal $X \in \mathbb{R}^N$ into a sum of $M$ distinct components plus a residual:

$X = \sum_{m=1}^M Y_m + R_M,$

where $Y_m$ are extracted modes or components at recursion step $m$ and $R_M$ is the remaining residual after $M$ steps (Zhou et al., 2023). Each recursion step seeks to model or remove a component from the current residual, so

$R^{(m)} = R^{(m-1)} - \hat{Y}_m,$

with $\hat{Y}_m$ estimated by an extraction operator (e.g., CNN, DMD, AR, learned block) parameterized by signal content and/or residual history.

The core idea is instantiated in various algorithmic forms:

Adaptive convolutional blocks (RRCNN family, LiNo, IRCNN $^+$ ) recursively predict and subtract learned component structure from the residual (Zhou et al., 2023, Zhou et al., 2023, Yu et al., 2024).
Modal decomposition (RDMD) builds an expansion of orthogonal, frequency-pure modes from residuals recursively (Noack et al., 2015).
Hybrid frameworks (KODA, LiNo, OnlineSTL) alternate between domain-specific extraction (e.g., linear/Koopman for regular patterns, neural/cnn for local dynamics) and residual learning (Singh et al., 2024, Yu et al., 2024, Mishra et al., 2021).
Memory-based recursion (VARNN) fuses a running memory of residuals into prediction to adapt to local temporal deviations (Gharwi et al., 10 Oct 2025).

2. Methodological Variants

2.1 Deep Learning Approaches: RRCNN and IRCNN $M$ 0

Both RRCNN (Zhou et al., 2023) and its extension IRCNN $M$ 1 (Zhou et al., 2023) implement recursive signal decomposition using cascaded convolutional neural networks, with each recursion responsible for extracting an intrinsic mode function (IMF) analog.

Each outer iteration processes the current residual, predicting a component via a small recursive CNN block, optionally with multi-scale convolutions and attention (in IRCNN $M$ 2).
The residual is updated by subtracting this component, and the process repeats.
A total variation denoising (TVD) postprocessing step enforces physical smoothness:

$M$ 3

Loss functions employ supervised MSE on predicted components, often without explicit smoothness penalties due to TVD postprocessing.

IRCNN $M$ 4 further stabilizes decomposition through multi-scale/attention modules, enabling efficient and rapid batch decomposition and improved mode separation under nonstationarity.

2.2 Hybrid and Multi-Block Strategies: LiNo

LiNo demonstrates a deep RRD mechanism alternating between a linear extractor (Li) and a nonlinear extractor (No) (Yu et al., 2024):

Linear block can be a fixed or learnable moving average or a generalized autoregressive map.
Nonlinear block can leverage Transformer encoders with multi-head self-attention, as well as time/frequency/channel projections and fusions.
At each level, Li and No blocks alternately subtract their predicted features from the current residual, with per-block forecast heads contributing to the cumulative prediction.
The residual is propagated to deeper levels, enabling finer pattern disentanglement and ultimately reducing the residual to white noise as $M$ 5.

In ablation studies, each block is shown to play a critical role: omitting the Li (linear) block increases MSE by 10% (average, multivariate), omitting the No (nonlinear) block leads to catastrophic MSE increase ( $M$ 6), and deeper recursion provides measurable benefits.

Recursive Dynamic Mode Decomposition (RDMD) generates orthogonal, frequency-pure modes by recursively extracting the component that minimizes residual energy (Noack et al., 2015):

At each recursion, DMD is run on the residual, and the candidate mode that yields the lowest averaged truncation error is selected.
The result is a set of modes with unentangled frequencies, approaching the reconstruction error of proper orthogonal decomposition (POD) but avoiding mode mixing and facilitating spectral clarity.

KODA (Singh et al., 2024) decomposes time series into "physical" (slow, globally regular, Koopman-evolvable) and "residual" (local, time-varying) components by hard frequency-domain separation:

The physical part is advanced by a learned Koopman operator.
The residual dynamic is modeled recursively, typically with a trainable GRU.
Data assimilation is implemented by a neural EKF analogue, allowing assimilation of noisy observations into both branches at inference.

2.4 Classical and Online Filtering: OnlineSTL

OnlineSTL (Mishra et al., 2021) enables recursive, streaming decomposition into trend, seasonal, and remainder components via:

One-sided kernel smoothing for trends (tri-cube filter).
Exponential smoothing for seasonality.
Immediate update of each new residual enables high-throughput, low-latency decomposition suitable for real-time monitoring.

This approach demonstrates the breadth of the RRD principle, spanning deep learning and classical online filters.

3. Training, Losses, and Optimization

Most deep RRD methods use supervised MSE losses for extracted components or cumulative predictions, with optional regularization:

$M$ 7

TVD regularization or instance normalization (LiNo) is applied to encourage smoothness or stabilize feature distributions (Zhou et al., 2023, Yu et al., 2024).
Orthogonality constraints can be imposed on components (e.g., via SVD projections in RRCNN) to approach the behavior of modal decompositions (Zhou et al., 2023).
Optimization is performed via Adam or SGD, with standard learning rate decay and weight decay.

Classical/online methods calibrate smoothing parameters (kernel width, adaptation rate $M$ 8) for bias-variance tradeoff.

4. Empirical Performance and Comparative Analysis

Model	Key Strengths	Limitations/Failure Modes
IRCNN $M$ 9	Robust IMF separation, minimal mode-mixing, real-time batch processing	Not inherently orthogonal components
LiNo	Simultaneous linear+nonlinear extraction, SOTA multi-variate scores	Recursion depth must be tuned
RDMD	Frequency-pure, orthogonal modes, low truncation error	High computational cost for large $X = \sum_{m=1}^M Y_m + R_M,$ 0, not online
OnlineSTL	Real-time, online decomposition, streaming data	Assumes additivity of trends/seasons, no neural modeling
KODA	Koopman+residual recursion, data assimilation for nonstationary NLDS	Relies on quality of frequency masking
VARNN	Drift/volatility adaptation via error memory	Only recalibrates prediction, not explicit decomposition

IRCNN $X = \sum_{m=1}^M Y_m + R_M,$ 1 achieves MAE $X = \sum_{m=1}^M Y_m + R_M,$ 2, RMSE $X = \sum_{m=1}^M Y_m + R_M,$ 3 in synthetic two-mode mixtures, improving RMSE by $X = \sum_{m=1}^M Y_m + R_M,$ 440% over RRCNN and outperforms classical EMD/VMD in mode-mixing and boundary effects (Zhou et al., 2023).
LiNo reduces overall MSE by $X = \sum_{m=1}^M Y_m + R_M,$ 5 (multivariate) and $X = \sum_{m=1}^M Y_m + R_M,$ 6 (univariate) vs strong Transformer and MLP baselines across 13 datasets (Yu et al., 2024).
RDMD’s maximal time-averaged truncation error is typically lower than DMD and approaches that of POD while yielding pure modes (Noack et al., 2015).
OnlineSTL processes each new observation in $X = \sum_{m=1}^M Y_m + R_M,$ 7 time with $X = \sum_{m=1}^M Y_m + R_M,$ 8 speedup over batch STL, competitive trend/remainder separation (Mishra et al., 2021).
KODA matches or outperforms state-of-the-art neural forecasters on long-horizon benchmarks, demonstrating robust assimilation and prediction in multivariate physical and simulated NLDS (Singh et al., 2024).

5. Connections to Classical Methods and Theoretical Implications

Recursive residual decomposition generalizes and extends several longstanding paradigms:

Empirical mode decomposition (EMD), variational mode decomposition (VMD), and iterative filtering are special cases of recursive subtraction with particular extraction rules; deep RRD methods supersede these with adaptive, learnable operators (Zhou et al., 2023, Zhou et al., 2023).
In ARIMA and Kalman filters, forecast error is recursed implicitly (ARIMA: backshift operator; Kalman: innovation filter), but RRD replaces these with nonlinear, flexible mechanisms—e.g., learned memory state updates in VARNN or GRUs in KODA—accommodating volatility and nonlinearity (Gharwi et al., 10 Oct 2025, Singh et al., 2024).
Modal approaches (DMD/RDMD) emphasize spectral purity and orthogonality, achievable via RRD by enforcing orthogonal projections post hoc or via structured loss functions (Noack et al., 2015, Zhou et al., 2023).

Recent theoretical work (Green et al., 14 Nov 2025) provides epistemic error decompositions for multi-step predictions in recursive vs. direct strategies, showing that, for nonlinear/recurrent function classes, recursion expands model expressivity but amplifies parameter estimation variance (quantified via Jacobian amplification). This challenges the classical bias-variance intuition and suggests optimizing recursion depth and block structure case-by-case.

6. Computational Efficiency and Scalability

IRCNN $X = \sum_{m=1}^M Y_m + R_M,$ 9 enables per-signal inference runtime of $Y_m$ 0 for $Y_m$ 1, $Y_m$ 2, $Y_m$ 3 on GPU, at least $Y_m$ 4 faster than classical EMD/VMD (Zhou et al., 2023).
OnlineSTL achieves $Y_m$ 5 speedups, with $Y_m$ 6 time and constant memory footprint per update (Mishra et al., 2021).
RRCNN inner loops are implemented with small CNNs, ensuring high-throughput batch processing; RDMD, however, incurs $Y_m$ 7 cost at each recursion unless low-rank approximations are used (Noack et al., 2015, Zhou et al., 2023).
LiNo, by propagating the current residual only through focused linear/nonlinear extractors at each recursion depth, achieves both depth and flexibility without incurring exponential growth in compute (Yu et al., 2024).

7. Future Directions and Open Challenges

Extending recursive residual decomposition to multivariate, hierarchical, and cross-domain temporal data remains an active research area (Yu et al., 2024, Singh et al., 2024).
Integrating physical constraints, online data assimilation, and adaptive masking/attention between recursion stages has shown promise but poses optimization and interpretability challenges (Singh et al., 2024).
Further theoretical analyses of stability, convergence, and expressivity—especially for deep or nonlinear RRD systems with complex loss surfaces—are still required for more principled architecture and recursion-depth selection (Green et al., 14 Nov 2025, Mau et al., 2024).
Combining robust residual modeling with causal, streaming, and low-latency requirements is crucial for industrial and real-time applications (Mishra et al., 2021).

Recursive residual decomposition methods constitute a versatile and potent paradigm for time series analysis, unifying ideas from neural, modal, and classical statistical approaches with adaptive, recursive architectures for feature extraction, forecasting, and system identification. Empirical results consistently demonstrate improved performance, especially in complex, nonstationary, and multiscale domains (Zhou et al., 2023, Yu et al., 2024, Singh et al., 2024, Zhou et al., 2023).