Recursive Residual Decomposition for Time Series
- Recursive residual decomposition is an adaptive method that iteratively separates a time series into interpretable components, trends, and residual noise.
- It employs diverse techniques including deep CNNs (e.g., IRCNN+), hybrid linear-nonlinear models (LiNo), and modal approaches (RDMD) for robust extraction.
- Empirical results show significant error reductions and computational efficiency improvements over classical methods in nonstationary data.
Recursive residual decomposition for time series refers to a class of methods that decompose a signal or temporal sequence by recursively extracting, modeling, or removing components—be they physical modes, noise, trends, seasonality, or learned nonlinearities—leaving a residual that itself is further decomposed in subsequent steps. These approaches offer adaptive, data-driven alternatives to traditional time-frequency or modal decompositions, improving robustness, physical interpretability, and computational efficiency in complex real-world time series.
1. Mathematical Foundation and General Framework
Recursive residual decomposition (RRD) is conceptually grounded in iteratively partitioning the original signal into a sum of distinct components plus a residual:
where are extracted modes or components at recursion step and is the remaining residual after steps (Zhou et al., 2023). Each recursion step seeks to model or remove a component from the current residual, so
with estimated by an extraction operator (e.g., CNN, DMD, AR, learned block) parameterized by signal content and/or residual history.
The core idea is instantiated in various algorithmic forms:
- Adaptive convolutional blocks (RRCNN family, LiNo, IRCNN) recursively predict and subtract learned component structure from the residual (Zhou et al., 2023, Zhou et al., 2023, Yu et al., 2024).
- Modal decomposition (RDMD) builds an expansion of orthogonal, frequency-pure modes from residuals recursively (Noack et al., 2015).
- Hybrid frameworks (KODA, LiNo, OnlineSTL) alternate between domain-specific extraction (e.g., linear/Koopman for regular patterns, neural/cnn for local dynamics) and residual learning (Singh et al., 2024, Yu et al., 2024, Mishra et al., 2021).
- Memory-based recursion (VARNN) fuses a running memory of residuals into prediction to adapt to local temporal deviations (Gharwi et al., 10 Oct 2025).
2. Methodological Variants
2.1 Deep Learning Approaches: RRCNN and IRCNN
Both RRCNN (Zhou et al., 2023) and its extension IRCNN (Zhou et al., 2023) implement recursive signal decomposition using cascaded convolutional neural networks, with each recursion responsible for extracting an intrinsic mode function (IMF) analog.
- Each outer iteration processes the current residual, predicting a component via a small recursive CNN block, optionally with multi-scale convolutions and attention (in IRCNN).
- The residual is updated by subtracting this component, and the process repeats.
- A total variation denoising (TVD) postprocessing step enforces physical smoothness:
- Loss functions employ supervised MSE on predicted components, often without explicit smoothness penalties due to TVD postprocessing.
IRCNN further stabilizes decomposition through multi-scale/attention modules, enabling efficient and rapid batch decomposition and improved mode separation under nonstationarity.
2.2 Hybrid and Multi-Block Strategies: LiNo
LiNo demonstrates a deep RRD mechanism alternating between a linear extractor (Li) and a nonlinear extractor (No) (Yu et al., 2024):
- Linear block can be a fixed or learnable moving average or a generalized autoregressive map.
- Nonlinear block can leverage Transformer encoders with multi-head self-attention, as well as time/frequency/channel projections and fusions.
- At each level, Li and No blocks alternately subtract their predicted features from the current residual, with per-block forecast heads contributing to the cumulative prediction.
- The residual is propagated to deeper levels, enabling finer pattern disentanglement and ultimately reducing the residual to white noise as .
In ablation studies, each block is shown to play a critical role: omitting the Li (linear) block increases MSE by 10% (average, multivariate), omitting the No (nonlinear) block leads to catastrophic MSE increase (), and deeper recursion provides measurable benefits.
2.3 Modal and Spectral Decomposition: RDMD and KODA
Recursive Dynamic Mode Decomposition (RDMD) generates orthogonal, frequency-pure modes by recursively extracting the component that minimizes residual energy (Noack et al., 2015):
- At each recursion, DMD is run on the residual, and the candidate mode that yields the lowest averaged truncation error is selected.
- The result is a set of modes with unentangled frequencies, approaching the reconstruction error of proper orthogonal decomposition (POD) but avoiding mode mixing and facilitating spectral clarity.
KODA (Singh et al., 2024) decomposes time series into "physical" (slow, globally regular, Koopman-evolvable) and "residual" (local, time-varying) components by hard frequency-domain separation:
- The physical part is advanced by a learned Koopman operator.
- The residual dynamic is modeled recursively, typically with a trainable GRU.
- Data assimilation is implemented by a neural EKF analogue, allowing assimilation of noisy observations into both branches at inference.
2.4 Classical and Online Filtering: OnlineSTL
OnlineSTL (Mishra et al., 2021) enables recursive, streaming decomposition into trend, seasonal, and remainder components via:
- One-sided kernel smoothing for trends (tri-cube filter).
- Exponential smoothing for seasonality.
- Immediate update of each new residual enables high-throughput, low-latency decomposition suitable for real-time monitoring.
This approach demonstrates the breadth of the RRD principle, spanning deep learning and classical online filters.
3. Training, Losses, and Optimization
- Most deep RRD methods use supervised MSE losses for extracted components or cumulative predictions, with optional regularization:
- TVD regularization or instance normalization (LiNo) is applied to encourage smoothness or stabilize feature distributions (Zhou et al., 2023, Yu et al., 2024).
- Orthogonality constraints can be imposed on components (e.g., via SVD projections in RRCNN) to approach the behavior of modal decompositions (Zhou et al., 2023).
- Optimization is performed via Adam or SGD, with standard learning rate decay and weight decay.
Classical/online methods calibrate smoothing parameters (kernel width, adaptation rate ) for bias-variance tradeoff.
4. Empirical Performance and Comparative Analysis
| Model | Key Strengths | Limitations/Failure Modes |
|---|---|---|
| IRCNN | Robust IMF separation, minimal mode-mixing, real-time batch processing | Not inherently orthogonal components |
| LiNo | Simultaneous linear+nonlinear extraction, SOTA multi-variate scores | Recursion depth must be tuned |
| RDMD | Frequency-pure, orthogonal modes, low truncation error | High computational cost for large , not online |
| OnlineSTL | Real-time, online decomposition, streaming data | Assumes additivity of trends/seasons, no neural modeling |
| KODA | Koopman+residual recursion, data assimilation for nonstationary NLDS | Relies on quality of frequency masking |
| VARNN | Drift/volatility adaptation via error memory | Only recalibrates prediction, not explicit decomposition |
- IRCNN achieves MAE , RMSE in synthetic two-mode mixtures, improving RMSE by 40% over RRCNN and outperforms classical EMD/VMD in mode-mixing and boundary effects (Zhou et al., 2023).
- LiNo reduces overall MSE by (multivariate) and (univariate) vs strong Transformer and MLP baselines across 13 datasets (Yu et al., 2024).
- RDMD’s maximal time-averaged truncation error is typically lower than DMD and approaches that of POD while yielding pure modes (Noack et al., 2015).
- OnlineSTL processes each new observation in time with speedup over batch STL, competitive trend/remainder separation (Mishra et al., 2021).
- KODA matches or outperforms state-of-the-art neural forecasters on long-horizon benchmarks, demonstrating robust assimilation and prediction in multivariate physical and simulated NLDS (Singh et al., 2024).
5. Connections to Classical Methods and Theoretical Implications
Recursive residual decomposition generalizes and extends several longstanding paradigms:
- Empirical mode decomposition (EMD), variational mode decomposition (VMD), and iterative filtering are special cases of recursive subtraction with particular extraction rules; deep RRD methods supersede these with adaptive, learnable operators (Zhou et al., 2023, Zhou et al., 2023).
- In ARIMA and Kalman filters, forecast error is recursed implicitly (ARIMA: backshift operator; Kalman: innovation filter), but RRD replaces these with nonlinear, flexible mechanisms—e.g., learned memory state updates in VARNN or GRUs in KODA—accommodating volatility and nonlinearity (Gharwi et al., 10 Oct 2025, Singh et al., 2024).
- Modal approaches (DMD/RDMD) emphasize spectral purity and orthogonality, achievable via RRD by enforcing orthogonal projections post hoc or via structured loss functions (Noack et al., 2015, Zhou et al., 2023).
Recent theoretical work (Green et al., 14 Nov 2025) provides epistemic error decompositions for multi-step predictions in recursive vs. direct strategies, showing that, for nonlinear/recurrent function classes, recursion expands model expressivity but amplifies parameter estimation variance (quantified via Jacobian amplification). This challenges the classical bias-variance intuition and suggests optimizing recursion depth and block structure case-by-case.
6. Computational Efficiency and Scalability
- IRCNN enables per-signal inference runtime of for , , on GPU, at least faster than classical EMD/VMD (Zhou et al., 2023).
- OnlineSTL achieves speedups, with time and constant memory footprint per update (Mishra et al., 2021).
- RRCNN inner loops are implemented with small CNNs, ensuring high-throughput batch processing; RDMD, however, incurs cost at each recursion unless low-rank approximations are used (Noack et al., 2015, Zhou et al., 2023).
- LiNo, by propagating the current residual only through focused linear/nonlinear extractors at each recursion depth, achieves both depth and flexibility without incurring exponential growth in compute (Yu et al., 2024).
7. Future Directions and Open Challenges
- Extending recursive residual decomposition to multivariate, hierarchical, and cross-domain temporal data remains an active research area (Yu et al., 2024, Singh et al., 2024).
- Integrating physical constraints, online data assimilation, and adaptive masking/attention between recursion stages has shown promise but poses optimization and interpretability challenges (Singh et al., 2024).
- Further theoretical analyses of stability, convergence, and expressivity—especially for deep or nonlinear RRD systems with complex loss surfaces—are still required for more principled architecture and recursion-depth selection (Green et al., 14 Nov 2025, Mau et al., 2024).
- Combining robust residual modeling with causal, streaming, and low-latency requirements is crucial for industrial and real-time applications (Mishra et al., 2021).
Recursive residual decomposition methods constitute a versatile and potent paradigm for time series analysis, unifying ideas from neural, modal, and classical statistical approaches with adaptive, recursive architectures for feature extraction, forecasting, and system identification. Empirical results consistently demonstrate improved performance, especially in complex, nonstationary, and multiscale domains (Zhou et al., 2023, Yu et al., 2024, Singh et al., 2024, Zhou et al., 2023).