Papers
Topics
Authors
Recent
Search
2000 character limit reached

Recursive Residual Decomposition for Time Series

Updated 28 January 2026
  • Recursive residual decomposition is an adaptive method that iteratively separates a time series into interpretable components, trends, and residual noise.
  • It employs diverse techniques including deep CNNs (e.g., IRCNN+), hybrid linear-nonlinear models (LiNo), and modal approaches (RDMD) for robust extraction.
  • Empirical results show significant error reductions and computational efficiency improvements over classical methods in nonstationary data.

Recursive residual decomposition for time series refers to a class of methods that decompose a signal or temporal sequence by recursively extracting, modeling, or removing components—be they physical modes, noise, trends, seasonality, or learned nonlinearities—leaving a residual that itself is further decomposed in subsequent steps. These approaches offer adaptive, data-driven alternatives to traditional time-frequency or modal decompositions, improving robustness, physical interpretability, and computational efficiency in complex real-world time series.

1. Mathematical Foundation and General Framework

Recursive residual decomposition (RRD) is conceptually grounded in iteratively partitioning the original signal XRNX \in \mathbb{R}^N into a sum of MM distinct components plus a residual:

X=m=1MYm+RM,X = \sum_{m=1}^M Y_m + R_M,

where YmY_m are extracted modes or components at recursion step mm and RMR_M is the remaining residual after MM steps (Zhou et al., 2023). Each recursion step seeks to model or remove a component from the current residual, so

R(m)=R(m1)Y^m,R^{(m)} = R^{(m-1)} - \hat{Y}_m,

with Y^m\hat{Y}_m estimated by an extraction operator (e.g., CNN, DMD, AR, learned block) parameterized by signal content and/or residual history.

The core idea is instantiated in various algorithmic forms:

2. Methodological Variants

2.1 Deep Learning Approaches: RRCNN and IRCNN+^+

Both RRCNN (Zhou et al., 2023) and its extension IRCNN+^+ (Zhou et al., 2023) implement recursive signal decomposition using cascaded convolutional neural networks, with each recursion responsible for extracting an intrinsic mode function (IMF) analog.

  • Each outer iteration processes the current residual, predicting a component via a small recursive CNN block, optionally with multi-scale convolutions and attention (in IRCNN+^+).
  • The residual is updated by subtracting this component, and the process repeats.
  • A total variation denoising (TVD) postprocessing step enforces physical smoothness:

Y^=argminY12Y^Y22+λTV(Y),TV(Y)=t=1N1Yt+1Yt\widehat{Y} = \arg\min_{Y} \frac{1}{2}\|\hat{Y} - Y\|_2^2 + \lambda\,TV(Y), \quad TV(Y)= \sum_{t=1}^{N-1} |Y_{t+1} - Y_t|

  • Loss functions employ supervised MSE on predicted components, often without explicit smoothness penalties due to TVD postprocessing.

IRCNN+^+ further stabilizes decomposition through multi-scale/attention modules, enabling efficient and rapid batch decomposition and improved mode separation under nonstationarity.

2.2 Hybrid and Multi-Block Strategies: LiNo

LiNo demonstrates a deep RRD mechanism alternating between a linear extractor (Li) and a nonlinear extractor (No) (Yu et al., 2024):

  • Linear block can be a fixed or learnable moving average or a generalized autoregressive map.
  • Nonlinear block can leverage Transformer encoders with multi-head self-attention, as well as time/frequency/channel projections and fusions.
  • At each level, Li and No blocks alternately subtract their predicted features from the current residual, with per-block forecast heads contributing to the cumulative prediction.
  • The residual is propagated to deeper levels, enabling finer pattern disentanglement and ultimately reducing the residual to white noise as KK\to\infty.

In ablation studies, each block is shown to play a critical role: omitting the Li (linear) block increases MSE by 10% (average, multivariate), omitting the No (nonlinear) block leads to catastrophic MSE increase (+71.8%+71.8\%), and deeper recursion provides measurable benefits.

Recursive Dynamic Mode Decomposition (RDMD) generates orthogonal, frequency-pure modes by recursively extracting the component that minimizes residual energy (Noack et al., 2015):

  • At each recursion, DMD is run on the residual, and the candidate mode that yields the lowest averaged truncation error is selected.
  • The result is a set of modes with unentangled frequencies, approaching the reconstruction error of proper orthogonal decomposition (POD) but avoiding mode mixing and facilitating spectral clarity.

KODA (Singh et al., 2024) decomposes time series into "physical" (slow, globally regular, Koopman-evolvable) and "residual" (local, time-varying) components by hard frequency-domain separation:

  • The physical part is advanced by a learned Koopman operator.
  • The residual dynamic is modeled recursively, typically with a trainable GRU.
  • Data assimilation is implemented by a neural EKF analogue, allowing assimilation of noisy observations into both branches at inference.

2.4 Classical and Online Filtering: OnlineSTL

OnlineSTL (Mishra et al., 2021) enables recursive, streaming decomposition into trend, seasonal, and remainder components via:

  • One-sided kernel smoothing for trends (tri-cube filter).
  • Exponential smoothing for seasonality.
  • Immediate update of each new residual enables high-throughput, low-latency decomposition suitable for real-time monitoring.

This approach demonstrates the breadth of the RRD principle, spanning deep learning and classical online filters.

3. Training, Losses, and Optimization

  • Most deep RRD methods use supervised MSE losses for extracted components or cumulative predictions, with optional regularization:

L=m=1MY^mYmtrue22L = \sum_{m=1}^M \|\hat{Y}_m - Y_m^{true}\|_2^2

  • TVD regularization or instance normalization (LiNo) is applied to encourage smoothness or stabilize feature distributions (Zhou et al., 2023, Yu et al., 2024).
  • Orthogonality constraints can be imposed on components (e.g., via SVD projections in RRCNN) to approach the behavior of modal decompositions (Zhou et al., 2023).
  • Optimization is performed via Adam or SGD, with standard learning rate decay and weight decay.

Classical/online methods calibrate smoothing parameters (kernel width, adaptation rate γ\gamma) for bias-variance tradeoff.

4. Empirical Performance and Comparative Analysis

Model Key Strengths Limitations/Failure Modes
IRCNN+^+ Robust IMF separation, minimal mode-mixing, real-time batch processing Not inherently orthogonal components
LiNo Simultaneous linear+nonlinear extraction, SOTA multi-variate scores Recursion depth must be tuned
RDMD Frequency-pure, orthogonal modes, low truncation error High computational cost for large NN, not online
OnlineSTL Real-time, online decomposition, streaming data Assumes additivity of trends/seasons, no neural modeling
KODA Koopman+residual recursion, data assimilation for nonstationary NLDS Relies on quality of frequency masking
VARNN Drift/volatility adaptation via error memory Only recalibrates prediction, not explicit decomposition
  • IRCNN+^+ achieves MAE 0.015\approx 0.015, RMSE 0.031\approx 0.031 in synthetic two-mode mixtures, improving RMSE by \sim40% over RRCNN and outperforms classical EMD/VMD in mode-mixing and boundary effects (Zhou et al., 2023).
  • LiNo reduces overall MSE by 3.41%3.41\% (multivariate) and 19.37%19.37\% (univariate) vs strong Transformer and MLP baselines across 13 datasets (Yu et al., 2024).
  • RDMD’s maximal time-averaged truncation error is typically lower than DMD and approaches that of POD while yielding pure modes (Noack et al., 2015).
  • OnlineSTL processes each new observation in O(km)O(k\cdot m) time with 100×100 \times speedup over batch STL, competitive trend/remainder separation (Mishra et al., 2021).
  • KODA matches or outperforms state-of-the-art neural forecasters on long-horizon benchmarks, demonstrating robust assimilation and prediction in multivariate physical and simulated NLDS (Singh et al., 2024).

5. Connections to Classical Methods and Theoretical Implications

Recursive residual decomposition generalizes and extends several longstanding paradigms:

Recent theoretical work (Green et al., 14 Nov 2025) provides epistemic error decompositions for multi-step predictions in recursive vs. direct strategies, showing that, for nonlinear/recurrent function classes, recursion expands model expressivity but amplifies parameter estimation variance (quantified via Jacobian amplification). This challenges the classical bias-variance intuition and suggests optimizing recursion depth and block structure case-by-case.

6. Computational Efficiency and Scalability

  • IRCNN+^+ enables per-signal inference runtime of 2ms2\,\text{ms} for N=1024N=1024, M=5M=5, S=10S=10 on GPU, at least 10×10\times faster than classical EMD/VMD (Zhou et al., 2023).
  • OnlineSTL achieves 100×100\times speedups, with O(km)O(k\,m) time and constant memory footprint per update (Mishra et al., 2021).
  • RRCNN inner loops are implemented with small CNNs, ensuring high-throughput batch processing; RDMD, however, incurs O(M3)O(M^3) cost at each recursion unless low-rank approximations are used (Noack et al., 2015, Zhou et al., 2023).
  • LiNo, by propagating the current residual only through focused linear/nonlinear extractors at each recursion depth, achieves both depth and flexibility without incurring exponential growth in compute (Yu et al., 2024).

7. Future Directions and Open Challenges

  • Extending recursive residual decomposition to multivariate, hierarchical, and cross-domain temporal data remains an active research area (Yu et al., 2024, Singh et al., 2024).
  • Integrating physical constraints, online data assimilation, and adaptive masking/attention between recursion stages has shown promise but poses optimization and interpretability challenges (Singh et al., 2024).
  • Further theoretical analyses of stability, convergence, and expressivity—especially for deep or nonlinear RRD systems with complex loss surfaces—are still required for more principled architecture and recursion-depth selection (Green et al., 14 Nov 2025, Mau et al., 2024).
  • Combining robust residual modeling with causal, streaming, and low-latency requirements is crucial for industrial and real-time applications (Mishra et al., 2021).

Recursive residual decomposition methods constitute a versatile and potent paradigm for time series analysis, unifying ideas from neural, modal, and classical statistical approaches with adaptive, recursive architectures for feature extraction, forecasting, and system identification. Empirical results consistently demonstrate improved performance, especially in complex, nonstationary, and multiscale domains (Zhou et al., 2023, Yu et al., 2024, Singh et al., 2024, Zhou et al., 2023).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Recursive Residual Decomposition for Time Series.