Papers
Topics
Authors
Recent
Search
2000 character limit reached

Logsparse Decomposable Multiscaling (LDM)

Updated 16 March 2026
  • Logsparse Decomposable Multiscaling (LDM) is a framework that decomposes time series into scale-specific streams, using logsparse truncation to mitigate non-stationarity.
  • It applies modified wavelet-like filtering and parallel per-scale predictors, enabling efficient long-term forecasting with reduced memory and computational overhead.
  • Empirical evaluations demonstrate that LDM outperforms state-of-the-art models on benchmarks by improving MSE by 3–15% while significantly lowering resource usage.

Logsparse Decomposable Multiscaling (LDM) is a framework designed to address the challenges of long-context modeling in long-term time-series forecasting (LTSF). LDM combines multiscale decomposition with logsparse truncation to achieve efficient and effective forecasting over long horizons, particularly by reducing non-stationarity and alleviating both memory and computational bottlenecks. Through explicit task separation by scale and parallelized per-scale processing, LDM outperforms state-of-the-art forecasting models on standard benchmarks while using significantly fewer computational resources (Ma et al., 2024).

1. Formal Structure and Mathematical Definition

LDM applies a modified wavelet-like filter bank for multiscale decomposition of the input time series. Given a multivariate sequence s0(k)RMs^0(k) \in \mathbb{R}^M for k=1,,Lk=1,\dots,L, with MM the channel dimension and LL the sequence length, and a strictly increasing set of scales {p1<p2<<pN}\{p^1<p^2<\cdots<p^N\}, the framework recursively performs:

  • Low-pass filtering:

τi(k)=1pin=pi2pi2si(kn)\tau^i(k) = \frac{1}{p^i} \sum_{n=-\frac{p^i}{2}}^{\frac{p^i}{2}} s^i(k-n)

  • Downsampling and detail extraction:

si+1(k)=pi2τi(k),wi(k)=si(k)τi(k)s^{i+1}(k) = \downarrow_{\frac{p^i}{2}} \tau^i(k), \qquad w^i(k) = s^i(k) - \tau^i(k)

where q\downarrow_q denotes average pooling with kernel and stride qq. After NN iterations, the decomposition yields {w0,w1,...,wN1,sN}\{w^0,w^1,...,w^{N-1},s^N\}, representing increasingly coarser scales.

  • Logsparse truncation: Each component sns_n or wnw_n is truncated to

L~n=min(Ln,  pnη)\tilde L_n = \min(L_n, \; \tfrac{p^n}{\eta})

where Ln2L/pnL_n \approx 2L/p^n is the length after downsampling and η(0,1]\eta \in (0,1] is the sparsity parameter regulating truncation severity.

  • Scale-wise prediction and fusion: Each truncated component snRM×L~n\mathbf{s}_n \in \mathbb{R}^{M \times \tilde{L}_n} is input to its own predictor, Predn\mathrm{Pred}_n, yielding ynRM×Hn\mathbf{y}_n \in \mathbb{R}^{M \times H_n} with Hn2H/pnH_n \approx 2H/p^n. The final forecast is

y^=n=1N+1InterpolateHnH(yn)\hat{\mathbf{y}} = \sum_{n=1}^{N+1} \mathrm{Interpolate}_{H_n\rightarrow H}(\mathbf{y}_n)

where upsampling to the horizon HH is achieved by linear interpolation.

2. Multiscale Decoupling and Non-Stationarity Reduction

The LDM framework explicitly decouples the input time series across frequency scales:

  • Finest scales (w0w^0, w1w^1): Represent high-frequency detail components that, after local mean subtraction, are approximately stationary.
  • Coarsest scale (sNs^N): Captures the long-term trend, typically smooth and potentially non-stationary, but represented with very low dimension (short length after downsampling).
  • Task assignment: By decomposing trend and detail, each predictor is optimized on the scale most appropriate for its temporal structure, which results in improved predictability due to reduced non-stationarity within each subtask.

This scale-wise task separation both enhances statistical stationarity in detail components and enables architectural simplification by obviating costly cross-scale interactions.

3. Architectural Design and Processing Pipeline

LDM’s end-to-end forecasting pipeline comprises four main stages:

  • Decomposition: The input is subjected to multiscale filtering and downsampling, extracting components at each scale as described above.
  • Logsparse truncation: Efficiently truncates each scale’s input to retain only the recent 1/η\propto 1/\eta fraction, eliminating long, noisy tails particularly at fine scales.
  • Parallelized per-scale prediction: Each truncated scale component is input to its own, typically compact, predictor block (e.g., small-scale Transformer or MLP). There is no explicit cross-scale attention or gating mechanism; the architecture is fully parallel across scales.
  • Fusion: All per-scale outputs yn\mathbf{y}_n are linearly interpolated to the full horizon and summed element-wise to form the final forecast y^\hat{\mathbf{y}}.

This pipeline eliminates architectural complexity while ensuring each model subcomponent operates on appropriately tailored input.

4. Training Objective and Loss Function

The primary loss function is the mean squared error (MSE) between the final forecast and the ground truth: L=1Tt=1T1MHi=1Mj=1H[yt(i,j)y^t(i,j)]2\mathcal{L} = \frac{1}{T} \sum_{t=1}^T \frac{1}{MH} \sum_{i=1}^M \sum_{j=1}^H \bigl[y_t^{(i,j)} - \hat y_t^{(i,j)}\bigr]^2 where TT is batch size, MM channel count, HH the forecasting horizon, yty_t the ground truth, and y^t\hat{y}_t the LDM prediction. Although scale-wise auxiliary losses Ln\mathcal{L}_n can be incorporated, by default the loss is computed only over the fused output.

5. Computational Efficiency and Scalability

LDM’s efficiency arises fundamentally from its logsparse representation:

  • Effective context reduction: Truncating each scale to L~nLn\tilde L_n \ll L_n means that each predictor operates on significantly reduced-length sequences, with L~1p1/ηL\tilde L_1 \approx p^1/\eta \ll L even for the largest predictor.
  • Complexity analysis: While a standard Transformer has complexity O(L2)O(L^2) for input length LL, and segment/token-based variants O(MLseg2)O(M L_{\mathrm{seg}}^2) for segment length LsegL_{\mathrm{seg}}, LDM reduces this to

O(M(L~1Lseg)2)O\left(M\,\left(\frac{\tilde{L}_1}{L_{\mathrm{seg}}}\right)^2\right)

with L~1\tilde{L}_1 observed to be 5–10× smaller than LL, leading to proportionate savings in runtime and memory.

6. Empirical Performance on Long-Term Forecasting Benchmarks

LDM has been evaluated on eight standard multivariate LTSF datasets (Weather, Solar-Energy, Electricity, Traffic, ETTh1, ETTh2, ETTm1, ETTm2) with forecasting horizons H{96,192,336,720}H\in\{96,192,336,720\}. Performance is measured by MSE and MAE, with key results summarized below (values averaged over horizons):

Dataset LDM MSE LDM MAE PatchTST MSE PatchTST MAE
Weather 0.222 0.267 0.237 0.280
Solar-Energy 0.188 0.266 0.205 0.280
Electricity 0.183 0.254 0.197 0.267
Traffic 0.406 0.272 0.415 0.286
ETTh1 0.404 0.438 0.438 0.449
ETTh2 0.344 0.380 0.404 0.417
ETTm1 0.346 0.375 0.386 0.395
ETTm2 0.248 0.313 0.280 0.333

Across all datasets, LDM reduces MSE by 3–15% versus the strongest Transformer baseline (PatchTST) and by 10–20% against the best MLP baseline (TimeMixer) while using 30–50% less GPU memory and 20–40% less training time on large LL.

7. Summary and Implications

LDM offers a framework that decomposes time series into a parsimonious number of scale-wise streams, applies log-sparse truncation to each, processes scale-wise inputs independently—avoiding the complexity of cross-scale attention—and fuses predictions by summation after appropriate interpolation. These design choices enable both superior forecasting accuracy and substantial resource savings on long-sequence forecasting tasks, directly addressing overfitting and context bottlenecks pervasive in deep model architectures for LTSF (Ma et al., 2024). A plausible implication is that similar logsparse decomposable approaches may prove beneficial in other long-sequence modeling domains where non-stationarity and context length pose practical constraints.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Logsparse Decomposable Multiscaling (LDM).