Logsparse Decomposable Multiscaling (LDM)

Updated 16 March 2026

Logsparse Decomposable Multiscaling (LDM) is a framework that decomposes time series into scale-specific streams, using logsparse truncation to mitigate non-stationarity.
It applies modified wavelet-like filtering and parallel per-scale predictors, enabling efficient long-term forecasting with reduced memory and computational overhead.
Empirical evaluations demonstrate that LDM outperforms state-of-the-art models on benchmarks by improving MSE by 3–15% while significantly lowering resource usage.

Logsparse Decomposable Multiscaling (LDM) is a framework designed to address the challenges of long-context modeling in long-term time-series forecasting (LTSF). LDM combines multiscale decomposition with logsparse truncation to achieve efficient and effective forecasting over long horizons, particularly by reducing non-stationarity and alleviating both memory and computational bottlenecks. Through explicit task separation by scale and parallelized per-scale processing, LDM outperforms state-of-the-art forecasting models on standard benchmarks while using significantly fewer computational resources (Ma et al., 2024).

1. Formal Structure and Mathematical Definition

LDM applies a modified wavelet-like filter bank for multiscale decomposition of the input time series. Given a multivariate sequence $s^0(k) \in \mathbb{R}^M$ for $k=1,\dots,L$ , with $M$ the channel dimension and $L$ the sequence length, and a strictly increasing set of scales $\{p^1<p^2<\cdots<p^N\}$ , the framework recursively performs:

Low-pass filtering:

$\tau^i(k) = \frac{1}{p^i} \sum_{n=-\frac{p^i}{2}}^{\frac{p^i}{2}} s^i(k-n)$

Downsampling and detail extraction:

$s^{i+1}(k) = \downarrow_{\frac{p^i}{2}} \tau^i(k), \qquad w^i(k) = s^i(k) - \tau^i(k)$

where $\downarrow_q$ denotes average pooling with kernel and stride $q$ . After $N$ iterations, the decomposition yields $\{w^0,w^1,...,w^{N-1},s^N\}$ , representing increasingly coarser scales.

Logsparse truncation: Each component $s_n$ or $w_n$ is truncated to

$\tilde L_n = \min(L_n, \; \tfrac{p^n}{\eta})$

where $L_n \approx 2L/p^n$ is the length after downsampling and $\eta \in (0,1]$ is the sparsity parameter regulating truncation severity.

Scale-wise prediction and fusion: Each truncated component $\mathbf{s}_n \in \mathbb{R}^{M \times \tilde{L}_n}$ is input to its own predictor, $\mathrm{Pred}_n$ , yielding $\mathbf{y}_n \in \mathbb{R}^{M \times H_n}$ with $H_n \approx 2H/p^n$ . The final forecast is

$\hat{\mathbf{y}} = \sum_{n=1}^{N+1} \mathrm{Interpolate}_{H_n\rightarrow H}(\mathbf{y}_n)$

where upsampling to the horizon $H$ is achieved by linear interpolation.

2. Multiscale Decoupling and Non-Stationarity Reduction

The LDM framework explicitly decouples the input time series across frequency scales:

Finest scales ( $w^0$ , $w^1$ ): Represent high-frequency detail components that, after local mean subtraction, are approximately stationary.
Coarsest scale ( $s^N$ ): Captures the long-term trend, typically smooth and potentially non-stationary, but represented with very low dimension (short length after downsampling).
Task assignment: By decomposing trend and detail, each predictor is optimized on the scale most appropriate for its temporal structure, which results in improved predictability due to reduced non-stationarity within each subtask.

This scale-wise task separation both enhances statistical stationarity in detail components and enables architectural simplification by obviating costly cross-scale interactions.

3. Architectural Design and Processing Pipeline

LDM’s end-to-end forecasting pipeline comprises four main stages:

Decomposition: The input is subjected to multiscale filtering and downsampling, extracting components at each scale as described above.
Logsparse truncation: Efficiently truncates each scale’s input to retain only the recent $\propto 1/\eta$ fraction, eliminating long, noisy tails particularly at fine scales.
Parallelized per-scale prediction: Each truncated scale component is input to its own, typically compact, predictor block (e.g., small-scale Transformer or MLP). There is no explicit cross-scale attention or gating mechanism; the architecture is fully parallel across scales.
Fusion: All per-scale outputs $\mathbf{y}_n$ are linearly interpolated to the full horizon and summed element-wise to form the final forecast $\hat{\mathbf{y}}$ .

This pipeline eliminates architectural complexity while ensuring each model subcomponent operates on appropriately tailored input.

4. Training Objective and Loss Function

The primary loss function is the mean squared error (MSE) between the final forecast and the ground truth: $\mathcal{L} = \frac{1}{T} \sum_{t=1}^T \frac{1}{MH} \sum_{i=1}^M \sum_{j=1}^H \bigl[y_t^{(i,j)} - \hat y_t^{(i,j)}\bigr]^2$ where $T$ is batch size, $M$ channel count, $H$ the forecasting horizon, $y_t$ the ground truth, and $\hat{y}_t$ the LDM prediction. Although scale-wise auxiliary losses $\mathcal{L}_n$ can be incorporated, by default the loss is computed only over the fused output.

5. Computational Efficiency and Scalability

LDM’s efficiency arises fundamentally from its logsparse representation:

Effective context reduction: Truncating each scale to $\tilde L_n \ll L_n$ means that each predictor operates on significantly reduced-length sequences, with $\tilde L_1 \approx p^1/\eta \ll L$ even for the largest predictor.
Complexity analysis: While a standard Transformer has complexity $O(L^2)$ for input length $L$ , and segment/token-based variants $O(M L_{\mathrm{seg}}^2)$ for segment length $L_{\mathrm{seg}}$ , LDM reduces this to

$O\left(M\,\left(\frac{\tilde{L}_1}{L_{\mathrm{seg}}}\right)^2\right)$

with $\tilde{L}_1$ observed to be 5–10× smaller than $L$ , leading to proportionate savings in runtime and memory.

6. Empirical Performance on Long-Term Forecasting Benchmarks

LDM has been evaluated on eight standard multivariate LTSF datasets (Weather, Solar-Energy, Electricity, Traffic, ETTh1, ETTh2, ETTm1, ETTm2) with forecasting horizons $H\in\{96,192,336,720\}$ . Performance is measured by MSE and MAE, with key results summarized below (values averaged over horizons):

Dataset	LDM MSE	LDM MAE	PatchTST MSE	PatchTST MAE
Weather	0.222	0.267	0.237	0.280
Solar-Energy	0.188	0.266	0.205	0.280
Electricity	0.183	0.254	0.197	0.267
Traffic	0.406	0.272	0.415	0.286
ETTh1	0.404	0.438	0.438	0.449
ETTh2	0.344	0.380	0.404	0.417
ETTm1	0.346	0.375	0.386	0.395
ETTm2	0.248	0.313	0.280	0.333

Across all datasets, LDM reduces MSE by 3–15% versus the strongest Transformer baseline (PatchTST) and by 10–20% against the best MLP baseline (TimeMixer) while using 30–50% less GPU memory and 20–40% less training time on large $L$ .

7. Summary and Implications

LDM offers a framework that decomposes time series into a parsimonious number of scale-wise streams, applies log-sparse truncation to each, processes scale-wise inputs independently—avoiding the complexity of cross-scale attention—and fuses predictions by summation after appropriate interpolation. These design choices enable both superior forecasting accuracy and substantial resource savings on long-sequence forecasting tasks, directly addressing overfitting and context bottlenecks pervasive in deep model architectures for LTSF (Ma et al., 2024). A plausible implication is that similar logsparse decomposable approaches may prove beneficial in other long-sequence modeling domains where non-stationarity and context length pose practical constraints.

Markdown Report Issue Upgrade to Chat

References (1)

Breaking the Context Bottleneck on Long Time Series Forecasting (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Logsparse Decomposable Multiscaling (LDM).

Logsparse Decomposable Multiscaling (LDM)

1. Formal Structure and Mathematical Definition

2. Multiscale Decoupling and Non-Stationarity Reduction

3. Architectural Design and Processing Pipeline

4. Training Objective and Loss Function

5. Computational Efficiency and Scalability

6. Empirical Performance on Long-Term Forecasting Benchmarks

7. Summary and Implications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Logsparse Decomposable Multiscaling (LDM)

1. Formal Structure and Mathematical Definition

2. Multiscale Decoupling and Non-Stationarity Reduction

3. Architectural Design and Processing Pipeline

4. Training Objective and Loss Function

5. Computational Efficiency and Scalability

6. Empirical Performance on Long-Term Forecasting Benchmarks

7. Summary and Implications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research