Logsparse Decomposable Multiscaling (LDM)
- Logsparse Decomposable Multiscaling (LDM) is a framework that decomposes time series into scale-specific streams, using logsparse truncation to mitigate non-stationarity.
- It applies modified wavelet-like filtering and parallel per-scale predictors, enabling efficient long-term forecasting with reduced memory and computational overhead.
- Empirical evaluations demonstrate that LDM outperforms state-of-the-art models on benchmarks by improving MSE by 3–15% while significantly lowering resource usage.
Logsparse Decomposable Multiscaling (LDM) is a framework designed to address the challenges of long-context modeling in long-term time-series forecasting (LTSF). LDM combines multiscale decomposition with logsparse truncation to achieve efficient and effective forecasting over long horizons, particularly by reducing non-stationarity and alleviating both memory and computational bottlenecks. Through explicit task separation by scale and parallelized per-scale processing, LDM outperforms state-of-the-art forecasting models on standard benchmarks while using significantly fewer computational resources (Ma et al., 2024).
1. Formal Structure and Mathematical Definition
LDM applies a modified wavelet-like filter bank for multiscale decomposition of the input time series. Given a multivariate sequence for , with the channel dimension and the sequence length, and a strictly increasing set of scales , the framework recursively performs:
- Low-pass filtering:
- Downsampling and detail extraction:
where denotes average pooling with kernel and stride . After iterations, the decomposition yields , representing increasingly coarser scales.
- Logsparse truncation: Each component or is truncated to
where is the length after downsampling and is the sparsity parameter regulating truncation severity.
- Scale-wise prediction and fusion: Each truncated component is input to its own predictor, , yielding with . The final forecast is
where upsampling to the horizon is achieved by linear interpolation.
2. Multiscale Decoupling and Non-Stationarity Reduction
The LDM framework explicitly decouples the input time series across frequency scales:
- Finest scales (, ): Represent high-frequency detail components that, after local mean subtraction, are approximately stationary.
- Coarsest scale (): Captures the long-term trend, typically smooth and potentially non-stationary, but represented with very low dimension (short length after downsampling).
- Task assignment: By decomposing trend and detail, each predictor is optimized on the scale most appropriate for its temporal structure, which results in improved predictability due to reduced non-stationarity within each subtask.
This scale-wise task separation both enhances statistical stationarity in detail components and enables architectural simplification by obviating costly cross-scale interactions.
3. Architectural Design and Processing Pipeline
LDM’s end-to-end forecasting pipeline comprises four main stages:
- Decomposition: The input is subjected to multiscale filtering and downsampling, extracting components at each scale as described above.
- Logsparse truncation: Efficiently truncates each scale’s input to retain only the recent fraction, eliminating long, noisy tails particularly at fine scales.
- Parallelized per-scale prediction: Each truncated scale component is input to its own, typically compact, predictor block (e.g., small-scale Transformer or MLP). There is no explicit cross-scale attention or gating mechanism; the architecture is fully parallel across scales.
- Fusion: All per-scale outputs are linearly interpolated to the full horizon and summed element-wise to form the final forecast .
This pipeline eliminates architectural complexity while ensuring each model subcomponent operates on appropriately tailored input.
4. Training Objective and Loss Function
The primary loss function is the mean squared error (MSE) between the final forecast and the ground truth: where is batch size, channel count, the forecasting horizon, the ground truth, and the LDM prediction. Although scale-wise auxiliary losses can be incorporated, by default the loss is computed only over the fused output.
5. Computational Efficiency and Scalability
LDM’s efficiency arises fundamentally from its logsparse representation:
- Effective context reduction: Truncating each scale to means that each predictor operates on significantly reduced-length sequences, with even for the largest predictor.
- Complexity analysis: While a standard Transformer has complexity for input length , and segment/token-based variants for segment length , LDM reduces this to
with observed to be 5–10× smaller than , leading to proportionate savings in runtime and memory.
6. Empirical Performance on Long-Term Forecasting Benchmarks
LDM has been evaluated on eight standard multivariate LTSF datasets (Weather, Solar-Energy, Electricity, Traffic, ETTh1, ETTh2, ETTm1, ETTm2) with forecasting horizons . Performance is measured by MSE and MAE, with key results summarized below (values averaged over horizons):
| Dataset | LDM MSE | LDM MAE | PatchTST MSE | PatchTST MAE |
|---|---|---|---|---|
| Weather | 0.222 | 0.267 | 0.237 | 0.280 |
| Solar-Energy | 0.188 | 0.266 | 0.205 | 0.280 |
| Electricity | 0.183 | 0.254 | 0.197 | 0.267 |
| Traffic | 0.406 | 0.272 | 0.415 | 0.286 |
| ETTh1 | 0.404 | 0.438 | 0.438 | 0.449 |
| ETTh2 | 0.344 | 0.380 | 0.404 | 0.417 |
| ETTm1 | 0.346 | 0.375 | 0.386 | 0.395 |
| ETTm2 | 0.248 | 0.313 | 0.280 | 0.333 |
Across all datasets, LDM reduces MSE by 3–15% versus the strongest Transformer baseline (PatchTST) and by 10–20% against the best MLP baseline (TimeMixer) while using 30–50% less GPU memory and 20–40% less training time on large .
7. Summary and Implications
LDM offers a framework that decomposes time series into a parsimonious number of scale-wise streams, applies log-sparse truncation to each, processes scale-wise inputs independently—avoiding the complexity of cross-scale attention—and fuses predictions by summation after appropriate interpolation. These design choices enable both superior forecasting accuracy and substantial resource savings on long-sequence forecasting tasks, directly addressing overfitting and context bottlenecks pervasive in deep model architectures for LTSF (Ma et al., 2024). A plausible implication is that similar logsparse decomposable approaches may prove beneficial in other long-sequence modeling domains where non-stationarity and context length pose practical constraints.