LTSF-Linear Models: Efficient Forecasting
- LTSF-Linear models are a family of simple, explicit linear mappings that project historical time series windows onto future predictions.
- They maintain strict temporal ordering by operating directly on lagged signals, leading to superior performance over transformer-based methods in benchmarks.
- Their efficiency in runtime and parameter count, backed by robust ablation studies, makes them a preferred baseline for long-term forecasting research.
Long-term time series forecasting (LTSF) research has undergone a definitive shift with the emergence of LTSF-Linear models—families of simple yet highly effective linear mappings, often outcompeting sophisticated transformer architectures on standard LTSF benchmarks. This line of work, typified by the "LTSF-Linear" models introduced in (Zeng et al., 2022), reframes LTSF as a (possibly channel-wise) linear regression problem over lagged, normalized, or decomposed input windows, with design variants that capture trend, seasonality, or distribution shifts using direct time-axis mappings. These models, their generic enhancements, and foundational ablation results have led to widespread adoption as baselines and have spurred an ecosystem of improved linear-centric and hybrid approaches. Below, key characteristics, formal structures, empirical insights, and theoretical implications of the LTSF-Linear approach are comprehensively summarized.
1. Core Model Formulation
LTSF-Linear models project fixed-length historical windows of (multivariate) time series data onto a finite predictive horizon through explicit, parameter-efficient linear operators. For a time series with variables, given look-back , the goal is to estimate :
- Standard LTSF-Linear: For each variate ,
- NLinear: Mitigates distribution shift by normalizing input:
- DLinear: Decomposes input into trend (via moving average ) and seasonal , processed in parallel:
These models are strictly channel-independent, encoding no explicit cross-variable dependency, and train all lag-specific weights without implicit inductive bias, granting full temporal resolution along the input window (Zeng et al., 2022).
2. Design Rationale and Temporal Encoding
LTSF-Linear models forgo all forms of position encoding, attention, or learned convolutional kernels. Unlike self-attention mechanisms—which are permutation-invariant and exist to find semantic correlations rather than preserve strict ordering—LTSF-Linear mappings natively maintain the lag structure, ensuring that historical ordering is explicitly honored and exploited. Model performance consistently degrades under input shuffling, whereas transformer-based alternatives are largely insensitive to time step permuting, indicating the latter’s limited ability to preserve fine temporal structure. This fact is underscored by empirical findings across robust ablation studies (Zeng et al., 2022).
3. Empirical Performance and Benchmark Analysis
LTSF-Linear variants have been rigorously evaluated on a suite of nine canonical multivariate benchmarks (ETTh1/2, ETTm1/2, Traffic, Electricity, Exchange, Weather, ILI). On all, one of the Linear variants (DLinear, NLinear, plain Linear) achieves the best MSE and MAE for all forecast horizons ( for most datasets). Example results, excerpted for the Electricity dataset ():
| Model | MSE | MAE |
|---|---|---|
| Linear | 0.140 | 0.237 |
| DLinear | 0.140 | 0.237 |
| FEDformer | 0.193 | 0.308 |
| Autoformer | 0.201 | 0.317 |
The Transformer-based models (e.g., FEDformer, Autoformer) do not outperform LTSF-Linear in any measured setting, with relative performance deficits often exceeding 20–50% in MSE ([Table 2, (Zeng et al., 2022)]). Furthermore, as the historical window grows, LTSF-Linear models consistently improve, whereas Transformer variants plateau or degrade, reflecting the superior temporal memory fidelity of the linear mapping.
4. Efficiency, Simplicity, and Scaling
Practical hardware metrics are critical for large-scale or resource-constrained deployment. LTSF-Linear models are highly parameter- and compute-efficient:
- Runtime: On the Electricity dataset (, ), DLinear achieves 0.4 ms/sample, compared to 26 ms/sample for a vanilla Transformer.
- Parameter count: DLinear utilizes only 0.14M parameters, versus 13.6M–40M for Transformer variants.
- Memory consumption: Training remains below 6 GiB even for lookbacks of .
This efficiency is not simply theoretical; even for large , costs for attention are empirically subdominant, with linear mapping’s or scaling ensuring practical deployment readiness (Zeng et al., 2022).
5. Ablations and Theoretical Insights
Comprehensive ablation studies reveal that removing Transformer-specific modules (attention, positional encodings, FFNs) monotonically improves LTSF performance, highlighting the redundancy of complex machinery for direct forecasting of real-valued signals. Input shuffling catastrophically degrades LTSF-Linear performance (25–80% error), but barely affects Transformer models. Cross-time attention often collapses to uniform or near-uniform weights, indicating a failure to exploit temporal semantics for long-range forecasting.
This suggests that, for LTSF, the information content is dominated by trend and periodic components—precisely the features captured by a learned lag-weighted sum. The self-attention paradigm’s advantages for semantic sequence tasks (NLP, vision) do not generalize to real-valued time series forecasting where permutation sensitivity is crucial.
6. Impact, Best Practices, and Future Directions
LTSF-Linear models, due to their architectural simplicity, parameter efficiency, and state-of-the-art empirical results, are now the de facto baseline for LTSF research. It is recommended to employ these linear models with large look-back windows and to use them as baselines in any LTSF study. Open questions remain for modeling abrupt change points, nonstationarities, and irregular sampling patterns—areas where static linear operators, while competitive, may be suboptimal.
Extensions such as cross-variable mixing [Client, (Gao et al., 2023)], channel-dependent dynamic routing (Ni et al., 2023), and disentangled state-space models (Weng et al., 2024) represent active research frontiers. However, the fundamental insight persists: order-aware, explicit linear regression remains surprisingly potent for long-term numeric prediction.
7. Summary Table of Core Model Variants
| Variant | Key Mechanism | Temporal Encoding | Special Features | Notes |
|---|---|---|---|---|
| Linear | Direct lags | Channel-indep | Pure linear | |
| NLinear | Input normalization | Direct lags | Residual re-add | For shifts |
| DLinear | Trend decomp. | Two-way mapping | Seasonal/trend sep. | Parallel decomp |
All variants share: direct mapping along time, full order exposure, single-layer weights, and high sample efficiency.
These properties and empirical findings make LTSF-Linear an indispensable reference point, and an essential tool for robust and interpretable time series forecasting (Zeng et al., 2022).