Papers
Topics
Authors
Recent
2000 character limit reached

LTSF-Linear Models: Efficient Forecasting

Updated 20 December 2025
  • LTSF-Linear models are a family of simple, explicit linear mappings that project historical time series windows onto future predictions.
  • They maintain strict temporal ordering by operating directly on lagged signals, leading to superior performance over transformer-based methods in benchmarks.
  • Their efficiency in runtime and parameter count, backed by robust ablation studies, makes them a preferred baseline for long-term forecasting research.

Long-term time series forecasting (LTSF) research has undergone a definitive shift with the emergence of LTSF-Linear models—families of simple yet highly effective linear mappings, often outcompeting sophisticated transformer architectures on standard LTSF benchmarks. This line of work, typified by the "LTSF-Linear" models introduced in (Zeng et al., 2022), reframes LTSF as a (possibly channel-wise) linear regression problem over lagged, normalized, or decomposed input windows, with design variants that capture trend, seasonality, or distribution shifts using direct time-axis mappings. These models, their generic enhancements, and foundational ablation results have led to widespread adoption as baselines and have spurred an ecosystem of improved linear-centric and hybrid approaches. Below, key characteristics, formal structures, empirical insights, and theoretical implications of the LTSF-Linear approach are comprehensively summarized.

1. Core Model Formulation

LTSF-Linear models project fixed-length historical windows of (multivariate) time series data onto a finite predictive horizon through explicit, parameter-efficient linear operators. For a time series with CC variables, given look-back X∈RL×CX \in \mathbb{R}^{L\times C}, the goal is to estimate Y∈RH×CY \in \mathbb{R}^{H\times C}:

  • Standard LTSF-Linear: For each variate ii,

Y^i=WXi+b,W∈RH×L, b∈RH.\hat Y_i = W X_i + b, \quad W \in \mathbb{R}^{H \times L},\ b \in \mathbb{R}^{H}.

  • NLinear: Mitigates distribution shift by normalizing input:

X~i=Xi−xL;Y^i=WX~i+b+xL.\tilde{X}_i = X_i - x_{L}; \quad \hat Y_i = W \tilde{X}_i + b + x_L.

  • DLinear: Decomposes input into trend TT (via moving average MAkMA_k) and seasonal S=X−TS = X - T, processed in parallel:

Y^i=WTTi+WSSi+b,WT,WS∈RH×L.\hat Y_i = W_T T_i + W_S S_i + b, \quad W_T, W_S \in \mathbb{R}^{H \times L}.

These models are strictly channel-independent, encoding no explicit cross-variable dependency, and train all lag-specific weights without implicit inductive bias, granting full temporal resolution along the input window (Zeng et al., 2022).

2. Design Rationale and Temporal Encoding

LTSF-Linear models forgo all forms of position encoding, attention, or learned convolutional kernels. Unlike self-attention mechanisms—which are permutation-invariant and exist to find semantic correlations rather than preserve strict ordering—LTSF-Linear mappings natively maintain the lag structure, ensuring that historical ordering is explicitly honored and exploited. Model performance consistently degrades under input shuffling, whereas transformer-based alternatives are largely insensitive to time step permuting, indicating the latter’s limited ability to preserve fine temporal structure. This fact is underscored by empirical findings across robust ablation studies (Zeng et al., 2022).

3. Empirical Performance and Benchmark Analysis

LTSF-Linear variants have been rigorously evaluated on a suite of nine canonical multivariate benchmarks (ETTh1/2, ETTm1/2, Traffic, Electricity, Exchange, Weather, ILI). On all, one of the Linear variants (DLinear, NLinear, plain Linear) achieves the best MSE and MAE for all forecast horizons (H∈{96,192,336,720}H \in \{96,192,336,720\} for most datasets). Example results, excerpted for the Electricity dataset (H=96H = 96):

Model MSE MAE
Linear 0.140 0.237
DLinear 0.140 0.237
FEDformer 0.193 0.308
Autoformer 0.201 0.317

The Transformer-based models (e.g., FEDformer, Autoformer) do not outperform LTSF-Linear in any measured setting, with relative performance deficits often exceeding 20–50% in MSE ([Table 2, (Zeng et al., 2022)]). Furthermore, as the historical window LL grows, LTSF-Linear models consistently improve, whereas Transformer variants plateau or degrade, reflecting the superior temporal memory fidelity of the linear mapping.

4. Efficiency, Simplicity, and Scaling

Practical hardware metrics are critical for large-scale or resource-constrained deployment. LTSF-Linear models are highly parameter- and compute-efficient:

  • Runtime: On the Electricity dataset (L=96L = 96, H=720H = 720), DLinear achieves 0.4 ms/sample, compared to >>26 ms/sample for a vanilla Transformer.
  • Parameter count: DLinear utilizes only 0.14M parameters, versus 13.6M–40M for Transformer variants.
  • Memory consumption: Training remains below 6 GiB even for lookbacks of L=720L = 720.

This efficiency is not simply theoretical; even for large LL, O(L2)O(L^2) costs for attention are empirically subdominant, with linear mapping’s O(L)O(L) or O(LC)O(LC) scaling ensuring practical deployment readiness (Zeng et al., 2022).

5. Ablations and Theoretical Insights

Comprehensive ablation studies reveal that removing Transformer-specific modules (attention, positional encodings, FFNs) monotonically improves LTSF performance, highlighting the redundancy of complex machinery for direct forecasting of real-valued signals. Input shuffling catastrophically degrades LTSF-Linear performance (++25–80% error), but barely affects Transformer models. Cross-time attention often collapses to uniform or near-uniform weights, indicating a failure to exploit temporal semantics for long-range forecasting.

This suggests that, for LTSF, the information content is dominated by trend and periodic components—precisely the features captured by a learned lag-weighted sum. The self-attention paradigm’s advantages for semantic sequence tasks (NLP, vision) do not generalize to real-valued time series forecasting where permutation sensitivity is crucial.

6. Impact, Best Practices, and Future Directions

LTSF-Linear models, due to their architectural simplicity, parameter efficiency, and state-of-the-art empirical results, are now the de facto baseline for LTSF research. It is recommended to employ these linear models with large look-back windows and to use them as baselines in any LTSF study. Open questions remain for modeling abrupt change points, nonstationarities, and irregular sampling patterns—areas where static linear operators, while competitive, may be suboptimal.

Extensions such as cross-variable mixing [Client, (Gao et al., 2023)], channel-dependent dynamic routing (Ni et al., 2023), and disentangled state-space models (Weng et al., 2024) represent active research frontiers. However, the fundamental insight persists: order-aware, explicit linear regression remains surprisingly potent for long-term numeric prediction.

7. Summary Table of Core Model Variants

Variant Key Mechanism Temporal Encoding Special Features Notes
Linear WXi+bW X_i + b Direct lags Channel-indep Pure linear
NLinear Input normalization Direct lags Residual re-add For shifts
DLinear Trend decomp. Two-way mapping Seasonal/trend sep. Parallel decomp

All variants share: direct mapping along time, full order exposure, single-layer weights, and high sample efficiency.


These properties and empirical findings make LTSF-Linear an indispensable reference point, and an essential tool for robust and interpretable time series forecasting (Zeng et al., 2022).

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to LTSF-Linear Models.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube