Papers
Topics
Authors
Recent
Search
2000 character limit reached

TimesNet: 2D Temporal Variation Neural Architecture

Updated 17 February 2026
  • TimesNet is a neural architecture for time series analysis that converts 1D sequences into 2D tensors by adaptively discovering periodic patterns.
  • It employs inception-style 2D convolutional modules to decouple intraperiod and interperiod variations, enhancing efficiency and accuracy.
  • Empirical results demonstrate that TimesNet outperforms many baselines in forecasting, imputation, classification, and anomaly detection tasks.

TimesNet is a neural architecture for general time series analysis leveraging 2D temporal variation modeling. Developed to address the inherent representational limitations of modeling intricate temporal patterns with 1D sequence models, TimesNet introduces a learnable mapping from 1D multivariate time series to a collection of 2D tensors structured by adaptive, automatically discovered periods. The method disaggregates complex temporal variations into intraperiod and interperiod components, enabling efficient modeling through parameter-efficient 2D convolutional modules. Empirical results establish TimesNet as a state-of-the-art backbone for core time series tasks, including long- and short-term forecasting, imputation, classification, and anomaly detection (Wu et al., 2022).

1. Mathematical Formulation: 1D to 2D Periodic Reshaping

Given a multivariate time series X1DRT×CX_{\rm 1D} \in \mathbb{R}^{T \times C}, where TT is sequence length and CC the channel dimension, TimesNet utilizes an adaptive per-block search for the dominant kk periodicities. This is performed by averaging the amplitude spectrum from a Fast Fourier Transform (FFT) over the temporal axis:

A=Avg(Amp(FFT(X1D)))RTA = \mathrm{Avg}\bigl(\mathrm{Amp}(\mathrm{FFT}(X_{\rm 1D}))\bigr)\in\mathbb{R}^{T}

{f1,,fk}=arg-Topkj[1,T/2]Aj,pi=T/fi.(1)\{f_1,\ldots,f_k\} = \arg\text{-}\mathrm{Topk}_{\,j\in[1,T/2]}A_j, \quad p_i = \lceil T/f_i \rceil. \tag{1}

Each frequency fif_i induces a period length pip_i. After zero-padding X1DX_{\rm 1D} if necessary, the data is reshaped into kk 2D tensors:

X2Di=Reshape(pi,fi)(Padding(X1D))Rpi×fi×C,i=1,,k.(2)X_{\rm 2D}^i = \mathrm{Reshape}_{(p_i, f_i)}\bigl(\mathrm{Padding}(X_{\rm 1D})\bigr) \in \mathbb{R}^{p_i \times f_i \times C},\quad i=1,\dots,k. \tag{2}

Columns (pip_i) in each tensor capture intraperiod variations (within one period), while rows (fif_i) encode interperiod variations (across period repetitions). This reframing exposes both types of temporal structure to 2D convolutional processing.

2. Network Architecture and TimesBlock

TimesNet comprises a stack of LL residual TimesBlocks, each operating on the 1D sequence representation. For block ll, the pipeline is as follows:

(Al1,{fi,pi}i=1k)=Period(X1Dl1), X2Dl,i=Reshape(pi,fi)(Padding(X1Dl1)), X^2Dl,i=Inception(X2Dl,i), X^1Dl,i=Trunc(Reshape(1,pifi)(X^2Dl,i)), αi=exp(Afil1)j=1kexp(Afjl1), X1Dl=i=1kαiX^1Dl,i.(3)\begin{aligned} (A^{l-1}, \{f_i, p_i\}_{i=1}^k) &= \mathrm{Period}(X_{\rm 1D}^{l-1}), \ X_{\rm 2D}^{l,i} &= \mathrm{Reshape}_{(p_i, f_i)}(\mathrm{Padding}(X_{\rm 1D}^{l-1})), \ \widehat X_{\rm 2D}^{l,i} &= \mathrm{Inception}(X_{\rm 2D}^{l,i}), \ \widehat X_{\rm 1D}^{l,i} &= \mathrm{Trunc}( \mathrm{Reshape}_{(1, p_i f_i)} (\widehat X_{\rm 2D}^{l,i}) ), \ \alpha_i &= \frac{\exp(A^{l-1}_{f_i})}{\sum_{j=1}^k\exp(A^{l-1}_{f_j})}, \ X_{\rm 1D}^l &= \sum_{i=1}^k \alpha_i \widehat X_{\rm 1D}^{l,i}. \end{aligned} \tag{3}

X1DlX1Dl+X1Dl1.(4)X_{\rm 1D}^l \longleftarrow X_{\rm 1D}^l + X_{\rm 1D}^{l-1}. \tag{4}

Key components:

  • Adaptive multi-periodicity: Period sets {pi}\{p_i\} and their importances {αi}\{\alpha_i\} are re-estimated in every block using FFT outputs.
  • Inception-style 2D module: Each Inception()\mathrm{Inception}(\cdot) applies four parallel 2D convolutions (1×11\times1, 3×33\times3, 5×55\times5 kernels, plus a pooling branch), with 1×11\times1 bottlenecks. Parameters are shared across all kk period tensors, decoupling model size from kk.
  • Parameter efficiency: Any standard 2D convolutional vision backbone (ResNet, ResNeXt, ConvNeXt, Swin, etc.) can be substituted for the shared-Inception for accuracy/model-size tradeoff.
  • Residual propagation: Outputs are aggregated via softmax-weighted sum and added back to the block input, ensuring stable optimization.

3. Training Protocol and Hyperparameterization

Training employs Adam (β1=0.9\beta_1{=}0.9, β2=0.999\beta_2{=}0.999) with no additional regularization (no dropout or weight decay). Series-stationarization from "Non-stationary Transformers" is applied to reduce domain shift. Task-specific loss functions are used:

  • Forecasting/Imputation: Mean Squared Error (MSE), with additional SMAPE, MASE, OWA for short-term forecasting.
  • Classification: Cross-entropy loss.
  • Anomaly detection: Reconstruction MSE with thresholding for point-wise anomaly labeling.

The key hyperparameters are shown below:

Task kk Layers dmin,dmaxd_{\min},d_{\max} Learning Rate Batch Size
Long-term forecast 5 2 (32, 512) 10410^{-4} 32
Short-term forecast 5 2 (16, 64) 10310^{-3} 16
Imputation 3 2 (64, 128) 10310^{-3} 16
Classification 3 2 (32, 64) 10310^{-3} 16
Anomaly detection 3 3 (32,128) 10410^{-4} 128

No search for additional regularization was required.

4. Empirical Results Across Mainstream Time Series Tasks

TimesNet was evaluated against over fifteen strong baselines, including RNN, TCN, MLP, Transformer, and task-specialized architectures, across five canonical tasks. Summary metrics are provided below; italics denote the second-best.

Long-term forecasting (MSE, MAE, avg. over six benchmarks, four horizons each):

Method MSE MAE
Autoformer 0.481 0.456
FEDformer 0.448 0.452
TimesNet 0.400 0.406

TimesNet achieved best scores in 40 out of 44 settings.

Short-term forecasting (M4: SMAPE, MASE, OWA):

Method SMAPE MASE OWA
N-HiTS 11.93 1.61 0.86
N-BEATS 11.85 1.60 0.86
TimesNet 11.83 1.59 0.85

Imputation (avg. MSE, MAE, mask ratios 12.5–50%):

Method MSE MAE
ETSformer 0.120 0.253
DLinear 0.093 0.206
TimesNet 0.027 0.107

Classification (10 UEA datasets, accuracy %):

Method Accuracy
Rocket 72.5
Flowformer 73.0
TimesNet 73.6

Anomaly detection (5 benchmarks, F1 %):

Method F1
Anomaly Transformer 80.5
FEDformer 84.9
TimesNet 86.3

5. Ablation Studies and Analysis

Key ablation findings establish the robustness and significance of design choices:

  • Inception module replacement: Switching to ResNeXt, ConvNeXt, or Swin increases anomaly F1 from 85.5% to 86.9%.
  • Fixed vs. per-block period estimation: Omitting per-block period re-estimation (transforming only the raw input) reduces average F1 to 84.9%.
  • Importance weighting ablation: Removing the softmax-based weighting for {X^1Dl,i}\{\widehat X_{\rm 1D}^{l,i}\} lowers F1 to 85.3%.

These studies confirm the core value of decoupling and jointly modeling intraperiod and interperiod variations, stacking such blocks with periodicity re-estimation, and softmax-based tensor combination for overall performance.

6. Significance and Implications

TimesNet provides a task-general backbone for time series analysis through three main principles: (1) explicit disentanglement of intraperiod vs. interperiod variations by a 1D to 2D transformation; (2) joint modeling via efficient, shared 2D convolutional modules; (3) adaptive per-block periodicity discovery. A plausible implication is that general time series architectures benefit from leveraging ideas originating in computer vision (e.g., 2D convolutions, inception modules) through principled representations that align with temporal periodicities. The architecture's generality, strong empirical results, and extensibility to other backbone modules suggest broad applicability throughout time-series learning paradigms (Wu et al., 2022).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to TimesNet.