TimesNet: 2D Temporal Variation Neural Architecture

Updated 17 February 2026

TimesNet is a neural architecture for time series analysis that converts 1D sequences into 2D tensors by adaptively discovering periodic patterns.
It employs inception-style 2D convolutional modules to decouple intraperiod and interperiod variations, enhancing efficiency and accuracy.
Empirical results demonstrate that TimesNet outperforms many baselines in forecasting, imputation, classification, and anomaly detection tasks.

TimesNet is a neural architecture for general time series analysis leveraging 2D temporal variation modeling. Developed to address the inherent representational limitations of modeling intricate temporal patterns with 1D sequence models, TimesNet introduces a learnable mapping from 1D multivariate time series to a collection of 2D tensors structured by adaptive, automatically discovered periods. The method disaggregates complex temporal variations into intraperiod and interperiod components, enabling efficient modeling through parameter-efficient 2D convolutional modules. Empirical results establish TimesNet as a state-of-the-art backbone for core time series tasks, including long- and short-term forecasting, imputation, classification, and anomaly detection (Wu et al., 2022).

1. Mathematical Formulation: 1D to 2D Periodic Reshaping

Given a multivariate time series $X_{\rm 1D} \in \mathbb{R}^{T \times C}$ , where $T$ is sequence length and $C$ the channel dimension, TimesNet utilizes an adaptive per-block search for the dominant $k$ periodicities. This is performed by averaging the amplitude spectrum from a Fast Fourier Transform (FFT) over the temporal axis:

$A = \mathrm{Avg}\bigl(\mathrm{Amp}(\mathrm{FFT}(X_{\rm 1D}))\bigr)\in\mathbb{R}^{T}$

$\{f_1,\ldots,f_k\} = \arg\text{-}\mathrm{Topk}_{\,j\in[1,T/2]}A_j, \quad p_i = \lceil T/f_i \rceil. \tag{1}$

Each frequency $f_i$ induces a period length $p_i$ . After zero-padding $X_{\rm 1D}$ if necessary, the data is reshaped into $k$ 2D tensors:

$X_{\rm 2D}^i = \mathrm{Reshape}_{(p_i, f_i)}\bigl(\mathrm{Padding}(X_{\rm 1D})\bigr) \in \mathbb{R}^{p_i \times f_i \times C},\quad i=1,\dots,k. \tag{2}$

Columns ( $p_i$ ) in each tensor capture intraperiod variations (within one period), while rows ( $f_i$ ) encode interperiod variations (across period repetitions). This reframing exposes both types of temporal structure to 2D convolutional processing.

2. Network Architecture and TimesBlock

TimesNet comprises a stack of $L$ residual TimesBlocks, each operating on the 1D sequence representation. For block $l$ , the pipeline is as follows:

$\begin{aligned} (A^{l-1}, \{f_i, p_i\}_{i=1}^k) &= \mathrm{Period}(X_{\rm 1D}^{l-1}), \ X_{\rm 2D}^{l,i} &= \mathrm{Reshape}_{(p_i, f_i)}(\mathrm{Padding}(X_{\rm 1D}^{l-1})), \ \widehat X_{\rm 2D}^{l,i} &= \mathrm{Inception}(X_{\rm 2D}^{l,i}), \ \widehat X_{\rm 1D}^{l,i} &= \mathrm{Trunc}( \mathrm{Reshape}_{(1, p_i f_i)} (\widehat X_{\rm 2D}^{l,i}) ), \ \alpha_i &= \frac{\exp(A^{l-1}_{f_i})}{\sum_{j=1}^k\exp(A^{l-1}_{f_j})}, \ X_{\rm 1D}^l &= \sum_{i=1}^k \alpha_i \widehat X_{\rm 1D}^{l,i}. \end{aligned} \tag{3}$

$X_{\rm 1D}^l \longleftarrow X_{\rm 1D}^l + X_{\rm 1D}^{l-1}. \tag{4}$

Key components:

Adaptive multi-periodicity: Period sets $\{p_i\}$ and their importances $\{\alpha_i\}$ are re-estimated in every block using FFT outputs.
Inception-style 2D module: Each $\mathrm{Inception}(\cdot)$ applies four parallel 2D convolutions ( $1\times1$ , $3\times3$ , $5\times5$ kernels, plus a pooling branch), with $1\times1$ bottlenecks. Parameters are shared across all $k$ period tensors, decoupling model size from $k$ .
Parameter efficiency: Any standard 2D convolutional vision backbone (ResNet, ResNeXt, ConvNeXt, Swin, etc.) can be substituted for the shared-Inception for accuracy/model-size tradeoff.
Residual propagation: Outputs are aggregated via softmax-weighted sum and added back to the block input, ensuring stable optimization.

3. Training Protocol and Hyperparameterization

Training employs Adam ( $\beta_1{=}0.9$ , $\beta_2{=}0.999$ ) with no additional regularization (no dropout or weight decay). Series-stationarization from "Non-stationary Transformers" is applied to reduce domain shift. Task-specific loss functions are used:

Forecasting/Imputation: Mean Squared Error (MSE), with additional SMAPE, MASE, OWA for short-term forecasting.
Classification: Cross-entropy loss.
Anomaly detection: Reconstruction MSE with thresholding for point-wise anomaly labeling.

The key hyperparameters are shown below:

Task	$k$	Layers	$d_{\min},d_{\max}$	Learning Rate	Batch Size
Long-term forecast	5	2	(32, 512)	$10^{-4}$	32
Short-term forecast	5	2	(16, 64)	$10^{-3}$	16
Imputation	3	2	(64, 128)	$10^{-3}$	16
Classification	3	2	(32, 64)	$10^{-3}$	16
Anomaly detection	3	3	(32,128)	$10^{-4}$	128

No search for additional regularization was required.

4. Empirical Results Across Mainstream Time Series Tasks

TimesNet was evaluated against over fifteen strong baselines, including RNN, TCN, MLP, Transformer, and task-specialized architectures, across five canonical tasks. Summary metrics are provided below; italics denote the second-best.

Long-term forecasting (MSE, MAE, avg. over six benchmarks, four horizons each):

Method	MSE	MAE
Autoformer	0.481	0.456
FEDformer	0.448	0.452
TimesNet	0.400	0.406

TimesNet achieved best scores in 40 out of 44 settings.

Short-term forecasting (M4: SMAPE, MASE, OWA):

Method	SMAPE	MASE	OWA
N-HiTS	11.93	1.61	0.86
N-BEATS	11.85	1.60	0.86
TimesNet	11.83	1.59	0.85

Imputation (avg. MSE, MAE, mask ratios 12.5–50%):

Method	MSE	MAE
ETSformer	0.120	0.253
DLinear	0.093	0.206
TimesNet	0.027	0.107

Classification (10 UEA datasets, accuracy %):

Method	Accuracy
Rocket	72.5
Flowformer	73.0
TimesNet	73.6

Anomaly detection (5 benchmarks, F1 %):

Method	F1
Anomaly Transformer	80.5
FEDformer	84.9
TimesNet	86.3

5. Ablation Studies and Analysis

Key ablation findings establish the robustness and significance of design choices:

Inception module replacement: Switching to ResNeXt, ConvNeXt, or Swin increases anomaly F1 from 85.5% to 86.9%.
Fixed vs. per-block period estimation: Omitting per-block period re-estimation (transforming only the raw input) reduces average F1 to 84.9%.
Importance weighting ablation: Removing the softmax-based weighting for $\{\widehat X_{\rm 1D}^{l,i}\}$ lowers F1 to 85.3%.

These studies confirm the core value of decoupling and jointly modeling intraperiod and interperiod variations, stacking such blocks with periodicity re-estimation, and softmax-based tensor combination for overall performance.

6. Significance and Implications

TimesNet provides a task-general backbone for time series analysis through three main principles: (1) explicit disentanglement of intraperiod vs. interperiod variations by a 1D to 2D transformation; (2) joint modeling via efficient, shared 2D convolutional modules; (3) adaptive per-block periodicity discovery. A plausible implication is that general time series architectures benefit from leveraging ideas originating in computer vision (e.g., 2D convolutions, inception modules) through principled representations that align with temporal periodicities. The architecture's generality, strong empirical results, and extensibility to other backbone modules suggest broad applicability throughout time-series learning paradigms (Wu et al., 2022).

Markdown Report Issue Upgrade to Chat

References (1)

TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to TimesNet.

TimesNet: 2D Temporal Variation Neural Architecture

1. Mathematical Formulation: 1D to 2D Periodic Reshaping

2. Network Architecture and TimesBlock

3. Training Protocol and Hyperparameterization

4. Empirical Results Across Mainstream Time Series Tasks

5. Ablation Studies and Analysis

6. Significance and Implications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

TimesNet: 2D Temporal Variation Neural Architecture

1. Mathematical Formulation: 1D to 2D Periodic Reshaping

2. Network Architecture and TimesBlock

3. Training Protocol and Hyperparameterization

4. Empirical Results Across Mainstream Time Series Tasks

5. Ablation Studies and Analysis

6. Significance and Implications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research