Papers
Topics
Authors
Recent
2000 character limit reached

PhaseFormer: From Patches to Phases for Efficient and Effective Time Series Forecasting (2510.04134v1)

Published 5 Oct 2025 in cs.LG and cs.AI

Abstract: Periodicity is a fundamental characteristic of time series data and has long played a central role in forecasting. Recent deep learning methods strengthen the exploitation of periodicity by treating patches as basic tokens, thereby improving predictive effectiveness. However, their efficiency remains a bottleneck due to large parameter counts and heavy computational costs. This paper provides, for the first time, a clear explanation of why patch-level processing is inherently inefficient, supported by strong evidence from real-world data. To address these limitations, we introduce a phase perspective for modeling periodicity and present an efficient yet effective solution, PhaseFormer. PhaseFormer features phase-wise prediction through compact phase embeddings and efficient cross-phase interaction enabled by a lightweight routing mechanism. Extensive experiments demonstrate that PhaseFormer achieves state-of-the-art performance with around 1k parameters, consistently across benchmark datasets. Notably, it excels on large-scale and complex datasets, where models with comparable efficiency often struggle. This work marks a significant step toward truly efficient and effective time series forecasting. Code is available at this repository: https://github.com/neumyor/PhaseFormer_TSL

Summary

  • The paper introduces PhaseFormer, replacing patch tokenization with phase tokenization to capture periodicity and reduce variability.
  • It employs a routing transformer with low-dimensional embeddings and cross-phase routing to efficiently model temporal dependencies.
  • Experimental results demonstrate nearly 99.9% reduction in parameters and computational load while sustaining high predictive accuracy on large-scale datasets.

Efficient and Effective Time Series Forecasting with PhaseFormer

PhaseFormer introduces a novel approach to time series forecasting by focusing on phase tokenization rather than the commonly used patch tokenization. The aim is to address inefficiencies observed in patch-based models, particularly when handling large and complex datasets. The paper emphasizes the importance of periodicity and proposes strategies to improve both forecasting accuracy and computational efficiency.

Background on Time Series Forecasting

Time series forecasting plays a critical role in various fields, such as finance, climate science, and energy management, by enabling predictions based on historical data. Recent advancements in deep learning have exploited periodicity through patch tokenization, treating patches as basic tokens to capture temporal patterns. However, these methods face challenges due to their computational demands and scalability issues, especially with the variability of cycle patterns in real-world data.

Phase-Based Perspective

PhaseFormer proposes a phase-based perspective that focuses on values aligned at similar offsets across cycles, leading to phase tokens with lower variability than patch tokens. This approach leverages the global stationarity of phase tokens and their low-dimensional representation space to efficiently forecast time series without compromising accuracy. The reduced dimensionality eases the computational burden, enabling consistent performance across diverse datasets.

Methodology

Data Pre-Processing

PhaseFormer preprocesses input data by transforming sequences into a phase-period matrix using phase tokenization, where each matrix entry corresponds to a specific phase-period value. This transformation, combined with normalization techniques, prepares the data for efficient processing.

Phase-Based Routing Transformer

PhaseFormer utilizes a routing Transformer architecture, which includes:

  1. Embedding Layer: Projects phase tokens into a low-dimensional space enriched with positional embeddings to capture temporal ordering.
  2. Cross-Phase Routing Layer: Facilitates efficient interaction between phase tokens using learnable routers to manage cross-phase dependencies through lightweight attention mechanisms.
  3. Predictor: Maps refined phase representations to forecasts using shared parameters to maintain consistency and reduce complexity.

Complexity Analysis

PhaseFormer's architecture scales linearly with input and output lengths thanks to its efficient phase tokenization and routing mechanisms, leading to substantial reductions in computational costs compared to patch-based approaches.

Experimental Results

PhaseFormer demonstrates superior performance across various datasets, consistently outperforming baseline models in both accuracy and efficiency. It achieves approximately 99.9% reduction in parameters and computational load while maintaining high predictive accuracy, especially on large-scale datasets like Traffic and Electricity. This robustness indicates its capability to handle complex real-world scenarios better than specialized models with comparable efficiency.

Ablation Studies

Evaluation of the cross-phase routing layer underscored its importance for capturing periodic dynamics and demonstrated that PhaseFormer could achieve substantial performance gains in efficiency and effectiveness. Furthermore, experiments with varying the number of routers revealed that lower numbers sufficed for strong performance due to the inherent low dimensionality of the phase token space.

Case Study

A detailed analysis on a sample from the Traffic dataset showed that PhaseFormer could capture temporal consistency through its routing mechanism, distinguishing among phase tokens and modeling their periodicity and trend characteristics.

Conclusion

PhaseFormer presents a paradigm shift in time series forecasting by efficiently leveraging phase tokenization to navigate periodic patterns and achieve significant gains in efficiency and accuracy. Although the current approach assumes locally stable periodicity, it opens pathways for future research to model non-stationarity and complex drifts, potentially establishing PhaseFormer as a benchmark for long-term forecasting.

Future Work

Future studies will explore ways to relax the assumptions of phase stability and further enhance PhaseFormer’s capabilities under irregular cycle patterns. The goal is to develop resilient phase representations that maintain robustness across varied real-world conditions, solidifying PhaseFormer’s position as a leading model in time-series forecasting.

By focusing on phase tokenization, the authors have charted a practical route towards developing efficient, lightweight forecasting models without sacrificing accuracy, demonstrating the potential for deploying AI solutions in resource-constrained environments.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Knowledge Gaps

Knowledge Gaps, Limitations, and Open Questions

The following list captures what remains missing, uncertain, or unexplored, framed to be concrete and actionable for future research:

  • Robustness to weak or absent periodicity: Quantify PhaseFormer’s performance on datasets with weak/irregular seasonality, regime shifts, holidays, or event-driven dynamics (including controlled stress tests where periodicity is deliberately degraded or absent).
  • Period length estimation reliability: Systematically evaluate and improve the robustness of LphaseL_{\text{phase}} estimation (autocorrelation/frequency methods) under noise, aliasing, multi-seasonality, and time-varying cycles; include ablations on mis-specified LphaseL_{\text{phase}} and adaptive/online period detection.
  • Multi-seasonality and time-varying cycles: Extend tokenization to handle multiple concurrent periods (e.g., daily/weekly/annual) and periods that drift over time; assess whether multi-resolution or hierarchical phase representations improve accuracy and stability.
  • Channel dependency and multivariate interactions: The model is channel-independent; develop and evaluate channel-dependent variants (e.g., cross-variable routing, shared routers across variables) on datasets with strong inter-variable couplings, and compare against iTransformer/Crossformer.
  • External covariates and calendar effects: Incorporate exogenous regressors (weather, calendar/holiday flags, interventions) and quantify gains; assess sensitivity to missing or noisy covariates.
  • Irregular sampling and missing data: Benchmark PhaseFormer on irregularly sampled series and varying sampling frequencies; paper resilience to missing values, gaps, and outliers, including imputation strategies and robust losses.
  • Probabilistic forecasting and uncertainty: Extend PhaseFormer to produce calibrated probabilistic outputs (e.g., quantile, CRPS, pinball loss) and evaluate reliability under distribution shifts.
  • Trend modeling and non-stationarity: Test and enhance handling of non-stationary trends beyond periodic components (e.g., additive/multiplicative trends, structural breaks); determine when excluding explicit cycle modeling harms accuracy.
  • Theoretical assumptions and applicability: Validate Theorem 1’s assumptions (linear low-rank generative structure, spectral separation) on real datasets; extend theory to nonlinear dynamics, multi-seasonality, and time-varying transformations S(t)S(t).
  • Router design and adaptivity: Explore data-dependent or dynamic router mechanisms (e.g., gating/MoE, learned router count MM, per-phase or per-variable routers) and quantify trade-offs between accuracy and efficiency.
  • Hyperparameter sensitivity and auto-tuning: Provide comprehensive analyses for dd, MM, number of layers NN, positional embedding design, and normalization choices; develop automatic selection criteria or meta-learning for hyperparameters.
  • Boundary effects from circular padding: Measure how circular padding influences forecasts near sequence boundaries and propose alternatives (e.g., reflective padding, learned padding tokens, boundary-aware losses).
  • Scalability claims vs. wall-clock: Report end-to-end training/inference time, memory footprint, and energy usage across hardware (CPU/GPU) and varying LinL_{\text{in}}, LoutL_{\text{out}}; validate linear scaling empirically for long horizons and extreme sequence lengths.
  • Parameter count scaling: Clarify and quantify how parameters grow with Pin=Lin/LphaseP_{\text{in}}=\lfloor L_{\text{in}}/L_{\text{phase}} \rfloor and Pout=Lout/LphaseP_{\text{out}}=\lfloor L_{\text{out}}/L_{\text{phase}} \rfloor; test “~1k parameters” under longer horizons, smaller LphaseL_{\text{phase}}, and multiple periods.
  • Comparative fairness and tuning: Ensure baselines are tuned comparably under identical preprocessing, period detection, and normalization; include classical seasonal-naïve and decomposition baselines to contextualize gains.
  • Multi-horizon and rolling strategies: Compare direct multi-step phase-wise prediction to iterative/rolling strategies; analyze error accumulation behavior and horizon-dependent trade-offs.
  • Interpretability and phase semantics: Move beyond attention heatmaps to causal/semantic validation (e.g., alignment with known schedules, holidays, infrastructure changes), and quantify interpretability benefits for practitioners.
  • Deployment in streaming/OOD settings: Evaluate online adaptation, concept drift handling, and OOD detection; measure robustness under real-time constraints, delayed or missing inputs, and sudden regime changes.
  • Generalization across domains: Study transfer learning/pretraining across datasets and domains; assess cross-dataset generalization and the stability of phase embeddings learned on source tasks.
  • Integration with frequency-domain methods: Investigate hybrid phase–frequency architectures, joint learning of filters and phase tokens, and whether spectral priors further compress or denoise phase spaces.
  • Loss functions and robustness: Benchmark alternative losses (Huber, quantile, asymmetric) under heavy-tailed noise and outliers; analyze sensitivity to normalization schemes (e.g., RIN vs. standard normalization).
  • Evaluation breadth: Include additional metrics (e.g., MAPE, SMAPE, WAPE, calibration error), tail-focused evaluations, and subgroup analyses (weekday/weekend, peak/off-peak) to better characterize strengths and failure modes.
Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Github Logo Streamline Icon: https://streamlinehq.com