PhaseFormer: From Patches to Phases for Efficient and Effective Time Series Forecasting (2510.04134v1)
Abstract: Periodicity is a fundamental characteristic of time series data and has long played a central role in forecasting. Recent deep learning methods strengthen the exploitation of periodicity by treating patches as basic tokens, thereby improving predictive effectiveness. However, their efficiency remains a bottleneck due to large parameter counts and heavy computational costs. This paper provides, for the first time, a clear explanation of why patch-level processing is inherently inefficient, supported by strong evidence from real-world data. To address these limitations, we introduce a phase perspective for modeling periodicity and present an efficient yet effective solution, PhaseFormer. PhaseFormer features phase-wise prediction through compact phase embeddings and efficient cross-phase interaction enabled by a lightweight routing mechanism. Extensive experiments demonstrate that PhaseFormer achieves state-of-the-art performance with around 1k parameters, consistently across benchmark datasets. Notably, it excels on large-scale and complex datasets, where models with comparable efficiency often struggle. This work marks a significant step toward truly efficient and effective time series forecasting. Code is available at this repository: https://github.com/neumyor/PhaseFormer_TSL
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days freePaper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Knowledge Gaps
Knowledge Gaps, Limitations, and Open Questions
The following list captures what remains missing, uncertain, or unexplored, framed to be concrete and actionable for future research:
- Robustness to weak or absent periodicity: Quantify PhaseFormer’s performance on datasets with weak/irregular seasonality, regime shifts, holidays, or event-driven dynamics (including controlled stress tests where periodicity is deliberately degraded or absent).
- Period length estimation reliability: Systematically evaluate and improve the robustness of estimation (autocorrelation/frequency methods) under noise, aliasing, multi-seasonality, and time-varying cycles; include ablations on mis-specified and adaptive/online period detection.
- Multi-seasonality and time-varying cycles: Extend tokenization to handle multiple concurrent periods (e.g., daily/weekly/annual) and periods that drift over time; assess whether multi-resolution or hierarchical phase representations improve accuracy and stability.
- Channel dependency and multivariate interactions: The model is channel-independent; develop and evaluate channel-dependent variants (e.g., cross-variable routing, shared routers across variables) on datasets with strong inter-variable couplings, and compare against iTransformer/Crossformer.
- External covariates and calendar effects: Incorporate exogenous regressors (weather, calendar/holiday flags, interventions) and quantify gains; assess sensitivity to missing or noisy covariates.
- Irregular sampling and missing data: Benchmark PhaseFormer on irregularly sampled series and varying sampling frequencies; paper resilience to missing values, gaps, and outliers, including imputation strategies and robust losses.
- Probabilistic forecasting and uncertainty: Extend PhaseFormer to produce calibrated probabilistic outputs (e.g., quantile, CRPS, pinball loss) and evaluate reliability under distribution shifts.
- Trend modeling and non-stationarity: Test and enhance handling of non-stationary trends beyond periodic components (e.g., additive/multiplicative trends, structural breaks); determine when excluding explicit cycle modeling harms accuracy.
- Theoretical assumptions and applicability: Validate Theorem 1’s assumptions (linear low-rank generative structure, spectral separation) on real datasets; extend theory to nonlinear dynamics, multi-seasonality, and time-varying transformations .
- Router design and adaptivity: Explore data-dependent or dynamic router mechanisms (e.g., gating/MoE, learned router count , per-phase or per-variable routers) and quantify trade-offs between accuracy and efficiency.
- Hyperparameter sensitivity and auto-tuning: Provide comprehensive analyses for , , number of layers , positional embedding design, and normalization choices; develop automatic selection criteria or meta-learning for hyperparameters.
- Boundary effects from circular padding: Measure how circular padding influences forecasts near sequence boundaries and propose alternatives (e.g., reflective padding, learned padding tokens, boundary-aware losses).
- Scalability claims vs. wall-clock: Report end-to-end training/inference time, memory footprint, and energy usage across hardware (CPU/GPU) and varying , ; validate linear scaling empirically for long horizons and extreme sequence lengths.
- Parameter count scaling: Clarify and quantify how parameters grow with and ; test “~1k parameters” under longer horizons, smaller , and multiple periods.
- Comparative fairness and tuning: Ensure baselines are tuned comparably under identical preprocessing, period detection, and normalization; include classical seasonal-naïve and decomposition baselines to contextualize gains.
- Multi-horizon and rolling strategies: Compare direct multi-step phase-wise prediction to iterative/rolling strategies; analyze error accumulation behavior and horizon-dependent trade-offs.
- Interpretability and phase semantics: Move beyond attention heatmaps to causal/semantic validation (e.g., alignment with known schedules, holidays, infrastructure changes), and quantify interpretability benefits for practitioners.
- Deployment in streaming/OOD settings: Evaluate online adaptation, concept drift handling, and OOD detection; measure robustness under real-time constraints, delayed or missing inputs, and sudden regime changes.
- Generalization across domains: Study transfer learning/pretraining across datasets and domains; assess cross-dataset generalization and the stability of phase embeddings learned on source tasks.
- Integration with frequency-domain methods: Investigate hybrid phase–frequency architectures, joint learning of filters and phase tokens, and whether spectral priors further compress or denoise phase spaces.
- Loss functions and robustness: Benchmark alternative losses (Huber, quantile, asymmetric) under heavy-tailed noise and outliers; analyze sensitivity to normalization schemes (e.g., RIN vs. standard normalization).
- Evaluation breadth: Include additional metrics (e.g., MAPE, SMAPE, WAPE, calibration error), tail-focused evaluations, and subgroup analyses (weekday/weekend, peak/off-peak) to better characterize strengths and failure modes.
Collections
Sign up for free to add this paper to one or more collections.