Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

TFB: Towards Comprehensive and Fair Benchmarking of Time Series Forecasting Methods (2403.20150v3)

Published 29 Mar 2024 in cs.LG, cs.AI, and cs.CY

Abstract: Time series are generated in diverse domains such as economic, traffic, health, and energy, where forecasting of future values has numerous important applications. Not surprisingly, many forecasting methods are being proposed. To ensure progress, it is essential to be able to study and compare such methods empirically in a comprehensive and reliable manner. To achieve this, we propose TFB, an automated benchmark for Time Series Forecasting (TSF) methods. TFB advances the state-of-the-art by addressing shortcomings related to datasets, comparison methods, and evaluation pipelines: 1) insufficient coverage of data domains, 2) stereotype bias against traditional methods, and 3) inconsistent and inflexible pipelines. To achieve better domain coverage, we include datasets from 10 different domains: traffic, electricity, energy, the environment, nature, economic, stock markets, banking, health, and the web. We also provide a time series characterization to ensure that the selected datasets are comprehensive. To remove biases against some methods, we include a diverse range of methods, including statistical learning, machine learning, and deep learning methods, and we also support a variety of evaluation strategies and metrics to ensure a more comprehensive evaluations of different methods. To support the integration of different methods into the benchmark and enable fair comparisons, TFB features a flexible and scalable pipeline that eliminates biases. Next, we employ TFB to perform a thorough evaluation of 21 Univariate Time Series Forecasting (UTSF) methods on 8,068 univariate time series and 14 Multivariate Time Series Forecasting (MTSF) methods on 25 datasets. The benchmark code and data are available at https://github.com/decisionintelligence/TFB.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (99)
  1. Energy time series forecasting based on pattern sequence similarity. IEEE Transactions on Knowledge and Data Engineering 23, 8 (2010), 1230–1243.
  2. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271 (2018).
  3. Libra: A benchmark for time series forecasting methods. In Proceedings of the ACM/SPEC International Conference on Performance Engineering. 189–200.
  4. George EP Box and David A Pierce. 1970. Distribution of residual autocorrelations in autoregressive-integrated moving average time series models. Journal of the American statistical Association 65, 332 (1970), 1509–1526.
  5. Leo Breiman. 2001. Random forests. Machine learning 45 (2001), 5–32.
  6. Rasmus Bro and Age K Smilde. 2014. Principal component analysis. Analytical methods 6, 9 (2014), 2812–2831.
  7. Unsupervised Time Series Outlier Detection with Diversity-Driven Convolutional Ensembles. Proc. VLDB Endow. 15, 3 (2022), 611–623.
  8. Nhits: Neural hierarchical interpolation for time series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 6989–6997.
  9. Pathformer: Multi-scale transformers with Adaptive Pathways for Time Series Forecasting. arXiv preprint arXiv:2402.05956 (2024).
  10. Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In SIGKDD. 785–794.
  11. Weakly guided adaptation for robust time series forecasting. Proceedings of the VLDB Endowment 17, 4 (2023), 766–779.
  12. Triformer: Triangular, Variable-Specific Attentions for Long Sequence Multivariate Time Series Forecasting. In IJCAI. 1994–2001.
  13. Towards Spatio-Temporal Aware Traffic Time Series Forecasting. In ICDE. 2900–2913.
  14. EnhanceNet: Plugin Neural Networks for Enhancing Correlated Time Series Forecasting.. In ICDE. 1739–1750.
  15. Graph Attention Recurrent Neural Networks for Correlated Time Series Forecasting.. In MileTS19@KDD.
  16. STL: A seasonal-trend decomposition. J. Off. Stat 6, 1 (1990), 3–73.
  17. Pearson correlation coefficient. Noise reduction in speech processing (2009), 1–4.
  18. Long-term Forecasting with TiDE: Time-series Dense Encoder. arXiv preprint arXiv:2304.08424 (2023).
  19. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248–255.
  20. Efficient tests for an autoregressive unit root.
  21. Cristian Challú Kin G. Olivares Federico Garza, Max Mergenthaler Canseco. 2022. StatsForecast: Lightning fast forecasting with statistical and econometric models. PyCon Salt Lake City, Utah, US 2022. https://github.com/Nixtla/statsforecast
  22. Temporal relational ranking for stock prediction. ACM Transactions on Information Systems (TOIS) 37, 2 (2019), 1–30.
  23. A machine learning approach to univariate time series forecasting of quarterly earnings. Review of Quantitative Finance and Accounting 55 (2020), 1163–1179.
  24. Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics (2001), 1189–1232.
  25. Probabilistic forecasting with spline quantile function RNNs. In The 22nd international conference on artificial intelligence and statistics. PMLR, 1901–1910.
  26. Monash time series forecasting archive. arXiv preprint arXiv:2105.06643 (2021).
  27. Towards Total Traffic Awareness. SIGMOD Record 43, 3 (2014), 18–23.
  28. Ecomark 2.0: empowering eco-routing with vehicular environmental models and actual vehicle fuel consumption data. GeoInformatica 19 (2015), 567–599.
  29. Context-aware, preference-based vehicle routing. VLDB J. 29, 5 (2020), 1149–1170.
  30. Andrew C Harvey. 1990. Forecasting, structural time series models and the Kalman filter. (1990).
  31. Darts: User-friendly modern machine learning for time series. The Journal of Machine Learning Research 23, 1 (2022), 5442–5447.
  32. Stochastic weight completion for road networks using graph convolutional networks. In 2019 IEEE 35th international conference on data engineering (ICDE). IEEE, 1274–1285.
  33. Risk-aware path selection with time-varying, uncertain travel costs: a time series approach. The VLDB Journal 27 (2018), 179–200.
  34. Enabling time-dependent uncertain eco-weights for road networks. GeoInformatica 21, 1 (2017), 57–88.
  35. Dgraph: A large-scale financial dataset for graph anomaly detection. Advances in Neural Information Processing Systems 35 (2022), 22765–22777.
  36. Forecasting with exponential smoothing: the state space approach. Springer Science & Business Media.
  37. Rob J Hyndman and Anne B Koehler. 2006. Another look at measures of forecast accuracy. International journal of forecasting 22, 4 (2006), 679–688.
  38. Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems 30 (2017).
  39. Benjamin Kedem and Konstantinos Fokianos. 2005. Regression models for time series analysis. John Wiley & Sons.
  40. Anomaly detection in time series with robust variational quasi-recurrent autoencoders. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 1342–1354.
  41. Robust and Explainable Autoencoders for Unsupervised Time Series Outlier Detection. In ICDE. 3038–3050.
  42. Reversible instance normalization for accurate time-series forecasting against distribution shift. In International Conference on Learning Representations.
  43. Modeling long-and short-term temporal patterns with deep neural networks. In The 41st international ACM SIGIR conference on research & development in information retrieval. 95–104.
  44. Big healthcare data analytics: Challenges and applications. Handbook of large-scale distributed computing in smart healthcare (2017), 11–41.
  45. Doyup Lee. 2017. Anomaly detection in multivariate non-stationary time series for automatic DBMS diagnosis. In 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE, 412–419.
  46. Generative time series forecasting with diffusion, denoise, and disentanglement. Advances in Neural Information Processing Systems 35 (2022), 23009–23022.
  47. Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. arXiv preprint arXiv:1707.01926 (2017).
  48. BasicTS: An Open Source Fair Multivariate Time Series Prediction Benchmark. In International Symposium on Benchmarking, Measuring and Optimization. Springer, 87–101.
  49. Non-stationary transformers: Exploring the stationarity in time series forecasting. Advances in Neural Information Processing Systems 35 (2022), 9881–9893.
  50. Feature selection using principal feature analysis. In Proceedings of the 15th ACM international conference on Multimedia. 301–304.
  51. catch22: CAnonical Time-series CHaracteristics: Selected through highly comparative time-series analysis. Data Mining and Knowledge Discovery 33, 6 (2019), 1821–1852.
  52. Spyros Makridakis and Michele Hibon. 2000. The M3-Competition: results, conclusions and implications. International journal of forecasting 16, 4 (2000), 451–476.
  53. The M4 Competition: Results, findings, conclusion and way forward. International Journal of Forecasting 34, 4 (2018), 802–808.
  54. Michael W McCracken and Serena Ng. 2016. FRED-MD: A monthly database for macroeconomic research. Journal of Business & Economic Statistics 34, 4 (2016), 574–589.
  55. A random forest method for real-time price forecasting in New York electricity market. In 2014 IEEE PES General Meeting| Conference & Exposition. IEEE, 1–5.
  56. A Unified Replay-based Continuous Learning Framework for Spatio-Temporal Prediction on Streaming Data. ICDE (2024).
  57. Generative semi-supervised learning for multivariate time series imputation. In Proceedings of the AAAI conference on artificial intelligence, Vol. 35. 8983–8991.
  58. Rob Muspratt and Musa Mammadov. 2023. Anomaly Detection with Sub-Extreme Values: Health Provider Billing. Data Science and Engineering (2023), 1–11.
  59. Guy P Nason. 2006. Stationary and non-stationary time series. (2006).
  60. A time series is worth 64 words: Long-term forecasting with transformers. arXiv preprint arXiv:2211.14730 (2022).
  61. Kevin E O’Grady. 1982. Measures of explained variance: Cautions and limitations. Psychological Bulletin 92, 3 (1982), 766.
  62. N-BEATS: Neural basis expansion analysis for interpretable time series forecasting. arXiv preprint arXiv:1905.10437 (2019).
  63. Magicscaler: Uncertainty-aware, predictive autoscaling. Proceedings of the VLDB Endowment 16, 12 (2023), 3808–3821.
  64. Transfer graph neural networks for pandemic forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 4838–4845.
  65. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019).
  66. Anytime Stochastic Routing with Hybrid Learning. Proc. VLDB Endow. 13, 9 (2020), 1555–1567.
  67. Global transpiration data from sap flow measurements: the SAPFLUXNET database. Earth System Science Data Discussions 2020 (2020), 1–57.
  68. DeepAR: Probabilistic forecasting with autoregressive recurrent networks. International Journal of Forecasting 36, 3 (2020), 1181–1191.
  69. Financial time series forecasting with deep learning: A systematic literature review: 2005–2019. Applied soft computing 90 (2020), 106181.
  70. Exploring Progress in Multivariate Time Series Forecasting: Comprehensive Benchmarking and Heterogeneity Analysis. arXiv preprint arXiv:2310.06119 (2023).
  71. Spatial-temporal synchronous graph convolutional networks: A new framework for spatial-temporal network data forecasting. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 914–921.
  72. A Suilin. 2017. kaggle-web-traffic. https://github.com/Arturus/kaggle-web-traffic
  73. A review and comparison of strategies for multi-step ahead time series forecasting based on the NN5 forecasting competition. Expert systems with applications 39, 8 (2012), 7067–7083.
  74. Monash University, UEA, UCR Time Series Regression Archive. 2020. URl: http://timeseriesregression. org ([n. d.]).
  75. Hiro Y Toda and Peter CB Phillips. 1994. Vector autoregression and causality: a theoretical overview and simulation study. Econometric reviews 13, 2 (1994), 259–285.
  76. Representation learning for early sepsis prediction. In 2019 Computing in Cardiology (CinC). IEEE, 1–4.
  77. Artur Trindade. 2015. ElectricityLoadDiagrams20112014. UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C58C86.
  78. MTTPRE: a multi-scale spatial-temporal model for travel time prediction. In Proceedings of the 30th International Conference on Advances in Geographic Information Systems. 1–10.
  79. Micn: Multi-scale local and global context modeling for long-term series forecasting. In The Eleventh International Conference on Learning Representations.
  80. Real-time Workload Pattern Analysis for Large-scale Cloud Databases. arXiv preprint arXiv:2307.02626 (2023).
  81. Starling: An I/O-Efficient Disk-Resident Graph Index Framework for High-Dimensional Vector Similarity Search on Data Segment. arXiv preprint arXiv:2401.02116 (2024).
  82. Timesnet: Temporal 2d-variation modeling for general time series analysis. arXiv preprint arXiv:2210.02186 (2022).
  83. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Advances in Neural Information Processing Systems 34 (2021), 22419–22430.
  84. AutoCTS: Automated correlated time series forecasting. Proceedings of the VLDB Endowment 15, 4 (2021), 971–983.
  85. AutoCTS+: Joint Neural Architecture and Hyperparameter Search for Correlated Time Series Forecasting. Proceedings of the ACM on Management of Data 1, 1 (2023), 1–26.
  86. TME: Tree-guided Multi-task Embedding Learning towards Semantic Venue Annotation. ACM Transactions on Information Systems 41, 4 (2023), 1–24.
  87. Unsupervised Path Representation Learning with Curriculum Negative Sampling. In IJCAI. 3286–3292.
  88. Lightpath: Lightweight and scalable path representation learning. In SIGKDD. 2999–3010.
  89. SimpleTS: An efficient and universal model selection framework for time series forecasting. Proceedings of the VLDB Endowment 16, 12 (2023), 3741–3753.
  90. CGF: A Category Guidance Based PM __\__{2.52.52.52.5} Sequence Forecasting Training Framework. IEEE Transactions on Knowledge and Data Engineering (2023).
  91. Are transformers effective for time series forecasting?. In Proceedings of the AAAI conference on artificial intelligence, Vol. 37. 11121–11128.
  92. Time series forecast of sales volume based on XGBoost. In Journal of Physics: Conference Series, Vol. 1873. IOP Publishing, 012067.
  93. Cautionary tales on air-quality improvement in Beijing. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences 473, 2205 (2017), 20170457.
  94. Yunhao Zhang and Junchi Yan. 2022. Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting. In The Eleventh International Conference on Learning Representations.
  95. Multiple time series forecasting with dynamic graph modeling. Proceedings of the VLDB Endowment 17, 4 (2023), 753–765.
  96. Outlier detection for streaming task assignment in crowdsourcing. In WWW. 1933–1943.
  97. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI conference on artificial intelligence, Vol. 35. 11106–11115.
  98. Film: Frequency improved legendre memory model for long-term time series forecasting. NeurIPS ([n. d.]).
  99. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. In International Conference on Machine Learning. PMLR, 27268–27286.
Citations (32)

Summary

  • The paper presents TFB as a robust framework that overcomes existing benchmarking biases by incorporating extensive, diverse datasets.
  • The paper demonstrates a balanced evaluation by integrating statistical, machine learning, and deep learning methods in a unified platform.
  • The paper reveals that TFB’s flexible pipeline accommodates both fixed and rolling forecasting strategies, enabling scalable, reliable analysis.

An Empirical Benchmarking of Time Series Forecasting Methods: The TFB Framework

The paper "TFB: Towards Comprehensive and Fair Benchmarking of Time Series Forecasting Methods" introduces an advanced benchmarking framework designed to address the prevalent shortcomings in the empirical evaluation of time series forecasting (TSF) methods. The authors identify and resolve three critical issues in existing benchmarks: limited domain coverage of datasets, stereotypes against traditional methods, and a lack of consistent and flexible evaluation pipelines. By developing the Time Series Forecasting Benchmark (TFB), the authors provide a robust platform for comparing TSF methodologies across diverse datasets and methods, thereby setting a new standard for empirical evaluations in the field.

Comprehensive Dataset Inclusion

One of the paper's key contributions is the TFB's extensive dataset collection. Recognizing that existing benchmarks often fall short in terms of domain coverage, TFB includes datasets from ten different domains: traffic, electricity, energy, the environment, nature, economics, stock markets, banking, health, and the web. Additionally, these datasets are systematically characterized based on parameters such as trend, seasonality, and stationarity, ensuring they represent a wide array of real-world dynamics. The authors employ techniques like Pattern Frequency Analysis (PFA) to ensure that the dataset retains diverse characteristics vital for generalizable TSF studies.

Methodological Diversity and Fairness

The TFB framework is particularly noted for its comprehensive coverage of TSF methodologies, incorporating statistical learning, machine learning, and deep learning paradigms. This inclusion is a deliberate attempt to dispel the stereotype bias that favors modern deep learning methods while undermining traditional approaches like ARIMA and VAR. The paper provides concrete evidence that statistical methods can outperform recent state-of-the-art (SOTA) approaches under specific conditions, thus emphasizing the need for balanced evaluations. This broader methodological scope, combined with varied evaluation strategies and metrics, ensures a fair and unbiased performance comparison.

Flexible and Scalable Evaluation Pipeline

TFB introduces a unified framework for evaluating TSF methods, offering consistency and flexibility that previous benchmarks lack. The framework supports both fixed and rolling forecasting strategies and various performance metrics, facilitating a comprehensive analysis of method strengths and weaknesses. The evaluation pipeline is designed to be scalable and adaptable, enabling the integration of emerging methods and diverse datasets. This ensures the benchmark's longevity and relevance as the field evolves.

Evaluation Results and Observations

Employing TFB, the authors conducted an extensive evaluation of 21 univariate and 14 multivariate TSF methods across numerous datasets. The results provided novel insights into method performance across different dataset characteristics, notably highlighting that linear methods excel on datasets with pronounced trends and shifts, whereas transformer-based models perform robustly on datasets with strong seasonality and nonlinear interactions. Furthermore, the benchmark highlights the importance of considering channel dependencies in multivariate time series for improved model performance.

Implications and Future Prospects

The introduction of TFB marks a significant advancement in how TSF methods are empirically evaluated. This comprehensive benchmark sets a new standard, emphasizing the importance of diverse datasets and methods and showcasing the impact of dataset characteristics on model performance. Practically, TFB provides TSF researchers a rigorous framework to test new methods, ensuring that the developed techniques are both effective and efficient across a wide range of scenarios. Theoretically, TFB opens new avenues for exploring interactions between dataset characteristics and method performance, potentially guiding the development of innovative TSF techniques.

The findings in this paper could pave the way for more targeted TSF research, focusing on creating models that capitalize on specific dataset characteristics. As AI continues to evolve, the TFB benchmark will likely serve as an essential tool, driving progress by providing a fair and comprehensive platform for method comparison. Looking forward, expanding TFB's dataset diversity further and integrating even more machine learning advancements could reinforce its position as a cornerstone in TSF methodological research.