TFB: Towards Comprehensive and Fair Benchmarking of Time Series Forecasting Methods (2403.20150v3)

Published 29 Mar 2024 in cs.LG, cs.AI, and cs.CY

Abstract: Time series are generated in diverse domains such as economic, traffic, health, and energy, where forecasting of future values has numerous important applications. Not surprisingly, many forecasting methods are being proposed. To ensure progress, it is essential to be able to study and compare such methods empirically in a comprehensive and reliable manner. To achieve this, we propose TFB, an automated benchmark for Time Series Forecasting (TSF) methods. TFB advances the state-of-the-art by addressing shortcomings related to datasets, comparison methods, and evaluation pipelines: 1) insufficient coverage of data domains, 2) stereotype bias against traditional methods, and 3) inconsistent and inflexible pipelines. To achieve better domain coverage, we include datasets from 10 different domains: traffic, electricity, energy, the environment, nature, economic, stock markets, banking, health, and the web. We also provide a time series characterization to ensure that the selected datasets are comprehensive. To remove biases against some methods, we include a diverse range of methods, including statistical learning, machine learning, and deep learning methods, and we also support a variety of evaluation strategies and metrics to ensure a more comprehensive evaluations of different methods. To support the integration of different methods into the benchmark and enable fair comparisons, TFB features a flexible and scalable pipeline that eliminates biases. Next, we employ TFB to perform a thorough evaluation of 21 Univariate Time Series Forecasting (UTSF) methods on 8,068 univariate time series and 14 Multivariate Time Series Forecasting (MTSF) methods on 25 datasets. The benchmark code and data are available at https://github.com/decisionintelligence/TFB.

References (99)

Citations (32)

View on Semantic Scholar

Summary

The paper presents TFB as a robust framework that overcomes existing benchmarking biases by incorporating extensive, diverse datasets.
The paper demonstrates a balanced evaluation by integrating statistical, machine learning, and deep learning methods in a unified platform.
The paper reveals that TFB’s flexible pipeline accommodates both fixed and rolling forecasting strategies, enabling scalable, reliable analysis.

An Empirical Benchmarking of Time Series Forecasting Methods: The TFB Framework

The paper "TFB: Towards Comprehensive and Fair Benchmarking of Time Series Forecasting Methods" introduces an advanced benchmarking framework designed to address the prevalent shortcomings in the empirical evaluation of time series forecasting (TSF) methods. The authors identify and resolve three critical issues in existing benchmarks: limited domain coverage of datasets, stereotypes against traditional methods, and a lack of consistent and flexible evaluation pipelines. By developing the Time Series Forecasting Benchmark (TFB), the authors provide a robust platform for comparing TSF methodologies across diverse datasets and methods, thereby setting a new standard for empirical evaluations in the field.

Comprehensive Dataset Inclusion

One of the paper's key contributions is the TFB's extensive dataset collection. Recognizing that existing benchmarks often fall short in terms of domain coverage, TFB includes datasets from ten different domains: traffic, electricity, energy, the environment, nature, economics, stock markets, banking, health, and the web. Additionally, these datasets are systematically characterized based on parameters such as trend, seasonality, and stationarity, ensuring they represent a wide array of real-world dynamics. The authors employ techniques like Pattern Frequency Analysis (PFA) to ensure that the dataset retains diverse characteristics vital for generalizable TSF studies.

Methodological Diversity and Fairness

The TFB framework is particularly noted for its comprehensive coverage of TSF methodologies, incorporating statistical learning, machine learning, and deep learning paradigms. This inclusion is a deliberate attempt to dispel the stereotype bias that favors modern deep learning methods while undermining traditional approaches like ARIMA and VAR. The paper provides concrete evidence that statistical methods can outperform recent state-of-the-art (SOTA) approaches under specific conditions, thus emphasizing the need for balanced evaluations. This broader methodological scope, combined with varied evaluation strategies and metrics, ensures a fair and unbiased performance comparison.

Flexible and Scalable Evaluation Pipeline

TFB introduces a unified framework for evaluating TSF methods, offering consistency and flexibility that previous benchmarks lack. The framework supports both fixed and rolling forecasting strategies and various performance metrics, facilitating a comprehensive analysis of method strengths and weaknesses. The evaluation pipeline is designed to be scalable and adaptable, enabling the integration of emerging methods and diverse datasets. This ensures the benchmark's longevity and relevance as the field evolves.

Evaluation Results and Observations

Employing TFB, the authors conducted an extensive evaluation of 21 univariate and 14 multivariate TSF methods across numerous datasets. The results provided novel insights into method performance across different dataset characteristics, notably highlighting that linear methods excel on datasets with pronounced trends and shifts, whereas transformer-based models perform robustly on datasets with strong seasonality and nonlinear interactions. Furthermore, the benchmark highlights the importance of considering channel dependencies in multivariate time series for improved model performance.

Implications and Future Prospects

The introduction of TFB marks a significant advancement in how TSF methods are empirically evaluated. This comprehensive benchmark sets a new standard, emphasizing the importance of diverse datasets and methods and showcasing the impact of dataset characteristics on model performance. Practically, TFB provides TSF researchers a rigorous framework to test new methods, ensuring that the developed techniques are both effective and efficient across a wide range of scenarios. Theoretically, TFB opens new avenues for exploring interactions between dataset characteristics and method performance, potentially guiding the development of innovative TSF techniques.

The findings in this paper could pave the way for more targeted TSF research, focusing on creating models that capitalize on specific dataset characteristics. As AI continues to evolve, the TFB benchmark will likely serve as an essential tool, driving progress by providing a fair and comprehensive platform for method comparison. Looking forward, expanding TFB's dataset diversity further and integrating even more machine learning advancements could reinforce its position as a cornerstone in TSF methodological research.

PDF Markdown

GitHub

GitHub - decisionintelligence/TFB: About Code release for "TFB: Towards Comprehensive and Fair Benchmarking of Time Series Forecasting Methods" (PVLDB 2024) (201 stars)