Evaluating time series forecasting models: An empirical study on performance estimation methods (1905.11744v1)

Published 28 May 2019 in cs.LG and stat.ML

Abstract: Performance estimation aims at estimating the loss that a predictive model will incur on unseen data. These procedures are part of the pipeline in every machine learning project and are used for assessing the overall generalisation ability of predictive models. In this paper we address the application of these methods to time series forecasting tasks. For independent and identically distributed data the most common approach is cross-validation. However, the dependency among observations in time series raises some caveats about the most appropriate way to estimate performance in this type of data and currently there is no settled way to do so. We compare different variants of cross-validation and of out-of-sample approaches using two case studies: One with 62 real-world time series and another with three synthetic time series. Results show noticeable differences in the performance estimation methods in the two scenarios. In particular, empirical experiments suggest that cross-validation approaches can be applied to stationary time series. However, in real-world scenarios, when different sources of non-stationary variation are at play, the most accurate estimates are produced by out-of-sample methods that preserve the temporal order of observations.

Citations (209)

View on Semantic Scholar

Summary

The paper rigorously compares OOS and modified cross-validation methods, showcasing their impact on preserving temporal dependencies.
It demonstrates that cross-validation works better for stationary series, while OOS methods produce more accurate estimates in non-stationary contexts.
The study highlights the importance of selecting performance techniques that align with data characteristics for improved model selection and tuning.

Evaluating Time Series Forecasting Models: An Empirical Study

The paper by Cerqueira, Torgo, and Mozetič provides a comprehensive empirical analysis of performance estimation methods for time series forecasting. This research is crucial as the methods employed for performance estimation significantly influence model selection and hyperparameter tuning in machine learning tasks involving time series data.

Overview of Methodologies

The paper evaluates various performance estimation techniques, primarily focusing on two categories: Out-of-sample (OOS) methods and cross-validation (CVAL) approaches. OOS methods, such as Holdout and Repeated Holdout, preserve the temporal order of observations, thereby maintaining temporal dependencies inherent in time series data. Conversely, traditional cross-validation, which assumes i.i.d. data, is challenging to adapt for time-dependent datasets. However, the paper explores modified CVAL techniques, including Blocked Cross-Validation (CV-Bl), Modified Cross-Validation (CV-Mod), and hv-Blocked Cross-Validation (CV-hvBl), all designed to address dependencies in time series.

Empirical Evaluation

The paper involves two case studies: one with 62 real-world time series and another using synthetic stationary time series. The authors meticulously compare the performance estimation methods within these contexts.

Synthetic Time Series: Confirming previous research, the paper finds that cross-validation approaches generally offer superior performance estimates compared to simple out-of-sample procedures in stationary environments.
Real-World Time Series: The results deviate significantly when applied to non-stationary and complex real-world datasets. Here, traditional cross-validation methods fall short, and OOS methods, especially the Repeated Holdout, tend to provide more accurate performance estimates. The authors argue that real-world scenarios, with potential non-stationarities, benefit from methods that maintain temporal integrity.
Impact of Stationarity: The paper highlights a critical observation that stationarity impacts the efficiency of performance estimation methods. Stationary time series benefit from cross-validation, whereas non-stationary series align more with out-of-sample techniques.

Implications and Future Directions

The findings have several implications for researchers and practitioners:

Practical Application: In practice, choosing an appropriate performance estimation method is sensitive to the stationarity and complexity of the time series. Hence, model assessments should consider whether the data is stationary or non-stationary.
Future Developments: There is an opportunity for developing novel CVAL methods that comprehensively address temporal dependencies in non-stationary time series. Enhancement of existing models by reducing optimism and pessimism in error estimations can also be a focus area.

Conclusion

This paper rigorously evaluates performance estimation techniques in time series forecasting, revealing nuanced insights that challenge the conventional adoption of cross-validation in all scenarios. By doing so, it underscores the necessity for methodical consideration of temporal attributes in performance evaluation strategies. As AI continues to integrate deeply into various sectors, refined performance estimation methods tailored for time-dependent data will be instrumental in advancing predictive modeling efficacy.

PDF Markdown