Infinite forecast combinations based on Dirichlet process (2311.12379v2)
Abstract: Forecast combination integrates information from various sources by consolidating multiple forecast results from the target time series. Instead of the need to select a single optimal forecasting model, this paper introduces a deep learning ensemble forecasting model based on the Dirichlet process. Initially, the learning rate is sampled with three basis distributions as hyperparameters to convert the infinite mixture into a finite one. All checkpoints are collected to establish a deep learning sub-model pool, and weight adjustment and diversity strategies are developed during the combination process. The main advantage of this method is its ability to generate the required base learners through a single training process, utilizing the decaying strategy to tackle the challenge posed by the stochastic nature of gradient descent in determining the optimal learning rate. To ensure the method's generalizability and competitiveness, this paper conducts an empirical analysis using the weekly dataset from the M4 competition and explores sensitivity to the number of models to be combined. The results demonstrate that the ensemble model proposed offers substantial improvements in prediction accuracy and stability compared to a single benchmark model.
- D. H. Wolpert and W. G. Macready, “No free lunch theorems for optimization,” IEEE Transactions on evolutionary computation, vol. 1, no. 1, pp. 67–82, 1997.
- R. E. Schapire, “The strength of weak learnability,” Machine learning, vol. 5, pp. 197–227, 1990.
- X. Wang, R. J. Hyndman, F. Li, and Y. Kang, “Forecast combinations: an over 50-year review,” International Journal of Forecasting, 2022.
- X. Wang, Y. Kang, and F. Li, “Another look at forecast trimming for combinations: robustness, accuracy and diversity,” arXiv preprint arXiv:2208.00139, 2022.
- C. Fraley and A. E. Raftery, “Model-based clustering, discriminant analysis, and density estimation,” Journal of the American statistical Association, vol. 97, no. 458, pp. 611–631, 2002.
- B. N. Oreshkin, D. Carpov, N. Chapados, and Y. Bengio, “N-beats: Neural basis expansion analysis for interpretable time series forecasting,” arXiv preprint arXiv:1905.10437, 2019.
- W. Liu, H. Fan, and M. Xia, “Tree-based heterogeneous cascade ensemble model for credit scoring,” International Journal of Forecasting, 2022.
- S. Zhang, Y. Chen, W. Zhang, and R. Feng, “A novel ensemble deep learning model with dynamic error correction and multi-objective ensemble pruning for time series forecasting,” Information Sciences, vol. 544, pp. 427–445, 2021.
- S. Geman, E. Bienenstock, and R. Doursat, “Neural networks and the bias/variance dilemma,” Neural computation, vol. 4, no. 1, pp. 1–58, 1992.
- Z. Lu, X. Wu, X. Zhu, and J. Bongard, “Ensemble pruning via individual contribution ordering,” pp. 871–880, 2010.
- J. Yang and F. Wang, “Auto-ensemble: An adaptive learning rate scheduling based deep learning model ensembling,” IEEE Access, vol. 8, pp. 217 499–217 509, 2020.
- D. D. Margineantu and T. G. Dietterich, “Pruning adaptive boosting,” vol. 97, pp. 211–218, 1997.
- H. Zhang, Y. Song, B. Jiang, B. Chen, G. Shan et al., “Two-stage bagging pruning for reducing the ensemble size and improving the classification performance,” Mathematical Problems in Engineering, vol. 2019, 2019.
- N. Li, Y. Yu, and Z.-H. Zhou, “Diversity regularized ensemble pruning,” in Machine Learning and Knowledge Discovery in Databases, P. A. Flach, T. De Bie, and N. Cristianini, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012, pp. 330–345.
- A. Tealab, “Time series forecasting using artificial neural networks methodologies: A systematic review,” Future Computing and Informatics Journal, vol. 3, no. 2, pp. 334–340, 2018.
- S. F. Crone, M. Hibon, and K. Nikolopoulos, “Advances in forecasting with neural networks? empirical evidence from the nn3 competition on time series prediction,” International Journal of forecasting, vol. 27, no. 3, pp. 635–660, 2011.
- L. Bottou, F. E. Curtis, and J. Nocedal, “Optimization methods for large-scale machine learning,” SIAM review, vol. 60, no. 2, pp. 223–311, 2018.
- S. J. Gershman and D. M. Blei, “A tutorial on bayesian nonparametric models,” Journal of Mathematical Psychology, vol. 56, no. 1, pp. 1–12, 2012.
- Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei, “Hierarchical dirichlet processes,” Journal of the American Statistical Association, vol. 101, no. 476, pp. 1566–1581, 2006.
- S. P. Schnaars, “A comparison of extrapolation models on yearly sales forecasts,” International Journal of Forecasting, vol. 2, no. 1, pp. 71–85, 1986.
- S. Makridakis, E. Spiliotis, and V. Assimakopoulos, “The m4 competition: 100,000 time series and 61 forecasting methods,” International Journal of Forecasting, vol. 36, no. 1, pp. 54–74, 2020.
- A. Graves and A. Graves, “Long short-term memory,” Supervised sequence labelling with recurrent neural networks, pp. 37–45, 2012.