Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Infinite forecast combinations based on Dirichlet process (2311.12379v2)

Published 21 Nov 2023 in cs.LG, cs.AI, and stat.ML

Abstract: Forecast combination integrates information from various sources by consolidating multiple forecast results from the target time series. Instead of the need to select a single optimal forecasting model, this paper introduces a deep learning ensemble forecasting model based on the Dirichlet process. Initially, the learning rate is sampled with three basis distributions as hyperparameters to convert the infinite mixture into a finite one. All checkpoints are collected to establish a deep learning sub-model pool, and weight adjustment and diversity strategies are developed during the combination process. The main advantage of this method is its ability to generate the required base learners through a single training process, utilizing the decaying strategy to tackle the challenge posed by the stochastic nature of gradient descent in determining the optimal learning rate. To ensure the method's generalizability and competitiveness, this paper conducts an empirical analysis using the weekly dataset from the M4 competition and explores sensitivity to the number of models to be combined. The results demonstrate that the ensemble model proposed offers substantial improvements in prediction accuracy and stability compared to a single benchmark model.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. D. H. Wolpert and W. G. Macready, “No free lunch theorems for optimization,” IEEE Transactions on evolutionary computation, vol. 1, no. 1, pp. 67–82, 1997.
  2. R. E. Schapire, “The strength of weak learnability,” Machine learning, vol. 5, pp. 197–227, 1990.
  3. X. Wang, R. J. Hyndman, F. Li, and Y. Kang, “Forecast combinations: an over 50-year review,” International Journal of Forecasting, 2022.
  4. X. Wang, Y. Kang, and F. Li, “Another look at forecast trimming for combinations: robustness, accuracy and diversity,” arXiv preprint arXiv:2208.00139, 2022.
  5. C. Fraley and A. E. Raftery, “Model-based clustering, discriminant analysis, and density estimation,” Journal of the American statistical Association, vol. 97, no. 458, pp. 611–631, 2002.
  6. B. N. Oreshkin, D. Carpov, N. Chapados, and Y. Bengio, “N-beats: Neural basis expansion analysis for interpretable time series forecasting,” arXiv preprint arXiv:1905.10437, 2019.
  7. W. Liu, H. Fan, and M. Xia, “Tree-based heterogeneous cascade ensemble model for credit scoring,” International Journal of Forecasting, 2022.
  8. S. Zhang, Y. Chen, W. Zhang, and R. Feng, “A novel ensemble deep learning model with dynamic error correction and multi-objective ensemble pruning for time series forecasting,” Information Sciences, vol. 544, pp. 427–445, 2021.
  9. S. Geman, E. Bienenstock, and R. Doursat, “Neural networks and the bias/variance dilemma,” Neural computation, vol. 4, no. 1, pp. 1–58, 1992.
  10. Z. Lu, X. Wu, X. Zhu, and J. Bongard, “Ensemble pruning via individual contribution ordering,” pp. 871–880, 2010.
  11. J. Yang and F. Wang, “Auto-ensemble: An adaptive learning rate scheduling based deep learning model ensembling,” IEEE Access, vol. 8, pp. 217 499–217 509, 2020.
  12. D. D. Margineantu and T. G. Dietterich, “Pruning adaptive boosting,” vol. 97, pp. 211–218, 1997.
  13. H. Zhang, Y. Song, B. Jiang, B. Chen, G. Shan et al., “Two-stage bagging pruning for reducing the ensemble size and improving the classification performance,” Mathematical Problems in Engineering, vol. 2019, 2019.
  14. N. Li, Y. Yu, and Z.-H. Zhou, “Diversity regularized ensemble pruning,” in Machine Learning and Knowledge Discovery in Databases, P. A. Flach, T. De Bie, and N. Cristianini, Eds.   Berlin, Heidelberg: Springer Berlin Heidelberg, 2012, pp. 330–345.
  15. A. Tealab, “Time series forecasting using artificial neural networks methodologies: A systematic review,” Future Computing and Informatics Journal, vol. 3, no. 2, pp. 334–340, 2018.
  16. S. F. Crone, M. Hibon, and K. Nikolopoulos, “Advances in forecasting with neural networks? empirical evidence from the nn3 competition on time series prediction,” International Journal of forecasting, vol. 27, no. 3, pp. 635–660, 2011.
  17. L. Bottou, F. E. Curtis, and J. Nocedal, “Optimization methods for large-scale machine learning,” SIAM review, vol. 60, no. 2, pp. 223–311, 2018.
  18. S. J. Gershman and D. M. Blei, “A tutorial on bayesian nonparametric models,” Journal of Mathematical Psychology, vol. 56, no. 1, pp. 1–12, 2012.
  19. Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei, “Hierarchical dirichlet processes,” Journal of the American Statistical Association, vol. 101, no. 476, pp. 1566–1581, 2006.
  20. S. P. Schnaars, “A comparison of extrapolation models on yearly sales forecasts,” International Journal of Forecasting, vol. 2, no. 1, pp. 71–85, 1986.
  21. S. Makridakis, E. Spiliotis, and V. Assimakopoulos, “The m4 competition: 100,000 time series and 61 forecasting methods,” International Journal of Forecasting, vol. 36, no. 1, pp. 54–74, 2020.
  22. A. Graves and A. Graves, “Long short-term memory,” Supervised sequence labelling with recurrent neural networks, pp. 37–45, 2012.

Summary

We haven't generated a summary for this paper yet.