Ensemble Multi-Quantiles: Adaptively Flexible Distribution Prediction for Uncertainty Quantification (2211.14545v3)
Abstract: We propose a novel, succinct, and effective approach for distribution prediction to quantify uncertainty in machine learning. It incorporates adaptively flexible distribution prediction of $\mathbb{P}(\mathbf{y}|\mathbf{X}=x)$ in regression tasks. This conditional distribution's quantiles of probability levels spreading the interval $(0,1)$ are boosted by additive models which are designed by us with intuitions and interpretability. We seek an adaptive balance between the structural integrity and the flexibility for $\mathbb{P}(\mathbf{y}|\mathbf{X}=x)$, while Gaussian assumption results in a lack of flexibility for real data and highly flexible approaches (e.g., estimating the quantiles separately without a distribution structure) inevitably have drawbacks and may not lead to good generalization. This ensemble multi-quantiles approach called EMQ proposed by us is totally data-driven, and can gradually depart from Gaussian and discover the optimal conditional distribution in the boosting. On extensive regression tasks from UCI datasets, we show that EMQ achieves state-of-the-art performance comparing to many recent uncertainty quantification methods. Visualization results further illustrate the necessity and the merits of such an ensemble model.
- Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no. 7553, pp. 436–444, 2015.
- C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger, “On calibration of modern neural networks,” in International Conference on Machine Learning. PMLR, 2017, pp. 1321–1330.
- R. Rahaman and A. Thiery, “Uncertainty quantification and deep ensembles,” Advances in Neural Information Processing Systems, vol. 34, 2021.
- E. Begoli, T. Bhattacharya, and D. Kusnezov, “The need for uncertainty quantification in machine-assisted medical decision making,” Nature Machine Intelligence, vol. 1, no. 1, pp. 20–23, 2019.
- B. Lakshminarayanan, A. Pritzel, and C. Blundell, “Simple and scalable predictive uncertainty estimation using deep ensembles,” Advances in neural information processing systems, vol. 30, 2017.
- D. Duffie and J. Pan, “An overview of value at risk,” Journal of derivatives, vol. 4, no. 3, pp. 7–49, 1997.
- C. Acerbi and D. Tasche, “On the coherence of expected shortfall,” Journal of Banking & Finance, vol. 26, no. 7, pp. 1487–1503, 2002.
- Y. Gal and Z. Ghahramani, “Dropout as a bayesian approximation: Representing model uncertainty in deep learning,” in international conference on machine learning. PMLR, 2016, pp. 1050–1059.
- W. J. Maddox, P. Izmailov, T. Garipov, D. P. Vetrov, and A. G. Wilson, “A simple baseline for bayesian uncertainty in deep learning,” Advances in Neural Information Processing Systems, vol. 32, 2019.
- M. Sensoy, L. Kaplan, and M. Kandemir, “Evidential deep learning to quantify classification uncertainty,” Advances in Neural Information Processing Systems, vol. 31, 2018.
- V. Kuleshov, N. Fenner, and S. Ermon, “Accurate uncertainties for deep learning using calibrated regression,” in International conference on machine learning. PMLR, 2018, pp. 2796–2804.
- N. Skafte, M. Jørgensen, and S. Hauberg, “Reliable training and estimation of variance networks,” Advances in Neural Information Processing Systems, vol. 32, 2019.
- S. Zhao, T. Ma, and S. Ermon, “Individual calibration with randomized forecasting,” in International Conference on Machine Learning. PMLR, 2020, pp. 11 387–11 397.
- J. M. Hernández-Lobato and R. Adams, “Probabilistic backpropagation for scalable learning of bayesian neural networks,” in International conference on machine learning. PMLR, 2015, pp. 1861–1869.
- G. Shafer and V. Vovk, “A tutorial on conformal prediction.” Journal of Machine Learning Research, vol. 9, no. 3, 2008.
- H. Papadopoulos, V. Vovk, and A. Gammerman, “Regression conformal prediction with nearest neighbours,” Journal of Artificial Intelligence Research, vol. 40, pp. 815–840, 2011.
- J. Lei, J. Robins, and L. Wasserman, “Distribution-free prediction sets,” Journal of the American Statistical Association, vol. 108, no. 501, pp. 278–287, 2013.
- Y. Romano, E. Patterson, and E. Candes, “Conformalized quantile regression,” Advances in neural information processing systems, vol. 32, 2019.
- P. Cui, W. Hu, and J. Zhu, “Calibrated reliable regression using maximum mean discrepancy,” Advances in Neural Information Processing Systems, vol. 33, pp. 17 164–17 175, 2020.
- T. Zhou, Y. Li, Y. Wu, and D. Carlson, “Estimating uncertainty intervals from collaborating networks,” Journal of Machine Learning Research, vol. 22, no. 257, pp. 1–47, 2021.
- F. Wenzel, J. Snoek, D. Tran, and R. Jenatton, “Hyperparameter ensembles for robustness and uncertainty quantification,” Advances in Neural Information Processing Systems, vol. 33, pp. 6514–6527, 2020.
- T. Pearce, A. Brintrup, M. Zaki, and A. Neely, “High-quality prediction intervals for deep learning: A distribution-free, ensembled approach,” in International conference on machine learning. PMLR, 2018, pp. 4075–4084.
- H. Song, T. Diethe, M. Kull, and P. Flach, “Distribution calibration for regression,” in International Conference on Machine Learning. PMLR, 2019, pp. 5897–5906.
- N. Tagasovska and D. Lopez-Paz, “Single-model uncertainties for deep learning,” Advances in Neural Information Processing Systems, vol. 32, 2019.
- Y. Chung, W. Neiswanger, I. Char, and J. Schneider, “Beyond pinball loss: Quantile methods for calibrated uncertainty quantification,” Advances in Neural Information Processing Systems, vol. 34, 2021.
- S. Feldman, S. Bates, and Y. Romano, “Improving conditional coverage via orthogonal quantile regression,” Advances in Neural Information Processing Systems, vol. 34, 2021.
- R. Koenker and G. Bassett Jr, “Regression quantiles,” Econometrica: journal of the Econometric Society, pp. 33–50, 1978.
- R. Koenker and K. F. Hallock, “Quantile regression,” Journal of economic perspectives, vol. 15, no. 4, pp. 143–156, 2001.
- X. He, “Quantile curves without crossing,” The American Statistician, vol. 51, no. 2, pp. 186–192, 1997.
- I. Takeuchi, Q. V. Le, T. D. Sears, and A. J. Smola, “Nonparametric quantile estimation,” Journal of Machine Learning Research, vol. 7, no. Jul, pp. 1231–1264, 2006.
- Y. Liu and Y. Wu, “Stepwise multiple quantile regression estimation using non-crossing constraints,” Statistics and its Interface, vol. 2, no. 3, pp. 299–310, 2009.
- H. D. Bondell, B. J. Reich, and H. Wang, “Noncrossing quantile regression curve estimation,” Biometrika, vol. 97, no. 4, pp. 825–838, 2010.
- V. Chernozhukov, I. Fernández-Val, and A. Galichon, “Quantile and probability curves without crossing,” Econometrica, vol. 78, no. 3, pp. 1093–1125, 2010.
- Y. Gal, J. Hron, and A. Kendall, “Concrete dropout,” Advances in neural information processing systems, vol. 30, 2017.
- R. Koenker and Z. Xiao, “Quantile autoregression,” Journal of the American Statistical Association, vol. 101, no. 475, pp. 980–990, 2006.
- T. Gneiting and A. E. Raftery, “Strictly proper scoring rules, prediction, and estimation,” Journal of the American statistical Association, vol. 102, no. 477, pp. 359–378, 2007.
- J. Gasthaus, K. Benidis, Y. Wang, S. S. Rangapuram, D. Salinas, V. Flunkert, and T. Januschowski, “Probabilistic forecasting with spline quantile function rnns,” in The 22nd international conference on artificial intelligence and statistics. PMLR, 2019, pp. 1901–1910.
- P. Theodossiou, “Financial data and the skewed generalized t distribution,” Management Science, vol. 44, no. 12-part-1, pp. 1650–1661, 1998.
- T. G. Bali, H. Mo, and Y. Tang, “The role of autoregressive conditional skewness and kurtosis in the estimation of conditional var,” Journal of Banking & Finance, vol. 32, no. 2, pp. 269–282, 2008.
- X. Yan, Q. Wu, and W. Zhang, “Cross-sectional learning of extremal dependence among financial assets,” Advances in Neural Information Processing Systems, pp. 3852–3862, 2019.
- A. Der Kiureghian and O. Ditlevsen, “Aleatory or epistemic? does it matter?” Structural safety, vol. 31, no. 2, pp. 105–112, 2009.
- E. Hüllermeier and W. Waegeman, “Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods,” Machine Learning, vol. 110, pp. 457–506, 2021.
- S. Lahlou, M. Jain, H. Nekoei, V. I. Butoi, P. Bertin, J. Rector-Brooks, M. Korablyov, and Y. Bengio, “Deup: Direct epistemic uncertainty prediction,” Transactions on Machine Learning Research, 2023.
- Y. Liu and Y. Wu, “Simultaneous multiple non-crossing quantile regression estimation using kernel constraints,” Journal of nonparametric statistics, vol. 23, no. 2, pp. 415–437, 2011.
- D. Dua and C. Graff, “UCI machine learning repository,” 2017. [Online]. Available: http://archive.ics.uci.edu/ml
- A. Coraddu, L. Oneto, A. Ghio, S. Savio, D. Anguita, and M. Figari, “Machine learning approaches for improving condition-based maintenance of naval propulsion plants,” Proceedings of the Institution of Mechanical Engineers, Part M: Journal of Engineering for the Maritime Environment, vol. 230, no. 1, pp. 136–153, 2016.
- L. M. Candanedo, V. Feldheim, and D. Deramaix, “Data driven prediction models of energy use of appliances in a low-energy house,” Energy and buildings, vol. 140, pp. 81–97, 2017.
- N. Moniz, “Real-time 2019 portuguese parliament election results dataset,” arXiv preprint arXiv:1912.08922, 2019.
- K. Singh, R. K. Sandhu, and D. Kumar, “Comment volume prediction using neural networks and decision trees,” in IEEE UKSim-AMSS 17th International Conference on Computer Modelling and Simulation, UKSim2015 (UKSim2015), 2015.
- K. Singh, “Facebook comment volume prediction,” International Journal of Simulation: Systems, Science and Technologies, vol. 16, no. 5, pp. 16–1, 2015.
- X. Liang, T. Zou, B. Guo, S. Li, H. Zhang, S. Zhang, H. Huang, and S. X. Chen, “Assessing beijing’s pm2. 5 pollution: severity, weather impact, apec and winter heating,” Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 471, no. 2182, p. 20150257, 2015.
- K. Buza, “Feedback prediction for blogs,” in Data analysis, machine learning and knowledge discovery. Springer, 2013, pp. 145–152.
- A. Salam and A. El Hibaoui, “Comparison of machine learning algorithms for the power consumption prediction:-case study of tetouan city–,” in 2018 6th International Renewable and Sustainable Energy Conference (IRSEC). IEEE, 2018, pp. 1–5.
- R. Ballester-Ripoll, E. G. Paredes, and R. Pajarola, “Sobol tensor trains for global sensitivity analysis,” Reliability Engineering & System Safety, vol. 183, pp. 311–322, 2019.
- F. Savva, C. Anagnostopoulos, and P. Triantafillou, “Explaining aggregates for exploratory analytics,” in 2018 IEEE International Conference on Big Data (Big Data). IEEE, 2018, pp. 478–487.
- C. Anagnostopoulos, F. Savva, and P. Triantafillou, “Scalable aggregation predictive analytics: a query-driven machine learning approach,” Applied Intelligence, vol. 48, pp. 2546–2567, 2018.
- S. Zhang, B. Guo, A. Dong, J. He, Z. Xu, and S. X. Chen, “Cautionary tales on air-quality improvement in beijing,” Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 473, no. 2205, p. 20170457, 2017.
- J. Torres-Sospedra, R. Montoliu, A. Martínez-Usó, J. P. Avariento, T. J. Arnau, M. Benedito-Bordonau, and J. Huerta, “Ujiindoorloc: A new multi-building and multi-floor database for wlan fingerprint-based indoor localization problems,” in 2014 international conference on indoor positioning and indoor navigation (IPIN). IEEE, 2014, pp. 261–270.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
- A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, “Pytorch: An imperative style, high-performance deep learning library,” Advances in Neural Information Processing Systems 32, pp. 8024–8035, 2019.
- X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in Proceedings of the thirteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 2010, pp. 249–256.
- J. H. Friedman, “Greedy function approximation: a gradient boosting machine,” Annals of statistics, pp. 1189–1232, 2001.
- G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y. Liu, “Lightgbm: A highly efficient gradient boosting decision tree,” Advances in neural information processing systems, vol. 30, 2017.
- N. Meinshausen, “Quantile regression forests,” Journal of Machine Learning Research, vol. 7, no. Jun, pp. 983–999, 2006.
- H. Zou and M. Yuan, “Composite quantile regression and the oracle model selection theory1,” The Annals of Statistics, vol. 36, no. 3, pp. 1108–1126, 2008.
- S. J. Moon, J.-J. Jeon, J. S. H. Lee, and Y. Kim, “Learning multiple quantiles with neural networks,” Journal of Computational and Graphical Statistics, vol. 30, no. 4, pp. 1238–1248, 2021.
- P. Chaudhuri, “Nonparametric estimates of regression quantiles and their local bahadur representation,” The Annals of statistics, vol. 19, no. 2, pp. 760–777, 1991.
- H. White, “Nonparametric estimation of conditional quantiles using neural networks,” in Computing Science and Statistics: Statistics of Many Parameters: Curves, Images, Spatial Models. Springer, 1992, pp. 190–199.
- A. Radhakrishnan, M. Belkin, and C. Uhler, “Wide and deep neural networks achieve consistency for classification,” Proceedings of the National Academy of Sciences, vol. 120, no. 14, p. e2208779120, 2023.
- S.-B. Lin, K. Wang, Y. Wang, and D.-X. Zhou, “Universal consistency of deep convolutional neural networks,” IEEE Transactions on Information Theory, vol. 68, no. 7, pp. 4610–4617, 2022.