Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Sample-based Uncertainty Quantification with a Single Deterministic Neural Network (2209.08418v2)

Published 17 Sep 2022 in cs.LG

Abstract: Development of an accurate, flexible, and numerically efficient uncertainty quantification (UQ) method is one of fundamental challenges in machine learning. Previously, a UQ method called DISCO Nets has been proposed (Bouchacourt et al., 2016), which trains a neural network by minimizing the energy score. In this method, a random noise vector in $\mathbb{R}{10\text{--}100}$ is concatenated with the original input vector in order to produce a diverse ensemble forecast despite using a single neural network. While this method has shown promising performance on a hand pose estimation task in computer vision, it remained unexplored whether this method works as nicely for regression on tabular data, and how it competes with more recent advanced UQ methods such as NGBoost. In this paper, we propose an improved neural architecture of DISCO Nets that admits faster and more stable training while only using a compact noise vector of dimension $\sim \mathcal{O}(1)$. We benchmark this approach on miscellaneous real-world tabular datasets and confirm that it is competitive with or even superior to standard UQ baselines. Moreover we observe that it exhibits better point forecast performance than a neural network of the same size trained with the conventional mean squared error. As another advantage of the proposed method, we show that local feature importance computation methods such as SHAP can be easily applied to any subregion of the predictive distribution. A new elementary proof for the validity of using the energy score to learn predictive distributions is also provided.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (66)
  1. A review of uncertainty quantification in deep learning: Techniques, applications and challenges. Information Fusion, 76:243–297. arXiv:2011.06225.
  2. Optuna: A Next-generation Hyperparameter Optimization Framework. KDD ’19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 2623–2631. arXiv:1907.10902.
  3. Depth Uncertainty in Neural Networks. In Advances in Neural Information Processing Systems 33 (NeurIPS 2020). arXiv:2006.08437.
  4. Short-Term Density Forecasting of Low-Voltage Load using Bernstein-Polynomial Normalizing Flows. arXiv:2204.13939.
  5. Bishop, C. M. (1994). Mixture density networks. Technical Report. Aston University, Birmingham.
  6. Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.
  7. DISCO Nets : DISsimilarity COefficients Networks. In Advances in Neural Information Processing Systems 29 (NIPS 2016). arXiv:1606.02556.
  8. Instance-Based Uncertainty Estimation for Gradient-Boosted Regression Trees. arXiv:2205.11412.
  9. Cannon, A. J. (2011). Quantile regression neural networks: Implementation in R and application to precipitation downscaling. Computers & Geosciences, 37:1277–1284.
  10. Posterior Network: Uncertainty Estimation without OOD Samples via Density-Based Pseudo-Counts. arXiv:2006.09239.
  11. Multivariate Probabilistic Forecasting of Intraday Electricity Prices using Normalizing Flows. arXiv:2205.13826.
  12. Implicit Quantile Networks for Distributional Reinforcement Learning. Proceedings of the 35 th International Conference on Machine Learning, Stockholm, Sweden, PMLR 80. arXiv:1806.06923.
  13. Distributional Reinforcement Learning with Quantile Regression. The Thirty-Second AAAI Conferenceon Artificial Intelligence (AAAI-18). arXiv:1710.10044.
  14. Ngboost: Natural gradient boosting for probabilistic prediction. Proceedings of the 37th International Conference on Machine Learning, PMLR, pages 2690–2700. arXiv:1910.03225.
  15. A deep generative model for probabilistic energy forecasting in power systems: normalizing flows. arXiv:2106.09370.
  16. A New Metric for Probability Distributions. IEEE Trans. Inf. Theory, 49:1858–1860.
  17. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. Proceedings of The 33rd International Conference on Machine Learning, PMLR, 48:1050–1059. arXiv:1506.02142.
  18. A Survey of Uncertainty in Deep Neural Networks. arXiv:2107.03342.
  19. Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association, 102:359–378.
  20. Assessing probabilistic forecasts of multivariate quantities, with an application to ensemble predictionsof surface winds. Test, 17:211–235.
  21. A Kernel Two-Sample Test. Journal of Machine Learning Research, 13:723–773.
  22. Estimating and Evaluating Regression Predictive Uncertainty in Deep Object Detectors. In ICLR 2021. arXiv:2101.05036.
  23. Snapshot Ensembles: Train 1, Get M for Free. In 5th International Conference on Learning Representations, ICLR 2017. arXiv:1704.00109.
  24. Deep Networks with Stochastic Depth. In Leibe, B., Matas, J., Sebe, N., and Welling, M., editors, ECCV 2016, volume 9908 of Lecture Notes in Computer Science. arXiv:1603.09382.
  25. Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods. Machine Learning, 110:457–506. arXiv:1910.09457.
  26. Conditional Approximate Normalizing Flows for Joint Multi-Step Probabilistic Forecasting with Application to Electricity Demand. arXiv:2201.02753.
  27. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. Advances in Neural Information Processing Systems 30 (NIPS 2017), pages 3149–3157.
  28. Self-Normalizing Neural Networks. 31st Conference on Neural Information Processing Systems (NIPS 2017). arXiv:1706.02515.
  29. Normalizing Flows: An Introduction and Review of Current Methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(11):3964–3979. arXiv:1908.09257.
  30. On information and sufficiency. Annals of Mathematical Statistics, 22:79–86.
  31. DEUP: Direct Epistemic Uncertainty Prediction. arXiv:2102.08501.
  32. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles. Advances in Neural Information Processing Systems 30 (NIPS 2017). arXiv:1612.01474.
  33. Bayesian approach for neural networks—review and case studies. Neural Networks, 14:257–274.
  34. Lundberg, S. (2018). Kernelexplainer. https://shap-lrjball.readthedocs.io/en/latest/generated/shap.KernelExplainer.html.
  35. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS 2017), page 4768. arXiv:1705.07874.
  36. Sparse Spectrum Gaussian Process Regression. Journal of Machine Learning Research, 11:1865–1881.
  37. DSAC: Distributional Soft Actor Critic for Risk-Sensitive Reinforcement Learning. arXiv:2004.14547.
  38. A Simple Baseline for Bayesian Uncertainty in Deep Learning. In Advances in Neural Information Processing Systems 32 (NeurIPS 2019). arXiv:1902.02476.
  39. Meinshausen, N. (2006). Quantile Regression Forests. Journal of Machine Learning Research, 7:983–999.
  40. Distributional Gradient Boosting Machines. arXiv:2204.00778.
  41. Distributional Reinforcement Learning via Moment Matching. The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21). arXiv:2007.12354.
  42. Nikulin, M. S. (2001). Hellinger distance. Encyclopedia of Mathematics, EMS Press.
  43. Growing Cosine Unit: A Novel Oscillatory Activation Function That Can Speedup Training and Reduce Parameters in Convolutional Neural Networks.
  44. Probabilistic Forecasting with Generative Networks via Scoring Rule Minimization. arXiv:2112.08217.
  45. Normalizing Flows for Probabilistic Modeling and Inference. Journal of Machine Learning Research, 22:1–64. arXiv:1912.02762.
  46. Uncertainty in Neural Networks: Approximately Bayesian Ensembling. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR, 108:234–244. arXiv:1810.05546.
  47. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12:2825–2830.
  48. Evaluating the quality of scenarios of short-term wind power generation. Applied Energy, 96:12–20.
  49. Gaussian Processes for Machine Learning. MIT Press.
  50. A Deep Learning Approach to Probabilistic Forecasting of Weather. arXiv:2203.12529.
  51. Explainable AI: Interpreting, Explaining and Visualizing Deep Learning, volume 11700 of Lecture Notes in Computer Science. Springer Cham.
  52. Scikit-Garden (2017). https://github.com/scikit-garden/scikit-garden.
  53. Equivalence of distance-based and RKHS-based statistics in hypothesis testing. Ann. Statist., 41:2263–2291. arXiv:1207.6076.
  54. Non-Gaussian Gaussian Processes for Few-Shot Regression. Advances in Neural Information Processing Systems 34 (NeurIPS 2021). arXiv:2110.13561.
  55. The Exact Equivalence of Distance and Kernel Methods for Hypothesis Testing. arXiv:1806.05514.
  56. Deep transformation models: Tackling complex regression problems with neural network based transformation models. arXiv:2004.00464.
  57. Sample-based Distributional Policy Gradient. Proceedings of The 4th Annual Learning for Dynamics and Control Conference, PMLR, 168:676–688. arXiv:2001.02652.
  58. Probabilistic gradient boosting machines for large-scale probabilistic regression. Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. arXiv:2106.01682.
  59. Szekely, G. J. (2003). E-statistics: Energy of Statistical Samples. Bowling Green State University, Department of Mathematics and Statistics Technical Report No. 03–05.
  60. Testing for Equal Distributions in High Dimension. InterStat. November (5).
  61. Energy statistics: A class of statistics based on distances. Journal of Statistical Planning and Inference, 143:1249–1272.
  62. The Energy of Data. Annual Review of Statistics and Its Application, 4:447–479.
  63. Single-Model Uncertainties for Deep Learning. In Proceedings of the 33rd International Conference on Neural Information Processing Systems (NeurIPS 2019). arXiv:1811.00908.
  64. Efficiently sampling functions from Gaussian process posteriors. Proceedings of the 37th International Conference on Machine Learning, PMLR, 119:10292–10302. arXiv:2002.09309.
  65. Fully Parameterized Quantile Function for Distributional Reinforcement Learning. 33rd Conference on Neural Information Processing Systems (NeurIPS 2019). arXiv:1911.02140.
  66. Distributional Reinforcement Learning for Multi-Dimensional Reward Functions. 35th Conference on Neural Information Processing Systems (NeurIPS 2021). arXiv:2110.13578.
Citations (4)

Summary

We haven't generated a summary for this paper yet.