Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Improved uncertainty quantification for neural networks with Bayesian last layer (2302.10975v3)

Published 21 Feb 2023 in cs.LG, cs.SY, and eess.SY

Abstract: Uncertainty quantification is an important task in machine learning - a task in which standardneural networks (NNs) have traditionally not excelled. This can be a limitation for safety-critical applications, where uncertainty-aware methods like Gaussian processes or Bayesian linear regression are often preferred. Bayesian neural networks are an approach to address this limitation. They assume probability distributions for all parameters and yield distributed predictions. However, training and inference are typically intractable and approximations must be employed. A promising approximation is NNs with Bayesian last layer (BLL). They assume distributed weights only in the linear output layer and yield a normally distributed prediction. To approximate the intractable Bayesian neural network, point estimates of the distributed weights in all but the last layer should be obtained by maximizing the marginal likelihood. This has previously been challenging, as the marginal likelihood is expensive to evaluate in this setting. We present a reformulation of the log-marginal likelihood of a NN with BLL which allows for efficient training using backpropagation. Furthermore, we address the challenge of uncertainty quantification for extrapolation points. We provide a metric to quantify the degree of extrapolation and derive a method to improve the uncertainty quantification for these points. Our methods are derived for the multivariate case and demonstrated in a simulation study. In comparison to Bayesian linear regression with fixed features, and a Bayesian neural network trained with variational inference, our proposed method achieves the highest log-predictive density on test data.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. L. V. Jospin, H. Laga, F. Boussaid, W. Buntine, and M. Bennamoun, “Hands-on Bayesian neural networks - A tutorial for deep learning users,” IEEE Computational Intelligence Magazine, vol. 17, no. 2, pp. 29–48, 2022.
  2. M. Abdar, F. Pourpanah, S. Hussain, D. Rezazadegan, L. Liu, M. Ghavamzadeh, P. Fieguth, X. Cao, A. Khosravi, and U. R. Acharya, “A review of uncertainty quantification in deep learning: Techniques, applications and challenges,” Information Fusion, vol. 76, pp. 243–297, 2021.
  3. J. Harrison, A. Sharma, R. Calandra, and M. Pavone, “Control adaptation via meta-learning dynamics,” in Workshop on Meta-Learning at NeurIPS, 2018.
  4. C. D. McKinnon and A. P. Schoellig, “Meta learning with paired forward and inverse models for efficient receding horizon control,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 3240–3247, 2021.
  5. K. P. Wabersich and M. N. Zeilinger, “Nonlinear learning-based model predictive control supporting state and input dependent model uncertainty estimates,” International Journal of Robust and Nonlinear Control, vol. 31, no. 18, pp. 8897–8915, 2021.
  6. J. Dong, “Robust Data-Driven Iterative Learning Control for Linear-Time-Invariant and Hammerstein-Wiener Systems,” IEEE Transactions on Cybernetics, pp. 1–14, 2021.
  7. C. D. McKinnon and A. P. Schoellig, “Learning Probabilistic Models for Safe Predictive Control in Unknown Environments,” in Proceedings of the 18th European Control Conference, 2019, pp. 2472–2479.
  8. L. Hewing, J. Kabzan, and M. N. Zeilinger, “Cautious Model Predictive Control Using Gaussian Process Regression,” IEEE Transactions on Control Systems Technology, vol. 28, no. 6, pp. 2736–2743, 2020.
  9. M. Lázaro-Gredilla and A. R. Figueiras-Vidal, “Marginalized neural network mixtures for large-scale regression,” IEEE transactions on neural networks, vol. 21, no. 8, pp. 1345–1351, 2010.
  10. H. Liu, Y.-S. Ong, X. Shen, and J. Cai, “When Gaussian Process Meets Big Data: A Review of Scalable GPs,” IEEE Transactions on Neural Networks and Learning Systems, vol. 31, no. 11, pp. 4405–4423, 2020.
  11. A. G. Wilson, Z. Hu, R. Salakhutdinov, and E. P. Xing, “Deep Kernel Learning,” in Proceedings of the 19th International Conference on Artificial Intelligence and Statistics.   PMLR, 2016, pp. 370–378.
  12. H. Liu, Y.-S. Ong, X. Jiang, and X. Wang, “Deep Latent-Variable Kernel Learning,” IEEE Transactions on Cybernetics, vol. 52, no. 10, pp. 10 276–10 289, 2022.
  13. Y. Lecun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015.
  14. B. Karg and S. Lucia, “Efficient representation and approximation of model predictive control laws via deep learning,” IEEE Transactions on Cybernetics, vol. 50, no. 9, pp. 3866–3878, 2020.
  15. L. Dong, J. Yan, X. Yuan, H. He, and C. Sun, “Functional Nonlinear Model Predictive Control Based on Adaptive Dynamic Programming,” IEEE Transactions on Cybernetics, vol. 49, no. 12, pp. 4206–4218, 2019.
  16. J. Sarangapani, “System Identification Using Discrete-Time Neural Networks,” in Neural Network Control of Nonlinear Discrete-Time Systems, 1st ed.   CRC Press, 2006, pp. 443–466.
  17. F. Fiedler, A. Cominola, and S. Lucia, “Economic nonlinear predictive control of water distribution networks based on surrogate modeling and automatic clustering,” IFAC-PapersOnLine, vol. 53, no. 2, pp. 16 636–16 643, 2020.
  18. F. Fiedler and S. Lucia, “Model predictive control with neural network system model and Bayesian last layer trust regions,” in Proceedings of the 17th IEEE International Conference on Control & Automation, 2022, pp. 141–147.
  19. R. Salakhutdinov and A. Mnih, “Bayesian probabilistic matrix factorization using Markov chain Monte Carlo,” in Proceedings of the 25th International Conference on Machine Learning, 2008, pp. 880–887.
  20. C. Blundell, J. Cornebise, K. Kavukcuoglu, and D. Wierstra, “Weight uncertainty in neural network,” in Proceedings of the International Conference on Machine Learning.   PMLR, 2015, pp. 1613–1622.
  21. J. Watson, J. A. Lin, P. Klink, J. Pajarinen, and J. Peters, “Latent Derivative Bayesian Last Layer Networks,” in Proceedings of the International Conference on Artificial Intelligence and Statistics.   PMLR, 2021, pp. 1198–1206.
  22. J. Snoek, O. Rippel, K. Swersky, R. Kiros, N. Satish, N. Sundaram, M. Patwary, M. Prabhat, and R. Adams, “Scalable bayesian optimization using deep neural networks,” in Proceedings of the International Conference on Machine Learning.   PMLR, 2015, pp. 2171–2180.
  23. R. Balestriero, J. Pesenti, and Y. LeCun, “Learning in High Dimension Always Amounts to Extrapolation,” arXiv:2110.09485 [cs], 2021.
  24. M. Fazel, H. Hindi, and S. Boyd, “Log-det heuristic for matrix rank minimization with applications to Hankel and Euclidean distance matrices,” in Proceedings of the American Control Conference, vol. 3, 2003, pp. 2156–2162 vol.3.
  25. W. Dong, G. Shi, X. Li, Y. Ma, and F. Huang, “Compressive Sensing via Nonlocal Low-Rank Regularization,” IEEE Transactions on Image Processing, vol. 23, no. 8, pp. 3618–3632, 2014.
  26. K. Kersting, C. Plagemann, P. Pfaff, and W. Burgard, “Most likely heteroscedastic Gaussian process regression,” in Proceedings of the 24th International Conference on Machine Learning.   Association for Computing Machinery, 2007, pp. 393–400.
  27. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
  28. S. Ioffe and C. Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” in Proceedings of the 32nd International Conference on Machine Learning.   PMLR, 2015, pp. 448–456.
Citations (7)

Summary

We haven't generated a summary for this paper yet.