Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
134 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Partially Stochastic Infinitely Deep Bayesian Neural Networks (2402.03495v4)

Published 5 Feb 2024 in cs.LG and math.PR

Abstract: In this paper, we present Partially Stochastic Infinitely Deep Bayesian Neural Networks, a novel family of architectures that integrates partial stochasticity into the framework of infinitely deep neural networks. Our new class of architectures is designed to improve the computational efficiency of existing architectures at training and inference time. To do this, we leverage the advantages of partial stochasticity in the infinite-depth limit which include the benefits of full stochasticity e.g. robustness, uncertainty quantification, and memory efficiency, whilst improving their limitations around computational complexity. We present a variety of architectural configurations, offering flexibility in network design including different methods for weight partition. We also provide mathematical guarantees on the expressivity of our models by establishing that our network family qualifies as Universal Conditional Distribution Approximators. Lastly, empirical evaluations across multiple tasks show that our proposed architectures achieve better downstream task performance and uncertainty quantification than their counterparts while being significantly more efficient. The code can be found at \url{https://github.com/Sergio20f/part_stoch_inf_deep}

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. Sampling-based inference for large linear models, with application to linearised laplace. arXiv preprint arXiv:2210.04994, 2023.
  2. Deep attentive survival analysis in limit order books: Estimating fill probabilities with convolutional-transformers. Quantitative Finance, pp.  1–23, 2024.
  3. Graph neural stochastic differential equations. arXiv preprint arXiv:2308.12316, 2023.
  4. Weight uncertainty in neural networks. In Proceedings of the 32nd International Conference on Machine Learning, volume 37, pp.  1613–1622, 2015.
  5. The missing u for efficient diffusion models. arXiv preprint arXiv:2310.20092, 2023.
  6. Neural ordinary differential equations. In Advances in Neural Information Processing Systems, volume 31, 2018.
  7. Laplace redux-effortless bayesian deep learning. In Advances in Neural Information Processing Systems, volume 34, pp.  20089–20103, 2021a.
  8. Bayesian deep learning via subnetwork inference. In Proceedings of the 38th International Conference on Machine Learning, volume 139, pp.  2510–2521, 2021b.
  9. Density estimation using real nvp. arXiv preprint arXiv:1605.08803, 2016.
  10. Augmented neural odes. In Advances in Neural Information Processing Systems, volume 32, 2019.
  11. Efficient online bayesian inference for neural bandits. In Proceedings of the 25th International Conference on Artificial Intelligence and Statistics, volume 151, pp.  6002–6021, 2022.
  12. Liberty or depth: Deep bayesian neural nets do not need complex weight posterior approximations. In Advances in Neural Information Processing Systems, volume 33, pp.  4346–4357, 2020.
  13. Stable architectures for deep neural networks. Inverse problems, 34(1):014004, 2017.
  14. Recombiner: Robust and enhanced compression with bayesian implicit neural representations. arXiv preprint arXiv:2309.17182, 2023.
  15. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  770–778, 2016.
  16. Stochastic variational inference. Journal of Machine Learning Research, 14(40):1303–1347, 2013.
  17. Revealing hidden dynamics from time-series data by odenet. arXiv preprint arXiv:2005.04849, 2020.
  18. Improving predictions of bayesian neural nets via local linearization. In Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, volume 130, pp.  703–711, 2021.
  19. Kallenberg, O. Foundations of Modern Probability. Probability and its Applications. Springer, New York, NY Berlin Heidelberg, 2. ed edition, 2010. ISBN 9781441929495.
  20. Being bayesian, even just a bit, fixes overconfidence in relu networks. In Proceedings of the 37th International Conference on Machine Learning, volume 119, pp.  5436–5446, 2020.
  21. Krizhevsky, A. Learning multiple layers of features from tiny images, 2009.
  22. Bayesian approach for neural networks—review and case studies. Neural Networks, 14(3):257–274, 2001.
  23. Mnist handwritten digit database. ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, 2, 2010.
  24. Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Networks, 6(6):861–867, 1993.
  25. Scalable gradients for stochastic differential equations. In Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, volume 108, pp.  3870–3882, 2020.
  26. Neural sde: Stabilizing neural ode networks with stochastic noise. arXiv preprint arXiv:1906.02355, 2019.
  27. Mackay, D. J. C. Bayesian methods for adaptive models. PhD thesis, California Institute of Technology, 1992.
  28. Deepvol: Volatility forecasting from high-frequency data with dilated causal convolutions. arXiv preprint arXiv:2210.04797, 2022.
  29. Neal, R. M. Bayesian Learning For Neural Networks. PhD thesis, University of Toronto, 1995.
  30. Oksendal, B. Stochastic differential equations: an introduction with applications. Springer Science & Business Media, 2013.
  31. Sticking the landing: Simple, lower-variance gradient estimators for variational inference. In Advances in Neural Information Processing Systems, volume 30, 2017.
  32. Do bayesian neural networks need to be fully stochastic? In Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, volume 206, pp.  7694–7722, 2023.
  33. Titterington, D. M. Bayesian methods for neural networks and related models. Statistical Science, 19(1):128–139, February 2004.
  34. Neural stochastic differential equations: Deep latent gaussian models in the diffusion limit. arXiv preprint arXiv:1905.09883, 2019.
  35. Compositional uncertainty in deep gaussian processes. In Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), volume 124, pp.  480–489, 2020.
  36. Uncertainty estimation using a single deep deterministic neural network. In Proceedings of the 37th International Conference on Machine Learning, volume 119, pp.  9690–9700, 2020.
  37. Accounting for informative sampling when learning to forecast treatment outcomes over time. arXiv preprint arXiv:2306.04255, 2023.
  38. Infinitely deep bayesian neural networks with stochastic differential equations. In Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, volume 151, pp.  721–738, 2022.
  39. Yarotsky, D. Error bounds for approximations with deep relu networks. Neural Networks, 94:103–114, 2017. ISSN 0893-6080.
  40. Approximation capabilities of neural ODEs and invertible residual networks. In Proceedings of the 37th International Conference on Machine Learning, volume 119, pp.  11086–11095, 2020.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com