Enhancing Diversity in Bayesian Deep Learning via Hyperspherical Energy Minimization of CKA (2411.00259v1)
Abstract: Particle-based Bayesian deep learning often requires a similarity metric to compare two networks. However, naive similarity metrics lack permutation invariance and are inappropriate for comparing networks. Centered Kernel Alignment (CKA) on feature kernels has been proposed to compare deep networks but has not been used as an optimization objective in Bayesian deep learning. In this paper, we explore the use of CKA in Bayesian deep learning to generate diverse ensembles and hypernetworks that output a network posterior. Noting that CKA projects kernels onto a unit hypersphere and that directly optimizing the CKA objective leads to diminishing gradients when two networks are very similar. We propose adopting the approach of hyperspherical energy (HE) on top of CKA kernels to address this drawback and improve training stability. Additionally, by leveraging CKA-based feature kernels, we derive feature repulsive terms applied to synthetically generated outlier examples. Experiments on both diverse ensembles and hypernetworks show that our approach significantly outperforms baselines in terms of uncertainty quantification in both synthetic and realistic outlier detection tasks.
- On the surprising behavior of distance metrics in high dimensional spaces. In Proceedings of the 8th International Conference on Database Theory, ICDT ’01, pp. 420–434, Berlin, Heidelberg, 2001. Springer-Verlag. ISBN 3540414568.
- Hypernetwork-based adaptive image restoration. In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5, 2023. doi: 10.1109/ICASSP49357.2023.10095537.
- Hyperstyle: Stylegan inversion with hypernetworks for real image editing. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 18490–18500, 2022. doi: 10.1109/CVPR52688.2022.01796.
- Gradient Flows: In Metric Spaces and in the Space of Probability Measures. Springer Science & Business Media, 2008.
- Hypernetworks in meta-reinforcement learning. In Liu, K., Kulic, D., and Ichnowski, J. (eds.), Proceedings of The 6th Conference on Robot Learning, volume 205 of Proceedings of Machine Learning Research, pp. 1478–1487. PMLR, 14–18 Dec 2023. URL https://proceedings.mlr.press/v205/beck23a.html.
- Recurrent hypernetworks are surprisingly strong in meta-rl. Advances in Neural Information Processing Systems, 36, 2024.
- Improved online conformal prediction via strongly adaptive online learning. In Proceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org, 2023.
- Bradley, A. P. The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognition, 30(7):1145–1159, 1997. ISSN 0031-3203. doi: https://doi.org/10.1016/S0031-3203(96)00142-2. URL https://www.sciencedirect.com/science/article/pii/S0031320396001422.
- A unified particle-optimization framework for scalable bayesian sampling. arXiv preprint arXiv:1805.11659, 2018.
- Describing textures in the wild. In Proceedings of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2014.
- Repulsive deep ensembles are bayesian. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems, volume 34, pp. 3451–3465. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/1c63926ebcabda26b5cdb31b5cc91efb-Paper.pdf.
- Deep ensembles: A loss landscape perspective. arxiv 2019. arXiv preprint arXiv:1912.02757, 2019.
- Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning, pp. 1050–1059. PMLR, 2016.
- A survey of uncertainty in deep neural networks, 2022. URL https://arxiv.org/abs/2107.03342.
- Measuring statistical dependence with hilbert-schmidt norms. In Jain, S., Simon, H. U., and Tomita, E. (eds.), Algorithmic Learning Theory, pp. 63–77, Berlin, Heidelberg, 2005. Springer Berlin Heidelberg. ISBN 978-3-540-31696-1.
- Hypernetworks, 2016.
- Similarity of neural network representations revisited. In Chaudhuri, K. and Salakhutdinov, R. (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp. 3519–3529. PMLR, 09–15 Jun 2019. URL https://proceedings.mlr.press/v97/kornblith19a.html.
- Krizhevsky, A. Learning multiple layers of features from tiny images. Technical report, 2009.
- Bayesian hypernetworks. arXiv preprint arXiv:1710.04759, 2017.
- Bayesian hypernetworks, 2018.
- Simple and scalable predictive uncertainty estimation using deep ensembles. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, pp. 6405–6416, Red Hook, NY, USA, 2017a. Curran Associates Inc. ISBN 9781510860964.
- Simple and scalable predictive uncertainty estimation using deep ensembles. Advances in neural information processing systems, 30, 2017b.
- Tiny imagenet visual recognition challenge. CS 231N, 7(7):3, 2015.
- Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998. doi: 10.1109/5.726791.
- Riemannian stein variational gradient descent for bayesian inference. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence, AAAI’18/IAAI’18/EAAI’18. AAAI Press, 2018. ISBN 978-1-57735-800-8.
- Geometry in sampling methods: A review on manifold mcmc and particle-based variational inference methods. Advancements in Bayesian Methods and Implementations, 47:239, 2022.
- Understanding and accelerating particle-based variational inference. In International Conference on Machine Learning, pp. 4082–4092. PMLR, 2019.
- Simple and principled uncertainty estimation with deterministic deep learning via distance awareness. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 7498–7512. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/543e83748234f7cbab21aa0ade66565f-Paper.pdf.
- Stein variational gradient descent: A general purpose bayesian inference algorithm. Advances in neural information processing systems, 29, 2016.
- Learning with hyperspherical uniformity. In International Conference On Artificial Intelligence and Statistics, pp. 1180–1188. PMLR, 2021.
- A simple baseline for bayesian uncertainty in deep learning. Advances in neural information processing systems, 32, 2019.
- Deep deterministic uncertainty: A new simple baseline. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 24384–24394, June 2023.
- Variational inference for infinitely deep neural networks. In International Conference on Machine Learning, pp. 16447–16461. PMLR, 2022.
- Reading digits in natural images with unsupervised feature learning. In NIPS workshop on deep learning and unsupervised feature learning, volume 2011, pp.  4. Granada, 2011.
- Deep exploration via bootstrapped dqn. Advances in neural information processing systems, 29, 2016.
- Can you trust your model's uncertainty? evaluating predictive uncertainty under dataset shift. In Wallach, H., Larochelle, H., Beygelzimer, A., d'Alché-Buc, F., Fox, E., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019a. URL https://proceedings.neurips.cc/paper_files/paper/2019/file/8558cb408c1d76621371888657d2eb1d-Paper.pdf.
- Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift. Curran Associates Inc., Red Hook, NY, USA, 2019b.
- Blurs behave like ensembles: Spatial smoothings to improve accuracy, uncertainty, and robustness. In International Conference on Machine Learning, pp. 17390–17419. PMLR, 2022.
- Svcca: Singular vector canonical correlation analysis for deep learning dynamics and interpretability. Advances in neural information processing systems, 30, 2017.
- HyperGAN: A generative model for diverse, performant neural networks. In Chaudhuri, K. and Salakhutdinov, R. (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp. 5361–5369. PMLR, 09–15 Jun 2019. URL https://proceedings.mlr.press/v97/ratzlaff19a.html.
- Implicit generative modeling for efficient exploration. In III, H. D. and Singh, A. (eds.), Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pp. 7985–7995. PMLR, 13–18 Jul 2020. URL https://proceedings.mlr.press/v119/ratzlaff20a.html.
- On the decreasing power of kernel and distance based nonparametric hypothesis tests in high dimensions, 2014. URL https://arxiv.org/abs/1406.2083.
- Recomposing the reinforcement learning building blocks with hypernetworks. In Meila, M. and Zhang, T. (eds.), Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp. 9301–9312. PMLR, 18–24 Jul 2021. URL https://proceedings.mlr.press/v139/sarafian21a.html.
- Uncertainty estimation using a single deep deterministic neural network. In Proceedings of the 37th International Conference on Machine Learning, ICML’20. JMLR.org, 2020.
- Villani, C. et al. Optimal transport: old and new, volume 338. Springer, 2009.
- Coplanner: Plan to roll out conservatively but to explore optimistically for model-based rl, 2023. URL https://arxiv.org/abs/2310.07220.
- A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In Proceedings of the 24rd International Conference on Very Large Data Bases, VLDB ’98, pp. 194–205, San Francisco, CA, USA, 1998. Morgan Kaufmann Publishers Inc. ISBN 1558605665.
- Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th international conference on machine learning (ICML-11), pp. 681–688. Citeseer, 2011.
- Batchensemble: An alternative approach to efficient ensemble and lifelong learning, 2020. URL https://arxiv.org/abs/2002.06715.
- Bayesian deep learning and a probabilistic perspective of generalization. arXiv preprint arXiv:2002.08791, 2020.
- Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv e-prints, art. arXiv:1708.07747, August 2017. doi: 10.48550/arXiv.1708.07747.
- Active learning using uncertainty information. In 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 2646–2651. IEEE, 2016.
- Hypertransformer: Model generation for supervised and semi-supervised few-shot learning. In International Conference on Machine Learning, pp. 27075–27098. PMLR, 2022.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days freePaper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.