Random-Set Neural Networks (RS-NN) (2307.05772v2)
Abstract: Machine learning is increasingly deployed in safety-critical domains where robustness against adversarial attacks is crucial and erroneous predictions could lead to potentially catastrophic consequences. This highlights the need for learning systems to be equipped with the means to determine a model's confidence in its prediction and the epistemic uncertainty associated with it, 'to know when a model does not know'. In this paper, we propose a novel Random-Set Neural Network (RS-NN) for classification. RS-NN predicts belief functions rather than probability vectors over a set of classes using the mathematics of random sets, i.e., distributions over the power set of the sample space. RS-NN encodes the 'epistemic' uncertainty induced in machine learning by limited training sets via the size of the credal sets associated with the predicted belief functions. Our approach outperforms state-of-the-art Bayesian (LB-BNN, BNN-R) and Ensemble (ENN) methods in a classical evaluation setting in terms of performance, uncertainty estimation and out-of-distribution (OoD) detection on several benchmarks (CIFAR-10 vs SVHN/Intel-Image, MNIST vs FMNIST/KMNIST, ImageNet vs ImageNet-O) and scales effectively to large-scale architectures such as WideResNet-28-10, VGG16, Inception V3, EfficientNetB2, and ViT-Base.
- Active learning: A survey. In Data Classification: Algorithms and Applications, pages 571–605. CRC Press, 2014.
- A gentle introduction to conformal prediction and distribution-free uncertainty quantification. CoRR, abs/2107.07511, 2021.
- Credal sets approximation by lower probabilities: Application to credal networks. In Eyke Hüllermeier, Rudolf Kruse, and Frank Hoffmann, editors, Computational Intelligence for Knowledge-Based Systems Design, volume 6178 of Lecture Notes in Computer Science, pages 716–725. Springer, Berlin Heidelberg, 2010.
- Credal sets approximation by lower probabilities: application to credal networks. In Computational Intelligence for Knowledge-Based Systems Design: 13th International Conference on Information Processing and Management of Uncertainty, IPMU 2010, Dortmund, Germany, June 28-July 2, 2010. Proceedings 13, pages 716–725. Springer, 2010.
- Puneet Bansal. Intel image classification - image scene classification of multiclass. https://www.kaggle.com/datasets/puneet6060/intel-image-classification.
- Bayesian back-propagation. Complex Syst., 5, 1991.
- Ensemble selection from libraries of models. In Proceedings of the Twenty-First International Conference on Machine Learning, ICML ’04, page 18, New York, NY, USA, 2004. Association for Computing Machinery.
- Some characterizations of lower probabilities and other monotone capacities through the use of möbius inversion. Mathematical Social Sciences, 17(3):263–283, 1989.
- Learning reliable classifiers from small or incomplete data sets: The naive credal classifier 2. Journal of Machine Learning Research, 9:581–621, 04 2008.
- Fabio Cuzzolin. Geometry of upper probabilities. In ISIPTA, pages 188–203, 2003.
- Fabio Cuzzolin. A geometric approach to the theory of evidence. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 38(4):522–534, 2008.
- Fabio Cuzzolin. On the credal structure of consistent probabilities. In European Workshop on Logics in Artificial Intelligence, pages 126–139. Springer, 2008.
- Fabio Cuzzolin. On the credal structure of consistent probabilities. In Steffen Hölldobler, Carsten Lutz, and Heinrich Wansing, editors, Logics in Artificial Intelligence, volume 5293 of Lecture Notes in Computer Science, pages 126–139. Springer, Berlin Heidelberg, 2008.
- Fabio Cuzzolin. Credal semantics of Bayesian transformations in terms of probability intervals. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 40(2):421–432, 2010.
- Fabio Cuzzolin. The geometry of consonant belief functions: simplicial complexes of necessity measures. Fuzzy Sets and Systems, 161(10):1459–1479, 2010.
- Fabio Cuzzolin. l_𝑙_l\_italic_l _{p𝑝pitalic_p} consonant approximations of belief functions. IEEE Transactions on Fuzzy Systems, 22(2):420–436, 2013.
- Fabio Cuzzolin. Visions of a generalized probability theory. arXiv preprint arXiv:1810.10341, 2018.
- Fabio Cuzzolin. The Geometry of Uncertainty: The Geometry of Imprecise Probabilities. Artificial Intelligence: Foundations, Theory, and Algorithms. Springer International Publishing, 2020.
- Geometric analysis of belief space and conditional subspaces. In ISIPTA, pages 122–132, 2001.
- Laplace redux - effortless bayesian deep learning. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 20089–20103. Curran Associates, Inc., 2021.
- Bruno de Finetti. Theory of Probability. Wiley, London, 1974.
- Arthur P. Dempster. Upper and lower probability inferences based on a sample from a finite univariate population. Biometrika, 54(3-4):515–528, 1967.
- Interaction transform of set functions over a finite set. Information Sciences, 121(1-2):149–170, 1999.
- T. Denoeux. A neural network classifier based on dempster-shafer theory. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans, 30(2):131–150, 2000.
- Thierry Denoeux. An evidential neural network model for regression based on random fuzzy numbers, 2022.
- Thierry Denœux. Nn-evclus: Neural network-based evidential clustering. Information Sciences, 572:297–330, 2021.
- Thierry Denœux. Reasoning with fuzzy and uncertain evidence using epistemic random fuzzy sets: General framework and practical models. Fuzzy Sets and Systems, 2022.
- Uncertainty calibration in bayesian neural networks via distance-aware priors, 2022.
- A set-theoretic view of belief functions Logical operations and approximations by fuzzy sets. International Journal of General Systems, 12(3):193–226, 1986.
- Consonant approximations of belief functions. International Journal of Approximate Reasoning, 4:419–449, 1990.
- Hierarchical neural networks utilising dempster-shafer evidence theory. In Friedhelm Schwenker and Simone Marinai, editors, Artificial Neural Networks in Pattern Recognition, pages 198–209, Berlin, Heidelberg, 2006. Springer Berlin Heidelberg.
- Vincent Fortuin. Priors in bayesian deep learning: A review, 2021.
- David Freedman. Wald lecture: On the bernstein-von mises theorem with infinite-dimensional parameters. The Annals of Statistics, 27(4):1119–1141, 1999.
- Bayesian convolutional neural networks with bernoulli approximate variational inference, 2015.
- Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning, pages 1050–1059. PMLR, 2016.
- Selective classification for deep neural networks. arXiv:1705.08500, 2017.
- Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-6(6):721–741, 1984.
- Alex Graves. Practical variational inference for neural networks. In J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 24. Curran Associates, Inc., 2011.
- Joseph Y. Halpern. Reasoning About Uncertainty. MIT Press, 2017.
- Fast predictive uncertainty for classification with bayesian deep networks. In James Cussens and Kun Zhang, editors, Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, volume 180 of Proceedings of Machine Learning Research, pages 822–832. PMLR, 01–05 Aug 2022.
- Fast predictive uncertainty for classification with bayesian deep networks. In Uncertainty in Artificial Intelligence, pages 822–832. PMLR, 2022.
- Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods. Machine Learning, 110(3):457–506, 2021.
- What uncertainties do we need in bayesian deep learning for computer vision? arXiv:1703.04977, 2017.
- Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. CoRR, abs/1705.07115, 2017.
- Variational dropout and the local reparameterization trick, 2015.
- Auto-encoding variational bayes, 2013.
- Alex Krizhevsky. Learning multiple layers of features from tiny images. University of Toronto, 05 2012.
- Cifar-10 (canadian institute for advanced research). Technical report, CIFAR, 2009.
- Henry E. Kyburg. Bayesian and non-Bayesian evidential updating. Artificial Intelligence, 31(3):271–294, 1987.
- The MNIST database of handwritten digits. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1–9, 2005.
- Isaac Levi. The enterprise of knowledge: An essay on knowledge, credal probability, and chance. The MIT Press, Cambridge, Massachusetts, 1980.
- Model adaptation: Unsupervised domain adaptation without source data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9641–9650, 2020.
- Evidence combination based on credal belief redistribution for pattern classification. IEEE Transactions on Fuzzy Systems, 28(4):618–631, 2019.
- David J. C. MacKay. A Practical Bayesian Framework for Backpropagation Networks. Neural Computation, 4(3):448–472, 05 1992.
- Epistemic deep learning. arXiv preprint arXiv:2206.07609, 2022.
- Marie-Hélène Masson and T. Denœux. Ecm: An evidential version of the fuzzy c-means algorithm. Pattern Recognition, 41(4):1384–1397, 2008.
- Ilya Molchanov. Theory of Random Sets. Springer London, 2017.
- Radford Neal. Bayesian learning via stochastic dynamics. In S. Hanson, J. Cowan, and C. Giles, editors, Advances in Neural Information Processing Systems, volume 5. Morgan-Kaufmann, 1992.
- Radford M. Neal. Bayesian learning for neural networks. In Proceedings of the Conference on Neural Information Processing Systems (NeurIPS), pages 50–58, 1995.
- Belief Functions and Random Sets, pages 243–255. Springer, 01 1997.
- Hung T. Nguyen. On random sets and belief functions. Journal of Mathematical Analysis and Applications, 65:531–542, 1978.
- Hung T. Nguyen. On entropy of random sets and possibility distributions. In J. C. Bezdek, editor, The Analysis of Fuzzy Information, pages 145–156. CRC Press, 1985.
- Epistemic neural networks. CoRR, abs/2107.08924, 2021.
- Uncertainty measures for evidential reasoning ii: A new measure of total uncertainty. International Journal of Approximate Reasoning, 8(1):1–16, 1993.
- The limitations of deep learning in adversarial settings. In 2016 IEEE European symposium on security and privacy (EuroS&P), pages 372–387, 2016.
- Ara: accurate, reliable and active histopathological image classification framework with bayesian deep learning. Scientific reports, 9(1):14347, 2019.
- A meta-analysis of overfitting in machine learning. Advances in Neural Information Processing Systems, 32, 2019.
- Tractable function-space variational inference in bayesian neural networks. Advances in Neural Information Processing Systems, 35:22686–22698, 2022.
- Evidential deep learning to quantify classification uncertainty. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, page 3183–3193, Red Hook, NY, USA, 2018. Curran Associates Inc.
- Glenn Shafer. A Mathematical Theory of Evidence. Princeton University Press, 1976.
- Keivan Shariatmadar. Path planning problem under non-probabilistic uncertainty, 2022.
- Cmmse: Linear programming under ϵitalic-ϵ\epsilonitalic_ϵ-contamination uncertainty. Computational and Mathematical Methods, 2(2):e1077, 2020. e1077 cmm4.1077.
- Identification of imprecision in data using ϵitalic-ϵ\epsilonitalic_ϵ-contamination advanced uncertainty model. In Peter F. Pelz and Peter Groche, editors, Uncertainty in Mechanical Engineering, pages 157–172, Cham, 2021. Springer International Publishing.
- Linear optimisation under advanced uncertainty using imprecise decision theory, 2021.
- An introduction to optimization under uncertainty – a short survey, 2022.
- Philippe Smets. The transferable belief model and other interpretations of Dempster–Shafer’s model. In P. P. Bonissone, M. Henrion, L. N. Kanal, and J. F. Lemmer, editors, Uncertainty in Artificial Intelligence, volume 6, pages 375–383. North-Holland, Amsterdam, 1991.
- Philippe Smets. Decision making in the tbm: the necessity of the pignistic transformation. International Journal of Approximate Reasoning, 38(2):133–147, 2005.
- Philippe Smets. Bayes’ theorem generalized for belief functions. In Proceedings of the 7th European Conference on Artificial Intelligence (ECAI-86), volume 2, pages 169–171, July 1986.
- Vincent Spruyt. How to draw a covariance error ellipse, 2014.
- An evidential classifier based on dempster-shafer theory and deep learning. Neurocomputing, 450:275–293, 2021.
- Classification systems based on rough sets under the belief function framework. Int. J. Approx. Reason., 52:1409–1432, 2011.
- All you need is a good functional prior for bayesian deep learning. arXiv preprint arXiv:2011.12829, 2020.
- Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of Machine Learning Research, 9(86):2579–2605, 2008.
- Peter Walley. Statistical Reasoning with Imprecise Probabilities. Chapman and Hall, New York, 1991.
- Anton Wallner. Maximal number of vertices of polytopes defined by f-probabilities. In Fabio Gagliardi Cozman, R. Nau, and Teddy Seidenfeld, editors, Proceedings of the Fourth International Symposium on Imprecise Probabilities and Their Applications (ISIPTA 2005), pages 126–139, 2005.
- Rethinking bayesian deep learning methods for semi-supervised volumetric medical image segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 182–190, June 2022.
- Flipout: Efficient pseudo-independent weight perturbations on mini-batches, 2018.
- Bayesian deep learning and a probabilistic perspective of generalization. Advances in neural information processing systems, 33:4697–4708, 2020.
- Marco Zaffalon. The naive credal classifier. Journal of Statistical Planning and Inference - J STATIST PLAN INFER, 105:5–21, 06 2002.
- Tree-based credal networks for classification. Reliable computing, 9(6):487–509, 2003.
- Learning from the wisdom of crowds by minimax entropy. In F. Pereira, C.J. Burges, L. Bottou, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc., 2012.