Learn to Accumulate Evidence from All Training Samples: Theory and Practice (2306.11113v2)
Abstract: Evidential deep learning, built upon belief theory and subjective logic, offers a principled and computationally efficient way to turn a deterministic neural network uncertainty-aware. The resultant evidential models can quantify fine-grained uncertainty using the learned evidence. To ensure theoretically sound evidential models, the evidence needs to be non-negative, which requires special activation functions for model training and inference. This constraint often leads to inferior predictive performance compared to standard softmax models, making it challenging to extend them to many large-scale datasets. To unveil the real cause of this undesired behavior, we theoretically investigate evidential models and identify a fundamental limitation that explains the inferior performance: existing evidential activation functions create zero evidence regions, which prevent the model to learn from training samples falling into such regions. A deeper analysis of evidential activation functions based on our theoretical underpinning inspires the design of a novel regularizer that effectively alleviates this fundamental limitation. Extensive experiments over many challenging real-world datasets and settings confirm our theoretical findings and demonstrate the effectiveness of our proposed approach.
- Deep evidential regression. Advances in Neural Information Processing Systems, 33:14927–14937, 2020.
- Evidential deep learning for open set action recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13349–13358, 2021.
- Pattern recognition and machine learning, volume 4. Springer, 2006.
- Weight uncertainty in neural network. In International conference on machine learning, pp. 1613–1622. PMLR, 2015.
- Posterior network: Uncertainty estimation without ood samples via density-based pseudo-counts. Advances in Neural Information Processing Systems, 33:1356–1367, 2020.
- Meta-baseline: Exploring simple meta-learning for few-shot learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9062–9071, 2021.
- Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 1126–1135. JMLR. org, 2017.
- Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning, pp. 1050–1059. PMLR, 2016.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
- Huynh, E. Vision transformers in 2022: An update on tiny imagenet. arXiv preprint arXiv:2205.10660, 2022.
- Jøsang, A. Subjective logic, volume 3. Springer, 2016.
- Deep learning for NLP and speech recognition, volume 84. Springer, 2019.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Knopp, K. Weierstrass’s factor-theorem. In Theory of Functions: Part II, pp. 1–7. Dover, 1996.
- Evaluating robustness of predictive uncertainty estimation: Are dirichlet-based models reliable? In International Conference on Machine Learning, pp. 5707–5718. PMLR, 2021.
- Learning multiple layers of features from tiny images. -, 2009.
- Simple and scalable predictive uncertainty estimation using deep ensembles. Advances in neural information processing systems, 30, 2017.
- LeCun, Y. The mnist database of handwritten digits. http://yann. lecun. com/exdb/mnist/, 1998.
- Object recognition with gradient-based learning. In Shape, contour and grouping in computer vision, pp. 319–345. Springer, 1999.
- Predictive uncertainty estimation via prior networks. Advances in neural information processing systems, 31, 2018.
- Reverse kl-divergence training of prior networks: Improved uncertainty and adversarial robustness. Advances in Neural Information Processing Systems, 32, 2019.
- Dropconnect is effective in modeling uncertainty of bayesian deep networks. Scientific reports, 11(1):1–14, 2021.
- Reading digits in natural images with unsupervised feature learning, 2011.
- Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 427–436, 2015.
- Multidimensional belief quantification for label-efficient meta-learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14391–14400, June 2022a.
- Evidential conditional neural processes. arXiv preprint arXiv:2212.00131, 2022b.
- Uncertainty in neural networks: Approximately bayesian ensembling. In International conference on artificial intelligence and statistics, pp. 234–244. PMLR, 2020.
- Evidential deep learning to quantify classification uncertainty. Advances in neural information processing systems, 31, 2018.
- Shafer, G. A mathematical theory of evidence, volume 42. Princeton university press, 1976.
- Multifaceted uncertainty estimation for label-efficient deep learning. Advances in neural information processing systems, 33, 2020.
- Machine translation using deep learning: An overview. In 2017 international conference on computer, communications and electronics (comptelix), pp. 162–167. IEEE, 2017.
- Towards trustworthy predictions from deep neural networks with fast adversarial calibration. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp. 9886–9896, 2021.
- Ulmer, D. A survey on evidential deep learning for single-pass uncertainty estimation. arXiv preprint arXiv:2110.03051, 2021.
- Matching networks for one shot learning. Advances in neural information processing systems, 29, 2016.
- Deep learning for computer vision: A brief review. Computational intelligence and neuroscience, 2018, 2018.
- Uncertainty aware semi-supervised learning on graph data. Advances in Neural Information Processing Systems, 33:12827–12836, 2020.