To view this video please enable JavaScript, and consider upgrading to a web browser that supports HTML5 video.

Learning from All Samples in Evidential Deep Learning

This presentation explores fundamental limitations in Evidential Deep Learning (EDL) models, particularly their struggle with sub-optimal training and reduced accuracy on complex datasets. It introduces the concept of a "zero-evidence" region where training samples fail to contribute gradients, hindering learning. The talk will present the Regularized Evidential model (RED), a novel solution designed to prevent training samples from falling into these regions, thereby improving predictive performance while maintaining robust uncertainty outputs.

Script

Deep neural networks are powerful, but how do we truly know when they are uncertain about their predictions? Evidential deep learning promised a path to robust uncertainty without complex sampling, yet often fell short in accuracy. This paper, 'Learn to Accumulate Evidence from All Training Samples: Theory and Practice,' uncovers a critical flaw in how these models learn, and offers an elegant solution to unlock their full potential.

Connecting to the problem, the authors observed that while evidential models offer uncertainty estimates, their predictive accuracy frequently lags behind traditional softmax models, especially on challenging datasets like CIFAR100. They identified a core issue: the existence of 'zero-evidence samples' where the model assigns no confidence to any class. Critically, when training data falls into these 'zero-evidence regions,' the model simply stops learning from those samples.

This brings us to the theoretical heart of the paper, detailing why and how this learning deficiency occurs.

The authors precisely diagnose this problem with a key theoretical finding: if a training sample generates zero evidence, then the gradient of the evidential loss concerning the network parameters becomes zero. This means the model essentially ignores these samples during learning. They show this learning deficiency persists across different loss functions and common evidential activation functions, and even existing 'incorrect-evidence' regularizers cannot resolve it.

To illustrate this problem, consider this diagnostic experiment on a tiny dataset of four samples. A standard evidential model, using ReLU activations and MSE loss, gets stuck at 50% training accuracy. This happens because two of the training samples fall into the zero-evidence region, effectively becoming 'invisible' to the learning process. In contrast, a standard softmax model quickly achieves 100% accuracy, demonstrating that the information was indeed present and learnable.

Beyond diagnosing the problem, the authors also analyze how different activation functions contribute to it. They prove that the exponential activation consistently yields larger gradient updates from non-positive logits compared to ReLU or SoftPlus, thereby reducing the critical "zero-evidence" regions. Building on this insight, they propose the Regularized Evidential model, or RED, which introduces a 'correct-evidence regularization' term. This term dynamically ensures that even if a sample initially produces zero evidence, its ground-truth logit still receives a non-zero gradient, allowing the model to learn effectively from all training data.

Now, let us examine the compelling empirical evidence that supports their theoretical claims and the effectiveness of RED.

Empirically, the 'exp' activation consistently yielded better accuracy than ReLU and SoftPlus in evidential models, and RED further amplified this improvement. A striking result is RED's dramatic reduction in zero-evidence prevalence: on CIFAR100, the fraction of training samples with minimal evidence plummeted from nearly 97% to just 0.06%. This innovation also translated into strong performance in few-shot learning settings on mini-ImageNet and achieved state-of-the-art out-of-distribution detection, using vacuity as an OOD score.

This work is significant because it pinpoints a fundamental training-time failure mode in evidential classifiers, where "zero evidence leads to zero gradient," causing labeled data to be ignored. RED directly addresses this, ensuring that evidential learning can genuinely utilize all training samples, which is crucial for improving predictive performance while maintaining reliable uncertainty outputs. This refined approach makes evidential deep learning more robust and accessible for diverse, complex applications.

While the RED approach makes significant strides, the authors acknowledge several areas for future research. Currently, their focus has been primarily on classification tasks, with plans to extend this approach to more complex areas like segmentation and detection. They also note that continued tuning of the incorrect-evidence regularization hyperparameter remains important, and further theoretical exploration into the nuances of various evidential losses, such as log versus digamma, is still warranted.

By systematically addressing the zero-evidence problem, this paper significantly advances our ability to build robust, uncertainty-aware deep learning models that truly learn from every piece of data. To dive deeper into the full theoretical proofs and extensive experimental results, visit EmergentMind.com.