- The paper introduces a training method that combines additional loss terms with GAN-generated samples to reduce overconfidence on OOD instances.
- The approach significantly improves OOD detection performance, achieving near-perfect AUROC and AUPR on benchmarks like CIFAR-10 and SVHN.
- The method enhances AI safety by maintaining high in-distribution accuracy while reliably distinguishing between familiar and novel inputs.
Training Confidence-calibrated Classifiers for Detecting Out-of-Distribution Samples
Abstract and Motivation
Deep neural networks (DNNs) have demonstrated exceptional performance across various classification tasks, including speech recognition, image classification, video prediction, and medical diagnosis. Despite their high accuracy, DNNs are known to exhibit overconfident predictions even for out-of-distribution (OOD) samples, posing significant challenges in applications requiring reliable uncertainty estimates and AI safety. Previous work has primarily focused on developing threshold-based OOD detectors that leverage pre-trained classifiers. However, these detectors' performance is highly dependent on the classifiers' ability to separate in- and out-of-distribution samples effectively. This paper introduces a novel training method aimed at improving classifiers' performance in OOD detection while maintaining their classification accuracy.
Methodology
The proposed method incorporates two additional loss terms into the standard cross entropy loss during training. The first term minimizes the Kullback-Leibler (KL) divergence between the predictive distribution on OOD samples and a uniform distribution, thereby reducing the classifier's confidence on these samples. The second term employs a generative adversarial network (GAN) to generate effective training samples for OOD detection. The overall training process involves jointly optimizing the classification and GAN models, which enables the classifier to produce more reliable uncertainty estimates.
Loss Function Design
The new loss function, termed confidence loss, is defined as follows:
1
2
|
min_θ E_{P_in(𝒙, y)}[ - log P_θ(y|𝒙) ]
+ β E_{P_out(𝒙)}[ KL(𝒰(y) ∥ P_θ(y|𝒙)) ] |
where θ represents the model parameters, P_in denotes the in-distribution, P_out denotes the OOD samples, β is a penalty parameter, and 𝒰(y) is the uniform distribution. The KL divergence term ensures that the classifier assigns low confidence to OOD samples, enhancing the separability between in- and out-distributions.
Generative Adversarial Network for OOD Sample Generation
To address the problem of generating effective OOD samples, the authors propose a new GAN architecture. Unlike traditional GANs, which generate samples resembling the in-distribution, the proposed GAN generates samples in the low-density regions near the boundary of the in-distribution. This approach is motivated by the insight that OOD samples close to the in-distribution in the feature space are more effective for training the classifier to distinguish between in- and out-distributions. The GAN training objective includes a term that minimizes the KL divergence between the generated samples' predictive distribution and the uniform distribution, along with the original GAN objective.
Experimental Results
The proposed method is evaluated using deep convolutional neural networks, such as AlexNet and VGGNet, on image classification tasks involving the CIFAR-10, SVHN, ImageNet, and LSUN datasets. The results demonstrate that classifiers trained using the proposed method significantly improve the detection performance of threshold-based detectors across all experiments. Specifically, the VGGNet trained using this method shows almost perfect detection performance on CIFAR-10 and SVHN datasets.
Numerical Results
The experiments highlight the effectiveness of the proposed training method in OOD detection. For instance, a VGGNet trained with the proposed method on CIFAR-10 achieves:
- Detection accuracy of 99.9%,
- AUROC of 100.0%,
- AUPR (in) of 100.0%,
- AUPR (out) of 99.9%.
Similarly, on the SVHN dataset, the classifier achieves comparable results, demonstrating robustness across different in-distributions.
Implications and Future Work
The proposed training method significantly enhances neural networks' ability to detect OOD samples without compromising their classification accuracy. This advancement has practical implications for deploying AI systems in safety-critical applications, such as medical diagnostics and autonomous systems, where reliable uncertainty estimates are crucial.
Future work could explore integrating this training method with Bayesian probabilistic models and ensemble techniques to further improve OOD detection performance. Additionally, extending the approach to other tasks, such as regression and network calibration, could provide further insights into its generalizability and utility in various machine learning domains.
Conclusion
This paper presents a novel training method that integrates a confidence loss framework with a GAN-based OOD sample generator, effectively addressing the overconfidence issue in DNNs for OOD detection. The experimental results underscore the method's efficacy, offering a robust solution for enhancing AI safety and reliability in real-world applications.