Exploring the Limits of Out-of-Distribution Detection (2106.03004v3)

Published 6 Jun 2021 in cs.LG

Abstract: Near out-of-distribution detection (OOD) is a major challenge for deep neural networks. We demonstrate that large-scale pre-trained transformers can significantly improve the state-of-the-art (SOTA) on a range of near OOD tasks across different data modalities. For instance, on CIFAR-100 vs CIFAR-10 OOD detection, we improve the AUROC from 85% (current SOTA) to more than 96% using Vision Transformers pre-trained on ImageNet-21k. On a challenging genomics OOD detection benchmark, we improve the AUROC from 66% to 77% using transformers and unsupervised pre-training. To further improve performance, we explore the few-shot outlier exposure setting where a few examples from outlier classes may be available; we show that pre-trained transformers are particularly well-suited for outlier exposure, and that the AUROC of OOD detection on CIFAR-100 vs CIFAR-10 can be improved to 98.7% with just 1 image per OOD class, and 99.46% with 10 images per OOD class. For multi-modal image-text pre-trained transformers such as CLIP, we explore a new way of using just the names of outlier classes as a sole source of information without any accompanying images, and show that this outperforms previous SOTA on standard vision OOD benchmark tasks.

PDF Abstract

Exploring the Limits of Out-of-Distribution Detection

The paper "Exploring the Limits of Out-of-Distribution Detection" investigates the performance boundaries of detecting out-of-distribution (OOD) inputs in deep neural networks, with a particular focus on leveraging large-scale pre-trained transformers. These models demonstrate a marked enhancement over the state-of-the-art (SOTA) across a spectrum of OOD detection tasks involving different data modalities.

The authors present compelling numerical results. For example, in the task of distinguishing CIFAR-100 from CIFAR-10, the application of Vision Transformers (ViT) pre-trained on ImageNet-21k elevates the area under the receiver operating characteristic curve (AUROC) from 85% to 96%. Similarly, in a challenging genomics OOD detection benchmark, the transition to transformers and unsupervised pre-training boosts AUROC from 66% to 77%.

To further ameliorate performance, the authors examine the few-shot outlier exposure setting, hypothesizing that pre-trained transformers are particularly adept at leveraging minimal amounts of outlier data. They show a significant uplift in OOD detection performance with just a handful of outlier examples: achieving an AUROC of 98.7% with one image per OOD class and 99.46% with ten images.

In addition to these strong results, the authors explore a novel application of multi-modal image-text pre-trained transformers, such as CLIP, in using only the names of outlier classes as a source of information—surprising outcomes indicate that this method outshines previous SOTA on standard vision OOD benchmark tasks.

The research has significant implications, theoretically and practically. It points to the potential of large-scale pre-trained transformer models to be less susceptible to shortcut learning and more effective in the nuanced task of near-OOD detection. This exploration of few-shot learning and leveraging of multi-modal resources extends the potential application of these models in domains requiring rigorous safety measures, such as autonomous navigation and healthcare diagnostics.

For future directions, further research can explore optimizing unsupervised pre-training strategies to bolster OOD detection performance across more modalities beyond vision and genomics. Additionally, as large models are constrained by data privacy concerns and computational resources, efficient fine-tuning mechanisms and cost-effective deployment strategies warrant additional investigation.

In conclusion, this paper demonstrates that the integration of large-scale pre-trained transformers into the OOD detection paradigm not only enhances existing methodologies but also offers new vistas in ensuring model robustness against distributional shifts, thereby fortifying the safe deployment of machine learning models in complex real-world scenarios.