Scaling Out-of-Distribution Detection for Real-World Settings (1911.11132v4)

Published 25 Nov 2019 in cs.CV and cs.LG

Abstract: Detecting out-of-distribution examples is important for safety-critical machine learning applications such as detecting novel biological phenomena and self-driving cars. However, existing research mainly focuses on simple small-scale settings. To set the stage for more realistic out-of-distribution detection, we depart from small-scale settings and explore large-scale multiclass and multi-label settings with high-resolution images and thousands of classes. To make future work in real-world settings possible, we create new benchmarks for three large-scale settings. To test ImageNet multiclass anomaly detectors, we introduce the Species dataset containing over 700,000 images and over a thousand anomalous species. We leverage ImageNet-21K to evaluate PASCAL VOC and COCO multilabel anomaly detectors. Third, we introduce a new benchmark for anomaly segmentation by introducing a segmentation benchmark with road anomalies. We conduct extensive experiments in these more realistic settings for out-of-distribution detection and find that a surprisingly simple detector based on the maximum logit outperforms prior methods in all the large-scale multi-class, multi-label, and segmentation tasks, establishing a simple new baseline for future work.

Citations (388)

View on Semantic Scholar

Summary

The paper introduces scalable benchmarks and a maximum logit method that outperforms traditional MSP detectors in large-scale out-of-distribution detection.
It leverages extensive datasets like ImageNet-21K and the Species dataset to evaluate multiclass, multi-label, and segmentation tasks under realistic conditions.
The study offers practical insights with simulation-based anomaly segmentation, paving the way for robust OOD detection in safety-critical applications.

Scaling Out-of-Distribution Detection for Real-World Settings

The paper "Scaling Out-of-Distribution Detection for Real-World Settings" by Hendrycks et al. explores the pressing challenge of out-of-distribution (OOD) detection within the ambit of large-scale machine learning applications. The authors identify the limitations in current OOD detection research that predominantly addresses small-scale settings with constrained datasets and classes. This paper proposes a transition towards large-scale settings, reflecting more granular and realistic conditions encountered in practical applications.

Core Contributions

The research presented addresses several critical areas in large-scale OOD detection:

Benchmark Creation: The authors introduce new benchmarks designed for evaluating OOD detection methods in expansive settings, marking a departure from simplistic and small-scale environments. These benchmarks include a multiclass image dataset called Species, assembled to test ImageNet-trained models, and a road anomaly segmentation benchmark created using advanced simulation environments.
Simple Baseline Method: The paper presents an OOD detector based on the maximum logit, which surpasses previous state-of-the-art methods like the maximum softmax probability (MSP) detector in efficacy across large-scale multiclass, multi-label, and segmentation tasks. This finding challenges the existing paradigms and suggests that simpler methodologies might be more effective for certain large-scale contexts.
New Datasets for Evaluation: The paper leverages high-resolution datasets such as ImageNet-21K to assess multilabel anomaly detectors and introduces the Species dataset with over 700,000 images. This dataset is specifically designed to eliminate overlap with existing training sets, ensuring cleaner evaluation scenarios.
Anomaly Segmentation Innovations: Leveraging simulated environments powered by the Unreal Engine and CARLA, the authors develop datasets for realistic anomaly segmentation. This approach sidesteps traditional issues like artificial patch placement artifacts that arise in more naive cut-and-paste benchmark scenarios.

Empirical Performance and Implications

The maximum logit method demonstrates superior performance on large-scale multiclass OOD detection tasks, highlighting how probability mass dispersion over numerous classes in large datasets like ImageNet may degrade MSP performance. Moreover, contrary to some recent suggestions that advanced model architectures like Vision Transformers naturally enhance OOD detection capabilities, the findings indicate that such improvements are non-trivial without the aid of the proposed MaxLogit approach.

In the multi-label scenario, the introduced methodology adapts the detection approach by broadening its applicability and establishing a framework for research in domains where traditional single-label datasets are inadequate. MaxLogit again demonstrates robustness across varied datasets, outperforming competitive baselines.

For anomaly segmentation, challenges such as semantic boundaries and environmental consistency are tackled. The paper finds that the MaxLogit approach also mitigates some issues related to semantic boundary artifacts, offering pathways to refine object segmentation methods in dynamic environments.

Theoretical and Practical Implications

The findings of this research have both theoretical and practical implications. Theoretically, it prompts reconsideration of how baseline methods and dataset scale impact OOD detection performance. Practically, the extensive datasets and baselines set forth enable future research to engage with OOD detection in forms more aligned with real-world demands. By providing benchmarks and methodological insights, the paper aids in advancing the development of OOD detection approaches that are feasible and reliable for deployment in safety-critical applications such as autonomous driving and ecological monitoring.

Future Directions

Looking ahead, potential research directions include refining and expanding the baseline methodologies and exploring novel approaches that leverage broader contextual information or incorporate domain-specific knowledge. The insights garnered from high-fidelity simulation environments could facilitate the development of more sophisticated algorithms capable of discerning subtler anomalies and adapting to evolving environmental contexts. Furthermore, pursuing robust cross-dataset evaluations and addressing label noise remains a pivotal area for progressing OOD detection research.

In summary, Hendrycks et al. provide a substantial contribution to the field of OOD detection through their scaling efforts, benchmark innovations, and critical empirical insights, setting new foundations for advancing both academic research and practical applications.

PDF Markdown