Full-Spectrum Out-of-Distribution Detection (2204.05306v1)

Published 11 Apr 2022 in cs.CV, cs.AI, and cs.LG

Abstract: Existing out-of-distribution (OOD) detection literature clearly defines semantic shift as a sign of OOD but does not have a consensus over covariate shift. Samples experiencing covariate shift but not semantic shift are either excluded from the test set or treated as OOD, which contradicts the primary goal in machine learning -- being able to generalize beyond the training distribution. In this paper, we take into account both shift types and introduce full-spectrum OOD (FS-OOD) detection, a more realistic problem setting that considers both detecting semantic shift and being tolerant to covariate shift; and designs three benchmarks. These new benchmarks have a more fine-grained categorization of distributions (i.e., training ID, covariate-shifted ID, near-OOD, and far-OOD) for the purpose of more comprehensively evaluating the pros and cons of algorithms. To address the FS-OOD detection problem, we propose SEM, a simple feature-based semantics score function. SEM is mainly composed of two probability measures: one is based on high-level features containing both semantic and non-semantic information, while the other is based on low-level feature statistics only capturing non-semantic image styles. With a simple combination, the non-semantic part is cancelled out, which leaves only semantic information in SEM that can better handle FS-OOD detection. Extensive experiments on the three new benchmarks show that SEM significantly outperforms current state-of-the-art methods. Our code and benchmarks are released in https://github.com/Jingkang50/OpenOOD.

Authors (3)

Jingkang Yang (36 papers)
Kaiyang Zhou (40 papers)
Ziwei Liu (368 papers)

Citations (45)

View on Semantic Scholar

Summary

Full-Spectrum Out-of-Distribution Detection

This paper presents a refined approach to out-of-distribution (OOD) detection by introducing the concept of full-spectrum OOD (FS-OOD) detection. The authors identify a critical limitation in current OOD detection research, which primarily addresses semantic shifts while often neglecting covariate shifts. They argue that covariate shifts, representing changes in non-semantic features like image style, are often mistakenly considered OOD, undermining machine learning's core goal of generalization.

Novel Contributions

FS-OOD Problem Definition: The paper defines FS-OOD detection as a more comprehensive problem that considers both semantic and covariate shifts. This framework allows for better evaluation of OOD detection methodologies by including covariate-shifted data as in-distribution (ID).
Benchmark Development: Three new benchmarks, DIGITS, OBJECTS, and COVID, are introduced. These benchmarks categorize datasets into training ID, covariate-shifted ID, near-OOD, and far-OOD. This categorization enables detailed analysis of detection algorithms' strengths and weaknesses under diverse conditions.
SEM Score Function: The authors propose SEM, a feature-based semantics score function. SEM distinguishes between semantic and non-semantic information through two probability measures, effectively isolating semantic content. This approach allows for more accurate detection of semantic shifts while maintaining robustness to covariate shifts.

Methodology

The SEM score function operates by manipulating feature statistics from convolutional neural network (CNN) layers. The method leverages high-level features that capture semantic content and low-level statistical features to capture non-semantic styles. By strategically cancelling out the non-semantic component, SEM focuses purely on semantic deviations.

To enhance source-awareness, the authors propose a fine-tuning scheme that concentrates feature statistics for ID data to increase their compactness, while simultaneously differentiating them from OOD data. This process utilizes negative data augmentation strategies to simulate covariate shifts.

Experimental Results

Evaluations on the proposed benchmarks show that SEM significantly outperforms existing state-of-the-art methods, particularly in scenarios involving near-OOD conditions. The research highlights major inadequacies in current OOD detection schemes when they encounter covariate-shifted data, underscoring the necessity of the FS-OOD framework.

Implications and Future Work

The implications of this research are profound, as it redirects focus towards more realistic OOD detection tasks that encompass both semantic and covariate considerations. Practically, this enhances the applicability of OOD systems in real-world scenarios, such as medical diagnostics where data may vary due to different imaging conditions yet belong to the same semantic category.

Theoretically, this work opens new avenues for AI researchers to explore feature disentanglement and distribution modeling, urging a shift from conventional paradigms that overly rely on covariate differences.

Future development in AI could benefit from this work by advancing techniques that optimize semantic generalization without being misled by covariate variations. Researchers may also explore further advancements in designing feature-based score functions that could better incorporate multi-layered semantic analysis, potentially using deeper architectures and more complex datasets.

In conclusion, this paper offers a meticulously crafted framework that challenges existing perceptions in OOD detection and introduces a more holistic approach, pushing the boundaries of machine learning's ability to generalize and ensuring robustness in diverse operational conditions.

PDF Markdown

Related Papers

GitHub

GitHub - Jingkang50/OpenOOD: Benchmarking Generalized Out-of-Distribution Detection (779 stars)

Tweets

https://twitter.com/liuziwei7/status/1668993651723100162

https://twitter.com/liuziwei7/status/1513726070918610944