Zero-Shot Out-of-Distribution Detection Based on the Pre-trained Model CLIP (2109.02748v3)

Published 6 Sep 2021 in cs.CV and cs.LG

Abstract: In an out-of-distribution (OOD) detection problem, samples of known classes(also called in-distribution classes) are used to train a special classifier. In testing, the classifier can (1) classify the test samples of known classes to their respective classes and also (2) detect samples that do not belong to any of the known classes (i.e., they belong to some unknown or OOD classes). This paper studies the problem of zero-shot out-of-distribution(OOD) detection, which still performs the same two tasks in testing but has no training except using the given known class names. This paper proposes a novel yet simple method (called ZOC) to solve the problem. ZOC builds on top of the recent advances in zero-shot classification through multi-modal representation learning. It first extends the pre-trained language-vision model CLIP by training a text-based image description generator on top of CLIP. In testing, it uses the extended model to generate candidate unknown class names for each test sample and computes a confidence score based on both the known class names and candidate unknown class names for zero-shot OOD detection. Experimental results on 5 benchmark datasets for OOD detection demonstrate that ZOC outperforms the baselines by a large margin.

Citations (127)

View on Semantic Scholar

Summary

The paper presents ZOC, a method that extends CLIP for zero-shot OOD detection by generating candidate unknown class descriptors.
It employs a two-step inference strategy that first generates textual labels from image features and then computes semantic similarity for scoring.
Experimental results on CIFAR10, CIFAR100, and TinyImagenet show that ZOC outperforms several supervised baselines in identifying unseen classes.

Zero-shot Open-set Detection

The paper "Zero-Shot Out-of-Distribution Detection Based on the Pre-trained Model CLIP" presents a novel approach to address the intrinsic complexity involved in out-of-distribution (OOD) detection, particularly focusing on a zero-shot learning framework using the CLIP model. Traditional supervised classifiers work under a closed-world assumption where the test samples belong to the classes seen during training. However, real-world applications often require handling unknown or unseen classes, making OOD detection an essential task for deploying machine learning models in dynamic environments.

The proposed method, termed ZOC (Zero-shot OOD Detection based on CLIP), leverages the advanced language-vision capabilities of CLIP by extending it to perform OOD detection without the necessity for on-hand training data specifically curated for this purpose. ZOC epitomizes simplicity yet introduces an efficient strategy for OOD detection by employing a multi-modal representation learning technique to generate descriptive text for candidate unknown classes.

Methodological Approach

ZOC builds upon CLIP, a pre-trained language-vision model by OpenAI that uses raw text to learn visual representations in a contrastive manner. CLIP's architecture comprises an image encoder and a text encoder, facilitating the mapping of textual descriptions to image features. However, CLIP in its original form is limited to closed-world classification, necessitating the proposed augmentation for OOD tasks.

Key innovations in ZOC include:

Training a Textual Description Generator: By utilizing CLIP's image encoder outputs, the authors trained a text generator to produce potential classification labels, which are used as candidate unknown class descriptors. This approach uniquely circumvents the need for labeled training data for unseen classes, relying instead on leveraging descriptions generated from large-scale captioning datasets like MS-COCO.
Inference Strategy: The ZOC method employs a two-step inference process. First, the image description generator produces candidate labels. Subsequently, these labels, combined with known class labels, are encoded, and their semantic similarity to the test image is computed. An OOD confidence score is then derived by comparing the accumulative match score for candidate unknown labels versus known labels.

Experimental Evaluations

The efficacy of ZOC is validated across several OOD detection benchmarks, including CIFAR10, CIFAR100, and TinyImagenet. The results demonstrate that ZOC achieves superior performance over several fully supervised baselines, including those employing discriminative and generative modeling techniques like DOC, OpenMax, and CSI.

Furthermore, ZOC's performance was juxtaposed with recently adapted methods using the CLIP backbone, such as CLIP+MSP and CLIP+CAC, reinforcing its advantage in correctly discerning OOD samples even when leveraging the same pre-trained model architectures.

Implications and Future Directions

This work introduces an effective strategy for addressing zero-shot OOD detection by proficiently integrating generative techniques with the robust language-vision capabilities of a pre-trained model like CLIP. The implications of such a methodology are significant, offering a framework where intelligent models can operate in open-world environments effectively identifying samples from unseen classes without prior training data for these unknown classes.

The results encourage further exploration into enhancing the diversity and relevance of the textual descriptions generated, potentially through larger and more comprehensive training corpuses. Additionally, recognizing relational contexts among candidate labels could enrich ZOC's semantic understanding and improve detection performance for intricate OOD scenarios.

In summary, the paper offers a compelling approach to OOD detection within the zero-shot learning paradigm, marking a crucial step in adaptive and generalized AI deployments. Future endeavors may focus on expanding ZOC's applicability across broader domains and refining its processing pipelines to accommodate a wider array of unseen conditions.

PDF Markdown

Related Papers

GitHub

GitHub - sesmae/ZOC: This repository is the official implementation of the aaai2022 paper "Zero-Shot Out-of-Distribution Detection Based on the Pre-trained Model CLIP" (20 stars)

YouTube

Show All Videos