Category-Extensible Out-of-Distribution Detection via Hierarchical Context Descriptions (2407.16725v2)
Abstract: The key to OOD detection has two aspects: generalized feature representation and precise category description. Recently, vision-LLMs such as CLIP provide significant advances in both two issues, but constructing precise category descriptions is still in its infancy due to the absence of unseen categories. This work introduces two hierarchical contexts, namely perceptual context and spurious context, to carefully describe the precise category boundary through automatic prompt tuning. Specifically, perceptual contexts perceive the inter-category difference (e.g., cats vs apples) for current classification tasks, while spurious contexts further identify spurious (similar but exactly not) OOD samples for every single category (e.g., cats vs panthers, apples vs peaches). The two contexts hierarchically construct the precise description for a certain category, which is, first roughly classifying a sample to the predicted category and then delicately identifying whether it is truly an ID sample or actually OOD. Moreover, the precise descriptions for those categories within the vision-language framework present a novel application: CATegory-EXtensible OOD detection (CATEX). One can efficiently extend the set of recognizable categories by simply merging the hierarchical contexts learned under different sub-task settings. And extensive experiments are conducted to demonstrate CATEX's effectiveness, robustness, and category-extensibility. For instance, CATEX consistently surpasses the rivals by a large margin with several protocols on the challenging ImageNet-1K dataset. In addition, we offer new insights on how to efficiently scale up the prompt engineering in vision-LLMs to recognize thousands of object categories, as well as how to incorporate LLMs (like GPT-3) to boost zero-shot applications. Code is publicly available at https://github.com/alibaba/catex.
- Towards open world recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1893–1902, 2015.
- Token merging: Your ViT but faster. In International Conference on Learning Representations, 2023.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Knowledgeable or educated guess? revisiting language models as knowledge bases. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1860–1874, 2021.
- Deep hybrid models for out-of-distribution detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4733–4743, 2022.
- Adversarial reciprocal points learning for open set recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11):8065–8081, 2021.
- Learning open set network with discriminative reciprocal points. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16, pages 507–522. Springer, 2020.
- Describing textures in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3606–3613, 2014.
- Certified adversarial robustness via randomized smoothing. In international conference on machine learning, pages 1310–1320. PMLR, 2019.
- A continual learning survey: Defying forgetting in classification tasks. IEEE transactions on pattern analysis and machine intelligence, 44(7):3366–3385, 2021.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. IEEE, 2009.
- Extremely simple activation shaping for out-of-distribution detection. In International Conference on Learning Representations, 2023.
- In search of lost domain generalization. In International Conference on Learning Representations, 2023.
- An image is worth 16x16 words: Transformers for image recognition at scale. International Conference on Machine Learning, 2021.
- Vos: Learning what you don’t know by virtual outlier synthesis. In International Conference on Learning Representations, 2022.
- Zero-shot out-of-distribution detection based on the pre-trained model clip. In Proceedings of the AAAI conference on artificial intelligence, volume 36, pages 6568–6576, 2022.
- Is out-of-distribution detection learnable? In Advances in Neural Information Processing Systems, 2022.
- Exploring the limits of out-of-distribution detection. Advances in Neural Information Processing Systems, 34:7068–7081, 2021.
- Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
- The many faces of robustness: A critical analysis of out-of-distribution generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8340–8349, 2021.
- A baseline for detecting misclassified and out-of-distribution examples in neural networks. In International Conference on Learning Representations, 2017.
- Deep anomaly detection with outlier exposure. Proceedings of the International Conference on Learning Representations, 2019.
- Natural adversarial examples. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15262–15271, 2021.
- On the importance of gradients for detecting distributional shifts in the wild. Advances in Neural Information Processing Systems, 34:677–689, 2021.
- Mos: Towards scaling out-of-distribution detection for large semantic space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8710–8719, 2021.
- Ood-maml: Meta-learning for few-shot out-of-distribution detection and classification. Advances in Neural Information Processing Systems, 33:3907–3916, 2020.
- A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2019.
- Fine-tuning can distort pretrained features and underperform out-of-distribution. In International Conference on Learning Representations, 2022.
- Masked vision and language modeling for multi-modal representation learning. In International Conference on Learning Representations, 2023.
- Simple and scalable predictive uncertainty estimation using deep ensembles. Advances in neural information processing systems, 30, 2017.
- A simple unified framework for detecting out-of-distribution samples and adversarial attacks. Advances in neural information processing systems, 31, 2018.
- Enhancing the reliability of out-of-distribution image detection in neural networks. In International Conference on Learning Representations, 2018.
- Energy-based out-of-distribution detection. Advances in neural information processing systems, 33:21464–21475, 2020.
- A simple baseline for bayesian uncertainty in deep learning. Advances in neural information processing systems, 32, 2019.
- Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018.
- Predictive uncertainty estimation via prior networks. Advances in neural information processing systems, 31, 2018.
- Visual classification via description from large language models. Proceedings of the International Conference on Learning Representations, 2023.
- Delving into out-of-distribution detection with vision-language representations. In Advances in Neural Information Processing Systems, 2022.
- Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784, 2014.
- Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 427–436, 2015.
- Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pages 8748–8763. PMLR, 2021.
- Imagenet-21k pretraining for the masses. arXiv preprint arXiv:2104.10972, 2021.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022.
- Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022.
- Ssd: A unified framework for self-supervised outlier detection. In International Conference on Learning Representations, 2021.
- React: Out-of-distribution detection with rectified activations. In Advances in Neural Information Processing Systems, 2021.
- Dice: Leveraging sparsification for out-of-distribution detection. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXIV, pages 691–708. Springer, 2022.
- Out-of-distribution detection with deep nearest neighbors. In International Conference on Machine Learning, pages 20827–20840. PMLR, 2022.
- Non-parametric outlier synthesis. In The Eleventh International Conference on Learning Representations, 2023.
- 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE transactions on pattern analysis and machine intelligence, 30(11):1958–1970, 2008.
- Uncertainty estimation using a single deep deterministic neural network. In International Conference on Machine Learning, pages 9690–9700. PMLR, 2020.
- The inaturalist species classification and detection dataset. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8769–8778, 2018.
- Vladimir Vapnik. The nature of statistical learning theory. Springer science & business media, 1999.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Learning robust global representations by penalizing local predictive power. In Advances in Neural Information Processing Systems, pages 10506–10518, 2019.
- Vim: Out-of-distribution with virtual-logit matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4921–4930, 2022.
- Can multi-label classification networks know what they don’t know? Advances in Neural Information Processing Systems, 34:29074–29087, 2021.
- Partial and asymmetric contrastive learning for out-of-distribution detection in long-tailed recognition. In International Conference on Machine Learning, pages 23446–23458. PMLR, 2022.
- Out-of-distribution detection with implicit outlier transformation. In The Eleventh International Conference on Learning Representations, 2023.
- Dualprompt: Complementary prompting for rehearsal-free continual learning. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXVI, pages 631–648. Springer, 2022.
- Learning to prompt for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 139–149, 2022.
- Mitigating neural network overconfidence with logit normalization. In International Conference on Machine Learning, pages 23631–23644. PMLR, 2022.
- Ross Wightman. Pytorch image models. https://github.com/rwightman/pytorch-image-models, 2019.
- Sun database: Large-scale scene recognition from abbey to zoo. In 2010 IEEE computer society conference on computer vision and pattern recognition, pages 3485–3492. IEEE, 2010.
- Semantically coherent out-of-distribution detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8301–8309, 2021.
- Generalized out-of-distribution detection: A survey. arXiv preprint arXiv:2110.11334, 2021.
- Inpaint anything: Segment anything meets image inpainting. arXiv preprint arXiv:2304.06790, 2023.
- Places: A 10 million image database for scene recognition. IEEE transactions on pattern analysis and machine intelligence, 40(6):1452–1464, 2017.
- Conditional prompt learning for vision-language models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Learning to prompt for vision-language models. International Journal of Computer Vision, 130(9):2337–2348, 2022.
- Toward understanding and boosting adversarial transferability from a distribution perspective. IEEE Transactions on Image Processing, 31:6487–6501, 2022.
- Boosting out-of-distribution detection with typical features. In Advances in Neural Information Processing Systems, 2022.
- Kai Liu (391 papers)
- Zhihang Fu (17 papers)
- Chao Chen (662 papers)
- Sheng Jin (69 papers)
- Ze Chen (38 papers)
- Mingyuan Tao (13 papers)
- Rongxin Jiang (15 papers)
- Jieping Ye (169 papers)