Learning Transferable Negative Prompts for Out-of-Distribution Detection (2404.03248v1)
Abstract: Existing prompt learning methods have shown certain capabilities in Out-of-Distribution (OOD) detection, but the lack of OOD images in the target dataset in their training can lead to mismatches between OOD images and In-Distribution (ID) categories, resulting in a high false positive rate. To address this issue, we introduce a novel OOD detection method, named 'NegPrompt', to learn a set of negative prompts, each representing a negative connotation of a given class label, for delineating the boundaries between ID and OOD images. It learns such negative prompts with ID data only, without any reliance on external outlier data. Further, current methods assume the availability of samples of all ID classes, rendering them ineffective in open-vocabulary learning scenarios where the inference stage can contain novel ID classes not present during training. In contrast, our learned negative prompts are transferable to novel class labels. Experiments on various ImageNet benchmarks show that NegPrompt surpasses state-of-the-art prompt-learning-based OOD detection methods and maintains a consistent lead in hard OOD detection in closed- and open-vocabulary classification scenarios. Code is available at https://github.com/mala-lab/negprompt.
- Flamingo: a visual language model for few-shot learning. Advances in Neural Information Processing Systems, 35:23716–23736, 2022.
- Describing textures in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3606–3613, 2014.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Zero-shot out-of-distribution detection based on the pre-trained model clip. In Proceedings of the AAAI conference on artificial intelligence, pages 6568–6576, 2022.
- Simple image-level classification improves open-vocabulary object detection. arXiv preprint arXiv:2312.10439, 2023.
- Clip-adapter: Better vision-language models with feature adapters. International Journal of Computer Vision, pages 1–15, 2023.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- A baseline for detecting misclassified and out-of-distribution examples in neural networks. In International Conference on Learning Representations, 2017.
- Deep anomaly detection with outlier exposure. arXiv preprint arXiv:1812.04606, 2018.
- Scaling out-of-distribution detection for real-world settings. arXiv preprint arXiv:1911.11132, 2019.
- Dualcoop++: Fast and effective adaptation to multi-label recognition with limited annotations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
- Mos: Towards scaling out-of-distribution detection for large semantic space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8710–8719, 2021.
- On the importance of gradients for detecting distributional shifts in the wild. Advances in Neural Information Processing Systems, 34:677–689, 2021.
- Openclip, 2021.
- Scaling up visual and vision-language representation learning with noisy text supervision. In International conference on machine learning, pages 4904–4916, 2021.
- Visual prompt tuning. In European Conference on Computer Vision, pages 709–727. Springer, 2022.
- How can we know what language models know? Transactions of the Association for Computational Linguistics, 8:423–438, 2020.
- Maple: Multi-modal prompt learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19113–19122, 2023.
- Supervised contrastive learning. Advances in neural information processing systems, 33:18661–18673, 2020.
- Opengan: Open-set recognition via open data generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 813–822, 2021.
- Tiny imagenet visual recognition challenge. CS 231N, 7(7):3, 2015.
- Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In International Conference on Machine Learning, pages 12888–12900. PMLR, 2022.
- Learning adversarial semantic embeddings for zero-shot recognition in open worlds. Pattern Recognition, 149:110258, 2024.
- Enhancing the reliability of out-of-distribution image detection in neural networks. In International Conference on Learning Representations, 2018.
- Energy-based out-of-distribution detection. Advances in neural information processing systems, 33:21464–21475, 2020.
- Residual pattern learning for pixel-wise out-of-distribution detection in semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1151–1161, 2023.
- Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021.
- Out-of-distribution detection in long-tailed recognition with calibrated outlier class learning. arXiv preprint arXiv:2312.10686, 2023.
- Simple open-vocabulary object detection. In European Conference on Computer Vision, pages 728–755. Springer, 2022.
- Delving into out-of-distribution detection with vision-language representations. Advances in Neural Information Processing Systems, 35:35087–35102, 2022.
- Locoop: Few-shot out-of-distribution detection via prompt learning. In Thirty-Seventh Conference on Neural Information Processing Systems, 2023.
- Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In CVPR, pages 427–436, 2015.
- Large-scale open-set classification protocols for imagenet. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 42–51, 2023.
- Improving language understanding by generative pre-training. 2018.
- Learning transferable visual models from natural language supervision, 2021.
- Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems, 35:25278–25294, 2022a.
- LAION-5b: An open large-scale dataset for training next generation image-text models. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022b.
- Autoprompt: Eliciting knowledge from language models with automatically generated prompts. arXiv preprint arXiv:2010.15980, 2020.
- Dualcoop: Fast adaptation to multi-label recognition with limited annotations. Advances in Neural Information Processing Systems, 35:30569–30582, 2022.
- React: Out-of-distribution detection with rectified activations. Advances in Neural Information Processing Systems, 34:144–157, 2021.
- Pixel-wise energy-biased abstention learning for anomaly segmentation on complex urban driving scenes. In European Conference on Computer Vision, pages 246–263. Springer, 2022.
- The inaturalist species classification and detection dataset. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8769–8778, 2018.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Vim: Out-of-distribution with virtual-logit matching. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4911–4920, 2022.
- Clipn for zero-shot ood detection: Teaching clip to say no. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1802–1812, 2023.
- Vadclip: Adapting vision-language models for weakly supervised video anomaly detection. arXiv preprint arXiv:2308.11681, 2023.
- Sun database: Large-scale scene recognition from abbey to zoo. In 2010 IEEE computer society conference on computer vision and pattern recognition, pages 3485–3492. IEEE, 2010.
- Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3836–3847, 2023.
- Factual probing is [mask]: Learning vs. learning to recall. arXiv preprint arXiv:2104.05240, 2021.
- Places: A 10 million image database for scene recognition. IEEE transactions on pattern analysis and machine intelligence, 40(6):1452–1464, 2017.
- Learning placeholders for open-set recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2021.
- Conditional prompt learning for vision-language models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16816–16825, 2022a.
- Learning to prompt for vision-language models. International Journal of Computer Vision, 130(9):2337–2348, 2022b.
- Anomalyclip: Object-agnostic prompt learning for zero-shot anomaly detection. arXiv preprint arXiv:2310.18961, 2023.
- Toward generalist anomaly detection via in-context residual learning with few-shot sample prompts. arXiv preprint arXiv:2403.06495, 2024.
- Tianqi Li (26 papers)
- Guansong Pang (82 papers)
- Xiao Bai (34 papers)
- Wenjun Miao (4 papers)
- Jin Zheng (24 papers)