Image-Object-Specific Prompt Learning for Few-Shot Class-Incremental Learning (2309.02833v2)
Abstract: While many FSCIL studies have been undertaken, achieving satisfactory performance, especially during incremental sessions, has remained challenging. One prominent challenge is that the encoder, trained with an ample base session training set, often underperforms in incremental sessions. In this study, we introduce a novel training framework for FSCIL, capitalizing on the generalizability of the Contrastive Language-Image Pre-training (CLIP) model to unseen classes. We achieve this by formulating image-object-specific (IOS) classifiers for the input images. Here, an IOS classifier refers to one that targets specific attributes (like wings or wheels) of class objects rather than the image's background. To create these IOS classifiers, we encode a bias prompt into the classifiers using our specially designed module, which harnesses key-prompt pairs to pinpoint the IOS features of classes in each session. From an FSCIL standpoint, our framework is structured to retain previous knowledge and swiftly adapt to new sessions without forgetting or overfitting. This considers the updatability of modules in each session and some tricks empirically found for fast convergence. Our approach consistently demonstrates superior performance compared to state-of-the-art methods across the miniImageNet, CIFAR100, and CUB200 datasets. Further, we provide additional experiments to validate our learned model's ability to achieve IOS classifiers. We also conduct ablation studies to analyze the impact of each module within the architecture.
- Generic attention-model explainability for interpreting bi-modal and encoder-decoder transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 397–406, 2021.
- Incremental few-shot learning via vector quantization in deep embedded space. In International Conference on Learning Representations, 2020.
- Semantic-aware knowledge distillation for few-shot class-incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2534–2543, 2021.
- Multimodal parameter-efficient few-shot class incremental learning. arXiv preprint arXiv:2303.04751, 2023.
- Few-shot class-incremental learning via relation knowledge distillation. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 1255–1263, 2021.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Contextual prompt learning for vision-language understanding. arXiv preprint arXiv:2307.00910, 2023.
- Constrained few-shot class-incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9057–9067, 2022.
- Memorizing complementation network for few-shot class-incremental learning. arXiv preprint arXiv:2208.05610, 2022.
- Introducing language guidance in prompt-based continual learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11463–11473, 2023.
- Cifar-10, cifar-100 (canadian institute for advanced research), 2019.
- Multi-view class incremental learning. arXiv preprint arXiv:2306.09675, 2023.
- Few-shot class-incremental learning via entropy-regularized data-free replay. arXiv preprint arXiv:2207.11213, 2022.
- Mudpt: Multi-modal deep-symphysis prompt tuning for large pre-trained vision-language models. arXiv preprint arXiv:2306.11400, 2023.
- Few-shot class-incremental learning from an open-set perspective. In European Conference on Computer Vision, pages 382–397. Springer, 2022.
- Semantic-visual guided transformer for few-shot class-incremental learning. arXiv preprint arXiv:2303.15494, 2023.
- Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pages 8748–8763. PMLR, 2021.
- Sebastian Ruder. An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747, 2016.
- Imagenet large scale visual recognition challenge. International journal of computer vision, 115(3):211–252, 2015.
- Learning with fantasy: Semantic-aware virtual contrastive constraint for few-shot class-incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24183–24192, 2023.
- When prompt-based incremental learning does not meet strong pretraining. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1706–1716, 2023.
- Few-shot class-incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12183–12192, 2020.
- Matching networks for one shot learning. Advances in neural information processing systems, 29:3630–3638, 2016.
- The caltech-ucsd birds-200-2011 dataset. 2011.
- Attriclip: A non-incremental learner for incremental knowledge learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3654–3663, 2023.
- Der: Dynamically expandable representation for class incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3014–3023, 2021.
- Balanced supervised contrastive learning for few-shot class-incremental learning. arXiv preprint arXiv:2305.16687, 2023.
- Few-shot incremental learning with continually evolved classifiers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12455–12464, 2021.
- Forward compatible few-shot class-incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9046–9056, 2022a.
- Few-shot class-incremental learning by sampling multi-phase tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022b.
- Learning without forgetting for vision-language models. arXiv preprint arXiv:2305.19270, 2023.
- Conditional prompt learning for vision-language models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16816–16825, 2022c.
- Learning to prompt for vision-language models. International Journal of Computer Vision, 130(9):2337–2348, 2022d.
- Margin-based few-shot class-incremental learning with class-level overfitting mitigation. arXiv preprint arXiv:2210.04524, 2022.
- In-Ug Yoon (3 papers)
- Tae-Min Choi (5 papers)
- Sun-Kyung Lee (4 papers)
- Young-Min Kim (37 papers)
- Jong-Hwan Kim (25 papers)