Sparse Concept Bottleneck Models: Gumbel Tricks in Contrastive Learning (2404.03323v1)
Abstract: We propose a novel architecture and method of explainable classification with Concept Bottleneck Models (CBMs). While SOTA approaches to Image Classification task work as a black box, there is a growing demand for models that would provide interpreted results. Such a models often learn to predict the distribution over class labels using additional description of this target instances, called concepts. However, existing Bottleneck methods have a number of limitations: their accuracy is lower than that of a standard model and CBMs require an additional set of concepts to leverage. We provide a framework for creating Concept Bottleneck Model from pre-trained multi-modal encoder and new CLIP-like architectures. By introducing a new type of layers known as Concept Bottleneck Layers, we outline three methods for training them: with $\ell_1$-loss, contrastive loss and loss function based on Gumbel-Softmax distribution (Sparse-CBM), while final FC layer is still trained with Cross-Entropy. We show a significant increase in accuracy using sparse hidden layers in CLIP-based bottleneck models. Which means that sparse representation of concepts activation vector is meaningful in Concept Bottleneck Models. Moreover, with our Concept Matrix Search algorithm we can improve CLIP predictions on complex datasets without any additional training or fine-tuning. The code is available at: https://github.com/Andron00e/SparseCBM.
- Long-tailed instance segmentation using gumbel optimized loss, 2022.
- Contrastive classification and representation learning with probabilistic interpretation, 2022.
- Cross-modal conceptualization in bottleneck models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2023. doi: 10.18653/v1/2023.emnlp-main.318. URL http://dx.doi.org/10.18653/v1/2023.emnlp-main.318.
- Language models are few-shot learners, 2020.
- Interactive concept bottleneck models, 2023.
- Sharegpt4v: Improving large multi-modal models with better captions, 2023a.
- Disco-clip: A distributed contrastive loss for memory efficient clip training, 2023b.
- Learning a similarity metric discriminatively, with application to face verification. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), volume 1, pp. 539–546 vol. 1, 2005. doi: 10.1109/CVPR.2005.202.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255, 2009. doi: 10.1109/CVPR.2009.5206848.
- An image is worth 16x16 words: Transformers for image recognition at scale, 2021.
- Cert: Contrastive self-supervised learning for language understanding, 2020.
- Simcse: Simple contrastive learning of sentence embeddings, 2022a.
- Pyramidclip: Hierarchical feature alignment for vision-language model pretraining, 2022b.
- Gumbel, E. J. Statistical theory of extreme values and some practical applications : A series of lectures. 1954. URL https://api.semanticscholar.org/CorpusID:125881359.
- Deep residual learning for image recognition, 2015.
- Categorical reparameterization with gumbel-softmax, 2017.
- Scaling up visual and vision-language representation learning with noisy text supervision, 2021.
- Clip-qda: An explainable concept bottleneck model, 2023.
- Supervised contrastive learning, 2021.
- Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav), 2018.
- Adam: A method for stochastic optimization, 2017.
- Kobayashi, S. Contextual augmentation: Data augmentation by words with paradigmatic relations, 2018.
- Concept bottleneck models, 2020.
- Krizhevsky, A. Learning multiple layers of features from tiny images. pp. 32–33, 2009. URL https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf.
- Convolutional networks and applications in vision. In Proceedings of 2010 IEEE International Symposium on Circuits and Systems, pp. 253–256, 2010. doi: 10.1109/ISCAS.2010.5537907.
- Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation, 2022.
- Interpretability beyond classification output: Semantic bottleneck networks, 2019.
- Decoupled weight decay regularization, 2019.
- A* sampling, 2015.
- The concrete distribution: A continuous relaxation of discrete random variables. CoRR, abs/1611.00712, 2016. URL http://arxiv.org/abs/1611.00712.
- Contextual semantic interpretability. In Proceedings of the Asian Conference on Computer Vision (ACCV), November 2020.
- Visual classification via description from large language models. ICLR, 2023.
- Clip-dissect: Automatic description of neuron representations in deep vision networks, 2023.
- Label-free concept bottleneck models. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=FlCg47MNvBA.
- Discover: Making vision networks interpretable via competition and dissection, 2023.
- Nonparametric bayesian deep networks with local competition, 2019.
- Pytorch: An imperative style, high-performance deep learning library, 2019.
- Regularizing neural networks by penalizing confident output distributions, 2017.
- Learning transferable visual models from natural language supervision, 2021.
- A generalist agent, 2022.
- Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 11 2019. URL https://arxiv.org/abs/1908.10084.
- Making monolingual sentence embeddings multilingual using knowledge distillation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 11 2020. URL https://arxiv.org/abs/2004.09813.
- Imagenet large scale visual recognition challenge, 2015.
- Schwalbe, G. Concept embedding analysis: A review, 2022.
- A simple but tough-to-beat data augmentation approach for natural language understanding and generation, 2020.
- Conceptnet 5: A large semantic network for relational knowledge. The people’s web meets NLP, theory and applications of natural language processing, pp. 161–176, 02 2013. doi: 10.1007/978-3-642-35085-6˙6.
- Curl: Contrastive unsupervised representations for reinforcement learning, 2020.
- Revisiting unreasonable effectiveness of data in deep learning era, 2017.
- Rethinking the inception architecture for computer vision, 2015.
- Llama: Open and efficient foundation language models, 2023.
- Visualizing data using t-sne. Journal of Machine Learning Research, 9(86):2579–2605, 2008. URL http://jmlr.org/papers/v9/vandermaaten08a.html.
- The Caltech-UCSD Birds-200-2011 Dataset. Jul 2011.
- Eda: Easy data augmentation techniques for boosting performance on text classification tasks, 2019.
- Weng, L. Contrastive representation learning. lilianweng.github.io, May 2021. URL https://lilianweng.github.io/posts/2021-05-31-contrastive/.
- Huggingface’s transformers: State-of-the-art natural language processing, 2020.
- Leveraging sparse linear layers for debuggable deep networks. In Meila, M. and Zhang, T. (eds.), Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp. 11205–11216. PMLR, 18–24 Jul 2021. URL https://proceedings.mlr.press/v139/wong21b.html.
- Language in a bottle: Language model guided concept bottlenecks for interpretable image classification, 2023.
- Noisynn: Exploring the influence of information entropy change in learning systems, 2023.
- Post-hoc concept bottleneck models, 2023.
- Lit: Zero-shot transfer with locked-image text tuning, 2022.
- Sigmoid loss for language image pre-training, 2023.
- Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(6):1452–1464, 2018. doi: 10.1109/TPAMI.2017.2723009.
- Andrei Semenov (13 papers)
- Vladimir Ivanov (28 papers)
- Aleksandr Beznosikov (68 papers)
- Alexander Gasnikov (251 papers)