Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE) (2402.10376v2)
Abstract: CLIP embeddings have demonstrated remarkable performance across a wide range of multimodal applications. However, these high-dimensional, dense vector representations are not easily interpretable, limiting our understanding of the rich structure of CLIP and its use in downstream applications that require transparency. In this work, we show that the semantic structure of CLIP's latent space can be leveraged to provide interpretability, allowing for the decomposition of representations into semantic concepts. We formulate this problem as one of sparse recovery and propose a novel method, Sparse Linear Concept Embeddings, for transforming CLIP representations into sparse linear combinations of human-interpretable concepts. Distinct from previous work, SpLiCE is task-agnostic and can be used, without training, to explain and even replace traditional dense CLIP representations, maintaining high downstream performance while significantly improving their interpretability. We also demonstrate significant use cases of SpLiCE representations including detecting spurious correlations and model editing.
- Learning transferable visual models from natural language supervision. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 8748–8763. PMLR, 18–24 Jul 2021.
- Clipcap: Clip prefix for image captioning. arXiv preprint arXiv:2111.09734, 2021.
- Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 1(2):3, 2022.
- Understanding black-box predictions via influence functions. In International conference on machine learning, pages 1885–1894. PMLR, 2017.
- Yoshua Bengio. Deep learning of representations: Looking forward. In International conference on statistical language and speech processing, pages 1–37. Springer, 2013.
- Feature visualization. Distill, 2(11):e7, 2017.
- Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018.
- Disentanglement via latent quantization. arXiv preprint arXiv:2305.18378, 2023.
- Don’t trust your eyes: on the (un) reliability of feature visualizations. arXiv preprint arXiv:2306.04719, 2023.
- Linguistic regularities in continuous space word representations. In Proceedings of the 2013 conference of the north american chapter of the association for computational linguistics: Human language technologies, pages 746–751, 2013.
- The linear representation hypothesis and the geometry of large language models. arXiv preprint arXiv:2311.03658, 2023.
- Linear algebraic structure of word senses, with applications to polysemy. Transactions of the Association for Computational Linguistics, 6:483–495, 2018.
- A latent variable model approach to pmi-based word embeddings. Transactions of the Association for Computational Linguistics, 4:385–399, 2016.
- Sparse overcomplete word vector representations. arXiv preprint arXiv:1506.02004, 2015.
- When and why vision-language models behave like bags-of-words, and what to do about it? In The Eleventh International Conference on Learning Representations, 2022a.
- Linearly mapping from image to text space. arXiv preprint arXiv:2209.15162, 2022.
- Dear: Debiasing vision-language models with additive residuals. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6820–6829, 2023.
- Concept bottleneck models. In International conference on machine learning, pages 5338–5348. PMLR, 2020.
- Learning to detect unseen object classes by between-class attribute transfer. In 2009 IEEE conference on computer vision and pattern recognition, pages 951–958. IEEE, 2009.
- Efficient object category recognition using classemes. In Computer Vision–ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part I 11, pages 776–789. Springer, 2010.
- Attribute and simile classifiers for face verification. In 2009 IEEE 12th international conference on computer vision, pages 365–372. IEEE, 2009.
- Do concept bottleneck models learn as intended? arXiv preprint arXiv:2105.04289, 2021.
- Promises and pitfalls of black-box concept learning models. arXiv preprint arXiv:2106.13314, 2021.
- Zoom in: An introduction to circuits. Distill, 5(3):e00024–001, 2020.
- Network dissection: Quantifying interpretability of deep visual representations. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6541–6549, 2017.
- Net2vec: Quantifying and explaining how concepts are encoded by filters in deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8730–8738, 2018.
- Acquisition of chess knowledge in alphazero. Proceedings of the National Academy of Sciences, 119(47):e2206625119, 2022.
- On interpretability of deep learning based skin lesion classifiers using concept activation vectors. In 2020 international joint conference on neural networks (IJCNN), pages 1–10. IEEE, 2020.
- Interpretable basis decomposition for visual explanation. In Proceedings of the European Conference on Computer Vision (ECCV), pages 119–134, 2018.
- A holistic approach to unifying automatic concept extraction and concept importance estimation. arXiv preprint arXiv:2306.07304, 2023.
- Towards monosemanticity: Decomposing language models with dictionary learning. Transformer Circuits Thread, page 2, 2023.
- Isolating sources of disentanglement in variational autoencoders. Advances in neural information processing systems, 31, 2018.
- Pierre Comon. Independent component analysis, a new concept? Signal processing, 36(3):287–314, 1994.
- Independent component analysis: algorithms and applications. Neural networks, 13(4-5):411–430, 2000.
- Is this the subspace you are looking for? an interpretability illusion for subspace activation patching. arXiv preprint arXiv:2311.17030, 2023.
- Text-to-concept (and back) via cross-model alignment. In International Conference on Machine Learning, pages 25037–25060. PMLR, 2023.
- Post-hoc concept bottleneck models. arXiv preprint arXiv:2205.15480, 2022b.
- Do vision-language pretrained models learn composable primitive concepts? arXiv preprint arXiv:2203.17271, 2022.
- Stair: Learning sparse text and image representation in grounded tokens. arXiv preprint arXiv:2301.13081, 2023.
- Interpreting clip’s image representation via text-based decomposition. arXiv preprint arXiv:2310.05916, 2023.
- Information maximization perspective of orthogonal matching pursuit with applications to explainable ai. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- Geoffrey Hinton. Taking inverse graphics seriously, 2013.
- Vision-as-inverse-graphics: Obtaining a rich 3d explanation of a scene from a single image. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pages 851–859, 2017.
- Overlooked factors in concept-based explanations: Dataset choice, concept salience, and human capability. arXiv preprint arXiv:2207.09615, 2022.
- Laion-400m: Open dataset of clip-filtered 400 million image-text pairs. arXiv preprint arXiv:2111.02114, 2021.
- Mind the gap: Understanding the modality gap in multi-modal contrastive representation learning. Advances in Neural Information Processing Systems, 35:17612–17625, 2022.
- RZdunek ACichocki et al. Nonnegativematrixandtensor factorizations: Applicationstoexploratorymulti-waydata analysisandblindsourceseparation, 2009.
- Learning effective and interpretable semantic models using non-negative sparse embedding. In Proceedings of COLING 2012, pages 1933–1950, 2012.
- Interpretable semantic vectors from a joint model of brain-and text-based meaning. In Proceedings of the conference. Association for Computational Linguistics. Meeting, volume 2014, page 489. NIH Public Access, 2014.
- A compositional and interpretable semantic space. In Proceedings of the 2015 conference of the north american chapter of the association for computational linguistics: Human language technologies, pages 32–41, 2015.
- Zachary C. Lipton. The mythos of model interpretability, 2017.
- ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016.
- Openclip. July 2021. doi: 10.5281/zenodo.5143773. URL https://doi.org/10.5281/zenodo.5143773. If you use this software, please cite it as below.
- Learning multiple layers of features from tiny images. 2009.
- Discovering states and transformations in image collections. In CVPR, 2015.
- Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), December 2015.
- Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer, 2014.
- ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09, 2009.
- Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
- A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences, 2(1):183–202, 2009.
- Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine learning, 3(1):1–122, 2011.
- Usha Bhalla (8 papers)
- Alex Oesterling (10 papers)
- Suraj Srinivas (28 papers)
- Flavio P. Calmon (56 papers)
- Himabindu Lakkaraju (88 papers)