Interpreting Neurons in Deep Vision Networks with Language Models (2403.13771v2)
Abstract: In this paper, we propose Describe-and-Dissect (DnD), a novel method to describe the roles of hidden neurons in vision networks. DnD utilizes recent advancements in multimodal deep learning to produce complex natural language descriptions, without the need for labeled training data or a predefined set of concepts to choose from. Additionally, DnD is training-free, meaning we don't train any new models and can easily leverage more capable general purpose models in the future. We have conducted extensive qualitative and quantitative analysis to show that DnD outperforms prior work by providing higher quality neuron descriptions. Specifically, our method on average provides the highest quality labels and is more than 2$\times$ as likely to be selected as the best explanation for a neuron than the best baseline. Finally, we present a use case providing critical insights into land cover prediction models for sustainability applications. Our code and data are available at https://github.com/Trustworthy-ML-Lab/Describe-and-Dissect.
- Network dissection: Quantifying interpretability of deep visual representations. Computer Vision and Pattern Recognition, 2017.
- Understanding the role of individual units in a deep neural network. Proceedings of the National Academy of Sciences, 117(48):30071–30078, 2020.
- Towards monosemanticity: Decomposing language models with dictionary learning. Transformer Circuits Thread, 2023. https://transformer-circuits.pub/2023/monosemantic-features/index.html.
- Visualizing higher-layer features of a deep network. 2009.
- Multimodal neurons in artificial neural networks. Distill, 6(3):e30, 2021.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
- Natural language descriptions of deep visual features. International Conference on Learning Representations, 2022.
- Identifying interpretable subspaces in image representations, 2023.
- Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation, 2022.
- Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597, 2023.
- Compositional explanations of neurons. Advances in Neural Information Processing Systems, 33:17153–17163, 2020.
- Clip-dissect: Automatic description of neuron representations in deep vision networks. International Conference on Learning Representations, 2023.
- Zoom in: An introduction to circuits. Distill, 5(3):e00024–001, 2020.
- Otsu, N. A threshold selection method from gray-level histograms. IEEE Transactions on Systems, Man, and Cybernetics, 9(1):62–66, 1979. doi: 10.1109/TSMC.1979.4310076.
- Learning transferable visual models from natural language supervision, 2021.
- High-resolution image synthesis with latent diffusion models, 2022.
- Imagenet large scale visual recognition challenge. International journal of computer vision, 115:211–252, 2015.
- Polysemanticity and capacity in neural networks, 2023.
- Laion-5b: An open large-scale dataset for training next generation image-text models, 2022.
- Object detectors emerge in deep scene cnns. arXiv preprint arXiv:1412.6856, 2014.
- Places: An image database for deep scene understanding, 2016.