Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
51 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Concept-Based Explainability Framework for Large Multimodal Models (2406.08074v3)

Published 12 Jun 2024 in cs.LG, cs.AI, cs.CL, and cs.CV

Abstract: Large multimodal models (LMMs) combine unimodal encoders and LLMs to perform multimodal tasks. Despite recent advancements towards the interpretability of these models, understanding internal representations of LMMs remains largely a mystery. In this paper, we present a novel framework for the interpretation of LMMs. We propose a dictionary learning based approach, applied to the representation of tokens. The elements of the learned dictionary correspond to our proposed concepts. We show that these concepts are well semantically grounded in both vision and text. Thus we refer to these as ``multi-modal concepts''. We qualitatively and quantitatively evaluate the results of the learnt concepts. We show that the extracted multimodal concepts are useful to interpret representations of test samples. Finally, we evaluate the disentanglement between different concepts and the quality of grounding concepts visually and textually. Our code is publicly available at https://github.com/mshukor/xl-vlms

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Jayneel Parekh (9 papers)
  2. Pegah Khayatan (3 papers)
  3. Mustafa Shukor (27 papers)
  4. Alasdair Newson (16 papers)
  5. Matthieu Cord (129 papers)
Citations (7)