Concepts from Representations: Post-hoc Concept Bottleneck Models via Sparse Decomposition of Visual Representations

Published 18 Jan 2026 in cs.CV | (2601.12303v1)

Abstract: Deep learning has achieved remarkable success in image recognition, yet their inherent opacity poses challenges for deployment in critical domains. Concept-based interpretations aim to address this by explaining model reasoning through human-understandable concepts. However, existing post-hoc methods and ante-hoc concept bottleneck models (CBMs), suffer from limitations such as unreliable concept relevance, non-visual or labor-intensive concept definitions, and model or data-agnostic assumptions. This paper introduces Post-hoc Concept Bottleneck Model via Representation Decomposition (PCBM-ReD), a novel pipeline that retrofits interpretability onto pretrained opaque models. PCBM-ReD automatically extracts visual concepts from a pre-trained encoder, employs multimodal LLMs (MLLMs) to label and filter concepts based on visual identifiability and task relevance, and selects an independent subset via reconstruction-guided optimization. Leveraging CLIP's visual-text alignment, it decomposes image representations into linear combination of concept embeddings to fit into the CBMs abstraction. Extensive experiments across 11 image classification tasks show PCBM-ReD achieves state-of-the-art accuracy, narrows the performance gap with end-to-end models, and exhibits better interpretability.

Abstract PDF Upgrade to Chat

Summary

The paper presents PCBM-ReD, which extracts and filters visual concepts via sparse autoencoders to achieve high-fidelity interpretations with minimal accuracy loss.
It leverages multimodal LLMs for semantic labeling and reconstruction-guided selection, ensuring concept independence and causal interpretability.
Empirical results across 11 benchmarks demonstrate near end-to-end performance and robust zero/few-shot generalization across diverse visual tasks.

Post-hoc Concept Bottleneck Models via Sparse Decomposition of Visual Representations

Introduction

The paper "Concepts from Representations: Post-hoc Concept Bottleneck Models via Sparse Decomposition of Visual Representations" (2601.12303) presents PCBM-ReD, a framework that integrates concept-based interpretability with the representational capacity of state-of-the-art pretrained vision encoders. By extracting, filtering, and selecting visual concepts directly from large-scale model representations, and mapping these to human-understandable descriptors via multimodal LLMs, PCBM-ReD establishes a route to interpretable, high-fidelity image classification that retains strong prediction accuracy on a diverse set of challenging tasks.

Pipeline Overview and Technical Framework

PCBM-ReD is structured in four major steps: concept mining, semantic labeling and selection, sparse decomposition, and predictive modeling within the bottleneck abstraction.

Figure 1: Overview of PCBM-ReD: concept extraction from image representations, ranking with MLLMs, selection for a bottleneck via reconstruction, sparse decomposition, and linear target prediction.

Concept Extraction: Visual embeddings from a pretrained image encoder (e.g., CLIP) are factorized using a sparse autoencoder (SAE), yielding a bank of disentangled, interpretable basis vectors (“concepts”). This data-driven approach, in contrast to previous hand-crafted or purely LLM-generated concepts, aligns the bottleneck with the intrinsic factors of variation captured by the model.
Semantic Labeling and Scoring: For each concept vector, top-activated images are retrieved; MLLMs (Llama-3.2-11B-Vision-Instruct, DeepSeek-V3) are prompted to generate detailed per-image descriptions, which are then summarized and filtered by LLMs for visual discriminativeness, identifiability, and lack of shortcut features. Only concepts with high semantic quality are retained.
Reconstruction-guided Concept Selection: From the filtered set, a greedy unsupervised algorithm selects concepts that minimize reconstruction error of image representations under a constrained linear span, guaranteeing near-complete coverage of the latent space while enforcing independence and avoiding redundancy.
Sparse Decomposition and Predictive Modeling: For each image, its embedding is decomposed into a sparse linear combination of the selected concepts (via OMP or similar algorithms). The resulting concept activations serve as direct CBM bottleneck inputs for a learned linear classifier. Label prediction thus flows transparently from concept activations, supporting causal interpretability and faithful explanation.
Figure 2: PCBM-ReD utilizes concepts extracted from the encoder to reconstruct the visual space, connecting interpretability with the model’s original expressiveness.

This decomposition is made possible—and remains high-fidelity—thanks to the semantic alignment in vision-LLMs like CLIP, where language and vision spaces are structurally compatible.

Empirical Results and Analytical Findings

The evaluation of PCBM-ReD spans 11 benchmarks covering generic, fine-grained, texture, action, medical, and satellite imagery. The experimental protocol includes fully-supervised, zero-shot, and few-shot setups, with comparisons against both linear probes (end-to-end models) and recent CBM variants (LaBo, Res-CBM, V2C-CBM, etc.).

Key results include:

Accuracy: The mean test accuracy gap between PCBM-ReD and a supervised linear probe is only 0.41%, and PCBM-ReD outperforms leading language-guided CBM baselines by significant margins (1.25% over LaBo; 5.57% over label-free CBMs). Performance saturates with approximately 300 concepts, but even 50 concepts yield acceptable results, highlighting efficiency.
Zero/Few-Shot Generalization: The sparse reconstruction means zero-shot performance closely tracks that of the underlying CLIP backbone, preserving generalization even with limited labels. PCBM-ReD consistently outperforms prior CBMs in the few-shot regime (e.g., a 5.01% improvement over LaBo).
Figure 3: Few-shot accuracy across 11 datasets shows superior average generalization for PCBM-ReD compared to baselines.

Figure 4: Detailed comparison with LaBo in few-shot learning; PCBM-ReD maintains a consistent lead as labeled data increases.
Interpretability and Explanation Quality: Visual and human studies were conducted to assess whether top concepts (those with highest causal contribution to predictions) are human-identifiable and genuinely explanatory.
Figure 5: Example explanations with top concepts that drive image classifications; concepts are visually grounded and task-specific.

Figure 6: Human evaluation scores show that PCBM-ReD’s explanations are judged highly for visual identifiability, descriptive fidelity, and causal link to predictions.

Interpretability is further enhanced by discarding non-causal or non-visual features as determined by LLM scoring.

Ablative Analysis: Removal of the LLM-based concept filtering, replacement of reconstruction-guided selection with random/k-means alternatives, or using naive CLIP similarities for concept association each causes measurable performance and interpretability drops.
Figure 7: Ablation study confirms the superiority of data-driven concept creation, reconstruction-based selection, and decomposition-based association over plausible alternatives.

Methodological Innovations and Theoretical Implications

PCBM-ReD’s core technical advances include:

Alignment of Concept Set with Data Distribution and Encoder Capacity: Unlike prior CBMs, whose concepts are fixed externally, PCBM-ReD discovers and ranks concepts inherently present in the foundation model’s representation. This results in semantically meaningful and discriminative factors that match the backbone’s learned invariances and biases.
Sparsity and Independence in Explanatory Factors: Both the SAE and the greedy selection provide independence across chosen concepts, facilitating possible CBM interventions and avoiding the representational redundancy that undermines causal interpretation.
Model Faithfulness and Completeness: By reducing the projection residual, the predictive performance of the interpretably-constrained model remains strongly faithful to the pretrained encoder—challenging the notion that interpretability and performance are inherently at odds in post-hoc settings.
MLLM-driven Semantic Validation: By using MLLMs and LLMs in a loop for labeling, scoring, and filtering, the pipeline ensures that concepts are visually meaningful and exclude non-perceptual shortcuts, a frequent failure mode in both manual curation and text-only LLM concept bottlenecks.

Practical Implications and Future Directions

PCBM-ReD provides a practical path towards interpretable deployment of foundation vision models in sensitive, high-stakes settings (medical imaging, autonomous systems, scientific analysis). The pipeline is domain-agnostic provided the foundation model and the MLLM/LLM components are sufficiently expressive about the task-specific visual modalities.

Extensions of this method could involve:

Domain-Adapted LLMs/MLLMs: For medical or technical imagery, specialized LLMs could further refine the labeling and filtering stages, freeing PCBM-ReD from errors due to inadequate descriptive power in generalist LLMs.
Non-CLIP Backbones: While the method leverages joint alignment in CLIP, similar decompositional strategies could be ported to architectures supporting concept activation vectors or other forms of latent disentanglement.
Causal Interventions and Editing: The independence and faithfulness of concept mappings open avenues for direct intervention on predictions, debugging, and even ethical audits at concept granularity.
Global and Local Attribution: The framework is compatible with both per-sample (local) and dataset-level (global) concept importance analyses.

Conclusion

PCBM-ReD advances concept-based interpretability for vision models by employing sparse decomposition, reconstruction-based concept selection, and LLM-guided semantic validation. It achieves classification accuracy on par with end-to-end models, robust zero/few-shot generalization, and highly rated interpretability, setting a new standard for post-hoc CBM construction. The approach effectively bridges the gap between the opaque, distributed features of foundation models and the structured, human-centered abstractions required for trustworthy AI deployment.

Markdown