Flexible Concept Bottleneck Model

Updated 17 November 2025

Flexible Concept Bottleneck Model is an interpretable neural network architecture that dynamically adapts and manages human-understandable concepts.
It leverages hypernetwork-sparsemax pipelines and frozen LLM classifiers to enable seamless concept revision and precise semantic reasoning.
Experimental results demonstrate that FCBM recovers full accuracy with minimal fine-tuning and outperforms traditional concept bottleneck models on various benchmarks.

Flexible Concept Bottleneck Model (FCBM) is a paradigm within interpretable neural network architectures that emphasizes dynamic, extensible reasoning over human-understandable concepts. Standard Concept Bottleneck Models (CBMs) enforce interpretability by structuring the prediction process through a discrete set of intermediate semantic concepts, but they are rigid with respect to concept set revision, intervention interfaces, and reasoning flexibility. FCBM generalizes and augments CBM approaches via mechanisms that enable seamless conceptual adaptation, direct semantic reasoning, and richer user interactivity, while preserving the transparency and auditability central to CBMs.

1. FCBM Architectures: Hypernetwork and Semantic Reasoning

Traditional CBMs comprise two sequential modules: a concept predictor $g$ that extracts concept activations from input data, and a label predictor $f$ that infers class predictions from these activations via a fixed linear transformation $f(\mathbf{c})=W\mathbf{c} + b$ . FCBM replaces the inflexible linear mapping with a hypernetwork-sparsemax pipeline or a frozen LLM semantic classifier.

Hypernetwork-based FCBM

Concept Embeddings: CLIP-derived text features $\{\mathbf{t}_j\}_{j=1}^m$ encode each concept.
Hypernetwork $h$ : $h:\mathbb{R}^d \rightarrow \mathbb{R}^n$ (typically a 3-layer MLP), maps per-concept text embeddings into class prediction weight vectors, constructing $W\in\mathbb{R}^{m\times n}$ .

$W = \begin{bmatrix} h(\mathbf{t}_1)^\top \ h(\mathbf{t}_2)^\top \ \vdots \ h(\mathbf{t}_m)^\top \end{bmatrix}$

Sparsemax Projection: To ensure interpretability and selectivity, a modified sparsemax projection with a learnable temperature $\tau$ sparsifies $W$ , resulting in $\mathring W = \mathcal{S}^\tau_{\max}(W)$ . Only a small subset of concepts actively influence predictions for each class.

LLM-based FCBM (Chat-CBM)

Semantic Bottleneck: The concept bottleneck consists of a set of semantic tokens $\hat{s}$ , decoded from high-confidence concept activations or CLIP similarities.
Language-based Classifier: A frozen LLM $f_{\mathcal{M}}$ reasons over prompts containing concept tokens, candidate classes, and optional priors to infer class probabilities:

$P(y_i \mid D, \theta, \hat{s}) \equiv f_{\mathcal{M}}(y_i; \text{prompt}(D, \theta, \hat{s}))$

No LLM Fine-tuning: The concept extractor is trained conventionally, but the downstream LLM classifier is frozen; only the prompting context adapts per task.

2. Dynamic Concept Adaptation Mechanisms

FCBM is fundamentally concept-size agnostic: the concept set $\{\mathbf{t}_j\}$ and its cardinality may be arbitrarily altered at test time.

Distribution Alignment for Hypernetwork FCBM: When swapping in a new concept set $\mathbf{T}'$ at inference, FCBM uses a feature-distribution alignment procedure:

$\tilde{\mathbf{T}}' = \frac{\sigma_{\mathbf{t}}}{\sigma_{\mathbf{t}'} } (\mathbf{T}' - \bar{\mathbf{t}}') + \bar{\mathbf{t}}, \ \tilde{W} = \frac{\sigma_{W}}{\sigma_{h(\tilde{\mathbf{T}}')}} (h(\tilde{\mathbf{T}}') - \bar{h}(\tilde{\mathbf{T}}')) + \bar{W}$

Interactivity via Semantics (Chat-CBM): In Chat-CBM, new concepts can be injected by augmenting the semantic bottleneck prompt and concept tokens; task-specific strategies, domain knowledge, or high-level guidance can be added through natural language.

FCBM supports arbitrary addition, removal, or replacement of concepts without retraining the concept predictor module $g$ . A plausible implication is that FCBM anticipates continual concept-bank evolution for real-world deployment, even when using foundation vision-LLMs.

3. Training Protocols and Loss Functions

FCBM decomposes training into concept prediction and label prediction stages.

Concept Prediction

Image Backbone: ResNet50 or ViT extract image features $\mathbf{z}_i = \omega(\mathbf{x}_i)$ .
Concept Predictor $g$ : $g(\mathbf{z}_i) = \mathbf{q}_i \in \mathbb{R}^m$ .
Concept Scores: $\mathbf{c}_i = \mathbf{z}_i \cdot \mathbf{T}^\top$ .
Loss:

$\mathcal{L}_{\text{concept}} = \sum_{j=1}^m \left[ -\mathrm{sim}(\mathbf{c}_{:,j}, \mathbf{q}_{:,j}) \right], \quad \mathrm{sim}(u,v) = \left(\frac{u\cdot v}{\|u\|\|v\|}\right)^3$

Chat-CBM Supervised: Uses standard binary cross-entropy loss for concept prediction.

Label Prediction

Hypernetwork FCBM:

$r_i = \mathbf{q}_i^\top \mathring{W} \in \mathbb{R}^n$

Minimize cross-entropy:

$\mathcal{L}_{\text{cls}} = \sum_{i=1}^N \mathrm{CE}(r_i, y_i)$

Sparsemask Gradient:

$\frac{\partial \mathcal{L}}{\partial \tau} = \sum_{i \in P(s)} \frac{1}{|P(s)|} \frac{\partial \mathcal{L}}{\partial \tilde{s}_i}$

$P(s) = \{i : \tilde{s}_i > 0\}$

Chat-CBM: Label predictor is a frozen LLM. No LLM fine-tuning; upstream training matches standard CBM.

Total Loss: Usually the sum of both stages; in practice, first train $g$ (concept prediction), then train $(h,\tau)$ (hypernetwork and temperature).

4. Intervention and Steering Interfaces

FCBM enables both numeric and rich semantic interventions.

Numeric Edits

Concept activations can be manually edited, $\mathbf{c}_j \rightarrow \mathbf{c}_j'$ , propagating to prediction.

Conversational/Language-based Interventions (Chat-CBM)

The user interacts through natural language:

Concept Correction: E.g., "Ignore the concept 'forest' in prediction."
Concept Addition/Removal: E.g., "Also note the bird has a forward-arching feather on the head."
Strategy Guidance: E.g., "Focus on relative bill length to distinguish species."

Interventions are processed by the LLM, which dynamically reweights evidence. This suggests that even noisy or dense concept activations become tractable since the classifier re-contextualizes in semantic space.

5. Experimental Evaluation and Benchmarks

Extensive experiments validate FCBM’s claims regarding flexibility, accuracy, and generalizability.

Datasets

CIFAR-10, CIFAR-100, CUB, Places365, ImageNet for hypernetwork FCBM (Du et al., 10 Nov 2025).
CUB, AwA2, PBC for supervised CBM/Chat-CBM (He et al., 22 Sep 2025).
DTD, Food-101, Flower-102, CIFAR-10/CIFAR-100, ImageNet for unsupervised CBMs.

Performance Metrics

Hypernetwork FCBM (Du et al., 10 Nov 2025):

Backbone	Method	CIFAR10	CIFAR100	CUB	Places365	ImageNet
ResNet50	LF-CBM	86.16	64.62	56.91	48.88	66.03
	FCBM (Ours)	85.59	64.77	63.46	49.13	66.34
ViT-L/14	LF-CBM	97.18	81.98	75.44	50.51	79.70
	FCBM (Ours)	97.21	83.63	80.52	51.39	80.62

Zero-shot concept swapping yields 75% accuracy on CIFAR-10, recovering full accuracy with one epoch of fine-tuning on Stage 2.

Chat-CBM (He et al., 22 Sep 2025):

Model	Concept Acc.	Class Acc. (CUB)	Class Acc. (AwA2)	Class Acc. (PBC)
CBM	0.965	0.752	0.923	0.988
+Chat-CBM	0.965	0.815	0.964	0.986

On unsupervised tasks (2-shot):

Model	DTD	CIFAR10	ImageNet	...
LaBo (2-shot)	0.552	0.803	0.558	...
LaBo+Chat-CBM	0.677	0.889	0.601	...
V2C-CBM (2-shot)	0.492	0.934	0.615	...
V2C+Chat-CBM	0.734	0.955	0.667	...

Intervention Curves: On CUB, full correction drives Chat-CBM to ≈0.998 class accuracy. In unsupervised tasks, autonomous LLM edits over five turns outperform all-shot baselines.

6. Ablation Studies and Component Analysis

Sparsity vs. Number of Effective Concepts (NEC): Varying NEC from 30 to full dimension yields marginal gains; NEC≈30 saturates accuracy.
Sparsemax and Learnable $\tau$ : Removing sparsemax or fixing $\tau$ degrades zero-shot generalization; full FCBM with both components outperforms ablations.
Bottleneck Inputs (Chat-CBM): Combining direct demonstrations, ground-truth labels, and class priors achieves maximal performance.
LLM Size: Saturates at LLaMA-3-70B or Qwen2.5-32B; smaller models (e.g., Qwen2.5-7B) underperform.
In-context Example Number ( $K$ ): Increasing $K$ improves predictive performance up to a plateau at $K=3$ .

7. Interpretability, Flexibility, and Implications

FCBM advances CBM designs by making the conceptual bottleneck directly programmable and extensible. Semantic flexibility supports:

Zero-shot Test-time Concept Swaps: New concept sets (e.g., from DeepSeek-V3 or GPT-4o) require no retraining of $g$ , only Stage 2 fine-tuning if desired.
Unsupervised Noisy Concepts: LLM-augmented Chat-CBM allows language-based selection/filtering of concepts, overcoming intervention ineffectiveness present in standard CBMs.
User-centric Control: In Chat-CBM, user steering is realized via conversational dialogs rather than only numeric sliders; few-shot and compositional reasoning re-weight evidence as needed.

A plausible implication is a growing emphasis on integrating foundation model semantics and interactive interfaces in interpretable neural system design, facilitating both audit trails and robust adaptation in downstream deployment.

In summary, Flexible Concept Bottleneck Models instantiate an interpretability-preserving, dynamically adaptable framework that overcomes the rigidities of traditional CBMs via hypernetwork/language-based reasoning and modular concept management, substantiated by empirical accuracy and intervention results across diverse benchmarks (He et al., 22 Sep 2025, Du et al., 10 Nov 2025).

PDF Markdown Chat (Pro)

References (2)

Flexible Concept Bottleneck Model (2025)

Chat-CBM: Towards Interactive Concept Bottleneck Models with Frozen Large Language Models (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Flexible Concept Bottleneck Model (FCBM).