Controllable Concept Bottleneck Models (CCBMs)

Updated 8 January 2026

Controllable Concept Bottleneck Models (CCBMs) are interpretable ML frameworks that use closed-form influence functions to enable efficient post-training edits.
They support granular edits at the concept-label, concept, and data levels, drastically reducing retraining time while maintaining accuracy.
CCBMs facilitate dynamic, privacy-sensitive model adaptation and unlearning, making them ideal for real-world, evolving deployments.

Controllable Concept Bottleneck Models (CCBMs) are a family of interpretable machine learning models designed to support efficient, fine-grained, and mathematically sound post-training edits. Their principal innovation is the ability to maintain and update the human-understandable concept layer—central to Concept Bottleneck Models (CBMs)—while allowing modifications at multiple semantic levels (concept-label, concept, data) without costly retraining. CCBMs are underpinned by closed-form approximations derived from influence functions, making them particularly suitable for real-world, dynamic, or privacy-sensitive deployments, where model adaptation and unlearning are critical requirements (Lin et al., 1 Jan 2026).

1. CBM Foundations and Motivation for Controllability

A standard CBM decomposes prediction into two explicit stages:

A concept predictor $g_\theta: X \rightarrow \mathbb{R}^k$ mapping input data $x$ to a vector of interpretable concepts $c$ .
A label predictor $f_\phi: \mathbb{R}^k \rightarrow \mathbb{R}^d$ mapping concepts to task outputs.

CBMs are typically trained to minimize separate losses for each component,

$L_C(\theta) = \sum_{i=1}^n \sum_{j=1}^k L_C^j(g_\theta^j(x_i), c_i^j), \quad L_Y(\phi) = \sum_{i=1}^n L_Y(f_\phi(g_\theta(x_i)), y_i)$

where $c_i^j$ denotes the $j$ -th concept label for sample $i$ , and $y_i$ is its task label. This decomposition ensures that model decisions can be precisely traced to underlying concepts.

In practice, CBMs have historically been treated as static: any changes in data, concept annotations, or ontology necessitate retraining. However, real-world deployments demand dynamic updates—including correction of mislabeled concepts, evolution of the concept vocabulary, and respectful unlearning of sensitive data. CCBMs are constructed to provide such flexibility without sacrificing interpretability (Lin et al., 1 Jan 2026).

2. Granularities of Editable Operations in CCBMs

CCBMs support efficient edits at three semantic levels:

Edit Granularity	Typical Operation	Closed-form Update?
Concept-label level	Fix individual entries in concept matrix	Yes
Concept level	Add or remove entire concepts	Yes
Data level	Unlearning/removal or addition of samples	Yes

Concept-label edits: Correct individual $(i, j)$ entries in the concept label matrix—e.g., fixing a mis-annotated sample's gender. CCBMs efficiently update both the concept predictor $\theta$ and label predictor $\phi$ to reflect this correction as if the model had been fully retrained.
Concept-level edits: Add, remove, or redefine entire concepts within the bottleneck. This includes extending the set of interpretable features or removing spurious/irrelevant ones, while updating model parameters accordingly.
Data-level edits: Support for removing (unlearning) or incrementally adding samples. For removals, CCBMs ensure that the updated model behaves as if certain data were never included, satisfying privacy or regulatory requirements. For additions, CCBMs rapidly integrate newly acquired data.

All these operations are achieved via mathematically grounded influence-function approximations, ensuring computational efficiency and theoretical guarantees (Lin et al., 1 Jan 2026).

3. Influence Functions for Closed-form Model Updates

The mathematical foundation of CCBM edits is the influence function: a classical technique that estimates the effect of infinitesimally up- or down-weighting individual training samples on model parameters (Lin et al., 1 Jan 2026).

Given $\ell(z; \theta)$ as the loss for sample $z = (x, y, c)$ , the influence function solves

$\Delta \theta_{-0} \approx - H_\theta^{-1} \nabla_\theta \ell(z_0; \theta),$

where $H_\theta$ is the Hessian of the loss over all data. For a finite perturbation (edit) involving $m$ points, a sum of such terms is applied.

Each edit granularity is associated with a tailored influence update:

Concept-label edit: For corrections $S_e = \{(i, j)\}$ , update

$\theta_g \leftarrow \theta_g - H_g^{-1} \sum_{(i, j) \in S_e} \left[\nabla_{\theta_g} L_C^j(g_\theta^j(x_i), \hat{c}_i^j) - \nabla_{\theta_g} L_C^j(g_\theta^j(x_i), c_i^j)\right].$

Concept edit: For a set of removed concepts $M$ , zero-pad removed dimensions, update parameters in the larger space, then remove them.
Data edit: For removed samples $G$ , apply

$\theta_g \leftarrow \theta_g + H_g^{-1} \sum_{r \in G} \nabla_{\theta_g} L_C(g_\theta(x_r), c_r)$

and corresponding steps for the label predictor $\phi$ .

Efficient computation is achieved using approximate second-order techniques such as EK-FAC or Fisher matrices, scaling well to moderately large models (Lin et al., 1 Jan 2026).

4. Algorithmic Workflow and Practical Implementation

The implementation of CCBM updates involves:

Hessian Calculation: Compute or approximate $H_\theta$ and $H_\phi$ once (e.g., per day).
Edit-processing: Batch similar edits for computational reuse; for each, accumulate gradients over the corrected entries or samples.
Parameter Update: Apply influence-based updates in the parameter space; for concept additions/removals, ensure dimension consistency by zero-padding/trimming.
Validation: Monitor post-edit performance—track small differences ( $\Delta F_1 < 0.002$ ) to the result of full retraining.
Integration: These updates can be exposed as edit APIs or MLOps “jobs,” facilitating automated and auditable post-deployment model maintenance.

Overall runtime for edits is reduced by two to three orders of magnitude versus full retraining (down to 1–5 minutes for typical edit batches), enabling frequent and responsive maintenance in production (Lin et al., 1 Jan 2026).

5. Experimental Validation and Case Studies

Comprehensive empirical evaluations demonstrate the utility, efficiency, and fidelity of CCBM edits (Lin et al., 1 Jan 2026):

Concept-label corrections achieve near-identical accuracy to full retraining with over 100× speedup (e.g., update time drops from hundreds of minutes to under 5 minutes).
Concept-level removals (up to 10 concepts) cause negligible accuracy loss ( $<0.003$ F1), with edit time reduced to $\sim$ 0.5 min.
Data-level unlearning of 3% of the training set achieves privacy erasure (as measured by membership inference attack scores) in minutes while preserving downstream performance.
Addition of new data (10% held out) recovers performance to retrain-equivalent accuracy in well under 1 minute.

Influence-based concept ablation experiments confirm the interpretability of concept importance: removing the top-ranked concepts (as estimated by influence) degrades performance as expected, closely matching retraining-based ablation curves (Lin et al., 1 Jan 2026).

6. Limitations, Theoretical Assumptions, and Practical Guidelines

CCBM influence updates assume local quadraticity and moderate edit size; accumulation of many large edits may require periodic full retraining to mitigate drift. While EK-FAC/Fisher approximations enable scaling to models with $10^7$ – $10^8$ parameters, further sparsity or factorization may be needed for larger architectures. Non-convex loss landscapes may affect fidelity for drastic edits.

Recommendations for practitioners include:

Use CCBM updates for small, frequent edits ( $<5\%$ of data or $<20$ concepts).
Refresh Hessian approximations regularly to maintain update accuracy.
Monitor cumulative deviations between influence-updated and retrained weights, triggering retraining as needed.
Integrate with existing MLOps pipelines for systematic, auditable model maintenance (Lin et al., 1 Jan 2026).

Potential future extensions involve higher-order influence functions, online Hessian update schemes, and formal privacy guarantees (e.g., with differential privacy certification).

7. Position in the Landscape and Relation to Other Controllable CBMs

CCBMs complement other recently proposed controllable CBM and interpretable model families:

Interactive CBM frameworks, which use uncertainty and influence metrics to guide user interventions at test time (Chauhan et al., 2022, Shin et al., 2023).
Generative CCBMs, which use energy-based models and diffusion guidance to ensure all semantic control passes through explicit concept vectors with no auxiliary cues or opaque latent channels (Kim et al., 11 Jul 2025).
Open-vocabulary or ontology-editable CBMs, such as OpenCBM, which enable the addition or replacement of arbitrary concepts after initial training (Tan et al., 2024).
Reasoning-graph-based models, e.g., CREAM, that allow explicit encoding and architectural enforcement of inter-concept and concept-to-task relationships, further facilitating targeted and reliable interventions plus mitigation of concept leakage (Kalampalikis et al., 5 Jun 2025).

CCBMs are unique in their explicit, closed-form, theoretically grounded approach to editable model control post-deployment, standing as a key paradigm for interpretable and dynamic machine learning systems (Lin et al., 1 Jan 2026).