Controllable Concept Bottleneck Models (CCBMs)
- Controllable Concept Bottleneck Models (CCBMs) are interpretable ML frameworks that use closed-form influence functions to enable efficient post-training edits.
- They support granular edits at the concept-label, concept, and data levels, drastically reducing retraining time while maintaining accuracy.
- CCBMs facilitate dynamic, privacy-sensitive model adaptation and unlearning, making them ideal for real-world, evolving deployments.
Controllable Concept Bottleneck Models (CCBMs) are a family of interpretable machine learning models designed to support efficient, fine-grained, and mathematically sound post-training edits. Their principal innovation is the ability to maintain and update the human-understandable concept layer—central to Concept Bottleneck Models (CBMs)—while allowing modifications at multiple semantic levels (concept-label, concept, data) without costly retraining. CCBMs are underpinned by closed-form approximations derived from influence functions, making them particularly suitable for real-world, dynamic, or privacy-sensitive deployments, where model adaptation and unlearning are critical requirements (Lin et al., 1 Jan 2026).
1. CBM Foundations and Motivation for Controllability
A standard CBM decomposes prediction into two explicit stages:
- A concept predictor mapping input data to a vector of interpretable concepts .
- A label predictor mapping concepts to task outputs.
CBMs are typically trained to minimize separate losses for each component,
where denotes the -th concept label for sample , and is its task label. This decomposition ensures that model decisions can be precisely traced to underlying concepts.
In practice, CBMs have historically been treated as static: any changes in data, concept annotations, or ontology necessitate retraining. However, real-world deployments demand dynamic updates—including correction of mislabeled concepts, evolution of the concept vocabulary, and respectful unlearning of sensitive data. CCBMs are constructed to provide such flexibility without sacrificing interpretability (Lin et al., 1 Jan 2026).
2. Granularities of Editable Operations in CCBMs
CCBMs support efficient edits at three semantic levels:
| Edit Granularity | Typical Operation | Closed-form Update? |
|---|---|---|
| Concept-label level | Fix individual entries in concept matrix | Yes |
| Concept level | Add or remove entire concepts | Yes |
| Data level | Unlearning/removal or addition of samples | Yes |
- Concept-label edits: Correct individual entries in the concept label matrix—e.g., fixing a mis-annotated sample's gender. CCBMs efficiently update both the concept predictor and label predictor to reflect this correction as if the model had been fully retrained.
- Concept-level edits: Add, remove, or redefine entire concepts within the bottleneck. This includes extending the set of interpretable features or removing spurious/irrelevant ones, while updating model parameters accordingly.
- Data-level edits: Support for removing (unlearning) or incrementally adding samples. For removals, CCBMs ensure that the updated model behaves as if certain data were never included, satisfying privacy or regulatory requirements. For additions, CCBMs rapidly integrate newly acquired data.
All these operations are achieved via mathematically grounded influence-function approximations, ensuring computational efficiency and theoretical guarantees (Lin et al., 1 Jan 2026).
3. Influence Functions for Closed-form Model Updates
The mathematical foundation of CCBM edits is the influence function: a classical technique that estimates the effect of infinitesimally up- or down-weighting individual training samples on model parameters (Lin et al., 1 Jan 2026).
Given as the loss for sample , the influence function solves
where is the Hessian of the loss over all data. For a finite perturbation (edit) involving points, a sum of such terms is applied.
Each edit granularity is associated with a tailored influence update:
- Concept-label edit: For corrections , update
- Concept edit: For a set of removed concepts , zero-pad removed dimensions, update parameters in the larger space, then remove them.
- Data edit: For removed samples , apply
and corresponding steps for the label predictor .
Efficient computation is achieved using approximate second-order techniques such as EK-FAC or Fisher matrices, scaling well to moderately large models (Lin et al., 1 Jan 2026).
4. Algorithmic Workflow and Practical Implementation
The implementation of CCBM updates involves:
- Hessian Calculation: Compute or approximate and once (e.g., per day).
- Edit-processing: Batch similar edits for computational reuse; for each, accumulate gradients over the corrected entries or samples.
- Parameter Update: Apply influence-based updates in the parameter space; for concept additions/removals, ensure dimension consistency by zero-padding/trimming.
- Validation: Monitor post-edit performance—track small differences () to the result of full retraining.
- Integration: These updates can be exposed as edit APIs or MLOps “jobs,” facilitating automated and auditable post-deployment model maintenance.
Overall runtime for edits is reduced by two to three orders of magnitude versus full retraining (down to 1–5 minutes for typical edit batches), enabling frequent and responsive maintenance in production (Lin et al., 1 Jan 2026).
5. Experimental Validation and Case Studies
Comprehensive empirical evaluations demonstrate the utility, efficiency, and fidelity of CCBM edits (Lin et al., 1 Jan 2026):
- Concept-label corrections achieve near-identical accuracy to full retraining with over 100× speedup (e.g., update time drops from hundreds of minutes to under 5 minutes).
- Concept-level removals (up to 10 concepts) cause negligible accuracy loss ( F1), with edit time reduced to 0.5 min.
- Data-level unlearning of 3% of the training set achieves privacy erasure (as measured by membership inference attack scores) in minutes while preserving downstream performance.
- Addition of new data (10% held out) recovers performance to retrain-equivalent accuracy in well under 1 minute.
Influence-based concept ablation experiments confirm the interpretability of concept importance: removing the top-ranked concepts (as estimated by influence) degrades performance as expected, closely matching retraining-based ablation curves (Lin et al., 1 Jan 2026).
6. Limitations, Theoretical Assumptions, and Practical Guidelines
CCBM influence updates assume local quadraticity and moderate edit size; accumulation of many large edits may require periodic full retraining to mitigate drift. While EK-FAC/Fisher approximations enable scaling to models with – parameters, further sparsity or factorization may be needed for larger architectures. Non-convex loss landscapes may affect fidelity for drastic edits.
Recommendations for practitioners include:
- Use CCBM updates for small, frequent edits ( of data or concepts).
- Refresh Hessian approximations regularly to maintain update accuracy.
- Monitor cumulative deviations between influence-updated and retrained weights, triggering retraining as needed.
- Integrate with existing MLOps pipelines for systematic, auditable model maintenance (Lin et al., 1 Jan 2026).
Potential future extensions involve higher-order influence functions, online Hessian update schemes, and formal privacy guarantees (e.g., with differential privacy certification).
7. Position in the Landscape and Relation to Other Controllable CBMs
CCBMs complement other recently proposed controllable CBM and interpretable model families:
- Interactive CBM frameworks, which use uncertainty and influence metrics to guide user interventions at test time (Chauhan et al., 2022, Shin et al., 2023).
- Generative CCBMs, which use energy-based models and diffusion guidance to ensure all semantic control passes through explicit concept vectors with no auxiliary cues or opaque latent channels (Kim et al., 11 Jul 2025).
- Open-vocabulary or ontology-editable CBMs, such as OpenCBM, which enable the addition or replacement of arbitrary concepts after initial training (Tan et al., 2024).
- Reasoning-graph-based models, e.g., CREAM, that allow explicit encoding and architectural enforcement of inter-concept and concept-to-task relationships, further facilitating targeted and reliable interventions plus mitigation of concept leakage (Kalampalikis et al., 5 Jun 2025).
CCBMs are unique in their explicit, closed-form, theoretically grounded approach to editable model control post-deployment, standing as a key paradigm for interpretable and dynamic machine learning systems (Lin et al., 1 Jan 2026).