Calibrated Hierarchical QPM Explained
- CHiQPM is a globally interpretable image classification framework that provides exhaustive global auditability and fine-grained local explainability.
- It employs a mixed-integer quadratic programming model to enforce extremely sparse, binary connections and generate contrastive, hierarchical explanations.
- The integrated conformal calibration delivers calibrated prediction sets with user-defined coverage, ensuring reliability in safety-critical applications.
Calibrated Hierarchical QPM (CHiQPM) is a globally interpretable image classification framework designed to unify optimal class-level sparsity, human-analogous hierarchical explanations, and conformal-prediction-based calibrated set outputs. CHiQPM builds on the Quantized Probability Model (QPM) family by enforcing extremely sparse, binary connections between high-level learned “concept” features and output classes, achieving both exhaustive global auditability and fine-grained local explainability. Its novel hierarchical structure enables contrastive explanations between classes and built-in conformal calibration yielding coherent prediction sets with user-defined coverage, supporting rigorous human-AI complementarity, particularly in safety-critical expert domains (Norrenbrock et al., 25 Nov 2025).
1. Theoretical Foundations and Motivation
CHiQPM arose from the need for globally and locally interpretable models in expert-critical domains such as medical diagnosis and autonomous systems, where high trust and auditable decision logic are imperative. Global interpretability allows exhaustive, class-level inspection of model decision boundaries, while local interpretability provides case-by-case rationales for automated predictions, facilitating expert oversight during inference. CHiQPM enforces optimal sparsity, presenting each class as a short list of interpretable “concept” features shared among many classes. This contrastive setup permits direct pairwise explanation: when any two classes differ by a single feature, the exclusive differentiator becomes the focal point of global explanation. Hierarchical representations allow recursive traversal from strong, coarse features to subtle distinctions, mirroring human reasoning, and the integrated conformal prediction layer guarantees calibrated set outputs at any desired error threshold (Norrenbrock et al., 25 Nov 2025).
2. Model Formulation: Sparse Hierarchies and Optimization
CHiQPM operates over a labeled image dataset , with and . Feature extraction via deep backbones (ResNet, DenseNet) yields normalized feature maps. CHiQPM selects features and assigns each to exactly classes. Selection vector identifies active features, and assignment matrix records feature-class pairs, with restricted to selected features.
Optimization is formulated as a mixed-integer quadratic program (MIP) balancing:
- Fidelity: Minimization of a quadratic objective matching dense classifier activations, with constants , encoding first- and second-order dependencies.
- Compactness: The constraint ensures a bounded feature set, and enforces class-level sparsity.
- Hierarchical constraint: Using a symmetric class similarity matrix , the method enforces that each of the top most similar class pairs shares exactly features, for all .
Resulting assignments are globally (or near-globally) optimized via Gurobi with a 1% MIP gap. Feature grounding is achieved by re-training the feature extractor and sparse layer for 70 epochs, minimizing cross-entropy plus a feature grounding loss enforcing strong, iconically sparse activation for assigned features of each ground-truth class and suppressing off-class features. The hyperparameter controls this term. ReLU activation on feature outputs ensures interpretive clarity and maximized contrastiveness (Norrenbrock et al., 25 Nov 2025).
3. Hierarchical Conformal Calibration
CHiQPM natively integrates conformal prediction, providing calibrated set outputs with provable coverage guarantees. On held-out calibration data , the model computes nonconformity scores for each candidate class:
- Split conformal method: The prediction set achieves .
- Hierarchy-aware scores: For the predicted class , a hierarchical feature tree orders features by strength. Shared-path indicators locate divergence depth for each class, yielding three score types:
- Up-score: Only shared path activations summed.
- Subtree score: Penalizes divergence features to favor additional activation support.
- Level-limited score: The maximal depth selected by calibration coverage, yielding final scores for calibrated inclusion.
Set predictions produced by CHiQPM are coherent—all classes within a set share a contiguous prefix of activated features—ensuring that sets correspond to perceptually similar classes (Norrenbrock et al., 25 Nov 2025).
4. Global and Local Interpretability Mechanisms
CHiQPM provides interpretable outputs at two levels:
Global Contrastive Explanations
Every class is defined by a fixed-length set of concept features. For all class pairs with shared features, CHiQPM highlights the single distinguishing feature, directly enabling contrastive global explanations of the form: “Class vs. differ only by concept ,” which supports full auditability and regulatory review.
Hierarchical Local Explanations
For a given input, CHiQPM:
- Selects the predicted class .
- Builds a hierarchical tree from ’s features, sorted by strength.
- Attaches other classes at their first diverging feature depth.
- Visualizes the prediction path, color-coded by feature strength, and marks alternative plausible classes at each branching, as determined by the conformal set .
This procedure answers inference questioning such as “Which features dominated the decision? Which alternative classes remained plausible and why?” rendering the prediction process transparent and facilitating expert oversight (Norrenbrock et al., 25 Nov 2025).
5. Empirical Evaluation and Ablation Studies
CHiQPM has been evaluated on standard fine-grained classification benchmarks: CUB-200-2011 (birds), Stanford Cars, and ImageNet-1K using ResNet50 backbones. Comparative metrics and feature counts, for major baselines—Dense ResNet, Q-SENN, QPM, and CHiQPM—are shown below.
| Method | Acc (%) | Contrastiveness (%) | Structural Grounding | Feature Alignment () | ||
|---|---|---|---|---|---|---|
| Dense Res50 | 86.6 | 2048 | 2048 | 74.4 | 34.0 | 0.90 |
| Q-SENN | 84.7 | 50 | 5 | 93.0 | 23.4 | 3.2 |
| QPM | 85.1 | 50 | 5 | 96.0 | 47.9 | 3.6 |
| CHiQPM | 85.3 | 50 | 5 | 99.9 | 75.0 | 3.8 |
CHiQPM achieves near state-of-the-art top-1 accuracy, with interpretability metrics superseding previous models: contrastiveness (99.9% of classes have a single-feature differentiator), structural grounding, and feature alignment with human-labeled attribute correlations.
For calibrated set prediction efficiency, CHiQPM set sizes are competitive with thresholded softmax sets and notably more concise than APS (adaptive prediction sets):
| 0.10 | 0.075 | 0.05 | ||
|---|---|---|---|---|
| CHiQPM | 1.22 | 1.73 | 2.94 | 9.05 |
| THR | 1.16 | 1.32 | 1.67 | 2.41 |
| APS | 6.30 | 7.20 | 8.54 | 11.3 |
Ablation studies show:
- Increasing hierarchy density raises the number of contrastive pairs and enables deeper hierarchy traversal, yet too high slightly lowers point accuracy; is optimal.
- Removing or substituting the feature grounding loss halves the feature alignment score and doubles the fraction of active features, despite constant accuracy.
- Omitting ReLU on feature activations significantly reduces contrastiveness and structural grounding.
6. Computational Complexity and Hyperparameterization
CHiQPM consists of three primary computational phases:
- Mixed integer quadratic programming: ; in practice, with , , , Gurobi solves for global optima in 2–5 hours on a 72-core CPU, with MIP-Gap=1%.
- Fine-tuning: 70 epochs for sparse model versus 50 for the dense baseline.
- Inference and set prediction: Single forward pass and sorting of up to nonconformity scores, yielding negligible latency.
Typical hyperparameters are , (Miller’s “7±2” rule), , , split-conformal calibration with 10 examples per class, and MIP-Gap=1% in Gurobi (Norrenbrock et al., 25 Nov 2025).
7. Applications and Outlook
CHiQPM is particularly suited to deployment in settings requiring exhaustive global auditability, such as regulated or safety-critical applications, and domains emphasizing human–AI complementarity, including radiology, pathology, and fault inspection. Hierarchical explanations and guaranteed calibrated prediction sets facilitate expert workflows by exposing plausible alternatives at each stage. The learned concept features, which frequently correlate with real-world attributes, open avenues for scientific discovery in exploratory domains (astronomy, materials science) where visual concepts are not known a priori.
In summary, CHiQPM constitutes a unified approach to interpretable image classification by combining optimally sparse global class representations, hierarchical local reasoning, and provable conformal-calibrated prediction sets, advancing the practical deployment of trustworthy AI interpretable systems in critical real-world environments (Norrenbrock et al., 25 Nov 2025).