ReCalX: Perturbation-Based Explanation Calibration
- ReCalX is a perturbation-aware method that recalibrates confidence scores as a function of perturbation intensity while preserving the model’s prediction ranking.
- It employs perturbation-conditioned temperature scaling to correct miscalibrated outputs, thereby improving global feature-importance estimation and local explanation robustness.
- Empirical evaluations demonstrate significant calibration error reductions often above 80%, enhancing explanation fidelity across both tabular and image classification tasks.
Searching arXiv for “ReCalX” and related calibration/explanation papers. ReCalX is a perturbation-aware post-hoc recalibration method for perturbation-based explanations in machine learning. It was introduced in the paper "Improving Perturbation-based Explanations by Understanding the Role of Uncertainty Calibration" (Decker et al., 13 Nov 2025), where it is motivated by the observation that explanation-specific perturbations often move inputs off the training manifold, inducing miscalibrated probability estimates precisely in the regimes from which attribution methods aggregate evidence. ReCalX preserves the original model’s prediction ranking while recalibrating confidence scores as a function of perturbation intensity, with the stated goal of improving global feature-importance estimation, local explanation robustness, and perturbation-specific uncertainty calibration.
1. Conceptual setting
ReCalX is defined in the context of perturbation-based explanation methods such as Shapley Values, LIME, feature ablation, RISE, and removal-based explanations (Decker et al., 13 Nov 2025). For a classifier , a perturbation function keeps a subset fixed and corrupts its complement, producing the perturbed prediction
The paper’s central claim is that these perturbed inputs are frequently out-of-distribution or semantically implausible, so the resulting probabilities can be unreliable even when the model is acceptably calibrated on ordinary inputs.
This matters because perturbation-based explanations are typically computed by aggregating many such perturbed predictions. The paper writes the explanation as
where stacks model outputs over all subsets . Under this formulation, any systematic perturbation-induced miscalibration propagates into the attribution vector. The paper emphasizes two direct consequences: global explanations can mis-rank feature subsets by over- or under-estimating their predictive value, and local explanations can become unstable and noisy because attribution scores inherit the model’s unreliable behavior over many perturbation states (Decker et al., 13 Nov 2025).
A common misunderstanding addressed by the work is to treat calibration as a prediction-quality issue only. ReCalX is built on the stronger claim that calibration is also an explanation-quality issue when explanations are constructed from perturbed model outputs. This suggests that explanation fidelity cannot be separated cleanly from uncertainty estimation on the perturbation distribution.
2. Theoretical formulation
The paper uses a KL-based calibration error aligned with cross-entropy: A perfectly calibrated model satisfies , equivalently, for each class ,
0
ReCalX departs from standard calibration analysis by assessing calibration on the perturbation distribution relevant to the explanation method, considering 1 and especially the worst-case quantity
2
For global explanations, the predictive power of a feature subset 3 is defined as
4
With cross-entropy loss 5, the paper gives the decomposition
6
This identifies perturbation-specific calibration error as a subtractive term in measured predictive power. A corollary stated in the paper is that if 7 is perfectly calibrated under all subset perturbations, then 8 (Decker et al., 13 Nov 2025).
For local explanations, the paper introduces a distortion bound. If 9 is the actual explanation and 0 is the explanation under perfect calibration, then with probability at least 1,
2
The result is stated for perturbation-based methods with bounded linear summary rules, including Shapley Values and LIME, and the appendix provides a broader version for nonlinear aggregation (Decker et al., 13 Nov 2025). In the paper’s interpretation, large worst-case perturbation calibration error permits large explanation distortion.
A further theoretical requirement is that recalibration be information-preserving: 3 A sufficient condition given in the paper is that 4 be deterministic and componentwise strictly monotonic for each fixed 5. This criterion motivates the use of temperature scaling, since it changes confidence levels without altering score ordering.
3. Method: perturbation-conditioned temperature scaling
ReCalX is described as a perturbation-conditioned temperature scaling method (Decker et al., 13 Nov 2025). Standard temperature scaling for logits 6 is
7
Because this mapping is monotonic in the logits, it preserves prediction ranking.
The key modification is to replace a single global temperature with temperatures indexed by perturbation intensity. For a subset 8, the perturbation level is defined as
9
Thus 0 corresponds to no perturbation and 1 to full perturbation. The interval 2 is partitioned into 3 equal-width bins, and a temperature 4 is learned for each bin.
On a validation set 5, each 6 is optimized by minimizing cross-entropy on perturbed samples whose perturbation levels fall into bin 7. The paper stresses that the optimization objective is computed on the perturbation distribution induced by 8, rather than only on clean inputs. This is the defining operational difference between ReCalX and ordinary post-hoc temperature scaling.
At explanation time, ReCalX infers the perturbation level of the current subset 9, selects the corresponding bin temperature 0, and applies it to the perturbed logits: 1 The perturbed sample therefore remains the object being explained, but its probability output is recalibrated according to how strongly it has been perturbed.
The paper also notes that the conceptual framework extends beyond classification. For probabilistic regression, it proposes monotonic calibration maps such as affine transformations or injective isotonic regression variants. However, the strongest guarantees are stated for classification with temperature scaling (Decker et al., 13 Nov 2025).
4. Experimental evaluation
The experimental study covers both tabular and image classification, with regression experiments reported in the appendix (Decker et al., 13 Nov 2025). For tabular classification, the datasets are Electricity, Covertype, Credit, and Pol, and the models are MLP and Tabular ResNet. For image classification, the dataset is ImageNet ILSVRC2012, and the models are ResNet50, DenseNet121, ViT, and the SigLIP zero-shot model. The perturbation strategies are mean value replacement for tabular data, and zero baseline perturbation and blur perturbation for image data.
| Domain | Datasets and models | Perturbations / evaluation |
|---|---|---|
| Tabular classification | Electricity, Covertype, Credit, Pol; MLP, Tabular ResNet | Mean value replacement; 2, 3, remove-and-retrain |
| Image classification | ImageNet ILSVRC2012; ResNet50, DenseNet121, ViT, SigLIP | Zero baseline, blur; calibration error and sensitivity-based robustness |
| Regression appendix | Two tabular regression datasets | Quantile-based calibration error |
Calibration is evaluated with 4 and 5, using the consistent and asymptotically unbiased estimator from Popordanoska et al. Explanation quality is evaluated by global remove-and-retrain fidelity, sensitivity-based robustness through Average Sensitivity and Maximum Sensitivity, and qualitative visualization. The reported practical setup includes 200 random validation samples per dataset, 10 perturbed instances per perturbation level, about 2000 calibration samples per bin, at least 5000 unseen samples for evaluation, global importance estimated from 1000 samples, and retraining repeated with 3 random seeds (Decker et al., 13 Nov 2025).
The central empirical finding is that ReCalX consistently reduces perturbation-specific miscalibration most effectively. The paper reports uncalibrated 6 values ranging from about 0.012 to 0.863 for tabular models and from about 0.037 to 0.418 for image models. Example reductions include Electricity for tabular MLP from 0.1534 to 0.0163, Covertype from 0.0797 to 0.0061, Credit from 0.4763 to 0.0533, and Pol from 0.6735 to 0.1679. Under zero-baseline perturbation for image models, the reported reductions are ResNet50 from 0.4177 to 0.0128, DenseNet121 from 0.3769 to 0.0098, ViT from 0.2618 to 0.0078, and SigLIP from 0.2013 to 0.0300. The paper characterizes these as often above 80% and sometimes above 95% (Decker et al., 13 Nov 2025).
The explanation metrics track these calibration gains. In remove-and-retrain experiments, features ranked by ReCalX-enhanced explanations produce larger performance degradation when removed, indicating better identification of truly important features. On the electricity dataset, removing the top three features ranked by ReCalX-enhanced explanations increased the original loss by 33%, compared with 24% using the uncalibrated model’s explanations. For local robustness on ImageNet, ReCalX reduces both Average Sensitivity and Maximum Sensitivity across LIME, Kernel SHAP, and feature ablation, and across both zero and blur perturbations. A specific example given is ResNet50 with zero-baseline perturbation, where LIME average sensitivity decreases from 1.349 to 1.190, Kernel SHAP from 1.434 to 1.364, and feature ablation from 0.965 to 0.825 (Decker et al., 13 Nov 2025).
The paper also studies the number of bins 7, reporting that more bins usually help, with diminishing returns beyond roughly 10 bins. Qualitative visualizations are said to become less noisy, more concentrated on the actual object of interest, and more semantically meaningful after recalibration.
5. Relation to adjacent calibration literature
ReCalX belongs to a broader literature on recalibration, but its target problem is narrower and more specific than standard classifier calibration (Decker et al., 13 Nov 2025). The most closely related distinction is with LoRe, the local recalibration method introduced in "Local Calibration: Metrics and Recalibration" (Luo et al., 2021). LoRe addresses example-dependent reliability of classifier confidences by defining local calibration error through kernel similarity in a pretrained feature space together with confidence binning, and it recalibrates predictions via a kernel-weighted local accuracy estimate. ReCalX, by contrast, does not perform feature-space local correction of ordinary predictions; it recalibrates the probabilities generated under explanation perturbations, conditioned on perturbation intensity.
This distinction matters because the two methods answer different failure modes. LoRe is designed to reveal and reduce local miscalibration that global metrics such as ECE and MCE can hide (Luo et al., 2021). ReCalX is designed to correct the specific distribution shift introduced by perturbation-based explainers (Decker et al., 13 Nov 2025). A plausible implication is that both methods share a concern with calibration beyond dataset averages, but their operational notions of locality differ: LoRe is local in feature space, whereas ReCalX is local in the perturbation regime induced by explanation subsets.
The name should also not be conflated with other unrelated recalibration usages on arXiv. "Recalibration of Neural Networks for Point Cloud Analysis" (Sarasua et al., 2020) introduces channel, spatial, and concurrent re-calibration blocks for hierarchical point-cloud networks; "Recalibration: A post-processing method for approximate Bayesian computation" (Rodrigues et al., 2017) develops a posterior-correction procedure for ABC using marginal CDF transformations; and "Accurate recalibated waveforms for extreme-mass-ratio inspirals in effective-one-body frame" (Cheng et al., 2017) concerns Teukolsky-calibrated EOB waveform coefficients. None of these uses defines ReCalX as the perturbation-aware explanation method of (Decker et al., 13 Nov 2025).
A second misconception addressed explicitly in the ReCalX paper is that ordinary temperature scaling on clean validation data is sufficient for explanation reliability. The reported evidence is the opposite: standard temperature scaling often helped on clean validation data but failed under perturbations, and in some cases made perturbation calibration worse (Decker et al., 13 Nov 2025). ReCalX is therefore positioned not as a generic replacement for calibration, but as a perturbation-conditioned specialization of it.
6. Assumptions, limitations, and practical significance
The paper states several assumptions underlying ReCalX (Decker et al., 13 Nov 2025). The explanation pipeline must use perturbations that can be characterized by a perturbation level 8. Calibration must be approximable by a monotonic post-hoc transform such as temperature scaling. Validation data together with perturbed variants must be available for fitting the calibrator. These assumptions are central because the information-preserving guarantee depends on monotonicity, and the method’s estimation procedure depends on explicit access to perturbation-conditioned validation samples.
The limitations are equally explicit. ReCalX adds an extra calibration stage and requires a validation set. It is tailored primarily to perturbation-based explanation methods. Its strongest guarantees are for classification with temperature scaling, whereas regression requires alternative monotonic calibration maps. The local fidelity theorem is stated for perturbation-based methods with bounded linear aggregation, although the appendix generalizes beyond that setting (Decker et al., 13 Nov 2025).
The practical cost is described as lightweight. The calibration stage requires upfront computation, while inference-time overhead is a temperature lookup and rescaling, which the paper says is only on the order of milliseconds. This makes the method compatible with explanation workflows that need to preserve the original prediction function while improving probabilistic reliability under perturbation.
In significance, ReCalX reframes perturbation-based explainability as a joint problem of attribution and uncertainty calibration. The paper’s theoretical results link perturbation-specific calibration error to both global predictive-power estimation and local explanation distortion, and the empirical results indicate that recalibrating on the perturbation distribution can materially improve explanation robustness and feature-importance fidelity (Decker et al., 13 Nov 2025). This suggests a broader methodological point: when an explanation procedure queries a model on systematically shifted inputs, calibration should be evaluated on that induced distribution rather than inferred from clean-data performance alone.