Proximity-informed Calibration (ProCal)

Updated 6 March 2026

ProCal is a calibration framework that adjusts model confidence by conditioning on input proximity to dense regions in the feature space.
It employs density-ratio estimation and bin-mean-shift techniques to correct overconfidence in low-proximity (sparse) data areas.
Empirical evaluations show significant reductions in calibration errors and bias across vision, NLP, and sensor fusion applications under distribution shifts.

Proximity-informed Calibration (ProCal) refers to a class of calibration frameworks, algorithms, and loss-based post-processing techniques that explicitly adjust confidence estimates of predictive models based on the proximity of an input to dense regions of the training distribution or to points with known semantic, geometric, or statistical affinity. These methods address systematic calibration errors—most notably, proximity bias, in which predictions for instances in sparse data regions are selectively overconfident compared to those in dense regions. ProCal approaches span deep neural network confidence calibration, post-hoc regression recalibration, multiclass probability alignment, and collaborative sensor fusion across diverse modalities.

1. Formalization of Proximity Bias and ProCal

Proximity is operationalized as a data point’s relative position in the learned feature or embedding space, typically measured by the average distance to its $K$ nearest neighbors: $D(X) = \exp\left(-\frac{1}{K} \sum_{X_i \in \mathcal N_K(X)} \text{dist}(X, X_i)\right) \in (0, 1]$ Lower $D(X)$ signifies greater sparsity. Proximity bias is present if, for fixed model confidence $\hat P = p$ , the conditional accuracy differs across proximity levels: $\Pr(\hat Y = Y \mid \hat P = p, D = d_1) \neq \Pr(\hat Y = Y \mid \hat P = p, D = d_2)$ The Bias Index quantifies this effect by comparing matched high- and low-proximity groups in a held-out dataset: $\text{BiasIndex} = \text{Acc}(B_H) - \text{Acc}(B_L)$ A positive index indicates systematic overconfidence for low-proximity samples (Xiong et al., 2023).

ProCal aims to recalibrate model confidence by conditioning on both $\hat P$ and $D$ to produce $\hat P_\text{calib} \approx \Pr(\hat Y = Y \mid \hat P, D)$ , via either density-ratio estimation or bin-based accuracy-shifting calibrators.

2. ProCal Algorithms: DNN Calibration and Pseudocode

Two canonical ProCal instantiations are described for deep neural network calibration:

Density-Ratio Adjustment

Estimate joint densities for correct and incorrect predictions, $p_+(p, d)$ and $p_-(p, d)$ , plus the empirical class ratio $\gamma = \Pr(\hat Y \neq Y)/\Pr(\hat Y = Y)$ : $\Pr(\hat Y = Y \mid \hat P = p, D = d) = \frac{p_+(p, d)}{p_+(p, d) + \gamma p_-(p, d)}$ $K$ -nearest neighbor-based kernel density estimation (KDE) is used for density estimation on a held-out calibration set.

Bin-Mean-Shift Adjustment

Partition the $(\hat P, D)$ space into $M \times H$ bins. In each bin $B_{mh}$ , compute empirical accuracy $\mathcal A_{mh}$ and mean confidence $\mathcal F_{mh}$ . Adjust predictions as: $\hat P_\text{calib} = \hat P + \lambda \left( \mathcal A_{mh} - \mathcal F_{mh} \right) \quad \text{with} \; \lambda \in (0, 1]$

Pseudocode Outline:

def ProCal_Inference(x):
    yhat, phat, e_x = model.predict_and_embed(x)
    d = exp(-mean(KNN_distances(e_x, K)))
    return Calibrator(phat, d)

def Calibrator(p, d):
    # If using density-ratio
    num = KDE_plus(p, d)
    den = num + gamma * KDE_minus(p, d)
    return num / den
    # If using bin-mean-shift
    m, h = find_cell_index(p, d)
    return p + lambda_ * (A_mh - F_mh)

(Xiong et al., 2023)

3. Theoretical Guarantees and Calibration Metrics

ProCal’s adjustments are supported by a Brier-score decomposition in the binary setting, ensuring non-inferiority compared to any base calibrator. Standard expected calibration error (ECE) is proximity-agnostic, so ProCal introduces Proximity-Informed Expected Calibration Error (PIECE): $\text{PIECE} = \mathbb{E}_{\hat P, D}\left[ \left| \Pr(\hat Y = Y \mid \hat P, D) - \hat P \right| \right ]$ PIECE captures persistent calibration errors masked by grouping data with disparate proximity; $\text{ECE} \leq \text{PIECE}$ by Jensen’s inequality, with equality only absent proximity bias (Xiong et al., 2023).

Local calibration formulations in multiclass settings extend this analysis. For $f: \mathcal{X} \to \Delta^C$ , ProCal enforces: $\| f(x) - \hat p_{\text{loc}}(x) \|_1 \leq \epsilon$ with $\hat p_{\text{loc}}(x)$ the kernel-weighted local label distribution; Jensen-Shannon distance is often used for the penalty: $\mathcal{L} = \frac{1}{n} \sum_{i=1}^n d_{\text{JSD}} \left( \hat p_i \| \hat p_{\text{loc}, i} \right ) + \lambda \mathcal{L}_{\text{ce}} (y_i, \hat p_{\text{loc},i})$ (Barbera et al., 30 Oct 2025)

4. Applications Across Modalities and Architectures

Vision and NLP: On 504 ImageNet models, ProCal reduced PIECE by 30–50% and ECE/ACE/MCE by 20–40%, with especially strong effects on Vision Transformer family models, which show higher initial BiasIndex than CNNs (Xiong et al., 2023). In NLP (e.g., RoBERTa finetuned for Yahoo and NLI), PIECE reductions of 40–60% were observed.

Imbalanced and Shifted Data: On long-tail datasets (iNaturalist, ImageNet-LT), ProCal sharply reduced ECE and PIECE relative to temperature scaling alone. Distribution shift settings (ImageNet-C, ImageNet-Sketch, MultiNLI-Mismatch) saw 30–70% relative error reductions.

Semantic Orthogonal Calibration: In test-time prompt-tuning for vision-LLMs, SoC integrates a ProCal principle by regularizing inter-class prototype repulsion via a Huber penalty. This ensures confidence on semantically related classes is not artificially inflated due to full orthogonality constraints, resulting in substantially lower ECE and better selective classification reliability under distribution shift (Fillioux et al., 13 Jan 2026).

Post-hoc Stratified Calibration: ProCal can be employed as a dual calibration framework. A proximity-based conformal predictor partitions the calibration domain into "putatively correct" and "putatively incorrect" subgroups; standard isotonic regression is fit separately on each, and under-confidence is enforced for likely mistakes. This instance-level adaptivity halves confidently incorrect prediction rates compared to baseline post-hoc methods at similar ECE (Gharoun et al., 19 Oct 2025).

Collaborative Perception and Sensor Fusion: In multi-agent settings such as urban vehicle-infrastructure (V2I) calibration, ProCal applies a geometric proximity metric, "overall distance" (oDist), to generate affinity matrices for global optimal transport-based object matching. Closed-form weighted SVD registration on co-visible targets achieves sub-centimeter translation error in simulation and sub-meter accuracy on real data, with sub-0.35s latency (Qu et al., 2024).

5. Comparative Metrics and Empirical Performance

Metric Table

Metric	Definition	ProCal Impact
PIECE	Proximity-Informed Expected Calibration Error: $\mathbb{E}_{\hat P, D}[\|\Pr(\hat Y = Y\|\hat P,D)-\hat P\|]$	30–70% reduction over baselines
BiasIndex	$\text{Acc}(B_H) - \text{Acc}(B_L)$ , high minus low-proximity accuracy, matched for $\hat P$	Detects and corrects overconfidence
LCE, MLCE	Local (Multiclass) Calibration Errors based on neighborhood statistics in feature space	30–60% lower than Dirichlet/IR/TS
oDist	$\|\mathcal{D}\| - \frac{1}{\|\mathcal{D}\|}\sum d(\mathbf{B}_m, \mathbf{B}_n)$ , affinity for object matching	Robust cross-frame target association

(Xiong et al., 2023, Barbera et al., 30 Oct 2025, Qu et al., 2024)

ProCal substantially reduces local and global calibration error metrics without sacrificing accuracy. Reliability diagrams (e.g., for SoC) show that ProCal models closely adhere to the ideal calibration curve, correcting overconfidence especially in semantically close or locally sparse categories. Under adversarial or natural distribution shifts, ProCal frameworks maintain or improve selectivity and uncertainty-aware decision metrics.

6. Implementation Strategies and Trade-offs

Key hyperparameters include neighbor count ( $K$ or kernel bandwidth $\gamma$ ), bin resolution, and regularization parameters (e.g., $\lambda$ in bin-mean-shift or Huber loss). For multiclass and high-dimensional contexts, approximate k-NN search (e.g., FAISS) or vectorized batch processing is employed to limit computational overhead. Residual parametrization and dropout provide stability, while dual calibration strategies may trade a slightly increased ECE in the "risky" regions for dramatic reductions in confidently incorrect outputs (Gharoun et al., 19 Oct 2025, Barbera et al., 30 Oct 2025).

A fundamental limitation is possible stratification error in proximity assignments (especially in highly overlapping or sparse clusters), and the need for robust hyperparameter selection. Extensions to non-vision domains may require alternative proximity or affinity metrics.

7. Extensions and Future Directions

ProCal principles generalize to any setting where external or learned proximity information is available:

Representation and Metric Learning: Incorporating explicit similarity matrices or side-information in pairwise penalties, with robust (e.g., Huber) or distance-adaptive regularization guides feature separation without over-separation of closely related classes (Fillioux et al., 13 Jan 2026).
Calibration-Aware Logit Adjustment: Inclusion of proximity-weighted margins or soft neighborhood constraints in output layer logit calibration.
Collaborative, Distributed, and Multi-modal Systems: Spatial or semantic proximity metrics define affinity in multi-agent, distributed sensor calibration pipelines, with global optimal transport ensuring assignment consistency (Qu et al., 2024).

Across all these domains, calibration fidelity, trustworthiness, and equitable error control in sparse or semantically ambiguous regions are advanced by ProCal’s explicit proximity conditioning strategy. These advances are central to the pursuit of robust and fair AI systems in safety-critical and data-shifting environments.