2000 character limit reached

Smooth Expected Calibration Error (Smooth ECE)

Updated 15 November 2025

Smooth Expected Calibration Error (Smooth ECE) is a calibration metric that replaces traditional binning with kernel smoothing to provide continuous, stable reliability diagrams.
It automatically determines a fixpoint bandwidth to create a parameter-free measure that closely approximates the true miscalibration using robust estimation techniques.
Empirical results demonstrate Smooth ECE’s effectiveness across applications like image classification and weather forecasting by offering reliable and interpretable calibration assessments.

Smooth Expected Calibration Error (Smooth ECE) is a calibration metric designed to overcome inconsistencies, discontinuities, and bin-dependence inherent in classical Expected Calibration Error (ECE). It achieves this by employing kernel smoothing, resulting in both a numerically stable metric and continuous reliability diagrams. Smooth ECE is now recognized as a well-behaved, consistent measure of miscalibration in probabilistic prediction, with direct links to foundational theory on the distance to calibration and robust estimation properties. It is parameter-free in practical instantiations and underlies contemporary recommendations for calibration assessment and visualization in both theoretical and applied contexts.

1. Formal Definition and Key Properties

Let $(f, y) \in [0,1] \times \{0,1\}$ be a predicted probability–outcome pair. The true calibration (or reliability) function is $\mu(p) = \mathbb{E}[y \mid f = p]$ . Given $n$ i.i.d. samples $\{(f_i, y_i)\}_{i=1}^n$ , Smooth ECE is defined through Nadaraya–Watson kernel regression using a positive semi-definite kernel $K_h:[0,1]^2 \rightarrow \mathbb{R}_{\geq 0}$ with bandwidth $h>0$ , typically an RBF: $K_h(p, q) = \exp\left( -\frac{(p - q)^2}{2h^2} \right)$ with boundary reflection on $[0,1]$ .

The smoothed empirical reliability function and prediction density are

$\hat\mu_h(t) = \frac{\sum_{i=1}^n K_h(t, f_i) y_i}{\sum_{i=1}^n K_h(t, f_i)}, \qquad \hat\delta_h(t) = \frac{1}{n} \sum_{i=1}^n K_h(t, f_i)$

and the smoothed residual is

$\hat r_h(t) = \frac{\mathbb{E}_{(f, y)}[K_h(t, f)\,(y - f)]}{\mathbb{E}_{(f, y)} K_h(t, f)}$

The SmoothECE at bandwidth $h$ is

$\mathrm{SmoothECE}_h = \int_0^1 |\hat\mu_h(t) - t|\, \hat\delta_h(t)\, dt = \int_0^1 |\hat r_h(t)|\, \hat\delta_h(t)\, dt$

To eliminate user-tuned hyperparameters, the scale $h^*>0$ is set such that $\mathrm{SmoothECE}_{h^*} = h^*$ ; that is,

$\mathrm{SmoothECE} = \mathrm{SmoothECE}_{h^*} = h^*$

This construction guarantees monotonicity in $h$ , existence/uniqueness of the fixpoint, and parameter-free application for general predictors (Błasiok et al., 2023).

2. Rationale for Kernel Smoothing and Consistency

Traditional binned ECE uses histogram regression (piecewise-constant bins), which induces discontinuities at bin edges and leads to instability in the ECE value under small perturbations of predictions or binning choices. Smooth ECE replaces histogram regression with kernel regression, creating a continuous curve for the reliability function and area-based measure of calibration error.

Kernel smoothing supplies several theoretical advantages:

Continuity: Smooth ECE is Lipschitz in the underlying data distribution with respect to Wasserstein-1 distance.
Consistency: Smooth ECE approximates the natural ground-truth notion of miscalibration, i.e., the $\ell_1$ distance to the nearest perfectly calibrated predictor, within tight polynomial factors (Błasiok et al., 2022). Specifically,

$\tfrac{1}{2}\,\mathrm{Dist}(\mathcal D) \lesssim \mathrm{SmoothECE}(\mathcal D) \lesssim 2\,\sqrt{\mathrm{Dist}(\mathcal D)}$

where $\mathrm{Dist}(\mathcal D)$ is the Wasserstein-1 distance to any perfectly calibrated distribution (Błasiok et al., 2023).

Smooth ECE is thus a "consistent calibration measure" (in the sense of Błasiok et al. 2023a) because it is polynomially related to this ground-truth distance.

3. Comparison to Classical Binned ECE and Alternative Metrics

Binned ECE partitions predictions into $k$ bins, measuring the average absolute deviation between predicted and empirical accuracy per bin. However, it suffers from the following issues:

Discontinuity: Small changes in data or bin positions can cause large jumps in ECE.
Non-consistency: Cannot be made consistent in the polynomial approximation sense for all distributions.
Parameter dependence: Sensitive to the bin count.

By contrast, Smooth ECE:

Uses kernels to effectively implement infinitely many infinitesimal bins, removing bin-edge artifacts.
Is hyperparameter-free when the bandwidth is set at the fixpoint.
Yields continuous, stable, and interpretable diagrams.

Alternative metrics such as interval calibration and Laplace-kernel calibration are also consistent; Laplace-kernel calibration can be easier for black-box estimation with random features, while Smooth ECE employs an LP or kernel regression (Błasiok et al., 2022).

Calibration Metric	Continuity	Consistency	Parameter-Free (at fixpoint)
Binned ECE	No	No	No
Smooth ECE	Yes	Yes ( $s=2, c=1$ )	Yes
Laplace-Kernel Cal.	Yes	Yes ( $s=2, c=1$ )	Yes

4. Construction and Interpretation of Smoothed Reliability Diagrams

The smoothed reliability diagram is constructed as follows:

Given data $\{(f_i, y_i)\}$ , select kernel $K_\sigma$ .
Compute $\hat\mu_\sigma(t)$ and $\hat\delta_\sigma(t)$ via Nadaraya–Watson regression.
Plot $t \mapsto \hat\mu_\sigma(t)$ as a smooth curve over $[0,1]$ .
Display $\hat\delta_\sigma(t)$ as a filled band or as line width to indicate local density of predictions.
Add bootstrap confidence bands to $\hat\mu_\sigma(t)$ .

The area between $\hat\mu_\sigma(t)$ and the diagonal $t \mapsto t$ , weighted by $\hat\delta_\sigma(t)$ , equals $\mathrm{SmoothECE}_\sigma$ . At the fixpoint bandwidth $\sigma^*$ , this area coincides with the final Smooth ECE value. This approach is directly supported by Python tools such as relplot, which performs binary search to set the bandwidth at the associated fixpoint (Błasiok et al., 2023).

5. Estimation, Complexity, and Practical Implementation

Smooth ECE estimation is efficient and requires no tuning:

Kernel regression: For moderate $n$ , the method is $O(n^2)$ for naive kernel implementation or can be accelerated using FFT techniques.
Sample complexity: $O(\epsilon^{-2} \ln(1/\delta))$ samples yield an $\epsilon$ -accurate estimation with high probability (Błasiok et al., 2022).
Automation: The fixpoint $h^*$ is found by searching $h$ where $\mathrm{SmoothECE}_h = h$ . All bootstrap/resampling and diagram construction use this value.
No manual bins: There are no user-tunable parameters; all necessary scales are data-derived.

Example code:

from relplot import smooth_calibration_error, plot_reliability_diagram

sm_ece = smooth_calibration_error(true_labels, preds)
plot_reliability_diagram(true_labels, preds, method="smooth", display_error=True)

(Błasiok et al., 2023).

6. Empirical Evidence and Applications

Smooth ECE has been empirically validated on both synthetic and real-world datasets:

On datasets such as ImageNet (ResNet-34, 50k points), solar-flare forecasting, and precipitation prediction, Smooth ECE generates stable, visually interpretable reliability diagrams.
Numeric values are robust to resampling and not sensitive to binning artifacts. In contrast, binned ECE varies substantially with bin choices.
Smooth ECE is suitable for calibration assessment in machine learning classifiers, medical risk estimation, and weather prediction.
The approach generalizes to any problem where the support of predictions is continuous; if predictors take values only on a small discrete set, simple aligned-bin ECE can suffice and is computationally less expensive (Błasiok et al., 2023).

7. Limitations and Future Directions

Current limitations include scenarios where predictors have an inherently discrete output—here, traditional binning suffices and smoothing offers minor additional benefit. The method presupposes that the underlying calibration function possesses minimal smoothness; severe non-smoothness may disrupt kernel regression.

Future directions involve:

Extending Smooth ECE to multiclass or vector-valued prediction settings.
Exploring alternative kernel choices (e.g., adaptive or anisotropic bandwidths) that preserve consistency.
Investigating connections of Smooth ECE to proper scoring rules, conformal prediction, and out-of-distribution calibration.

Smooth ECE, as a continuous, consistent, and parameter-free calibration metric, provides a principled standard for calibration measurement and visualization in modern probabilistic machine learning, unifying theoretical desiderata with practical usability (Błasiok et al., 2023, Błasiok et al., 2022).

PDF Markdown Chat (Pro)

References (2)

Smooth ECE: Principled Reliability Diagrams via Kernel Smoothing (2023)

A Unifying Theory of Distance from Calibration (2022)

Follow Topic

Get notified by email when new papers are published related to Smooth Expected Calibration Error (Smooth ECE).