ACMMD: Conditional Fit & Model Calibration

Updated 21 October 2025

ACMMD is a kernel-based divergence that quantitatively evaluates the conditional fit between true and modeled distributions via an RKHS metric.
It provides an unbiased U-statistic estimator and employs a wild bootstrap method to control Type I error in hypothesis testing.
ACMMD extends to model calibration and aids hyperparameter tuning in conditional sequence models, as demonstrated in ProteinMPNN applications.

Augmented Conditional Maximum Mean Discrepancy (ACMMD) is a kernel-based divergence designed to assess the discrepancy between true and model conditional distributions, especially in the context of conditional sequence models. ACMMD couples the conditioning variable with conditional distributions, thus quantifying absolute conditional model fit via a reproducing kernel Hilbert space (RKHS) metric. ACMMD enables unbiased estimation from paired data and model samples, embeds directly into hypothesis testing scenarios with controlled Type I error, and extends naturally to assessing model reliability (calibration). This framework provides rigorous statistical tools for evaluating conditional sequence models, tuning hyperparameters critical to generalization, and investigating model limitations in applied computational biology contexts (Glaser et al., 17 Oct 2025).

1. Theoretical Formulation and Mathematical Structure

ACMMD is formally defined by constructing two joint distributions sharing the same marginal on the conditioning variable $X$ : (i) $\mathbb{P}_X \otimes \mathbb{P}_{(|)}$ , where $Y$ is sampled from the true conditional $\mathbb{P}(Y|X)$ given $X\sim\mathbb{P}_X$ ; and (ii) $\mathbb{P}_X \otimes Q_{(|)}$ , where $Y$ is sampled from the model’s conditional $Q(Y|X)$ . The measure is the Maximum Mean Discrepancy between these joint distributions under an appropriate kernel $k$ on $\mathcal{X} \times \mathcal{Y}$ . Explicitly: $\text{ACMMD}(\mathbb{P}_{(|)}, Q_{(|)}) := \text{MMD}\left(\mathbb{P}_X \otimes \mathbb{P}_{(|)},\, \mathbb{P}_X \otimes Q_{(|)}\right)$ where for $Z=(x, y)$ ,

$\text{MMD}^2(\mathcal{Q}, \mathcal{Q}') = \mathbb{E}_{Z, Z' \sim \mathcal{Q}}[k(Z, Z')] + \mathbb{E}_{Z, Z' \sim \mathcal{Q}'}[k(Z, Z')] - 2\,\mathbb{E}_{Z \sim \mathcal{Q},\; Z' \sim \mathcal{Q}'}[k(Z, Z')]$

If $k_x$ and $k_y$ are universal kernels (RKHS dense in the space of continuous vanishing-at-infinity functions), then $\text{ACMMD}=0$ iff $\mathbb{P}(Y|X=x)=Q(Y|X=x)$ for $\mathbb{P}_X$ -almost every $x$ (Glaser et al., 17 Oct 2025).

2. Statistical Properties, Estimation, and Consistency

Given that the model $Q_{(|)}$ allows sampling, ACMMD can be estimated unbiasedly from paired data. For samples $\{(X_i, Y_i)\}$ with each $X_i$ an input and $Y_i$ sampled from the true conditional, and each $\tilde{Y}_i$ drawn from $Q(Y|X_i)$ , the unbiased U-statistic estimator is: $\widehat{\text{ACMMD}}^2 = \frac{2}{N(N-1)} \sum_{1 \leq i < j \leq N} k_x(X_i, X_j)\big[ k_y(Y_i, Y_j) + k_y(\tilde{Y}_i, \tilde{Y}_j) - k_y(Y_i, \tilde{Y}_j) - k_y(\tilde{Y}_i, Y_j) \big]$ This estimator is proven unbiased and consistent under mild conditions (Glaser et al., 17 Oct 2025).

3. Hypothesis Testing: Controlled Error Rates and Bootstrap Procedures

ACMMD quantifies absolute conditional goodness-of-fit: it is zero if and only if model and true conditional match for almost every $X$ . A canonical hypothesis test is: $H_0:\ \text{ACMMD}(\mathbb{P}_{(|)}, Q_{(|)}) = 0\quad\text{vs}\quad H_1:\ \text{ACMMD}(\mathbb{P}_{(|)}, Q_{(|)}) > 0$ Due to finite sample noise, inference for ACMMD employs a wild bootstrap. For each bootstrap $b=1,\ldots,B$ , draw Rademacher random variables $\{W_i^b\}$ and compute

$\widehat{\text{ACMMD}}^2_b = \frac{2}{N(N-1)} \sum_{i < j} W_i^b W_j^b \, h\left( (X_i, Y_i, \tilde{Y}_i),\, (X_j, Y_j, \tilde{Y}_j) \right)$

where $h$ is the symmetric kernel function from the estimator. The empirical $(1-\alpha)$ quantile for the bootstrap distribution delivers Type-I error control: reject $H_0$ if $\widehat{\text{ACMMD}}^2 > q_{1-\alpha}$ (Glaser et al., 17 Oct 2025).

4. Extension to Reliability and Model Calibration: ACMMD-Rel

Beyond fit, ACMMD supports the assessment of reliability (calibration). A reliable model’s predicted distribution matches the observed frequency of outcomes. Define $Q_X$ as the model’s output and require $q = \mathbb{P}(Y \in \cdot\mid Q_X=q)$ almost everywhere. ACMMD–Rel evaluates this property by comparing the “empirical recalibration” $\mathbb{P}_{q|}$ (true conditional given prediction $q$ ) and the “ideal” mapping $q \mapsto q$ ,

$\text{ACMMD–Rel}(\mathbb{P}_{(|)}, Q_{(|)}) := \text{MMD}\left(\mathbb{P}_{q|}, Q_{\text{rel}(|)}\right)$

Estimation and testing parallel the ACMMD setup (Glaser et al., 17 Oct 2025).

5. Practical Applications: Protein Sequence Design and Hyperparameter Tuning

In computational biology, ACMMD quantifies model fit for conditional sequence generators such as ProteinMPNN. For evaluation, sequences are generated from ProteinMPNN at different sampling temperatures $T$ . Empirically, increasing $T$ yields higher ACMMD values, revealing greater divergence from the true conditional distribution. By optimizing $T$ to minimize ACMMD (and ACMMD–Rel), practitioners achieve improved conditional fit and model reliability. The paper demonstrates that default ProteinMPNN sampling temperatures may be suboptimal and can be systematically chosen via ACMMD-based tuning (Glaser et al., 17 Oct 2025).

6. Key Properties and Summary Table

ACMMD provides absolute, kernel-based quantification of conditional fit, supports consistent estimation and hypothesis testing, and extends directly to calibration assessment. An overview is provided below:

Aspect	Definition/Property	Reference
Main Divergence	ACMMD $(\mathbb{P}_{(\|)}, Q_{(\|)}) :=$ MMD $(\mathbb{P}_X\!\otimes\!\mathbb{P}_{(\|)},\,\mathbb{P}_X\!\otimes\!Q_{(\|)})$	(Glaser et al., 17 Oct 2025)
U-statistic Estimator	Equation (1) above; unbiased and consistent	(Glaser et al., 17 Oct 2025)
Hypothesis Testing	Bootstrap procedure for controlled Type-I error	(Glaser et al., 17 Oct 2025)
Reliability (Calibration)	ACMMD–Rel as above; consistency for calibrated models	(Glaser et al., 17 Oct 2025)
Application: Tuning	Used for hyperparameter selection in ProteinMPNN	(Glaser et al., 17 Oct 2025)

ACMMD is positioned as a canonical tool for statistical assessment of conditional model fit and reliability, with demonstrated value for hyperparameter selection and model evaluation in biological sequence modeling. The strict equality condition under universal kernels ensures its status as an absolute metric for conditional goodness-of-fit. Its statistical properties permit robust uncertainty quantification and rigorous hypothesis testing with exact Type-I error control. These characteristics enable ACMMD to guide improvements in conditional biological sequence models and beyond.

PDF Markdown Chat (Pro)

References (1)

Kernel-Based Evaluation of Conditional Biological Sequence Models (2025)

Follow Topic

Get notified by email when new papers are published related to Augmented Conditional Maximum Mean Discrepancy (ACMMD).