Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 167 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 33 tok/s Pro
GPT-5 High 40 tok/s Pro
GPT-4o 92 tok/s Pro
Kimi K2 193 tok/s Pro
GPT OSS 120B 425 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

ACMMD: Conditional Fit & Model Calibration

Updated 21 October 2025
  • ACMMD is a kernel-based divergence that quantitatively evaluates the conditional fit between true and modeled distributions via an RKHS metric.
  • It provides an unbiased U-statistic estimator and employs a wild bootstrap method to control Type I error in hypothesis testing.
  • ACMMD extends to model calibration and aids hyperparameter tuning in conditional sequence models, as demonstrated in ProteinMPNN applications.

Augmented Conditional Maximum Mean Discrepancy (ACMMD) is a kernel-based divergence designed to assess the discrepancy between true and model conditional distributions, especially in the context of conditional sequence models. ACMMD couples the conditioning variable with conditional distributions, thus quantifying absolute conditional model fit via a reproducing kernel Hilbert space (RKHS) metric. ACMMD enables unbiased estimation from paired data and model samples, embeds directly into hypothesis testing scenarios with controlled Type I error, and extends naturally to assessing model reliability (calibration). This framework provides rigorous statistical tools for evaluating conditional sequence models, tuning hyperparameters critical to generalization, and investigating model limitations in applied computational biology contexts (Glaser et al., 17 Oct 2025).

1. Theoretical Formulation and Mathematical Structure

ACMMD is formally defined by constructing two joint distributions sharing the same marginal on the conditioning variable XX: (i) PXP()\mathbb{P}_X \otimes \mathbb{P}_{(|)}, where YY is sampled from the true conditional P(YX)\mathbb{P}(Y|X) given XPXX\sim\mathbb{P}_X; and (ii) PXQ()\mathbb{P}_X \otimes Q_{(|)}, where YY is sampled from the model’s conditional Q(YX)Q(Y|X). The measure is the Maximum Mean Discrepancy between these joint distributions under an appropriate kernel kk on X×Y\mathcal{X} \times \mathcal{Y}. Explicitly: ACMMD(P(),Q()):=MMD(PXP(),PXQ())\text{ACMMD}(\mathbb{P}_{(|)}, Q_{(|)}) := \text{MMD}\left(\mathbb{P}_X \otimes \mathbb{P}_{(|)},\, \mathbb{P}_X \otimes Q_{(|)}\right) where for Z=(x,y)Z=(x, y),

MMD2(Q,Q)=EZ,ZQ[k(Z,Z)]+EZ,ZQ[k(Z,Z)]2EZQ,  ZQ[k(Z,Z)]\text{MMD}^2(\mathcal{Q}, \mathcal{Q}') = \mathbb{E}_{Z, Z' \sim \mathcal{Q}}[k(Z, Z')] + \mathbb{E}_{Z, Z' \sim \mathcal{Q}'}[k(Z, Z')] - 2\,\mathbb{E}_{Z \sim \mathcal{Q},\; Z' \sim \mathcal{Q}'}[k(Z, Z')]

If kxk_x and kyk_y are universal kernels (RKHS dense in the space of continuous vanishing-at-infinity functions), then ACMMD=0\text{ACMMD}=0 iff P(YX=x)=Q(YX=x)\mathbb{P}(Y|X=x)=Q(Y|X=x) for PX\mathbb{P}_X-almost every xx (Glaser et al., 17 Oct 2025).

2. Statistical Properties, Estimation, and Consistency

Given that the model Q()Q_{(|)} allows sampling, ACMMD can be estimated unbiasedly from paired data. For samples {(Xi,Yi)}\{(X_i, Y_i)\} with each XiX_i an input and YiY_i sampled from the true conditional, and each Y~i\tilde{Y}_i drawn from Q(YXi)Q(Y|X_i), the unbiased U-statistic estimator is: ACMMD^2=2N(N1)1i<jNkx(Xi,Xj)[ky(Yi,Yj)+ky(Y~i,Y~j)ky(Yi,Y~j)ky(Y~i,Yj)]\widehat{\text{ACMMD}}^2 = \frac{2}{N(N-1)} \sum_{1 \leq i < j \leq N} k_x(X_i, X_j)\big[ k_y(Y_i, Y_j) + k_y(\tilde{Y}_i, \tilde{Y}_j) - k_y(Y_i, \tilde{Y}_j) - k_y(\tilde{Y}_i, Y_j) \big] This estimator is proven unbiased and consistent under mild conditions (Glaser et al., 17 Oct 2025).

3. Hypothesis Testing: Controlled Error Rates and Bootstrap Procedures

ACMMD quantifies absolute conditional goodness-of-fit: it is zero if and only if model and true conditional match for almost every XX. A canonical hypothesis test is: H0: ACMMD(P(),Q())=0vsH1: ACMMD(P(),Q())>0H_0:\ \text{ACMMD}(\mathbb{P}_{(|)}, Q_{(|)}) = 0\quad\text{vs}\quad H_1:\ \text{ACMMD}(\mathbb{P}_{(|)}, Q_{(|)}) > 0 Due to finite sample noise, inference for ACMMD employs a wild bootstrap. For each bootstrap b=1,,Bb=1,\ldots,B, draw Rademacher random variables {Wib}\{W_i^b\} and compute

ACMMD^b2=2N(N1)i<jWibWjbh((Xi,Yi,Y~i),(Xj,Yj,Y~j))\widehat{\text{ACMMD}}^2_b = \frac{2}{N(N-1)} \sum_{i < j} W_i^b W_j^b \, h\left( (X_i, Y_i, \tilde{Y}_i),\, (X_j, Y_j, \tilde{Y}_j) \right)

where hh is the symmetric kernel function from the estimator. The empirical (1α)(1-\alpha) quantile for the bootstrap distribution delivers Type-I error control: reject H0H_0 if ACMMD^2>q1α\widehat{\text{ACMMD}}^2 > q_{1-\alpha} (Glaser et al., 17 Oct 2025).

4. Extension to Reliability and Model Calibration: ACMMD-Rel

Beyond fit, ACMMD supports the assessment of reliability (calibration). A reliable model’s predicted distribution matches the observed frequency of outcomes. Define QXQ_X as the model’s output and require q=P(YQX=q)q = \mathbb{P}(Y \in \cdot\mid Q_X=q) almost everywhere. ACMMD–Rel evaluates this property by comparing the “empirical recalibration” Pq\mathbb{P}_{q|} (true conditional given prediction qq) and the “ideal” mapping qqq \mapsto q,

ACMMD–Rel(P(),Q()):=MMD(Pq,Qrel())\text{ACMMD–Rel}(\mathbb{P}_{(|)}, Q_{(|)}) := \text{MMD}\left(\mathbb{P}_{q|}, Q_{\text{rel}(|)}\right)

Estimation and testing parallel the ACMMD setup (Glaser et al., 17 Oct 2025).

5. Practical Applications: Protein Sequence Design and Hyperparameter Tuning

In computational biology, ACMMD quantifies model fit for conditional sequence generators such as ProteinMPNN. For evaluation, sequences are generated from ProteinMPNN at different sampling temperatures TT. Empirically, increasing TT yields higher ACMMD values, revealing greater divergence from the true conditional distribution. By optimizing TT to minimize ACMMD (and ACMMD–Rel), practitioners achieve improved conditional fit and model reliability. The paper demonstrates that default ProteinMPNN sampling temperatures may be suboptimal and can be systematically chosen via ACMMD-based tuning (Glaser et al., 17 Oct 2025).

6. Key Properties and Summary Table

ACMMD provides absolute, kernel-based quantification of conditional fit, supports consistent estimation and hypothesis testing, and extends directly to calibration assessment. An overview is provided below:

Aspect Definition/Property Reference
Main Divergence ACMMD(P(),Q()):=(\mathbb{P}_{(|)}, Q_{(|)}) :=MMD(PX ⁣ ⁣P(),PX ⁣ ⁣Q())(\mathbb{P}_X\!\otimes\!\mathbb{P}_{(|)},\,\mathbb{P}_X\!\otimes\!Q_{(|)}) (Glaser et al., 17 Oct 2025)
U-statistic Estimator Equation (1) above; unbiased and consistent (Glaser et al., 17 Oct 2025)
Hypothesis Testing Bootstrap procedure for controlled Type-I error (Glaser et al., 17 Oct 2025)
Reliability (Calibration) ACMMD–Rel as above; consistency for calibrated models (Glaser et al., 17 Oct 2025)
Application: Tuning Used for hyperparameter selection in ProteinMPNN (Glaser et al., 17 Oct 2025)

ACMMD is positioned as a canonical tool for statistical assessment of conditional model fit and reliability, with demonstrated value for hyperparameter selection and model evaluation in biological sequence modeling. The strict equality condition under universal kernels ensures its status as an absolute metric for conditional goodness-of-fit. Its statistical properties permit robust uncertainty quantification and rigorous hypothesis testing with exact Type-I error control. These characteristics enable ACMMD to guide improvements in conditional biological sequence models and beyond.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Augmented Conditional Maximum Mean Discrepancy (ACMMD).