ACMMD: Conditional Fit & Model Calibration
- ACMMD is a kernel-based divergence that quantitatively evaluates the conditional fit between true and modeled distributions via an RKHS metric.
- It provides an unbiased U-statistic estimator and employs a wild bootstrap method to control Type I error in hypothesis testing.
- ACMMD extends to model calibration and aids hyperparameter tuning in conditional sequence models, as demonstrated in ProteinMPNN applications.
Augmented Conditional Maximum Mean Discrepancy (ACMMD) is a kernel-based divergence designed to assess the discrepancy between true and model conditional distributions, especially in the context of conditional sequence models. ACMMD couples the conditioning variable with conditional distributions, thus quantifying absolute conditional model fit via a reproducing kernel Hilbert space (RKHS) metric. ACMMD enables unbiased estimation from paired data and model samples, embeds directly into hypothesis testing scenarios with controlled Type I error, and extends naturally to assessing model reliability (calibration). This framework provides rigorous statistical tools for evaluating conditional sequence models, tuning hyperparameters critical to generalization, and investigating model limitations in applied computational biology contexts (Glaser et al., 17 Oct 2025).
1. Theoretical Formulation and Mathematical Structure
ACMMD is formally defined by constructing two joint distributions sharing the same marginal on the conditioning variable : (i) , where is sampled from the true conditional given ; and (ii) , where is sampled from the model’s conditional . The measure is the Maximum Mean Discrepancy between these joint distributions under an appropriate kernel on . Explicitly: where for ,
If and are universal kernels (RKHS dense in the space of continuous vanishing-at-infinity functions), then iff for -almost every (Glaser et al., 17 Oct 2025).
2. Statistical Properties, Estimation, and Consistency
Given that the model allows sampling, ACMMD can be estimated unbiasedly from paired data. For samples with each an input and sampled from the true conditional, and each drawn from , the unbiased U-statistic estimator is: This estimator is proven unbiased and consistent under mild conditions (Glaser et al., 17 Oct 2025).
3. Hypothesis Testing: Controlled Error Rates and Bootstrap Procedures
ACMMD quantifies absolute conditional goodness-of-fit: it is zero if and only if model and true conditional match for almost every . A canonical hypothesis test is: Due to finite sample noise, inference for ACMMD employs a wild bootstrap. For each bootstrap , draw Rademacher random variables and compute
where is the symmetric kernel function from the estimator. The empirical quantile for the bootstrap distribution delivers Type-I error control: reject if (Glaser et al., 17 Oct 2025).
4. Extension to Reliability and Model Calibration: ACMMD-Rel
Beyond fit, ACMMD supports the assessment of reliability (calibration). A reliable model’s predicted distribution matches the observed frequency of outcomes. Define as the model’s output and require almost everywhere. ACMMD–Rel evaluates this property by comparing the “empirical recalibration” (true conditional given prediction ) and the “ideal” mapping ,
Estimation and testing parallel the ACMMD setup (Glaser et al., 17 Oct 2025).
5. Practical Applications: Protein Sequence Design and Hyperparameter Tuning
In computational biology, ACMMD quantifies model fit for conditional sequence generators such as ProteinMPNN. For evaluation, sequences are generated from ProteinMPNN at different sampling temperatures . Empirically, increasing yields higher ACMMD values, revealing greater divergence from the true conditional distribution. By optimizing to minimize ACMMD (and ACMMD–Rel), practitioners achieve improved conditional fit and model reliability. The paper demonstrates that default ProteinMPNN sampling temperatures may be suboptimal and can be systematically chosen via ACMMD-based tuning (Glaser et al., 17 Oct 2025).
6. Key Properties and Summary Table
ACMMD provides absolute, kernel-based quantification of conditional fit, supports consistent estimation and hypothesis testing, and extends directly to calibration assessment. An overview is provided below:
| Aspect | Definition/Property | Reference |
|---|---|---|
| Main Divergence | ACMMDMMD | (Glaser et al., 17 Oct 2025) |
| U-statistic Estimator | Equation (1) above; unbiased and consistent | (Glaser et al., 17 Oct 2025) |
| Hypothesis Testing | Bootstrap procedure for controlled Type-I error | (Glaser et al., 17 Oct 2025) |
| Reliability (Calibration) | ACMMD–Rel as above; consistency for calibrated models | (Glaser et al., 17 Oct 2025) |
| Application: Tuning | Used for hyperparameter selection in ProteinMPNN | (Glaser et al., 17 Oct 2025) |
ACMMD is positioned as a canonical tool for statistical assessment of conditional model fit and reliability, with demonstrated value for hyperparameter selection and model evaluation in biological sequence modeling. The strict equality condition under universal kernels ensures its status as an absolute metric for conditional goodness-of-fit. Its statistical properties permit robust uncertainty quantification and rigorous hypothesis testing with exact Type-I error control. These characteristics enable ACMMD to guide improvements in conditional biological sequence models and beyond.