Dice Question Streamline Icon: https://streamlinehq.com

Universality of two-step neural embedding kernels for ACMMD on probability measures of sequences

Determine whether the kernel on the space of probability measures over sequences k_{P(𝒴)}(q, q') = exp(-(1/(2σ^2)) MMD^2(q, q')) constructed using the two-step sequence kernel k_𝒴 that averages per-residue embeddings from the ESM-2 sequence model (and the analogous structural embeddings from GearNet) satisfies the conditions of Proposition 4 (vanishing at infinity and discrete mass property for k_𝒴) so that k_{P(𝒴)} is C0-universal on 𝒫(𝒴).

Information Square Streamline Icon: https://streamlinehq.com

Background

In the reliability analysis, the paper requires a universal kernel on 𝒫(𝒴) to ensure that ACMMD--Rel detects any pattern of unreliability. Proposition 4 provides a recipe to obtain a C0-universal kernel on 𝒫(𝒴) via an exponentiated MMD built from a kernel k_𝒴 that vanishes at infinity and has discrete masses.

For practical evaluation, the authors propose two-step kernels for sequences and structures based on embeddings from neural networks (ESM-2 for sequences and GearNet for structures), combined with Euclidean kernels on averaged embeddings. It is uncertain whether these practical kernels meet the assumptions required by Proposition 4 to guarantee universality on 𝒫(𝒴).

References

Whether \cref{prop:universal_kernel_on_PYS} holds for these kernels is an open question, but we find that they perform well in our experiments.

Kernel-Based Evaluation of Conditional Biological Sequence Models (2510.15601 - Glaser et al., 17 Oct 2025) in Section 6.2 (Choice of kernel)