Papers
Topics
Authors
Recent
2000 character limit reached

Multi-Modality CFM Architecture

Updated 14 January 2026
  • Multi-Modality Conditional Flow Matching (CFM) architecture is a hybrid framework that fuses continuous and discrete flow matching to capture detailed molecular surface geometry and chemistry.
  • It employs SE(3)-equivariant surface geometric networks and classifier-free guidance to enhance predictions in peptide–protein binding design.
  • Integration with rigorous evaluation metrics like AAR, RMSD, and designability establishes a robust standard for benchmarking generative models and hyperparameter optimization.

The term "PepMerge Benchmark" refers to advanced resources in both structural peptide–protein co-design and hyperparameter optimization for LLM merging. In the context of peptide design, PepMerge is a large-scale, systematically clustered benchmark dataset and evaluation suite for assessing the efficacy of generative models targeting peptide–protein binding. Separately, the term also denotes a surrogate-based hyperparameter optimization benchmark (SMM-Bench) for model merging, comprising rigorously defined parameter spaces, empirical surrogates, and reproducible evaluation protocols. This entry details both usages with a focus on rigorously extracted evaluation criteria, dataset construction, and baseline results as reported in the literature.

1. Dataset Composition and Clustering Criteria

PepMerge, as introduced in peptide–protein modeling, is a curated, non-redundant collection of 8,365 peptide–protein complexes (Wu et al., 8 Jan 2026). Complexes are aggregated from PepBDB and Q-BioLip, with filtering steps ensuring:

  • Peptide length between 3 and 25 residues.
  • X-ray crystallographic resolution better than 4 Å.
  • Non-redundancy via clustering at 40% receptor sequence identity, computed by MMseqs2.

This clustering yields 292 groups, of which 10 clusters (spanning 158 complexes) constitute a canonical held-out test set. The remaining 282 clusters form the pool for training and validation, with the precise allocation of train/validation splits as per Li et al. (2024). This structure ensures low sequence and structure redundancy between train and test, supporting robust model generalization evaluation.

2. Evaluation Metrics and Mathematical Formulation

All model assessments on PepMerge employ a unified suite of geometry, energy, and designability/diversity metrics (Wu et al., 8 Jan 2026):

Geometry Metrics

  • Amino Acid Recovery Rate (AAR): Fraction of positions with exact residue identity between prediction and ground truth,

AAR=1Nj=1N1{ajpred=ajgt}×100%.\mathrm{AAR} = \frac{1}{N}\sum_{j=1}^N \mathbf{1}\{a_j^\mathrm{pred}=a_j^\mathrm{gt}\} \times 100\%.

  • Cα_\alpha RMSD: Backbone Cα_\alpha root-mean-square deviation post-superposition,

RMSD=1Nj=1Nxjpredxjgt2  (A˚)\mathrm{RMSD} = \sqrt{\frac{1}{N}\sum_{j=1}^N \bigl\|\mathbf{x}_j^\mathrm{pred}-\mathbf{x}_j^\mathrm{gt}\bigr\|^2} \;\text{(\AA)}

  • Secondary‐Structure Similarity Ratio (SSR):

SSR=1Nj=1N1{SSjpred=SSjgt}×100%.\mathrm{SSR} = \frac{1}{N}\sum_{j=1}^N \mathbf{1}\{\mathit{SS}_j^\mathrm{pred}=\mathit{SS}_j^\mathrm{gt}\} \times 100\%.

  • Binding‐Site Overlap (BSR): Jaccard overlap of predicted vs. ground truth peptide–receptor contact sets,

BSR=BpredBgtBpredBgt×100%\mathrm{BSR} = \frac{|B^\mathrm{pred}\cap B^\mathrm{gt}|}{|B^\mathrm{pred}\cup B^\mathrm{gt}|}\times100\%

Energy Metrics

  • Complex Stability (Stb): Proportion of generated complexes with Rosetta all-atom energies lower than the native,

Stb=1Mi=1M1{Epred(i)<Egt(i)}×100%\mathrm{Stb} = \frac{1}{M}\sum_{i=1}^M \mathbf{1}\{E_\mathrm{pred}^{(i)}<E_\mathrm{gt}^{(i)}\}\times100\%

  • Binding Affinity Improvement (Aff): Proportion with improved (lower) predicted binding affinity,

Aff=1Mi=1M1{ΔGpred(i)<ΔGgt(i)}×100%\mathrm{Aff} = \frac{1}{M}\sum_{i=1}^M \mathbf{1}\{\Delta G_\mathrm{pred}^{(i)}<\Delta G_\mathrm{gt}^{(i)}\}\times100\%

Design Metrics

  • Designability (Des): Fraction of generated sequences for which an ESMFold refold returns to within 2 Å RMSD,

Des=1Mi=1M1{RMSDfold(i)<2A˚}×100%\mathrm{Des}=\frac{1}{M}\sum_{i=1}^M \mathbf{1}\{\mathrm{RMSD}_\mathrm{fold}^{(i)}<2\,\text{\AA}\}\times100\%

  • Diversity (Div): 1 minus mean pairwise TM-score among generated peptides,

Div=12M(M1)i<jTM(Pi,Pj)\mathrm{Div}=1-\frac{2}{M(M-1)}\sum_{i<j}\mathrm{TM}(P_i,P_j)

This formalism enables thorough, multidimensional assessment of sequence identity, atomic geometry, secondary structural integrity, receptor contact accuracy, stability, energetic favorability, structural recoverability, and novelty.

3. Baseline Methods and Quantitative Benchmarks

PepMerge's evaluation framework supports direct comparison across a spectrum of generative architectures for de novo peptide co-design. On the 158-complex test set, leading models were scored as summarized below (Wu et al., 8 Jan 2026):

Method AAR (%)↑ RMSD (Å)↓ SSR (%)↑ BSR (%)↑ Stb (%)↑ Aff (%)↑ Des (%)↑ Div ↑
Diffusion 47.04 3.28 74.89 49.83 15.34 17.13 48.54 0.57
PepGLAD 50.43 3.83 80.24 19.34 20.39 10.47 75.07 0.32
PPIFlow 48.35 3.59 68.13 25.94 15.77 12.08 46.53 0.51
PepFlow 51.25 2.07 83.46 86.89 18.15 21.37 65.22 0.42
SurfFlow 54.07 1.96 85.11 87.38 22.46 22.51 73.60 0.61

These metrics demonstrate that SurfFlow, which models explicit surface geometry and chemistry, achieves the best mean reconstruction accuracy, contact overlap, and structure diversity. For instance, the improvement in AAR (54.1% vs. PepFlow’s 51.3%) and RMSD (1.96 Å vs. 2.07 Å) reflects high-fidelity sequence and structure capture by incorporating surface features.

4. Architectural and Methodological Innovations

SurfFlow exemplifies key modeling innovations underpinning modern PepMerge results (Wu et al., 8 Jan 2026):

L=λposLpos+λoriLori+λconLcon+λcatLcat+λstrLstr.\mathcal{L}=\lambda_\mathrm{pos}\mathcal{L}_\mathrm{pos}+\lambda_\mathrm{ori}\mathcal{L}_\mathrm{ori}+\lambda_\mathrm{con}\mathcal{L}_\mathrm{con}+\lambda_\mathrm{cat}\mathcal{L}_\mathrm{cat}+\lambda_\mathrm{str}\mathcal{L}_\mathrm{str}.

  • Equivariant Surface Geometric Network (ESGN): Models both intra- and inter-surface interactions as a dynamic, heterogeneous graph. Edge features use radial basis function embeddings, while node features encode normal angle features via spherical Fourier-Bessel expansions. SE(3)-equivariance is enforced.
  • Classifier-Free Guidance: This mechanism enables direct conditional generation (e.g., enforcing cyclic or disulfide constraints) without auxiliary classifiers.

These approaches facilitate fine-grained alignment of peptide–receptor surface complementarity, capturing both geometric and physicochemical binding determinants.

5. PepMerge as a Surrogate Optimization Benchmark (Model Merging)

In model merging and hyperparameter optimization, “PepMerge” (SMM-Bench) is a self-contained surrogate benchmark for efficient HPO development (Akizuki et al., 2 Sep 2025):

  • Two Search Spaces:
    • Parameter-Space (PS):

    h=(w1,w2,,w64)[0,1]64h = (w_1,\,w_2,\,\dots,\,w_{64}) \in [0,1]^{64}

    with layer-wise continuous weights, PerfPS(h)=accuracy(merged model(h))\mathrm{Perf}_{\rm PS}(h) = \mathrm{accuracy}(\text{merged model}(h)). - Data-Flow-Space (DFS):

    h=(c1,,c32){0,1,2}32,(s2,,s64)[0.4,1.5]63h = (c_1,\dots,c_{32}) \in \{0,1,2\}^{32},\quad (s_2,\dots,s_{64})\in [0.4,1.5]^{63}

    with categorical layer insertion choices and interface scaling factors.

  • Surrogate Models: For each space and evaluation dataset (gsm8k-ja, MGSM), a LightGBM regression model fsurf_{\rm sur} is trained on tens of thousands of (h,Perf)(h, \mathrm{Perf}) pairs, supporting millisecond function evaluation and rapid optimizer benchmarking.

  • Metrics:

    • Mean Squared Error (MSE)
    • R2R^2 (coefficient of determination)
    • Kendall’s Tau (rank correlation)

Standardized API calls enable seamless integration into AutoML HPO workflows. Surrogates capture real LLM merge evaluations, providing high-fidelity optimization testing and reproducibility.

6. Usage Protocols and Best Practices

For both the peptide design and surrogate optimization instantiations, reproducibility and methodological rigor are prioritized:

  • Strictly enforce input domains:
    • [0,1]64[0,1]^{64} for PS
    • {0,1,2}32×[0.4,1.5]63\{0,1,2\}^{32}\times[0.4,1.5]^{63} for DFS.
  • In peptide design, adhere to the prescribed data splits and evaluation metrics for fair comparison.
  • Protocols emphasize the need to account for possible surrogate overestimation in DFS, and recommend retraining on uniform data subsets for bias mitigation (Akizuki et al., 2 Sep 2025).
  • Optimizer trajectory reporting, including best-so-far performance and random seeds, is recommended for full reproducibility.

7. Impact and Extensions

PepMerge, in both its molecular design and surrogate HPO instantiations, serves as a robust, reproducible benchmark for advancing generative modeling and optimizer development. The biological benchmark supports unbiased comparison of full-atom and surface-based generative architectures, highlighting the importance of explicit molecular surface modeling for high-precision peptide binder design (Wu et al., 8 Jan 2026). Meanwhile, the surrogate formulation enables rapid, computationally lightweight development of novel hyperparameter optimization techniques, especially valuable for expensive LLM merging tasks (Akizuki et al., 2 Sep 2025). A plausible implication is that the methodologies and metric formalism of PepMerge may be broadly applicable for benchmarking generative models and optimizers in other large-scale molecular and neural network design settings.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Multi-Modality Conditional Flow Matching (CFM) Architecture.