Papers
Topics
Authors
Recent
2000 character limit reached

PepEVOLVE: Dynamic Peptide Optimization

Updated 28 November 2025
  • PepEVOLVE is a position-aware, dynamic peptide optimization framework that enhances macrocyclic peptide lead discovery by addressing combinatorial challenges.
  • It employs dynamic pretraining strategies, including stochastic masking and CHUCKLES shifting, to improve model generalization and avoid overfitting.
  • Integrating a multi-armed bandit router with an evolving reinforcement learning loop using group-relative advantage, PepEVOLVE efficiently navigates multi-objective design spaces.

PepEVOLVE is a position-aware, dynamic peptide optimization framework designed for multi-parameter exploration and optimization of macrocyclic peptides. Addressing the limitations of prior generative approaches, such as the necessity for chemist-specified mutable positions and static optimization protocols, PepEVOLVE employs a novel combination of dynamic pretraining, automatic site selection via a multi-armed bandit router, and evolving optimization through group-relative advantage (GRA) to efficiently navigate the combinatorial and multi-objective challenge of peptide lead discovery (Nguyen et al., 21 Nov 2025).

1. Motivation and Background

The optimization of macrocyclic peptides is challenged by a vast combinatorial design space and nonlinear, multi-parameter objectives (MPOs) encompassing potency, solubility, permeability, and pharmacokinetics. For example, a 12-mer constructed from 4,000 possible monomers yields up to 4,000124{,}000^{12} candidates—a scale beyond the reach of enumerate-and-score methods, which are further constrained by vendor libraries (e.g., “top 20” per position yields %%%%1%%%% sequences). Multi-objective constraints interact nonlinearly, precluding brute-force optimization.

Machine learning-based generative models, including VAEs, GANs, RNNs, diffusion models, transformers, and LLMs, have been applied to propose de novo modifications and, in combination with reinforcement learning (RL), genetic algorithms, or MCTS, to traverse these MPO landscapes. PepINVENT, a key precursor, utilizes CHUCKLES (a SMILES-like tokenizer) and a transformer backbone, but is limited by its static masking (overfitting), chemist-specified edit sites (no automatic discovery), and static input protocols during RL (Nguyen et al., 21 Nov 2025).

PepEVOLVE addresses these deficiencies by introducing (i) dynamic pretraining via stochastic masking and rotational invariance, (ii) an automatic, context-free multi-armed bandit router to discover “where” to edit, and (iii) an evolving RL loop employing GRA to stabilize “how” edits are optimized under MPO constraints.

2. Dynamic Pretraining Strategies

2.1 Dynamic Masking

To circumvent the overfitting inherent in static masking, PepEVOLVE implements dynamic masking in which, for a peptide of length LL, the number of masked positions per epoch is drawn as

nmask=round(T(1,0.4L,0))n_\text{mask} = \mathrm{round}\bigl(T(1,\, 0.4L,\, 0)\bigr)

with %%%%3%%%% a triangular distribution on [a,b][a, b] with mode cc, biasing toward single-site masks. These positions are selected uniformly at random and replaced with a special “?” token. This expands the diversity of the reconstruction task during pretraining.

2.2 CHUCKLES Shifting

To ensure robustness and prevent the model from memorizing absolute token positions, CHUCKLES-shifted augmentations are utilized. For a cyclic peptide with monomers m1m2mLm_1 \mid m_2 \mid \ldots \mid m_L, a random rotation kUniform{1,,L}k \sim \mathrm{Uniform}\{1, \ldots, L\} is applied each epoch: m1m2mLmkmk+1mLm1mk1m_1 \mid m_2 \mid \dots \mid m_L \longrightarrow m_k \mid m_{k+1} \mid \dots \mid m_L \mid m_1 \mid \dots \mid m_{k-1} This procedure treats all rotationally equivalent configurations as identical, enforcing invariance.

2.3 Pretraining Objective

The learning objective is the minimization of negative log-likelihood over the pretraining data: LNLL(p~,p^)=t=1Tlogfθ(p~,p^0:t1)[p^t]\mathcal{L}_\mathrm{NLL}(\widetilde p, \hat p) = -\sum_{t=1}^T \log f_\theta(\widetilde p, \hat p_{0:t-1})[\hat p_t] where p~\widetilde p is the masked source and p^\hat p is the target.

3. Automatic Edit Site Selection via Multi-Armed Bandit Router

The router algorithm formalizes each of the LL peptide positions as an “arm” in a context-free multi-armed bandit. On each episode, it samples a subset I[L]I \subseteq [L], I=K|I| = K, of positions to be masked and edited. For each subset II, the generator ff proposes GG candidates {p^Ig}g=1G\{\hat p^g_I\}_{g=1}^G, each evaluated for scalar reward RIgR^g_I. Rewards per subset are averaged: RˉI=1Gg=1GRIg\bar R_I = \frac{1}{G}\sum_{g=1}^G R^g_I

The router's policy is parameterized by θRL\theta \in \mathbb{R}^L, defining a categorical distribution πθ\pi_\theta over KK-subsets. Policy-gradient updates use the REINFORCE algorithm with entropy regularization: Lrouter=1Bb=1BAIblogπθ(Ib)βH(πθ)\mathcal{L}_\mathrm{router} = -\frac{1}{B}\sum_{b=1}^B A_{I_b}\log \pi_\theta(I_b) - \beta \mathcal{H}(\pi_\theta) where AIb=RˉIbboldA_{I_b} = \bar R_{I_b} - b_\text{old} is the advantage, boldb_\text{old} is a moving-average baseline, β\beta is an annealed entropy coefficient, and H\mathcal{H} denotes Shannon entropy. This process concentrates probability on position subsets that yield higher multi-objective rewards.

4. Evolving Optimization and Group-Relative Advantage

PepEVOLVE employs an evolving optimization architecture that iteratively refines peptide candidates using group-relative advantage, stabilizing RL updates across heterogeneous seed groups.

Given initial seeds {p~j}j=1K\{\tilde p^j\}_{j=1}^K, the process per iteration is as follows:

  1. For each seed jj and context mask, generate GG candidates {p^gj}g=1G\{\hat p^j_g\}_{g=1}^G.
  2. Compute R(p^gj)R(\hat p^j_g).
  3. Calculate within-group statistics:

Rˉj=1GgR(p^gj),σRj=1Gg(R(p^gj)Rˉj)2\bar R^j = \frac{1}{G}\sum_{g} R(\hat p^j_g), \quad \sigma_R^j = \sqrt{\frac{1}{G} \sum_{g} (R(\hat p^j_g) - \bar R^j)^2}

  1. Compute group-relative advantage:

Agj=R(p^gj)RˉjσRj+εA^j_g = \frac{R(\hat p^j_g) - \bar R^j}{\sigma_R^j + \varepsilon}

  1. Update the generator fθf_\theta via loss:

Levolve=1KGj=1Kg=1GAgjlogfθ(p^gjp~j)\mathcal{L}_\mathrm{evolve} = -\frac{1}{K\,G}\sum_{j=1}^K\sum_{g=1}^G A^j_g\, \log f_\theta(\hat p^j_g \mid \tilde p^j)

  1. Aggregate all generated peptides, rescore, and select the top KK as seeds for the next round.

This approach normalizes reward signals within each seed group, preventing high-variance updates from reward scale heterogeneity and promoting improvements relative to each group’s context.

5. Benchmarking and Comparative Results

PepEVOLVE was evaluated on a therapeutically relevant Rev-binding macrocycle (RBP) lead, derived from YPAASYR and engineered for head-to-tail cyclization. MPOs included:

  • Permeability (SpermS_\text{perm}), weight 3
  • Ring size constraint (SringS_\text{ring}), weight 1
  • Lipophilicity (SlipS_\text{lip}), target 4.0\sim-4.0, weight 1
  • SMARTS-based alerts (SSMARTSS_\text{SMARTS}), weight 1

The composite score is defined as the weighted geometric mean: Score=(Sperm3×Sring×Slip×SSMARTS)1/6\mathrm{Score} = \left(S_\mathrm{perm}^3 \times S_\mathrm{ring} \times S_\mathrm{lip} \times S_\mathrm{SMARTS}\right)^{1/6}

Key benchmarking metrics and outcomes are summarized:

Configuration Mean Score Best Score Steps to >0.8 Unique Peptides >0.9
PepINVENT ≈0.60 0.87 ≈800 0
SS (self, single) ≈0.82 0.95 ≈150 45
SM (self, multi) ≈0.80 0.95 ≈200 40
NS (neighbor, single) ≈0.79 0.93 ≈180 80
NM (neighbor, multi) ≈0.77 0.92 ≈220 70

PepEVOLVE achieves higher mean and best scores and converges substantially faster (≈200 vs. 1000 steps) than PepINVENT. The NS configuration generates the largest set of unique high-scoring peptides, while SS converges fastest. SM balances yield and quality; NM, while trailing, still outperforms PepINVENT.

Router ablations confirm that the policy reliably learns chemically meaningful sites regardless of reward direction (e.g., high-donor or aromatic positions for hydrogen-bond donor/logP objectives), with position selection adapting under objective inversion.

6. Implementation Specifications

PepEVOLVE utilizes a transformer encoder–decoder of equivalent complexity to PepINVENT (e.g., 12 layers, hidden dimension 512, 8 attention heads). Pretraining uses 900k training and 50k validation peptides of length 6–18, with \sim30% non-canonical amino acids (NCAAs) and a mix of linear (40%) and macrocyclic (60%) configurations (including head-to-tail, sidechain-to-tail, disulfide cyclization).

Key hyperparameters:

  • Masking: Triangular T(1,0.4L,0)T(1, 0.4L, 0) distribution; dynamic resampling and CHUCKLES shift per epoch
  • Router: Batch B=32B=32, subset size K=1K=1 or 2, candidates G=16G=16, baseline smoothing λ=0.9\lambda=0.9, entropy coefficient β\beta annealed from 0.1 to 0.01
  • Evolving: Seeds K=16K=16, candidates G=8G=8, 4 context types, 250 steps (1000 calls)
  • Compute: Pretraining on 4×A100 GPUs (∼3 days); router and evolving on 2×A100 GPUs (∼24 h per benchmark)

7. Limitations and Prospects

PepEVOLVE’s context-free router currently lacks direct conditioning on sequence or 3D structure, and surrogate objectives such as solubility proxies are omitted. The use of GRA introduces the risk of mode collapse from over-normalization. Future developments may address these issues by incorporating structure-aware routers, integrating 3D predictors into reward functions, enabling finer multi-objective trade-off control, and expanding experiments across broader peptide target sets (Nguyen et al., 21 Nov 2025).

PepEVOLVE eliminates the requirement for static, hand-specified mutation sites and manual input selection, offering a reproducible, efficient approach for lead peptide optimization, especially when edit sites are a priori unknown.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to PepEVOLVE.