Residue-Level Designability Preference Optimization
- ResiDPO is a fine-tuning methodology that optimizes protein and antibody sequences at the residue level using per-residue confidence signals like AlphaFold pLDDT and Rosetta energy scores.
- It decomposes the optimization into Residue-level Preference Learning and Constraint Learning, ensuring targeted improvements while preserving reliable model outputs.
- ResiDPO has achieved 2-3x higher success rates in de novo enzyme and antibody design benchmarks compared to previous sequence-level approaches.
Residue-level Designability Preference Optimization (ResiDPO) refers to a class of fine-tuning methodologies that directly optimize the probability of generating protein or antibody sequences whose structures are highly designable—i.e., predicted stably to fold into user-specified backbones—while rigorously preserving reliable regions of the model’s output. The central innovation lies in decomposing the optimization target and loss into residuewise preference and constraint components, leveraging structural or energetic confidence metrics as per-residue reward signals. ResiDPO extends sequence-level Direct Preference Optimization (DPO) protocols by introducing residue-level granularity, which allows the learning process to target only those positions in a sequence structure pair where there is a true local improvement in model confidence (as measured by, e.g., AlphaFold pLDDT or Rosetta energy terms), while softly anchoring well-performing positions to the pretrained reference distribution. This approach has achieved substantial improvements in de novo enzyme and antibody design benchmarks, improving design success rates by factors of two to three over prior baselines (Xue et al., 30 May 2025, Zhou et al., 2024).
1. Theoretical Framework and Loss Decomposition
ResiDPO is formalized for a protein backbone of length and an amino-acid sequence , starting from a pretrained “reference” policy (e.g., LigandMPNN or a diffusion model) and optimizing a fine-tuned policy . The method constructs a dataset of preference triplets , in which (the “winning” sequence) exhibits higher designability (e.g., pLDDT score or lower energy) than for the same backbone.
The key innovation is the splitting of the loss into two residue-level objectives:
- Residue‐level Preference Learning (RPL): Encourages to upweight the probability of 0 over 1 for positions 2 where the winner sequence achieves a statistically significant improvement in structural confidence.
- Residue‐level Constraint Learning (RCL): Penalizes 3 for deviating from 4 at positions where the reference model is already highly confident, thereby preventing degradation of reliable outputs (catastrophic forgetting).
For each paired structure-sequence triplet, sets 5 and 6 index residues for RPL and RCL terms:
- 7,
- 8,
where 9 are hyperparameters. The ResiDPO loss is: 0 with
1
2
where 3 regulates the constraint strength (Xue et al., 30 May 2025).
2. Construction of Per-Residue Preference Signals
ResiDPO utilizes position-wise structural or energetic proxies to define localized preference and anchoring targets:
- For protein design, the method uses AlphaFold2’s per-residue pLDDT scores obtained without MSAs or templates. For a sequence and backbone 4, pLDDT5 is produced for each residue 6. Residues with improvement 7 are identified as those where the change in sequence yields meaningfully better folding confidence (Xue et al., 30 May 2025).
- For antibody design, the method decomposes Rosetta all-atom energy scores at the per-residue level, partitioning attractive (“non-repulsive”) and repulsive interaction energies to separate structure quality from binding functionality. The global reward is a weighted sum of these local energies (Zhou et al., 2024).
A critical aspect is the direct, per-residue mapping of physical score improvements to learning signals, in contrast to sequence-level aggregation in earlier DPO variants.
3. Decoupled Residue-wise Optimization Procedure
ResiDPO decouples the learning updates at the residue level, ensuring that improvement updates (RPL) and constraint regularization (RCL) act only on relevant positions. This is operationalized through the following sequence for each minibatch:
- Run structure prediction (e.g., AlphaFold2) or energy calculation for both 8 and 9 on the same backbone 0.
- Compute 1 and 2 sets for RPL and RCL, respectively.
- Accumulate RPL loss by comparing log probabilities for residues in 3.
- Accumulate RCL loss by residuewise KL divergence for 4.
- Backpropagate the total loss and update 5, typically using Adam optimization.
When 6 is empty (i.e., no local improvements), global sequence-wise DPO is applied, subsuming the original DPO protocol as a special case. This residuewise decoupling prevents deleterious global updates that overwrite reliable subsequences, focusing optimization strictly where benefit is observed (Xue et al., 30 May 2025, Zhou et al., 2024).
4. Methodological Extensions and Generalizations
The residue-level formulation generalizes and improves upon original DPO frameworks, replacing sequencewise odds and Kullback–Leibler constraints with per-position operations. For antibody design, an analogue is realized via residuewise energy decomposition and preference optimization in a conditional diffusion model equipped with SE(3)-equivariant neural networks. Gradient surgery techniques such as PCGrad are employed to address conflicting gradients among multiple energy objectives (e.g., van der Waals repulsion versus hydrogen bond attraction), ensuring simultaneous convergence of all physically relevant terms (Zhou et al., 2024). Ablation studies confirm that per-residue preference granularity yields faster and more reliable optimization than whole-sequence or CDR-level supervision.
5. Dataset Curation and Training Protocol
ResiDPO training requires curated, backbone-matched sequence pairs with per-residue structural or energetic annotations:
- Protein design: The PDB-D dataset is constructed as a monomeric X-ray PDB subset filtered for high-resolution (<3.5 Å), with clustering to eliminate redundancies (train: 19,203 structures; validation: 1,690). For each backbone, LigandMPNN is used to sample eight sequences at temperature 7; AlphaFold2 produces per-residue pLDDT profiles. Preference pairs 8 are defined via relative sampling with a global pLDDT differential threshold (9), yielding 09,557 usable pairs for learning (Xue et al., 30 May 2025).
- Antibody design: The dataset is drawn from SAbDab, clustered at 40% CDR-H3 sequence identity (train/val: 1786/193), with 55 held-out RAbD complexes for evaluation. Pairwise energies are computed using the Rosetta energy function for all CDR-H3 sequence/backbone pairs (Zhou et al., 2024).
Training (e.g., for EnhancedMPNN) utilizes Adam optimizer with a conservative learning rate schedule (1 over 100k iterations), batch size and accumulation tailored to available computational resources, and fixed hyperparameters (2, 3, 4, 5 for protein design). The antibody diffusion model is trained on a similar multi-stage protocol with extensive structural features and message passing (Xue et al., 30 May 2025, Zhou et al., 2024).
6. Quantitative Impact and Benchmarks
ResiDPO demonstrates substantial empirical gains across design tasks:
| Model/Method | Enzyme Success Rate | Binder Success Rate | RAbD Antibody Success Rate |
|---|---|---|---|
| Baseline LigandMPNN | 6.56% | 7.07% | N/A |
| DPO-tuned LigandMPNN | ~9% | 10.40% | N/A |
| EnhancedMPNN (ResiDPO) | 17.57% | 16.07% | N/A |
| DiffAb (antibody baseline) | N/A | N/A | 14.5% |
| AbDPO (antibody ResiDPO) | N/A | N/A | 27.3% |
On enzyme benchmarks covering five EC classes and RFDiffusion2 backbones, ResiDPO triples the in silico design success rate relative to baseline and outperforms sequence-level DPO. Similarly, in binder and antibody design (RAbD benchmark), residuewise DPO yields significant improvements in energetic metrics and functional success rates compared to previous generative methods. Ablation studies confirm the advantages of full residue-level preference optimization and the necessity of decomposing energy terms and gradient management to avoid model degeneracies (Xue et al., 30 May 2025, Zhou et al., 2024).
7. Practical Implementation and Pseudocode
A prototypical ResiDPO learning loop operates as follows: 6 This procedure guarantees that learning is strictly localized to uncertain or improvable regions, and preserves global model priors (Xue et al., 30 May 2025).
8. Context, Significance, and Related Approaches
Residue-level Designability Preference Optimization represents a methodological advance over sequence-level preference optimization by aligning optimization granularity with the spatial and energetic locality intrinsic to protein and antibody folding and binding. The decoupled, targeted learning protocol mitigates catastrophic forgetting, enhances interpretability of marginal improvements, and yields higher success rates in stringent design tasks. Analogous residue-level DPO schemes in generative diffusion models for antibody design—augmented with energy decompositions and techniques like PCGrad—demonstrate generality across molecular design modalities (Zhou et al., 2024).
By leveraging per-residue confidence and energy proxies as explicit learning signals, ResiDPO establishes a template for further innovations in structure-guided macromolecular generative modeling, with practical implications for enzyme engineering and therapeutic antibody design pipelines.