Comparative performance of SGPO methods under low-throughput wet-lab constraints

Determine how existing Steered Generation for Protein Optimization methods—including classifier guidance and posterior sampling for diffusion models as well as reinforcement learning fine-tuning of protein language models—perform and compare to each other in real-world protein optimization campaigns where fitness is measured via low-throughput wet-lab assays with only hundreds of labeled sequence–fitness pairs, in order to establish practical best practices for protein fitness optimization with limited experimental feedback.

Background

The paper introduces Steered Generation for Protein Optimization (SGPO) as a framework that leverages generative priors over natural protein sequences and experimental fitness labels to optimize protein properties. Prior studies often rely on surrogate rewards or large labeled datasets, leaving uncertainty about how steering strategies and models perform when constrained by realistic wet-lab throughput.

This problem matters because many real-world protein engineering campaigns must operate with only hundreds of labeled sequence–fitness pairs per round, and practical guidance on which generative priors and steering methods are most effective under such constraints is essential for efficient optimization.

References

However, by and large, past studies have optimized surrogate rewards and/or utilized large amounts of labeled data for steering, making it unclear how well existing methods perform and compare to each other in real-world optimization campaigns where fitness is measured by low-throughput wet-lab assays.

— Steering Generative Models with Experimental Data for Protein Fitness Optimization (2505.15093 - Yang et al., 21 May 2025) in Abstract

Comparative performance of SGPO methods under low-throughput wet-lab constraints

Background

References

Related Problems