Validity and extrapolation of fitness oracles beyond the training distribution
Ascertain whether the supervised fitness oracle—implemented as an ensemble of neural network regressors trained on lower-order mutational data—accurately captures the true protein fitness landscape and extrapolates reliably to sequences with many mutations relative to the parent sequence, particularly when used to evaluate variants far from the original training distribution in the TrpB and CreiLOV design spaces.
References
It is unclear if the oracle captures the true nature of the protein fitness landscape or extrapolates well to sequences with many mutations relative to the original fitness dataset from which the oracle was trained.
— Steering Generative Models with Experimental Data for Protein Fitness Optimization
(2505.15093 - Yang et al., 21 May 2025) in Section: Protein fitness optimization task – Comparison to existing protein engineering methods