Contrastive losses as generalized models of global epistasis
Abstract: Fitness functions map large combinatorial spaces of biological sequences to properties of interest. Inferring these multimodal functions from experimental data is a central task in modern protein engineering. Global epistasis models are an effective and physically-grounded class of models for estimating fitness functions from observed data. These models assume that a sparse latent function is transformed by a monotonic nonlinearity to emit measurable fitness. Here we demonstrate that minimizing supervised contrastive loss functions, such as the Bradley-Terry loss, is a simple and flexible technique for extracting the sparse latent function implied by global epistasis. We argue by way of a fitness-epistasis uncertainty principle that the nonlinearities in global epistasis models can produce observed fitness functions that do not admit sparse representations, and thus may be inefficient to learn from observations when using a Mean Squared Error (MSE) loss (a common practice). We show that contrastive losses are able to accurately estimate a ranking function from limited data even in regimes where MSE is ineffective and validate the practical utility of this insight by demonstrating that contrastive loss functions result in consistently improved performance on benchmark tasks.
- Epistatic Net allows the sparse spectral regularization of deep neural networks for inferring fitness functions. Nature Communications, 12(1):5225, 2021.
- Spectral regularization allows data-frugal learning over combinatorial spaces, 2022.
- Combinatorial Genetics Reveals a Scaling Law for the Effects of Mutations on Splicing. Cell, 176(3):549—-563.e23, 2019.
- Idiosyncratic epistasis leads to global fitness–correlated trends. Science, 376(6593):630–635, 2022.
- Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika, 39:324, 1952.
- Conditioning by adaptive sampling for robust design. In Kamalika Chaudhuri and Ruslan Salakhutdinov (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp. 773–782. PMLR, 09-15 Jun 2019.
- On the sparsity of fitness functions and implications for learning. Proceedings of the National Academy of Sciences of the United States of America, 119(1):e2109649118, 2022.
- Deep diversification of an AAV capsid protein by machine learning. Nature Biotechnology, pp. 1–6, 2021.
- Learning to rank using gradient descent. In Proceedings of the 22nd International Conference on Machine Learning, ICML ’05, pp. 89–96, New York, NY, USA, 2005. Association for Computing Machinery.
- MBE: model-based enrichment estimation and prediction for differential sequencing data. Genome Biology, 24(1):218, 2023.
- Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on Information Theory, 52(2):489–509, 2006. doi: 10.1109/TIT.2005.862083.
- Deep extrapolation for attribute-enhanced generation. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan (eds.), Advances in Neural Information Processing Systems, 2021.
- Ranking Measures and Loss Functions in Learning to Rank. In Y Bengio, D Schuurmans, J Lafferty, C Williams, and A Culotta (eds.), Advances in Neural Information Processing Systems, volume 22. Curran Associates, Inc., 2009.
- Learning a Similarity Metric Discriminatively, with Application to Face Verification. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), 1:539–546, 2005. doi: 10.1109/cvpr.2005.202.
- FLIP: Benchmark tasks in fitness landscape inference for proteins. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021.
- Information Theoretic Inequalities. IEEE Transactions on Information Theory, 37(6):1501–1518, 1991.
- Co-evolution of interacting proteins through non-contacting and non-specific mutations. Nature Ecology & Evolution, 6(5):590–603, 2022.
- Master regulators of evolution and the microbiome in higher dimensions, 2020. URL https://arxiv.org/abs/2009.12277.
- Neural networks to learn protein sequence–function relationships from deep mutational scanning data. Proceedings of the National Academy of Sciences, 118(48):e2104878118, 2021.
- Large Margin Rank Boundaries for Ordinal Regression. In Advances in Large Margin Classifiers, chapter 7, pp. 115–132. The MIT Press, 1999.
- Learning protein fitness models from evolutionary and assay-labeled data. Nature Biotechnology, 40(7):1114–1122, 2022.
- Physical Constraints on Epistasis. Molecular Biology and Evolution, 37(10):2865–2874, 2020.
- Meltome atlas—thermal proteome stability across the tree of life. Nature Methods, 17(5):495–503, 2020.
- The NK model of rugged fitness landscapes and its application to maturation of the immune response. Journal of Theoretical Biology, 141(2):211–245, 1989.
- Adam: A method for stochastic optimization. In Yoshua Bengio and Yann LeCun (eds.), 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.
- Global epistasis makes adaptation predictable despite sequence-level stochasticity. Science, 344(6191):1519–1522, 2014.
- Tranception: Protein fitness prediction with autoregressive transformers and inference-time retrieval. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp. 16990–17017. PMLR, 17–23 Jul 2022.
- Jakub Otwinowski. Biophysical Inference of Epistasis and the Effects of Mutations on Protein Stability and Function. Molecular Biology and Evolution, 35(10):2345–2354, 2018.
- Inferring fitness landscapes by regression produces biased estimates of epistasis. Proceedings of the National Academy of Sciences, 111(22):E2301–E2309, 2014.
- Inferring the shape of global epistasis. Proceedings of the National Academy of Sciences, 115(32):E7550–E7558, 2018.
- Learning the pattern of epistasis linking genotype and phenotype in a protein. Nature Communications, 10(1):4213, 2019.
- An experimental assay of the interactions of amino acids from orthologous sequences shaping a complex fitness landscape. PLOS Genetics, 15(4):1–30, 2019.
- Global epistasis emerges from a generic model of a complex trait. Elife, 10:e64740, 2021.
- Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proceedings of the National Academy of Sciences, 118(15):e2016239118, 2021. doi: 10.1073/pnas.2016239118.
- Detecting High-Order Epistasis in Nonlinear Genotype-Phenotype Maps. Genetics, 205(3):1079–1088, 2017.
- Local fitness landscape of the green fluorescent protein. Nature, 533(7603):397–401, 2016. ISSN 1476-4687.
- Peter F. Stadler. Towards a theory of landscapes. In Ramón López-Peña, Henri Waelbroeck, Riccardo Capovilla, Ricardo GarcÃa-Pelayo, and Federico Zertuche (eds.), Complex Systems and Binary Networks, pp. 78–163, Berlin, Heidelberg, 1995. Springer Berlin Heidelberg.
- MAVE-NN: learning genotype-phenotype maps from multiplex assays of variant effect. Genome Biology, 23(1), 2022.
- E. D. Weinberger. Fourier and Taylor series on fitness landscapes. Biological Cybernetics, 65(5):321–330, 1991.
- Adaptation in protein fitness landscapes is facilitated by indirect paths. eLife, 5:e16965, 2016.
- Machine learning-assisted directed protein evolution with combinatorial libraries. Proceedings of the National Academy of Sciences, 116(18):8852–8858, 2019.
- Machine-learning-guided directed evolution for protein engineering. Nature Methods, 16(8):687–694, 2019.
- Minimum epistasis interpolation for sequence-function relationships. Nature Communications, 11(1):1782, 2020.
- Higher-order epistasis and phenotypic prediction. Proceedings of the National Academy of Sciences, 119(39):e2204233119, 2022.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.