Papers
Topics
Authors
Recent
Search
2000 character limit reached

Machine learning-guided directed evolution for protein engineering

Published 27 Nov 2018 in q-bio.BM | (1811.10775v2)

Abstract: Machine learning (ML)-guided directed evolution is a new paradigm for biological design that enables optimization of complex functions. ML methods use data to predict how sequence maps to function without requiring a detailed model of the underlying physics or biological pathways. To demonstrate ML-guided directed evolution, we introduce the steps required to build ML sequence-function models and use them to guide engineering, making recommendations at each stage. This review covers basic concepts relevant to using ML for protein engineering as well as the current literature and applications of this new engineering paradigm. ML methods accelerate directed evolution by learning from information contained in all measured variants and using that information to select sequences that are likely to be improved. We then provide two case studies that demonstrate the ML-guided directed evolution process. We also look to future opportunities where ML will enable discovery of new protein functions and uncover the relationship between protein sequence and function.

Citations (22)

Summary

  • The paper leverages machine learning to streamline directed evolution by predicting sequence-function relationships and reducing exhaustive sequence searches.
  • The paper details building tailored sequence-function models using diverse ML algorithms and nuanced protein sequence representations.
  • The paper demonstrates enhanced enzyme productivity and improved protein thermostability through ML-driven optimization of experimental iterations.

Overview of "Machine Learning-Guided Directed Evolution for Protein Engineering"

The paper "Machine Learning-Guided Directed Evolution for Protein Engineering" by Yang, Wu, and Arnold explores a novel paradigm in biological design that utilizes ML methodologies to enhance the directed evolution of proteins. By leveraging data-driven predictions of sequence-function relationships, the authors propose a refined approach to protein engineering that bypasses the need for exhaustive searches in vast sequence spaces where functional proteins are exceedingly rare.

Directed Evolution and Its Limitations

Directed evolution, inspired by natural evolutionary processes, has historically been employed to iteratively improve protein functions through diversification and selection protocols. These traditional methods, while effective, are constrained by the low throughput of screening techniques and the overwhelming scale of possible protein sequences, which render comprehensive searches impracticable. Furthermore, directed evolution typically neglects the data from unimproved sequences, potentially losing valuable information.

Machine Learning Augmentation

The incorporation of ML into the directed evolution process offers a transformative advantage. By constructing models that learn from the experimental data, including both improved and unimproved variants, ML can predict promising mutations and sequences, thereby guiding the exploration of sequence space more efficiently. The paper discusses two key steps in this approach: (i) building sequence-function models using machine learning, and (ii) employing these models to optimize sequence selection for subsequent experimental iterations.

Building a Sequence-Function Model

The effective application of ML in protein engineering begins with discerning robust models that can accurately attribute functional outputs to protein sequences. Choices span from simple linear models and decision trees to more sophisticated algorithms like random forests, kernel methods, Gaussian processes, and deep learning techniques. This decision hinges on factors such as data quantity, model interpretability needs, and computational resources. The paper underscores the importance of appropriate model training, leveraging techniques such as cross-validation for model evaluation and hyperparameter tuning.

Sequence Representation

The success of ML models is also contingent upon how protein sequences are represented. The paper expounds on various vectorization methods, from one-hot encodings, which offer a rudimentary yet effective baseline, to more nuanced representations that incorporate evolutionary or structural information. The optimal choice of representation is context-dependent, often requiring empirical validation to maximize model precision.

Application and Implications

The use of ML-guided directed evolution is exemplified through detailed case studies. For instance, the researchers highlight a campaign that utilized partial least squares regression to significantly enhance enzyme productivity. Another study demonstrated the use of Gaussian processes and Bayesian optimization to efficiently explore protein thermostability, stressing the efficacy of ML in tackling challenging engineering problems where high-throughput screening is infeasible.

The implications of these advancements are profound, indicating potential shifts in the efficiency and scope of protein engineering activities. The integration of ML not only improves functional optimization but also augurs possibilities for discovering novel protein functions by exploring hitherto inaccessible regions of sequence space.

Future Prospects

Looking forward, the paper anticipates that ML, particularly generative models trained on vast databases of unlabeled protein sequences, could revolutionize the synthesis of novel proteins with bespoke functions. These models could infer the distribution of functional proteins, overcoming the limitations of current mutagenesis techniques and facilitating the design of entirely new proteins.

In conclusion, "Machine Learning-Guided Directed Evolution for Protein Engineering" makes strong numerical claims regarding the enhanced capabilities offered by ML in protein engineering, providing a compelling case for its adoption. As computational power and data accessibility continue to improve, this approach holds promise for further refinement, potentially broadening the horizon of biotechnological applications where protein engineering plays a pivotal role.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.