Protein Language Models and Structure Prediction: Connection and Progression (2211.16742v1)

Published 30 Nov 2022 in q-bio.QM, cs.AI, and cs.LG

Abstract: The prediction of protein structures from sequences is an important task for function prediction, drug design, and related biological processes understanding. Recent advances have proved the power of LLMs (LMs) in processing the protein sequence databases, which inherit the advantages of attention networks and capture useful information in learning representations for proteins. The past two years have witnessed remarkable success in tertiary protein structure prediction (PSP), including evolution-based and single-sequence-based PSP. It seems that instead of using energy-based models and sampling procedures, protein LLM (pLM)-based pipelines have emerged as mainstream paradigms in PSP. Despite the fruitful progress, the PSP community needs a systematic and up-to-date survey to help bridge the gap between LMs in the NLP and PSP domains and introduce their methodologies, advancements and practical applications. To this end, in this paper, we first introduce the similarities between protein and human languages that allow LMs extended to pLMs, and applied to protein databases. Then, we systematically review recent advances in LMs and pLMs from the perspectives of network architectures, pre-training strategies, applications, and commonly-used protein databases. Next, different types of methods for PSP are discussed, particularly how the pLM-based architectures function in the process of protein folding. Finally, we identify challenges faced by the PSP community and foresee promising research directions along with the advances of pLMs. This survey aims to be a hands-on guide for researchers to understand PSP methods, develop pLMs and tackle challenging problems in this field for practical purposes.

Authors (7)

Bozhen Hu (16 papers)
Jun Xia (76 papers)
Jiangbin Zheng (25 papers)
Cheng Tan (140 papers)
Yufei Huang (81 papers)
Yongjie Xu (12 papers)
Stan Z. Li (222 papers)

Citations (30)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Protein Language Models and Structure Prediction: Connection and Progression (2211.16742v1)

Summary

Related Papers