Structure-Informed Protein Language Model (2402.05856v1)

Published 7 Feb 2024 in q-bio.BM and cs.LG

Abstract: Protein LLMs are a powerful tool for learning protein representations through pre-training on vast protein sequence datasets. However, traditional protein LLMs lack explicit structural supervision, despite its relevance to protein function. To address this issue, we introduce the integration of remote homology detection to distill structural information into protein LLMs without requiring explicit protein structures as input. We evaluate the impact of this structure-informed training on downstream protein function prediction tasks. Experimental results reveal consistent improvements in function annotation accuracy for EC number and GO term prediction. Performance on mutant datasets, however, varies based on the relationship between targeted properties and protein structures. This underscores the importance of considering this relationship when applying structure-aware training to protein function prediction tasks. Code and model weights are available at https://github.com/DeepGraphLearning/esm-s.

Citations (2)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/Oxer22/status/1759397898952073352

https://twitter.com/seclink/status/1761651177345687721

Structure-Informed Protein Language Model (2402.05856v1)

Summary

Related Papers

Tweets