Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Structure-informed Language Models Are Protein Designers (2302.01649v2)

Published 3 Feb 2023 in cs.LG

Abstract: This paper demonstrates that LLMs are strong structure-based protein designers. We present LM-Design, a generic approach to reprogramming sequence-based protein LLMs (pLMs), that have learned massive sequential evolutionary knowledge from the universe of natural protein sequences, to acquire an immediate capability to design preferable protein sequences for given folds. We conduct a structural surgery on pLMs, where a lightweight structural adapter is implanted into pLMs and endows it with structural awareness. During inference, iterative refinement is performed to effectively optimize the generated protein sequences. Experiments show that LM-Design improves the state-of-the-art results by a large margin, leading to up to 4% to 12% accuracy gains in sequence recovery (e.g., 55.65%/56.63% on CATH 4.2/4.3 single-chain benchmarks, and >60% when designing protein complexes). We provide extensive and in-depth analyses, which verify that LM-Design can (1) indeed leverage both structural and sequential knowledge to accurately handle structurally non-deterministic regions, (2) benefit from scaling data and model size, and (3) generalize to other proteins (e.g., antibodies and de novo proteins)

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Zaixiang Zheng (25 papers)
  2. Yifan Deng (11 papers)
  3. Dongyu Xue (9 papers)
  4. Yi Zhou (438 papers)
  5. Quanquan Gu (198 papers)
  6. Fei Ye (78 papers)
Citations (68)
X Twitter Logo Streamline Icon: https://streamlinehq.com