Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of Protein (2401.06199v2)

Published 11 Jan 2024 in q-bio.QM, cs.AI, and cs.LG

Abstract: Protein LLMs have shown remarkable success in learning biological information from protein sequences. However, most existing models are limited by either autoencoding or autoregressive pre-training objectives, which makes them struggle to handle protein understanding and generation tasks concurrently. We propose a unified protein LLM, xTrimoPGLM, to address these two types of tasks simultaneously through an innovative pre-training framework. Our key technical contribution is an exploration of the compatibility and the potential for joint optimization of the two types of objectives, which has led to a strategy for training xTrimoPGLM at an unprecedented scale of 100 billion parameters and 1 trillion training tokens. Our extensive experiments reveal that 1) xTrimoPGLM significantly outperforms other advanced baselines in 18 protein understanding benchmarks across four categories. The model also facilitates an atomic-resolution view of protein structures, leading to an advanced 3D structural prediction model that surpasses existing LLM-based tools. 2) xTrimoPGLM not only can generate de novo protein sequences following the principles of natural ones, but also can perform programmable generation after supervised fine-tuning (SFT) on curated sequences. These results highlight the substantial capability and versatility of xTrimoPGLM in understanding and generating protein sequences, contributing to the evolving landscape of foundation models in protein science.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (15)
  1. Bo Chen (309 papers)
  2. Xingyi Cheng (20 papers)
  3. Pan Li (164 papers)
  4. Yangli-ao Geng (10 papers)
  5. Jing Gong (17 papers)
  6. Shen Li (77 papers)
  7. Zhilei Bei (2 papers)
  8. Xu Tan (164 papers)
  9. Boyan Wang (8 papers)
  10. Xin Zeng (20 papers)
  11. Chiming Liu (5 papers)
  12. Aohan Zeng (19 papers)
  13. Yuxiao Dong (119 papers)
  14. Jie Tang (302 papers)
  15. Le Song (140 papers)
Citations (70)

Summary

We haven't generated a summary for this paper yet.