Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Data-Efficient Protein 3D Geometric Pretraining via Refinement of Diffused Protein Structure Decoy (2302.10888v1)

Published 5 Feb 2023 in cs.LG, cs.AI, and q-bio.BM

Abstract: Learning meaningful protein representation is important for a variety of biological downstream tasks such as structure-based drug design. Having witnessed the success of protein sequence pretraining, pretraining for structural data which is more informative has become a promising research topic. However, there are three major challenges facing protein structure pretraining: insufficient sample diversity, physically unrealistic modeling, and the lack of protein-specific pretext tasks. To try to address these challenges, we present the 3D Geometric Pretraining. In this paper, we propose a unified framework for protein pretraining and a 3D geometric-based, data-efficient, and protein-specific pretext task: RefineDiff (Refine the Diffused Protein Structure Decoy). After pretraining our geometric-aware model with this task on limited data(less than 1% of SOTA models), we obtained informative protein representations that can achieve comparable performance for various downstream tasks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Yufei Huang (81 papers)
  2. Lirong Wu (67 papers)
  3. Haitao Lin (63 papers)
  4. Jiangbin Zheng (25 papers)
  5. Ge Wang (214 papers)
  6. Stan Z. Li (223 papers)
Citations (13)

Summary

We haven't generated a summary for this paper yet.