Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

VQPL: Vector Quantized Protein Language (2310.04985v1)

Published 8 Oct 2023 in cs.CE

Abstract: Is there a foreign language describing protein sequences and structures simultaneously? Protein structures, represented by continuous 3D points, have long posed a challenge due to the contrasting modeling paradigms of discrete sequences. To represent protein sequence-structure as discrete symbols, we propose a VQProteinformer to project residue types and structures into a discrete space, supervised by a reconstruction loss to ensure information preservation. The sequential latent codes of residues introduce a new quantized protein language, transforming the protein sequence-structure into a unified modality. We demonstrate the potential of the created protein language on predictive and generative tasks, which may not only advance protein research but also establish a connection between the protein-related and NLP-related fields. The proposed method will be continually improved to unify more protein modalities, including text and point cloud.

Citations (7)

Summary

We haven't generated a summary for this paper yet.