Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
117 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DiffSDS: A language diffusion model for protein backbone inpainting under geometric conditions and constraints (2301.09642v1)

Published 22 Jan 2023 in q-bio.QM, cs.AI, and cs.LG

Abstract: Have you ever been troubled by the complexity and computational cost of SE(3) protein structure modeling and been amazed by the simplicity and power of LLMing? Recent work has shown promise in simplifying protein structures as sequences of protein angles; therefore, LLMs could be used for unconstrained protein backbone generation. Unfortunately, such simplification is unsuitable for the constrained protein inpainting problem, where the model needs to recover masked structures conditioned on unmasked ones, as it dramatically increases the computing cost of geometric constraints. To overcome this dilemma, we suggest inserting a hidden \textbf{a}tomic \textbf{d}irection \textbf{s}pace (\textbf{ADS}) upon the LLM, converting invariant backbone angles into equivalent direction vectors and preserving the simplicity, called Seq2Direct encoder ($\text{Enc}{s2d}$). Geometric constraints could be efficiently imposed on the newly introduced direction space. A Direct2Seq decoder ($\text{Dec}{d2s}$) with mathematical guarantees is also introduced to develop a \textbf{SDS} ($\text{Enc}{s2d}$+$\text{Dec}{d2s}$) model. We apply the SDS model as the denoising neural network during the conditional diffusion process, resulting in a constrained generative model--\textbf{DiffSDS}. Extensive experiments show that the plug-and-play ADS could transform the LLM into a strong structural model without loss of simplicity. More importantly, the proposed DiffSDS outperforms previous strong baselines by a large margin on the task of protein inpainting.

Citations (17)

Summary

We haven't generated a summary for this paper yet.