Sufficiency of sequence information for improving backbone generation in hybrid co-generation models

Determine whether integrating explicit sequence-generation processes into backbone structure generation within hybrid sequence-structure co-generation models, such as DiffAb, MultiFlow, and CarbonNovo, is sufficient to produce more plausible backbone structures than backbone-only methods like RFdiffusion, as measured by designability and diversity metrics.

Background

Hybrid co-generation models combine discrete sequence generation (via diffusion or flow matching) with continuous backbone generation in a shared neural network, aiming to co-generate both modalities. Examples include DiffAb and MultiFlow, which add a discrete sequence process to a backbone generation model, and CarbonNovo, which interleaves backbone diffusion with a structure-conditioned Markov random field for sequence.

Despite this integration, the authors note these approaches require paired sequence-structure data and do not directly generate side-chain atom positions. Crucially, they report that current hybrid methods have not demonstrated substantial gains in designability or diversity compared to backbone-only approaches like RFdiffusion, raising the question of whether adding sequence information is actually sufficient to improve backbone plausibility.

References

Furthermore, it is unclear whether adding sequence information to backbone generation methods is sufficient to generate more plausible backbone structures, as these methods do not demonstrate substantial gains in designability and diversity over RFdiffusion.

— Towards deep learning sequence-structure co-generation for protein design (2410.01773 - Wang et al., 2 Oct 2024) in Section 3.3 Hybrid co-generation models

Sufficiency of sequence information for improving backbone generation in hybrid co-generation models

Sponsor

Background

References

Related Problems