RNAFlow: RNA Structure & Sequence Design via Inverse Folding-Based Flow Matching (2405.18768v2)
Abstract: The growing significance of RNA engineering in diverse biological applications has spurred interest in developing AI methods for structure-based RNA design. While diffusion models have excelled in protein design, adapting them for RNA presents new challenges due to RNA's conformational flexibility and the computational cost of fine-tuning large structure prediction models. To this end, we propose RNAFlow, a flow matching model for protein-conditioned RNA sequence-structure design. Its denoising network integrates an RNA inverse folding model and a pre-trained RosettaFold2NA network for generation of RNA sequences and structures. The integration of inverse folding in the structure denoising process allows us to simplify training by fixing the structure prediction network. We further enhance the inverse folding model by conditioning it on inferred conformational ensembles to model dynamic RNA conformations. Evaluation on protein-conditioned RNA structure and sequence generation tasks demonstrates RNAFlow's advantage over existing RNA design methods.
- Rnasolo: a repository of cleaned pdb-derived rna 3d structures. Bioinformatics, 38(14):3668–3670, 2022.
- Accurate prediction of protein structures and interactions using a three-track neural network. Science, 373(6557):871–876, 2021.
- Accurate prediction of protein–nucleic acid complexes using rosettafoldna. Nature Methods, pp. 1–5, 2023.
- Se (3)-stochastic flow matching for protein backbone generation. arXiv preprint arXiv:2310.02391, 2023.
- Methods and applications of in silico aptamer design and modeling. International Journal of Molecular Sciences, 21(22):8420, 2020.
- Engineering synthetic rna devices for cell control. Nature Reviews Genetics, 23(4):215–228, 2022.
- Specialized dynamical properties of promiscuous residues revealed by simulated conformational ensembles. Journal of chemical theory and computation, 9(11):5127–5147, 2013.
- Cd-hit: accelerated for clustering the next-generation sequencing data. Bioinformatics, 28(23):3150–3152, 2012.
- The roles of structural dynamics in the cellular functions of rnas. Nature reviews Molecular cell biology, 20(8):474–489, 2019.
- Gold, L. Selex: How it happened and where it will go. Journal of Molecular Evolution, 81(5-6):140–143, 2015.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- A generative model for constructing nucleic acid sequences binding to a protein. BMC genomics, 20(13):1–13, 2019.
- Generative models for graph-based protein design. Advances in neural information processing systems, 32, 2019.
- Illuminating protein space with a programmable generative model. Nature, pp. 1–9, 2023.
- Generative aptamer discovery using raptgen. Nature Computational Science, 2(6):378–386, 2022.
- Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144, 2016.
- Iterative refinement graph neural network for antibody sequence-structure co-design. arXiv preprint arXiv:2110.04624, 2021.
- Learning from protein structure with geometric vector perceptrons. arXiv preprint arXiv:2009.01411, 2020.
- Equivariant graph neural networks for 3d macromolecular structure. arXiv preprint arXiv:2106.03843, 2021.
- Alphafold meets flow matching for generating protein ensembles. In NeurIPS 2023 AI for Science Workshop, 2023.
- Multi-state rna design with geometric multi-graph neural networks. In ICML 2023 Workshop on Computation Biology, 2023.
- Kabsch, W. A solution for the best rotation to relate two sets of vectors. Acta Crystallographica Section A: Crystal Physics, Diffraction, Theoretical and General Crystallography, 32(5):922–923, 1976.
- Aptamers as therapeutics. Nature reviews Drug discovery, 9(7):537–550, 2010.
- Computational generation and screening of rna motifs in large nucleotide sequence pools. Nucleic acids research, 38(13):e139–e139, 2010.
- Equivariant flow matching. arXiv preprint arXiv:2306.15030, 2023.
- Protein–rna interactions: structural characteristics and hotspot amino acids. Rna, 24(11):1457–1465, 2018.
- Lennarz, S. M. I. RNA aptamers as selective protein kinase inhibitors. PhD thesis, Universitäts-und Landesbibliothek Bonn, 2015.
- Flow matching for generative modeling. arXiv preprint arXiv:2210.02747, 2022.
- Forging the basis for developing protein–ligand interaction scoring functions. Accounts of chemical research, 50(2):302–309, 2017.
- An rna molecule that specifically inhibits g-protein-coupled receptor kinase 2 in vitro. Rna, 14(3):524–534, 2008.
- Towards joint sequence-structure generation of nucleic acid and protein complexes with se (3)-discrete diffusion. arXiv preprint arXiv:2401.06151, 2023.
- Rna structure drives interaction with proteins. Nature communications, 10(1):3246, 2019.
- Harmonic prior self-conditioned flow matching for multi-ligand docking and binding site design. In NeurIPS 2023 Generative AI and Biology (GenBio) Workshop, 2023.
- Hierarchical data-efficient representation learning for tertiary structure-based rna design, 2023.
- Fourier features let networks learn high frequency functions in low dimensional domains. Advances in Neural Information Processing Systems, 33:7537–7547, 2020.
- Molecular mechanism for inhibition of g protein-coupled receptor kinase 2 by a selective rna aptamer. Structure, 20(8):1300–1309, 2012.
- Rna engineering for public health: Innovations in rna-based diagnostics and therapeutics. Annual review of chemical and biomolecular engineering, 12:263–286, 2021.
- Automated design of protein-binding riboswitches for sensing human biomarkers in a cell-free expression system. Nature communications, 14(1):2416, 2023.
- Evaluating and learning from rna pseudotorsional space: quantitative validation of a reduced representation for rna structure. Journal of molecular biology, 372(4):942–957, 2007.
- De novo design of protein structure and function with rfdiffusion. Nature, 620(7976):1089–1100, 2023.
- Computational design of three-dimensional rna structure and function. Nature nanotechnology, 14(9):866–873, 2019.
- Fast protein backbone generation with se (3) flow matching. arXiv preprint arXiv:2310.05297, 2023a.
- Se (3) diffusion model with application to protein backbone generation. arXiv preprint arXiv:2302.02277, 2023b.
- Searching the sequence space for potent aptamers using selex in silico. Journal of chemical theory and computation, 11(12):5939–5946, 2015.