Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

RNAFlow: RNA Structure & Sequence Design via Inverse Folding-Based Flow Matching (2405.18768v2)

Published 29 May 2024 in q-bio.BM and cs.LG

Abstract: The growing significance of RNA engineering in diverse biological applications has spurred interest in developing AI methods for structure-based RNA design. While diffusion models have excelled in protein design, adapting them for RNA presents new challenges due to RNA's conformational flexibility and the computational cost of fine-tuning large structure prediction models. To this end, we propose RNAFlow, a flow matching model for protein-conditioned RNA sequence-structure design. Its denoising network integrates an RNA inverse folding model and a pre-trained RosettaFold2NA network for generation of RNA sequences and structures. The integration of inverse folding in the structure denoising process allows us to simplify training by fixing the structure prediction network. We further enhance the inverse folding model by conditioning it on inferred conformational ensembles to model dynamic RNA conformations. Evaluation on protein-conditioned RNA structure and sequence generation tasks demonstrates RNAFlow's advantage over existing RNA design methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Rnasolo: a repository of cleaned pdb-derived rna 3d structures. Bioinformatics, 38(14):3668–3670, 2022.
  2. Accurate prediction of protein structures and interactions using a three-track neural network. Science, 373(6557):871–876, 2021.
  3. Accurate prediction of protein–nucleic acid complexes using rosettafoldna. Nature Methods, pp.  1–5, 2023.
  4. Se (3)-stochastic flow matching for protein backbone generation. arXiv preprint arXiv:2310.02391, 2023.
  5. Methods and applications of in silico aptamer design and modeling. International Journal of Molecular Sciences, 21(22):8420, 2020.
  6. Engineering synthetic rna devices for cell control. Nature Reviews Genetics, 23(4):215–228, 2022.
  7. Specialized dynamical properties of promiscuous residues revealed by simulated conformational ensembles. Journal of chemical theory and computation, 9(11):5127–5147, 2013.
  8. Cd-hit: accelerated for clustering the next-generation sequencing data. Bioinformatics, 28(23):3150–3152, 2012.
  9. The roles of structural dynamics in the cellular functions of rnas. Nature reviews Molecular cell biology, 20(8):474–489, 2019.
  10. Gold, L. Selex: How it happened and where it will go. Journal of Molecular Evolution, 81(5-6):140–143, 2015.
  11. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  12. A generative model for constructing nucleic acid sequences binding to a protein. BMC genomics, 20(13):1–13, 2019.
  13. Generative models for graph-based protein design. Advances in neural information processing systems, 32, 2019.
  14. Illuminating protein space with a programmable generative model. Nature, pp.  1–9, 2023.
  15. Generative aptamer discovery using raptgen. Nature Computational Science, 2(6):378–386, 2022.
  16. Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144, 2016.
  17. Iterative refinement graph neural network for antibody sequence-structure co-design. arXiv preprint arXiv:2110.04624, 2021.
  18. Learning from protein structure with geometric vector perceptrons. arXiv preprint arXiv:2009.01411, 2020.
  19. Equivariant graph neural networks for 3d macromolecular structure. arXiv preprint arXiv:2106.03843, 2021.
  20. Alphafold meets flow matching for generating protein ensembles. In NeurIPS 2023 AI for Science Workshop, 2023.
  21. Multi-state rna design with geometric multi-graph neural networks. In ICML 2023 Workshop on Computation Biology, 2023.
  22. Kabsch, W. A solution for the best rotation to relate two sets of vectors. Acta Crystallographica Section A: Crystal Physics, Diffraction, Theoretical and General Crystallography, 32(5):922–923, 1976.
  23. Aptamers as therapeutics. Nature reviews Drug discovery, 9(7):537–550, 2010.
  24. Computational generation and screening of rna motifs in large nucleotide sequence pools. Nucleic acids research, 38(13):e139–e139, 2010.
  25. Equivariant flow matching. arXiv preprint arXiv:2306.15030, 2023.
  26. Protein–rna interactions: structural characteristics and hotspot amino acids. Rna, 24(11):1457–1465, 2018.
  27. Lennarz, S. M. I. RNA aptamers as selective protein kinase inhibitors. PhD thesis, Universitäts-und Landesbibliothek Bonn, 2015.
  28. Flow matching for generative modeling. arXiv preprint arXiv:2210.02747, 2022.
  29. Forging the basis for developing protein–ligand interaction scoring functions. Accounts of chemical research, 50(2):302–309, 2017.
  30. An rna molecule that specifically inhibits g-protein-coupled receptor kinase 2 in vitro. Rna, 14(3):524–534, 2008.
  31. Towards joint sequence-structure generation of nucleic acid and protein complexes with se (3)-discrete diffusion. arXiv preprint arXiv:2401.06151, 2023.
  32. Rna structure drives interaction with proteins. Nature communications, 10(1):3246, 2019.
  33. Harmonic prior self-conditioned flow matching for multi-ligand docking and binding site design. In NeurIPS 2023 Generative AI and Biology (GenBio) Workshop, 2023.
  34. Hierarchical data-efficient representation learning for tertiary structure-based rna design, 2023.
  35. Fourier features let networks learn high frequency functions in low dimensional domains. Advances in Neural Information Processing Systems, 33:7537–7547, 2020.
  36. Molecular mechanism for inhibition of g protein-coupled receptor kinase 2 by a selective rna aptamer. Structure, 20(8):1300–1309, 2012.
  37. Rna engineering for public health: Innovations in rna-based diagnostics and therapeutics. Annual review of chemical and biomolecular engineering, 12:263–286, 2021.
  38. Automated design of protein-binding riboswitches for sensing human biomarkers in a cell-free expression system. Nature communications, 14(1):2416, 2023.
  39. Evaluating and learning from rna pseudotorsional space: quantitative validation of a reduced representation for rna structure. Journal of molecular biology, 372(4):942–957, 2007.
  40. De novo design of protein structure and function with rfdiffusion. Nature, 620(7976):1089–1100, 2023.
  41. Computational design of three-dimensional rna structure and function. Nature nanotechnology, 14(9):866–873, 2019.
  42. Fast protein backbone generation with se (3) flow matching. arXiv preprint arXiv:2310.05297, 2023a.
  43. Se (3) diffusion model with application to protein backbone generation. arXiv preprint arXiv:2302.02277, 2023b.
  44. Searching the sequence space for potent aptamers using selex in silico. Journal of chemical theory and computation, 11(12):5939–5946, 2015.
Citations (1)

Summary

  • The paper introduces RNAFlow, which integrates inverse folding and flow matching to enable efficient, protein-conditioned RNA design without expensive fine-tuning.
  • It employs a noise-to-sequence geometric graph neural network and a fixed structure prediction model (RF2NA) to generate accurate RNA conformations.
  • Experimental results show superior performance in structure prediction and sequence recovery compared to state-of-the-art diffusion and sequence-only models.

Overview of RNAFlow: RNA Structure and Sequence Design via Inverse Folding-Based Flow Matching

The paper introduces RNAFlow, an innovative approach to RNA sequence and structure design, leveraging inverse folding-based flow matching. This approach is tailored to overcome the unique challenges posed by RNA's structural flexibility and the computational demands of large-scale model fine-tuning. At the heart of RNAFlow is a denoising network combining an RNA inverse folding model and the pre-trained RosettaFold2NA network (RF2NA). This integration simplifies training by keeping the structure prediction network fixed and enhances inverse folding by conditioning on inferred conformational ensembles, thus modeling RNA's dynamic conformations effectively.

Context and Motivation

RNA engineering has enormous significance in biological systems, with applications ranging from synthetic riboswitch sensors to protein-targeting aptamers. The current experimental methods for high-throughput RNA selection are labor-intensive and time-consuming, necessitating the development of efficient deep learning models for targeted RNA design. While diffusion models have advanced protein design, their direct application to RNA design is hampered by RNA's inherent flexibility and the associated computational complexity. RNAFlow offers a novel solution by focusing on protein-conditioned RNA sequence-structure design using flow matching, a less computationally intensive alternative to diffusion models.

Methodology

RNAFlow employs a conditional flow-matching framework, where the RNA sequence prediction is guided by the dynamics and structure of protein-RNA complexes. The core of RNAFlow is its Noise-to-Seq model, a geometric graph neural network enhanced by the integration of pre-trained RF2NA. The innovation lies in its ability to predict RNA sequences by processing noisy RNA backbone structures and leveraging protein-induced conditions.

The network is trained through flow matching, employing an efficient interpolation strategy to handle the conformational variances in RNA structures. RNAFlow iterates between structure prediction using RF2NA and sequence generation through inverse folding, enabling simultaneous sequence and structure design without the need for expensive model fine-tuning. Additionally, an advanced RNAFlow-Traj variant conditions its sequence predictions on multiple RNA structure conformations derived from flow matching trajectories.

Experimental Results and Analysis

RNAFlow demonstrates a noticeable improvement over state-of-the-art methods in RNA design, outperforming diffusion-based and sequence-only models in both structure prediction (measured by RMSD and lDDT) and sequence recovery tasks. Two significant experimental setups underscore its efficacy: a performance comparison against the baseline models and a test on designing RNA aptamers for G-protein-coupled receptor kinase 2 (GRK2).

In established datasets, RNAFlow excels in generating protein-conditioned RNA structures and sequences, presenting superior metrics for recovery and structural alignment compared to rivals like MMDiff and LSTMs. The addition of the output rescoring model further enhances RNAFlow's capability to select optimal RNA designs, indicating a robust balance between novelty and fidelity in the generated samples.

Implications and Future Work

RNAFlow's development paves the way for more efficient and scalable RNA therapeutic design by demonstrating effective computational modeling of RNA dynamics and protein interactions. The model's capacity for high sequence and structure accuracy hints at its potential for broad applications in biotechnology and medicine, particularly in drug discovery and synthetic biology.

However, challenges remain, particularly in handling RNAs that exhibit extensive conformational diversity. Future research could focus on enhancing protein-RNA folding predictions to better capture full-atom interactions, thereby enabling more nuanced design of functionally intricate RNAs such as ribozymes and potentially expanding RNAFlow's versatility and application scope. The integration with advanced simulations or parallel architectures may also provide avenues to further refine both the computational efficiency and the structural accuracy of RNA designs.