Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Scalable and Interpretable Identification of Minimal Undesignable RNA Structure Motifs with Rotational Invariance (2402.17206v2)

Published 27 Feb 2024 in cs.DS

Abstract: RNA design aims to find a sequence that folds with highest probability into a designated target structure. However, certain structures are undesignable, meaning no sequence can fold into the target structure under the default (Turner) RNA folding model. Understanding the specific local structures (i.e., "motifs") that contribute to undesignability is crucial for refining RNA folding models and determining the limits of RNA designability. Despite its importance, this problem has received very little attention, and previous efforts are neither scalable nor interpretable. We develop a new theoretical framework for motif (un-)designability, and design scalable and interpretable algorithms to identify minimal undesignable motifs within a given RNA secondary structure. Our approach establishes motif undesignability by searching for rival motifs, rather than exhaustively enumerating all (partial) sequences that could potentially fold into the motif. Furthermore, we exploit rotational invariance in RNA structures to detect, group, and reuse equivalent motifs and to construct a database of unique minimal undesignable motifs. To achieve that, we propose a loop-pair graph representation for motifs and a recursive graph isomorphism algorithm for motif equivalence. Our algorithms successfully identify 24 unique minimal undesignable motifs among 18 undesignable puzzles from the Eterna100 benchmark. Surprisingly, we also find over 350 unique minimal undesignable motifs and 663 undesignable native structures in the ArchiveII dataset, drawn from a diverse set of RNA families. Our source code is available at https://github.com/shanry/RNA-Undesign and our web server is available at http://linearfold.org/motifs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. Aguirre-Hernández, R. et al. (2007). Computational RNA secondary structure design: empirical complexity and improved methods. BMC bioinformatics, 8(1), 1–16.
  2. Anderson-Lee, J. et al. (2016). Principles for predicting RNA secondary structure design difficulty. Journal of molecular biology, 428(5), 748–757.
  3. Andronescu, M. et al. (2010). Computational approaches for rna energy parameter estimation. RNA, 16(12), 2304–2318.
  4. Bellaousov, S. et al. (2018). Accelerated RNA secondary structure design using preselected sequences for helices and loops. RNA, 24(11), 1555–1567.
  5. Bonnet, É. et al. (2020). Designing RNA secondary structures is hard. Journal of Computational Biology, 27(3), 302–316.
  6. Cannone, J. J. et al. (2002). The Comparative RNA Web (CRW) Site: An Online Database of Comparative Sequence and Structure Information for Ribosomal, Intron, and Other RNAs. BioMed Central Bioinformatics, 3(2).
  7. The chemical repertoire of natural ribozymes. Nature, 418(6894), 222–228.
  8. Garcia-Martin, J. A. et al. (2013). RNAiFOLD: a constraint programming algorithm for RNA inverse folding and molecular design. Journal of bioinformatics and computational biology, 11(02), 1350001.
  9. Haleš, J. et al. (2015). Combinatorial RNA design: designability and structure-approximating algorithm. In Combinatorial Pattern Matching: 26th Annual Symposium, CPM 2015, Ischia Island, Italy, June 29–July 1, 2015, Proceedings, pages 231–246. Springer.
  10. Koodli, R. V. et al. (2021). Redesigning the EteRNA100 for the Vienna 2 folding engine. BioRxiv, pages 2021–08.
  11. Portela, F. (2018). An unexpectedly effective Monte Carlo technique for the RNA inverse folding problem. BioRxiv, page 345587.
  12. Rnastructure: software for rna secondary structure prediction and analysis. BMC bioinformatics, 11(1), 1–9.
  13. Rivas, E. et al. (2012). A range of complex probabilistic models for rna secondary structure prediction that includes the nearest-neighbor model and more. RNA, 18(2), 193–212.
  14. NNDB: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure. Nucleic Acids Research, 38(suppl_1), D280–D282.
  15. Ward, M. et al. (2019). Determining parameters for non-linear models of multi-loop free energy change. Bioinformatics, 35(21), 4298–4306.
  16. Ward, M. et al. (2022). Fitness Functions for RNA Structure Design. bioRxiv.
  17. Wayment-Steele, H. K. et al. (2022). Rna secondary structure packages evaluated and improved by high-throughput experiments. Nature Methods, 19(10), 1234–1242.
  18. Yao, H.-T. (2021). Local decomposition in RNA structural design. Ph.D. thesis, McGill University (Canada).
  19. Yao, H.-T. et al. (2019). Exponentially few RNA structures are designable. In Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, pages 289–298.
  20. Zadeh, J. N. et al. (2010). Nucleic Acid Sequence Design via Efficient Ensemble Defect Optimization. Journal of Computational Chemistry, 32(3), 439–452.
  21. Zhou, T. et al. (2023). RNA design via structure-aware multifrontier ensemble optimization. Bioinformatics, 39(Supplement_1), i563–i571.
  22. Zhou, T. et al. (2024). Undesignable RNA Structure Identification via Rival Structure Generation and Structure Decomposition. To appear in Proceedings of RECOMB 2024.
  23. Estimating uncertainty in predicted folding free energy changes of rna secondary structures. RNA, 25(6), 747–754.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com