2000 character limit reached
Rethinking Performance Measures of RNA Secondary Structure Problems (2401.05351v1)
Published 4 Dec 2023 in q-bio.BM and cs.LG
Abstract: Accurate RNA secondary structure prediction is vital for understanding cellular regulation and disease mechanisms. Deep learning (DL) methods have surpassed traditional algorithms by predicting complex features like pseudoknots and multi-interacting base pairs. However, traditional distance measures can hardly deal with such tertiary interactions and the currently used evaluation measures (F1 score, MCC) have limitations. We propose the Weisfeiler-Lehman graph kernel (WL) as an alternative metric. Embracing graph-based metrics like WL enables fair and accurate evaluation of RNA structure prediction algorithms. Further, WL provides informative guidance, as demonstrated in an RNA design experiment.
- Transcriptome sequencing across a prostate cancer cohort identifies pcat-1, an unannotated lincrna implicated in disease progression. Nature Biotechnology, 29(8):742–749, 2011.
- Long noncoding rna snhg1 promotes neuroinflammation in parkinson’s disease via regulating mir-7/nlrp3 pathway. Neuroscience, 388:118 – 127, 2018. ISSN 0306-4522.
- Rna motifs and combinatorial prediction of interactions, stability and localization of noncoding rnas. Nature Structural & Molecular Biology, 25:1070–1076, 2018.
- Designing rna secondary structures is hard. Journal of Computational Biology, 27(3):302–316, 2020.
- How rna folds. Journal of molecular biology, 293(2):271–281, 1999.
- Rna secondary structure prediction using an ensemble of two-dimensional deep neural networks and transfer learning. Nature communications, 10(1):1–13, 2019.
- A new method of rna secondary structure prediction based on convolutional neural network and dynamic programming. Frontiers in genetics, 10:467, 2019.
- Learning to fold rnas in linear time. bioRxiv, page 852871, 2019.
- Improved rna secondary structure and tertiary base-pairing prediction using evolutionary profile, mutational coupling and two-dimensional transfer learning. Bioinformatics, 37, 2021.
- Rna secondary structure prediction with convolutional neural networks. BMC bioinformatics, 23(1):58, 2022.
- Rna secondary structure packages evaluated and improved by high-throughput experiments. Nature Methods, 19(10):1234–1242, 2022.
- Rtfold: Rna secondary structure prediction using deep learning with domain inductive bias.
- Probabilistic transformer: Modelling ambiguities and distributions for rna folding and molecule design. Advances in Neural Information Processing Systems, 35:26856–26873, 2022.
- Interpretable rna foundation model from unannotated data for highly accurate rna structure and function predictions. arXiv preprint arXiv:2204.00300, 2022.
- Redfold: accurate rna secondary structure prediction using residual encoder-decoder network. BMC bioinformatics, 24(1):1–13, 2023.
- Scalable deep learning for rna secondary structure prediction. arXiv preprint arXiv:2307.10073, 2023.
- Pseudoknots: Rna structures with diverse functions. PLoS biology, 3(6):e213, 2005.
- Simple fast algorithms for the editing distance between trees and related problems. SIAM journal on computing, 18(6):1245–1262, 1989.
- Philip N Klein. Computing the edit-distance between unrooted ordered trees. In European Symposium on Algorithms, pages 91–102. Springer, 1998.
- An optimal decomposition algorithm for tree edit distance. ACM Transactions on Algorithms (TALG), 6(1):1–19, 2009.
- Local similarity in rna secondary structures. In Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003, pages 159–168. IEEE, 2003.
- An algebraic language for rna pseudoknots comparison. BMC bioinformatics, 20(4):1–18, 2019.
- Helix formation by guanylic acid. Proceedings of the National Academy of Sciences, 48(12):2013–2018, 1962.
- Brian W Matthews. Comparison of the predicted and observed secondary structure of t4 phage lysozyme. Biochimica et Biophysica Acta (BBA)-Protein Structure, 405(2):442–451, 1975.
- The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics, 21:1–13, 2020.
- A caged uridine for the selective preparation of an rna fold and determination of its refolding kinetics by real-time nmr. ChemBioChem, 7(3):417–420, 2006.
- David H Mathews. How to benchmark rna secondary structure prediction accuracy. Methods, 162:60–67, 2019.
- Proton nuclear magnetic resonance studies on bulge-containing dna oligonucleotides from a mutational hot-spot sequence. Biochemistry, 26(3):904–912, 1987.
- Weisfeiler-lehman graph kernels. Journal of Machine Learning Research, 12(9), 2011.
- Efficient randomized pattern-matching algorithms. IBM journal of research and development, 31(2):249–260, 1987.
- Fast Folding and Comparison of RNA Secondary Structures. Monatshefte fuer Chemie/Chemical Monthly, 125:167–188, 02 1994.
- Contrafold: Rna secondary structure prediction without physics-based models. Bioinformatics, 22(14):e90–e98, 2006.
- Rnastructure: software for rna secondary structure prediction and analysis. BMC bioinformatics, 11(1):1–9, 2010.
- Ipknot: fast and accurate prediction of rna secondary structures with pseudoknots using integer programming. Bioinformatics, 27(13):i85–i93, 2011.
- The rna shapes studio. Bioinformatics, 31(3):423–425, 2015.
- Linearfold: linear-time approximate rna folding by 5’-to-3’dynamic programming and beam search. Bioinformatics, 35(14):i295–i304, 2019.
- Protein data bank: the single global archive for 3d macromolecular structure data. Nucleic acids research, 47(D1):D520–D528, 2019.
- RNAcentral Consortium. RNAcentral 2021: secondary structure integration, improved sequence search and new member databases. Nucleic Acids Research, 49(D1):D212–D220, 10 2020. ISSN 0305-1048. doi: 10.1093/nar/gkaa921.
- A bulge structure in hiv-1 tar rna is required for tat binding and tat-mediated trans-activation. Genes & development, 4(8):1365–1373, 1990.
- The apical loop of the hiv-1 tar rna hairpin is stabilized by a cross-loop base pair. Journal of Biological Chemistry, 278(40):38892–38901, 2003.
- Exosomes derived from hiv-1-infected cells promote growth and progression of cancer via hiv tar rna. Nature communications, 9(1):4585, 2018.
- A novel higher-order weisfeiler-lehman graph convolution. In Asian Conference on Machine Learning, pages 49–64. PMLR, 2020.
- Towards automated design of riboswitches. arXiv preprint arXiv:2307.08801, 2023.
- Learning to design RNA. In International Conference on Learning Representations, 2019.
- Redesigning the eterna100 for the vienna 2 folding engine. bioRxiv, pages 2021–08, 2021.
- Principles for predicting RNA secondary structure design difficulty. Journal of molecular biology, 428(5):748–757, 2016.
- Structural and energetic features of base–base stacking contacts in rna. Journal of Chemical Information and Modeling, 63(2):655–669, 2023.
- Functional complexity and regulation through rna dynamics. Nature, 482(7385):322–330, 2012.
- The roles of structural dynamics in the cellular functions of rnas. Nature reviews Molecular cell biology, 20(8):474–489, 2019.
- De novo design of a synthetic riboswitch that regulates transcription termination . Nucleic Acids Research, 41(4):2541–2551, 12 2012. ISSN 0305-1048.