Median and Small Parsimony Problems on RNA trees
Abstract: Motivation: Non-coding RNAs (ncRNAs) express their functions by adopting molecular structures. Specifically, RNA secondary structures serve as a relatively stable intermediate step before tertiary structures, offering a reliable signature of molecular function. Consequently, within an RNA functional family, secondary structures are generally more evolutionarily conserved than sequences. Conversely, homologous RNA families grouped within an RNA clan share ancestors but typically exhibit structural differences. Inferring the evolution of RNA structures within RNA families and clans is crucial for gaining insights into functional adaptations over time and providing clues about the Ancient RNA World Hypothesis. Results: We introduce the median problem and the small parsimony problem for ncRNA families, where secondary structures are represented as leaf-labelled trees. We utilize the Robinson-Foulds (RF) tree distance, which corresponds to a specific edit distance between RNA trees, and a new metric called the Internal-Leafset (IL) distance. While the RF tree distance compares sets of leaves descending from internal nodes of two RNA trees, the IL distance compares the collection of leaf-children of internal nodes. The latter is better at capturing differences in structural elements of RNAs than the RF distance, which is more focused on base pairs. We also consider a more general tree edit distance that allows the mapping of base pairs that are not perfectly aligned. We study the theoretical complexity of the median problem and the small parsimony problem under the three distance metrics and various biologically-relevant constraints, and we present polynomial-time maximum parsimony algorithms for solving some versions of the problems. Our algorithms are applied to ncRNA families from the RFAM database, illustrating their practical utility
- Trees, stars, and multiple biological sequence alignment. SIAM Journal on Applied Mathematics, 49(1), 197–209.
- Constructing an rna world. Trends in Genetics, 15(12), M9–M13.
- The median procedure for n-trees. Journal of Classification, 3, 329–334.
- Computational reconstruction of ancestral dna sequences. Phylogenomics, pages 171–184.
- Evolutionary triplet models of structured rna. PLoS Computational Biology, 5(8), e1000483.
- A statistical sampling algorithm for rna secondary structure prediction. Nucleic acids research, 31(24), 7280–7301.
- Scj: a breakpoint-like distance that simplifies several rearrangement problems. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 8(5), 1318–1329.
- Fitch, W. M. (1971). Toward defining the course of evolution: minimum change for a specific tree topology. Systematic Zoology, 20(4), 406–416. Citation Key: Fitch1971.
- A large version of the small parsimony problem. In Algorithms in Bioinformatics: Third International Workshop, WABI 2003, Budapest, Hungary, September 15-20, 2003. Proceedings 3, pages 417–432. Springer.
- Strategies for measuring evolutionary conservation of rna secondary structures. BMC bioinformatics, 9, 1–19.
- Hartigan, J. A. (1973). Minimum mutation fits to a given tree. Biometrics, pages 53–65.
- The rna world: molecular cooperation at the origins of life. Nature Reviews Genetics, 16(1), 7–17.
- Local similarity in rna secondary structures. In Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003, pages 159–168. IEEE.
- Holmes, I. (2004). A probabilistic model for the evolution of rna structure. BMC Bioinformatics, 5(1), 166.
- An efficient algorithm for finding a maximum weight 2-independent set on interval graphs. Information Processing Letters, 43(5), 229–235.
- Rfam 14: expanded coverage of metagenomic, viral and microrna families. Nucleic Acids Research, 49(D1), D192–D200.
- Tree graphs of rna secondary structures and their comparisons. Computers and Biomedical Research, 22(5), 461–473.
- Clustering rfam 10.1: Clans, families, and classes. Genes, 3(33), 378–390.
- Viennarna package 2.0. Algorithms for molecular biology, 6, 1–14.
- Mattick, J. S. (2005). The functional genomics of noncoding rna. Science, 309(5740), 1527–1528.
- Local similarity between quotiented ordered trees. Journal of Discrete Algorithms, 5(1), 23–35.
- Comparison of phylogenetic trees. Mathematical biosciences, 53(1-2), 131–147.
- Sankoff, D. (1975). Minimal mutation trees of sequences. SIAM Journal on Applied Mathematics, 28(1), 35–42. Citation Key: Sankoff1975.
- Frequency of insertion-deletion, transversion, and transition in the evolution of 5s ribosomal rna. Journal of Molecular Evolution, 7, 133–149.
- Introduction to rna secondary structure comparison. RNA Sequence, Structure, and Function: Computational and Bioinformatic Methods, pages 247–273.
- The reconstruction of ancestral character states. Evolution, 50(2), 504–511.
- Phylogenetics, volume 24. Oxford University Press on Demand.
- Comparing multiple rna secondary structures using tree comparisons. Bioinformatics, 6(4), 309–318.
- An evolutionary model for maximum likelihood alignment of dna sequences. Journal of Molecular Evolution, 33, 114–124.
- Reconstruction of ancestral rna sequences under multiple structural constraints. BMC genomics, 17(10), 175–186.
- Nndb: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure. Nucleic acids research, 38(suppl_1), D280–D282.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.