Topological Sequence Analysis of Genomes: Delta Complex approaches (2507.05452v1)
Abstract: Algebraic topology has been widely applied to point cloud data to capture geometric shapes and topological structures. However, its application to genome sequence analysis remains rare. In this work, we propose topological sequence analysis (TSA) techniques by constructing $\Delta$-complexes and classifying spaces, leading to persistent homology, and persistent path homology on genome sequences. We also develop $\Delta$-complex-based persistent Laplacians to facilitate the topological spectral analysis of genome sequences. Finally, we demonstrate the utility of the proposed TSA approaches in phylogenetic analysis using Ebola virus sequences and whole bacterial genomes. The present TSA methods are more efficient than earlier TSA model, k-mer topology, and thus have a potential to be applied to other time-consuming sequential data analyses, such as those in linguistics, literature, music, media, and social contexts.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.