Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Efficient Computation of Sequence Mappability (1807.11702v3)

Published 31 Jul 2018 in cs.DS

Abstract: In the $(k,m)$-mappability problem, for a given sequence $T$ of length $n$, the goal is to compute a table whose $i$th entry is the number of indices $j \ne i$ such that the length-$m$ substrings of $T$ starting at positions $i$ and $j$ have at most $k$ mismatches. Previous works on this problem focused on heuristics computing a rough approximation of the result or on the case of $k=1$. We present several efficient algorithms for the general case of the problem. Our main result is an algorithm that, for $k=\mathcal{O}(1)$, works in $\mathcal{O}(n)$ space and, with high probability, in $\mathcal{O}(n \cdot \min{mk,\logk n})$ time. Our algorithm requires a careful adaptation of the $k$-errata trees of Cole et al. [STOC 2004] to avoid multiple counting of pairs of substrings. Our technique can also be applied to solve the all-pairs Hamming distance problem introduced by Crochemore et al. [WABI 2017]. We further develop $\mathcal{O}(n2)$-time algorithms to compute all $(k,m)$-mappability tables for a fixed $m$ and all $k\in {0,\ldots,m}$ or a fixed $k$ and all $m\in{k,\ldots,n}$. Finally, we show that, for $k,m = \Theta(\log n)$, the $(k,m)$-mappability problem cannot be solved in strongly subquadratic time unless the Strong Exponential Time Hypothesis fails. This is an improved and extended version of a paper that was presented at SPIRE 2018.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Panagiotis Charalampopoulos (34 papers)
  2. Costas S. Iliopoulos (24 papers)
  3. Tomasz Kociumaka (97 papers)
  4. Solon P. Pissis (52 papers)
  5. Jakub Radoszewski (52 papers)
  6. Juliusz StraszyƄski (9 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.