Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Fast and Small Subsampled R-index (2103.15329v1)

Published 29 Mar 2021 in cs.DS

Abstract: The $r$-index (Gagie et al., JACM 2020) represented a breakthrough in compressed indexing of repetitive text collections, outperforming its alternatives by orders of magnitude. Its space usage, $\mathcal{O}(r)$ where $r$ is the number of runs in the Burrows-Wheeler Transform of the text, is however larger than Lempel-Ziv and grammar-based indexes, and makes it uninteresting in various real-life scenarios of milder repetitiveness. In this paper we introduce the $sr$-index, a variant that limits the space to $\mathcal{O}(\min(r,n/s))$ for a text of length $n$ and a given parameter $s$, at the expense of multiplying by $s$ the time per occurrence reported. The $sr$-index is obtained by carefully subsampling the text positions indexed by the $r$-index, in a way that we prove is still able to support pattern matching with guaranteed performance. Our experiments demonstrate that the $sr$-index sharply outperforms virtually every other compressed index on repetitive texts, both in time and space, even matching the performance of the $r$-index while using 1.5--3.0 times less space. Only some Lempel-Ziv-based indexes achieve better compression than the $sr$-index, using about half the space, but they are an order of magnitude slower.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Dustin Cobas (4 papers)
  2. Travis Gagie (123 papers)
  3. Gonzalo Navarro (121 papers)
Citations (10)

Summary

We haven't generated a summary for this paper yet.