Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On Computing the Smallest Suffixient Set (2407.18753v1)

Published 26 Jul 2024 in cs.DS

Abstract: Let T in \Sigman be a text over alphabet \Sigma. A suffixient set S \subseteq [n] for T is a set of positions such that, for every one-character right-extension T[i,j] of every right-maximal substring T[i,j-1] of T, there exists x in S such that T[i,j] is a suffix of T[1,x]. It was recently shown that, given a suffixient set of cardinality q and an oracle offering fast random access on T (for example, a straight-line program), there is a data structure of O(q) words (on top of the oracle) that can quickly find all Maximal Exact Matches (MEMs) of any query pattern P in T with high probability. The paper introducing suffixient sets left open the problem of computing the smallest such set; in this paper, we solve this problem by describing a simple quadratic-time algorithm, a O(n + \bar r|\Sigma|)-time algorithm running in compressed working space (\bar r is the number of runs in the Burrows-Wheeler transform of T reversed), and an optimal O(n)-time algorithm computing the smallest suffixient set. We present an implementation of our compressed-space algorithm and show experimentally that it uses a small memory footprint on repetitive text collections.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Davide Cenzato (14 papers)
  2. Francisco Olivares (8 papers)
  3. Nicola Prezza (59 papers)

Summary

We haven't generated a summary for this paper yet.