On Computing the Smallest Suffixient Set (2407.18753v1)

Published 26 Jul 2024 in cs.DS

Abstract: Let T in \Sigmaⁿ be a text over alphabet \Sigma. A suffixient set S \subseteq [n] for T is a set of positions such that, for every one-character right-extension T[i,j] of every right-maximal substring T[i,j-1] of T, there exists x in S such that T[i,j] is a suffix of T[1,x]. It was recently shown that, given a suffixient set of cardinality q and an oracle offering fast random access on T (for example, a straight-line program), there is a data structure of O(q) words (on top of the oracle) that can quickly find all Maximal Exact Matches (MEMs) of any query pattern P in T with high probability. The paper introducing suffixient sets left open the problem of computing the smallest such set; in this paper, we solve this problem by describing a simple quadratic-time algorithm, a O(n + \bar r|\Sigma|)-time algorithm running in compressed working space (\bar r is the number of runs in the Burrows-Wheeler transform of T reversed), and an optimal O(n)-time algorithm computing the smallest suffixient set. We present an implementation of our compressed-space algorithm and show experimentally that it uses a small memory footprint on repetitive text collections.

Authors (3)

Davide Cenzato (14 papers)
Francisco Olivares (8 papers)
Nicola Prezza (59 papers)

Summary

We haven't generated a summary for this paper yet.

Summarize Now

On Computing the Smallest Suffixient Set (2407.18753v1)

Summary

Related Papers