Papers
Topics
Authors
Recent
Search
2000 character limit reached

Computing Minimal Absent Words and Extended Bispecial Factors with CDAWG Space

Published 28 Feb 2024 in cs.DS and cs.FL | (2402.18090v3)

Abstract: A string $w$ is said to be a minimal absent word (MAW) for a string $S$ if $w$ does not occur in $S$ and any proper substring of $w$ occurs in $S$. We focus on non-trivial MAWs which are of length at least 2. Finding such non-trivial MAWs for a given string is motivated for applications in bioinformatics and data compression. Fujishige et al. [TCS 2023] proposed a data structure of size $\Theta(n)$ that can output the set $\mathsf{MAW}(S)$ of all MAWs for a given string $S$ of length $n$ in $O(n + |\mathsf{MAW}(S)|)$ time, based on the directed acyclic word graph (DAWG). In this paper, we present a more space efficient data structure based on the compact DAWG (CDAWG), which can output $\mathsf{MAW}(S)$ in $O(|\mathsf{MAW}(S)|)$ time with $O(\mathsf{e}\min)$ space, where $\mathsf{e}\min$ denotes the minimum of the sizes of the CDAWGs for $S$ and for its reversal $SR$. For any strings of length $n$, it holds that $\mathsf{e}\min < 2n$, and for highly repetitive strings $\mathsf{e}\min$ can be sublinear (up to logarithmic) in $n$. We also show that MAWs and their generalization minimal rare words have close relationships with extended bispecial factors, via the CDAWG.

Citations (1)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 0 likes about this paper.