Papers
Topics
Authors
Recent
Search
2000 character limit reached

Space-Efficient Online Computation of String Net Occurrences

Published 19 Nov 2024 in cs.DS | (2411.12160v1)

Abstract: A substring $u$ of a string $T$ is said to be a repeat if $u$ occurs at least twice in $T$. An occurrence $[i..j]$ of a repeat $u$ in $T$ is said to be a net occurrence if each of the substrings $aub = T[i-1..j+1]$, $au = T[i-1..j+1]$, and $ub = T[i..j+1]$ occurs exactly once in $T$. The occurrence $[i-1..j+1]$ of $aub$ is said to be an extended net occurrence of $u$. Let $T$ be an input string of length $n$ over an alphabet of size $\sigma$, and let $\mathsf{ENO}(T)$ denote the set of extended net occurrences of repeats in $T$. Guo et al. [SPIRE 2024] presented an online algorithm which can report $\mathsf{ENO}(T[1..i])$ in $T[1..i]$ in $O(n\sigma2)$ time, for each prefix $T[1..i]$ of $T$. Very recently, Inenaga [arXiv 2024] gave a faster online algorithm that can report $\mathsf{ENO}(T[1..i])$ in optimal $O(#\mathsf{ENO}(T[1..i]))$ time for each prefix $T[1..i]$ of $T$, where $#S$ denotes the cardinality of a set $S$. Both of the aforementioned data structures can be maintained in $O(n \log \sigma)$ time and occupy $O(n)$ space, where the $O(n)$-space requirement comes from the suffix tree data structure. In this paper, we propose the two following space-efficient alternatives: (1) A sliding-window algorithm of $O(d)$ working space that can report $\mathsf{ENO}(T[i-d+1..i])$ in optimal $O(#\mathsf{ENO}(T[i-d+1..i]))$ time for each sliding window $T[i-d+1..i]$ of size $d$ in $T$. (2) A CDAWG-based online algorithm of $O(e)$ working space that can report $\mathsf{ENO}(T[1..i])$ in optimal $O(#\mathsf{ENO}(T[1..i]))$ time for each prefix $T[1..i]$ of $T$, where $e < 2n$ is the number of edges in the CDAWG for $T$. All of our proposed data structures can be maintained in $O(n \log \sigma)$ time for the input online string $T$. We also discuss that the extended net occurrences of repeats in $T$ can be fully characterized in terms of the minimal unique substrings (MUSs) in $T$.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.