Linear-size Suffix Tries and Linear-size CDAWGs Simplified and Improved
Abstract: The linear-size suffix tries (LSTries) [Crochemore et al., TCS 2016] are a version of suffix trees in which the edge labels are single characters, yet are able to perform pattern matching queries in optimal time. Instead of explicitly storing the input text, LSTries have some extra non-branching internal nodes called type-2 nodes. The extended techniques are then used in the linear-size compact directed acyclic word graphs (LCDAWGs) [Takagi et al., SPIRE 2017], which can be stored with $O(el(T)+er(T))$ space (i.e. without the text), where $el(T)$ and $er(T)$ are the numbers of left- and right-extensions of the maximal repeats in the input text string $T$, respectively. In this paper, we present simpler alternatives to the aforementioned indexing structures, called the simplified LSTries (simLSTries) and the simplified LCDAWGs (simLCDAWGs), in which most of the type-2 nodes are removed. In particular, our simLCDAWGs require only $O(er(T))$ space and work on a weaker model of computation (i.e. the pointer machine model). This contrasts the $O(er(T))$-space CDAWG representation of [Belazzougui and Cunial, SPIRE 2017], which works on the word RAM model.
- D. Belazzougui and F. Cunial. Fast label extraction in the CDAWG. In Proceedings of the 24th International Symposium on String Processing and Information Retrieval, pages 161–175, 2017.
- M. A. Bender and M. Farach-Colton. The LCA problem revisited. In LATIN 2000, volume 1776, pages 88–94, 2000.
- M. A. Bender and M. Farach-Colton. The level ancestor problem simplified. Theor. Comput. Sci., 321(1):5–12, 2004.
- O. Berkman and U. Vishkin. Finding level-ancestors in trees. J. Comput. Syst. Sci., 48(2):214–230, 1994.
- The smallest automation recognizing the subwords of a text. Theoretical Computer Science, 40:31–55, 1985.
- Complete inverted files for efficient text retrieval and analysis. Journal of the ACM, 34(3):578–595, 1987.
- The smallest grammar problem. IEEE Trans. Inf. Theory, 51(7):2554–2576, 2005.
- R. Cole and R. Hariharan. Dynamic LCA queries on trees. SIAM J. Comput., 34(4):894–923, 2005.
- Linear-size suffix tries. Theoretical Computer Science, 638:171–178, 2016.
- Linear-time computation of DAWGs, symmetric indexing structures, and MAWs for integer alphabets. Theoretical Computer Science, 973:114093, 2023.
- Real-time traversal in grammar-based compressed files. In DCC 2005, page 458, 2005.
- Online Algorithms for Constructing Linear-size Suffix Trie. In CPM 2019, pages 30:1–30:19, 2019.
- Linear time online algorithms for constructing linear-size suffix trie. CoRR, abs/2301.04295, 2023.
- On-line construction of compact directed acyclic word graphs. Discrete Applied Mathematics, 146(2):156–179, 2005.
- J. Kärkkäinen. Personal communication, 2017. StringMasters 2017 in Tokyo.
- D. Kempa and N. Prezza. At the roots of dictionary compression: string attractors. In STOC, pages 827–840, 2018.
- Efficient computation of substring equivalence classes with suffix arrays. Algorithmica, 79(2):291–318, 2017.
- G. Navarro. Indexing highly repetitive string collections, part I: repetitiveness measures. ACM Comput. Surv., 54(2):29:1–29:31, 2022.
- J. Radoszewski and W. Rytter. On the structure of compacted subword graphs of Thue-Morse words and their applications. J. Discrete Algorithms, 11:15–24, 2012.
- W. Rytter. Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theor. Comput. Sci., 302(1-3):211–222, 2003.
- W. Rytter. The structure of subword graphs and suffix trees of Fibonacci words. Theor. Comput. Sci., 363(2):211–223, 2006.
- Linear-Size CDAWG: New Repetition-Aware Indexing and Grammar Compression. In SPIRE 2017, pages 304–316, 2017.
- Discovering instances of poetic allusion from anthologies of classical Japanese poems. Theor. Comput. Sci., 292(2):497–524, 2003.
- E. Ukkonen. On-line construction of suffix trees. Algorithmica, 14(3):249–260, 1995.
- P. Weiner. Linear pattern matching algorithms. In Proceedings of the 14th Annual Symposium on Switching and Automata Theory, pages 1–11. IEEE, 1973.
- J. Ziv and A. Lempel. A universal algorithm for sequential data compression. IEEE Transactions on Information Theory, 23(3):337–343, 1977.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.