Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Linear-time Computation of DAWGs, Symmetric Indexing Structures, and MAWs for Integer Alphabets (2307.01428v1)

Published 4 Jul 2023 in cs.DS and cs.FL

Abstract: The directed acyclic word graph (DAWG) of a string $y$ of length $n$ is the smallest (partial) DFA which recognizes all suffixes of $y$ with only $O(n)$ nodes and edges. In this paper, we show how to construct the DAWG for the input string $y$ from the suffix tree for $y$, in $O(n)$ time for integer alphabets of polynomial size in $n$. In so doing, we first describe a folklore algorithm which, given the suffix tree for $y$, constructs the DAWG for the reversed string of $y$ in $O(n)$ time. Then, we present our algorithm that builds the DAWG for $y$ in $O(n)$ time for integer alphabets, from the suffix tree for $y$. We also show that a straightforward modification to our DAWG construction algorithm leads to the first $O(n)$-time algorithm for constructing the affix tree of a given string $y$ over an integer alphabet. Affix trees are a text indexing structure supporting bidirectional pattern searches. We then discuss how our constructions can lead to linear-time algorithms for building other text indexing structures, such as linear-size suffix tries and symmetric CDAWGs in linear time in the case of integer alphabets. As a further application to our $O(n)$-time DAWG construction algorithm, we show that the set $\mathsf{MAW}(y)$ of all minimal absent words (MAWs) of $y$ can be computed in optimal, input- and output-sensitive $O(n + |\mathsf{MAW}(y)|)$ time and $O(n)$ working space for integer alphabets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)
  1. doi:10.1016/S0196-6774(03)00087-7.
  2. doi:10.1137/S0097539702402354.
  3. doi:10.1007/s00224-006-1198-x.
  4. doi:10.1007/3-540-48523-6_23.
  5. doi:10.1186/1471-2105-9-167. URL https://doi.org/10.1186/1471-2105-9-167
  6. doi:10.1016/j.ipl.2010.05.008. URL https://doi.org/10.1016/j.ipl.2010.05.008
  7. doi:10.1093/bioinformatics/btv189. URL https://doi.org/10.1093/bioinformatics/btv189
  8. doi:10.1016/j.ic.2018.06.002.
  9. doi:10.1186/s13015-017-0094-z. URL https://doi.org/10.1186/s13015-017-0094-z
  10. doi:10.1093/bioinformatics/btx209. URL https://doi.org/10.1093/bioinformatics/btx209
  11. doi:10.1186/s12859-014-0388-9.
  12. doi:10.1007/978-3-319-32152-3_23.
  13. doi:10.1007/978-3-030-32686-9_11.
  14. doi:10.1007/978-3-030-61792-9_16.
  15. doi:10.2197/ipsjjip.29.1.
  16. doi:10.4230/LIPIcs.CPM.2022.27.
  17. doi:10.4230/LIPIcs.MFCS.2016.38.
  18. arXiv:1302.3347. URL http://arxiv.org/abs/1302.3347
  19. doi:10.1142/S0129626496000054.
  20. arXiv:2301.04295, doi:10.48550/arXiv.2301.04295.
Citations (5)

Summary

We haven't generated a summary for this paper yet.