Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Nearly Optimal Internal Dictionary Matching (2312.11873v3)

Published 19 Dec 2023 in cs.DS

Abstract: We study the internal dictionary matching (IDM) problem where a dictionary $\mathcal{D}$ containing $d$ substrings of a text $T$ is given, and each query concerns the occurrences of patterns in $\mathcal{D}$ in another substring of $T$. We propose a novel $O(n)$-sized data structure named Basic Substring Structure (BASS) where $n$ is the length of the text $T.$ With BASS, we are able to handle all types of queries in the IDM problem in nearly optimal query and preprocessing time. Specifically, our results include: $\bullet$ The first algorithm that answers the CountDistinct query in $\tilde{O}(1)$ time with $\tilde{O}(n+d)$ preprocessing, where we need to compute the number of distinct patterns that exist in $T[l,r]$. Previously, the best result was $\tilde{O}(m)$ time per query after $\tilde{O}(n2/m+d)$ or $\tilde{O}(nd/m+d)$ preprocessing, where $m$ is a chosen parameter. $\bullet$ Faster algorithms for two other types of internal queries. We improve the runtime for (1) Occurrence counting (Count) queries to $O(\log n/\log\log n)$ time per query with $O(n+d\sqrt{\log n})$ preprocessing from $O(\log2 n/\log\log n)$ time per query with $O(n\log n/\log \log n+d\log{3/2} n)$ preprocessing. (2) Distinct pattern reporting (ReportDistinct) queries to $O(1+|\text{output}|)$ time per query from $O(\log n+|\text{output}|)$ per query. In addition, we match the optimal runtime in the remaining two types of queries, pattern existence (Exists), and occurrence reporting (Report). We also show that BASS is more generally applicable to other internal query problems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. Alfred V Aho and Margaret J Corasick “Efficient string matching: an aid to bibliographic search” In Communications of the ACM 18.6 ACM New York, NY, USA, 1975, pp. 333–340
  2. “Dynamic dictionary matching” In Journal of Computer and System Sciences 49.2 Elsevier, 1994, pp. 208–222
  3. “Improved dynamic dictionary matching” In Information and Computation 119.2 Elsevier, 1995, pp. 258–282
  4. “Linear size finite automata for the set of all subwords of a word - an outline of results” In Bulletin of the EATCS 21, 1983, pp. 12–20
  5. Hideo Bannai, Shunsuke Inenaga and Dominik Köppl “Computing all distinct squares in linear time for integer alphabets” In arXiv preprint arXiv:1610.03421, 2016
  6. “Weighted ancestors in suffix trees revisited” In arXiv preprint arXiv:2103.00462, 2021
  7. “Fast approximate string matching in a dictionary” In Proceedings. String Processing and Information Retrieval: A South American Symposium (Cat. No. 98EX207), 1998, pp. 14–22 IEEE
  8. Leonid Boytsov “Indexing methods for approximate dictionary searching: Comparative analysis” In Journal of Experimental Algorithmics (JEA) 16 ACM New York, NY, USA, 2011, pp. 1–1
  9. “A practical index for approximate dictionary matching with few mismatches” In arXiv preprint arXiv:1501.04948, 2015
  10. Maxime Crochemore, Christophe Hancart and Thierry Lecroq “Algorithms on strings” Cambridge University Press, 2007
  11. “Dynamic dictionary matching and compressed suffix trees” In Proceedings of the sixteenth annual ACM-SIAM symposium on discrete algorithms, 2005 Society for IndustrialApplied Mathematics.
  12. “Extracting powers and periods in a word from its runs structure” In Theoretical Computer Science 521 Elsevier, 2014, pp. 29–41
  13. “Counting distinct patterns in internal dictionary matching” In arXiv preprint arXiv:2005.05681, 2020
  14. “Internal dictionary matching” In Algorithmica 83.7 Springer, 2021, pp. 2142–2169
  15. AFW COULSON “Algorithms on Strings, Trees and Sequences by Dan Gusfield, Cambridge University Press 1997, ISBN 0 521 58519 8, 534+ xviii pages.” In Genetics Research 71.1 Cambridge University Press, 1998, pp. 91–95
  16. Timothy M Chan and Mihai Pătraşcu “Counting inversions, offline orthogonal range counting, and related problems” In Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms, 2010, pp. 161–173 SIAM
  17. Maxime Crochemore “Optimal factor transducers” In Combinatorial Algorithms on Words Springer, 1985, pp. 31–43
  18. “On compact directed acyclic word graphs” In Structures in Logic and Computer Science: A Selection of Essays in Honor of A. Ehrenfeucht Springer, 1997, pp. 192–211
  19. Martin Farach-Colton, Paolo Ferragina and Shanmugavelayutham Muthukrishnan “On the sorting-complexity of suffix tree construction” In Journal of the ACM (JACM) 47.6 ACM New York, NY, USA, 2000, pp. 987–1011
  20. Pawel Gawrychowski, Moshe Lewenstein and Patrick K Nicholson “Weighted ancestors in suffix trees” In Algorithms-ESA 2014: 22th Annual European Symposium, Wroclaw, Poland, September 8-10, 2014. Proceedings 21, 2014, pp. 455–466 Springer
  21. Richard Groult, Élise Prieur and Gwénaël Richomme “Counting distinct palindromes in a word in linear time” In Information Processing Letters 110.20 Elsevier, 2010, pp. 908–912
  22. “Succinct index for dynamic dictionary matching” In Algorithms and Computation: 20th International Symposium, ISAAC 2009, Honolulu, Hawaii, USA, December 16-18, 2009. Proceedings 20, 2009, pp. 1034–1043 Springer
  23. “Generalized substring compression” In Theoretical Computer Science 525 Elsevier, 2014, pp. 42–54
  24. Donald E Knuth, James H Morris and Vaughan R Pratt “Fast pattern matching in strings” In SIAM journal on computing 6.2 SIAM, 1977, pp. 323–350
  25. Tomasz Kociumaka “Efficient data structures for internal queries in texts”, 2019
  26. “Internal pattern matching queries in a text and applications” In Proceedings of the twenty-sixth annual ACM-SIAM symposium on Discrete algorithms, 2014, pp. 532–551 SIAM
  27. “Simple and efficient algorithm for approximate dictionary matching” In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), 2010, pp. 851–859
  28. Mikhail Rubinchik and Arseny M Shur “Counting palindromes in substrings” In International Symposium on String Processing and Information Retrieval, 2017, pp. 290–303 Springer
  29. Daniel D Sleator and Robert Endre Tarjan “A data structure for dynamic trees” In Proceedings of the thirteenth annual ACM symposium on Theory of computing, 1981, pp. 114–122
  30. Süleyman Cenk Sahinalp and Uzi Vishkin “Efficient approximate and dynamic matching of patterns using a labeling paradigm” In Proceedings of 37th Conference on Foundations of Computer Science, 1996, pp. 320–328 IEEE
  31. Peter Weiner “Linear pattern matching algorithms” In 14th Annual Symposium on Switching and Automata Theory (swat 1973), 1973, pp. 1–11 IEEE

Summary

We haven't generated a summary for this paper yet.