Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Internal Pattern Matching in Small Space and Applications (2404.17502v1)

Published 26 Apr 2024 in cs.DS

Abstract: In this work, we consider pattern matching variants in small space, that is, in the read-only setting, where we want to bound the space usage on top of storing the strings. Our main contribution is a space-time trade-off for the Internal Pattern Matching (IPM) problem, where the goal is to construct a data structure over a string $S$ of length $n$ that allows one to answer the following type of queries: Compute the occurrences of a fragment $P$ of $S$ inside another fragment $T$ of $S$, provided that $|T| < 2|P|$. For any $\tau \in [1 .. n/\log2 n]$, we present a nearly-optimal $~O(n/\tau)$-size data structure that can be built in $~O(n)$ time using $~O(n/\tau)$ extra space, and answers IPM queries in $O(\tau+\log n \log3 \log n)$ time. IPM queries have been identified as a crucial primitive operation for the analysis of algorithms on strings. In particular, the complexities of several recent algorithms for approximate pattern matching are expressed with regards to the number of calls to a small set of primitive operations that include IPM queries; our data structure allows us to port these results to the small-space setting. We further showcase the applicability of our IPM data structure by using it to obtain space-time trade-offs for the longest common substring and circular pattern matching problems in the asymmetric streaming setting.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (66)
  1. Efficient data structures for range shortest unique substring queries. Algorithms, 13(11), 2020. doi:10.3390/a13110276.
  2. Period recovery of strings over the Hamming and edit distances. Theoretical Computer Science, 710:2–18, 2018. Advances in Algorithms & Combinatorics on Strings (Honoring 60th birthday for Prof. Costas S. Iliopoulos). doi:https://doi.org/10.1016/j.tcs.2017.10.026.
  3. Repetition detection in a dynamic string. In Proc. of ESA, pages 5:1–5:18, 2019. doi:10.4230/LIPIcs.ESA.2019.5.
  4. Multidimensional period recovery. Algorithmica, 84(6):1490–1510, 2022. doi:10.1007/S00453-022-00926-Y.
  5. Dynamic and internal longest common substring. Algorithmica, 82(12):3707–3743, 2020. doi:10.1007/S00453-020-00744-0.
  6. Dynamic text and static pattern matching. ACM Trans. Algor., 3(2):19, 2007. doi:10.1145/1240233.1240242.
  7. Polylogarithmic approximation for edit distance and the asymmetric query complexity. In Proc. of FOCS, pages 377–386, 2010. doi:10.1109/FOCS.2010.43.
  8. A faster and more accurate heuristic for cyclic edit distance computation. Pattern Recognition Letters, 88:81–87, 2017. doi:https://doi.org/10.1016/j.patrec.2017.01.018.
  9. Wavelet trees meet suffix trees. In Proc. of SODA, pages 572–591, 2015. doi:10.1137/1.9781611973730.39.
  10. Internal shortest absent word queries in constant time and linear space. Theoretical Computer Science, 922:271–282, 2022. doi:https://doi.org/10.1016/j.tcs.2022.04.029.
  11. Pattern matching with mismatches and wildcards. CoRR, abs/2402.07732, 2024. doi:10.48550/ARXIV.2402.07732.
  12. Small-space algorithms for the online language distance problem for palindromes and squares. In Proc. of ISAAC, pages 10:1–10:17, 2023. doi:10.4230/LIPICS.ISAAC.2023.10.
  13. Weighted ancestors in suffix trees revisited. In Proc. of CPM, pages 8:1–8:15, 2021. doi:10.4230/LIPIcs.CPM.2021.8.
  14. Time-space tradeoffs for finding a long common substring. In Proc. of CPM, pages 5:1–5:14, 2020. doi:10.4230/LIPICS.CPM.2020.5.
  15. Time–space trade-offs for longest common extensions. Journal of Discrete Algorithms, 25:42–50, 2014. doi:10.1016/J.JDA.2013.06.003.
  16. Locally consistent parsing for text indexing in small space. In Proc. of SODA, pages 607–626, 2020. doi:10.1137/1.9781611975994.37.
  17. Simple real-time constant-space string matching. Theoretical Computer Science, 483:2–9, 2013. doi:10.1016/J.TCS.2012.11.040.
  18. Information cost tradeoffs for augmented index and streaming language recognition. SIAM J. Comput., 42(1):61–83, 2013. doi:10.1137/100816481.
  19. Orthogonal range searching on the RAM, revisited. In Proc. of SoCG, pages 1–10, 2011. doi:10.1145/1998196.1998198.
  20. Counting distinct patterns in internal dictionary matching. In Proc. of CPM, pages 8:1–8:15, 2020. doi:10.4230/LIPICS.CPM.2020.8.
  21. Internal dictionary matching. Algorithmica, 83(7):2142–2169, 2021. doi:10.1007/S00453-021-00821-Y.
  22. Faster algorithms for longest common substring. In Proc. of ESA, pages 30:1–30:17, 2021. Full version: https://arxiv.org/abs/2105.03106. doi:10.4230/LIPICS.ESA.2021.30.
  23. Circular pattern matching with k mismatches. J. Comput. Syst. Sci., 115:73–85, 2021. doi:10.1016/J.JCSS.2020.07.003.
  24. Approximate circular pattern matching. In Proc. of ESA, pages 35:1–35:19, 2022. doi:10.4230/LIPICS.ESA.2022.35.
  25. Efficient enumeration of distinct factors using package representations. In Proc. of SPIRE, volume 12303, pages 247–261. Springer, 2020. doi:10.1007/978-3-030-59212-7\_18.
  26. Faster approximate pattern matching: A unified approach. In Proc. of FOCS, pages 978–989, 2020. doi:10.1109/FOCS46700.2020.00095.
  27. Faster pattern matching under edit distance: A reduction to dynamic puzzle matching and the seaweed monoid of permutation matrices. In Proc. of FOCS, pages 698–707, 2022. doi:10.1109/FOCS54457.2022.00072.
  28. Approximate circular pattern matching under edit distance. In Proc. of STACS, pages 24:1–24:22, 2024. doi:10.4230/LIPIcs.STACS.2024.24.
  29. Bit-Parallel Algorithms for Exact Circular String Matching. The Computer Journal, 57(5):731–743, 03 2013. doi:10.1093/comjnl/bxt023.
  30. A black box for online approximate pattern matching. Inf. Comput., 209(4):731–736, 2011. doi:10.1016/J.IC.2010.12.007.
  31. Internal quasiperiod queries. In Proc. of SPIRE, pages 60–75, 2020. doi:10.1007/978-3-030-59212-7\_5.
  32. Faster algorithms for internal dictionary queries. CoRR, abs/2312.11873, 2023. doi:10.48550/ARXIV.2312.11873.
  33. Improved approximation algorithms for Dyck edit distance and RNA folding. In Proc. of ICALP, pages 49:1–49:20, 2022. doi:10.4230/LIPIcs.ICALP.2022.49.
  34. Internal masked prefix sums and its connection to fully internal measurement queries. In Proc. of SPIRE, pages 217–232, 2022. doi:10.1007/978-3-031-20643-6\_16.
  35. Jean-Pierre Duval. Factorizing words over an ordered alphabet. J. Algorithms, 4(4):363–381, 1983. doi:10.1016/0196-6774(83)90017-2.
  36. Pattern matching with variables: Efficient algorithms and complexity results. ACM Trans. Comput. Theory, 12(1), feb 2020. doi:10.1145/3369935.
  37. Uniqueness theorems for periodic functions. Proceedings of the American Mathematical Society, 16:109–114, 1965.
  38. Approximating LZ77 via small-space multiple-pattern matching. In Proc. of ESA, volume 9294, pages 533–544. Springer, 2015. doi:10.1007/978-3-662-48350-3\_45.
  39. Alphabet-dependent string searching with wexponential search trees. In Proc. of CPM, pages 160–171, 2015. doi:10.1007/978-3-319-19929-0\_14.
  40. Average-optimal string matching. Journal of Discrete Algorithms, 7(4):579–594, 2009. doi:https://doi.org/10.1016/j.jda.2008.09.001.
  41. Tighter bounds and optimal algorithms for all maximal α𝛼\alphaitalic_α-gapped repeats and palindromes - finding all maximal α𝛼\alphaitalic_α-gapped repeats and palindromes in optimal worst case time on integer alphabets. Theory Comput. Syst., 62(1):162–191, 2018. doi:10.1007/S00224-017-9794-5.
  42. Space-efficient construction algorithm for the circular suffix tree. In Proc. of CPM, pages 142–152, 2013. doi:10.1007/978-3-642-38905-4\_15.
  43. Succinct indexes for circular patterns. In Proc. of ISAAC, pages 673–682, 2011. doi:10.1007/978-3-642-25591-5\_69.
  44. Linear-time computation of cyclic roots and cyclic covers of a string. In Proc. of CPM, pages 15:1–15:15, 2023. doi:10.4230/LIPICS.CPM.2023.15.
  45. Searching and indexing circular patterns. In Algorithms for Next-Generation Sequencing Data: Techniques, Approaches, and Applications, pages 77–90. Springer, 2017. doi:10.1007/978-3-319-59826-0_3.
  46. Space efficient multi-dimensional range reporting. In Proc. of COCOON, volume 5609, pages 215–224. Springer, 2009. doi:10.1007/978-3-642-02882-3\_22.
  47. Generalized substring compression. Theor. Comput. Sci., 525:42–54, 2014. doi:10.1016/J.TCS.2013.10.010.
  48. Resolution of the Burrows-Wheeler transform conjecture. In Proc. of FOCS, pages 1002–1013, 2020. doi:10.1109/FOCS46700.2020.00097.
  49. Dynamic suffix array with polylogarithmic queries and updates. In Proc. of STOC, pages 1657–1670, 2022. Full version at http://arxiv.org/abs/1910.10631. doi:10.1145/3519935.3520061.
  50. Tomasz Kociumaka. Efficient data structures for internal queries in texts. PhD thesis, University of Warsaw, Warsaw, Poland, October 2018. Available at https://depotuw.ceon.pl/handle/item/3614.
  51. Longest unbordered factor in quasilinear time. In Proc. of ISAAC, pages 70:1–70:13, 2018. doi:10.4230/LIPIcs.ISAAC.2018.70.
  52. Optimal data structure for internal pattern matching queries in a text and applications. CoRR, abs/1311.6235, 2013. arXiv:1311.6235.
  53. Internal pattern matching queries in a text and applications. In Proc. of SODA, pages 532–551, 2015. doi:10.1137/1.9781611973730.36.
  54. Sublinear space algorithms for the longest common substring problem. In Proc. of ESA, pages 605–617, 2014. doi:10.1007/978-3-662-44777-2\_50.
  55. Searching of gapped repeats and subrepetitions in a word. Journal of Discrete Algorithms, 46-47:1–15, 2017. doi:https://doi.org/10.1016/j.jda.2017.10.004.
  56. Detecting one-variable patterns. In Proc. of SPIRE, pages 254–270, 2017. doi:10.1007/978-3-319-67428-5\_22.
  57. Construction of sparse suffix trees and LCE indexes in optimal time and space. In Proc. of CPM, 2024.
  58. Moshe Lewenstein. Orthogonal range searching for text indexing. In Space-Efficient Data Structures, Streams, and Algorithms, pages 267–302, 2013. doi:10.1007/978-3-642-40273-9\_18.
  59. M. Lothaire. Applied Combinatorics on Words. Encyclopedia of Mathematics and its Applications. Cambridge University Press, 2005.
  60. Optimal space and time for streaming pattern matching. arXiv preprint arXiv:2107.04660, 2021.
  61. Internal longest palindrome queries in optimal time. In Proc. of WALCOM, pages 127–138, 2023.
  62. Milan Ružić. Constructing efficient dictionaries in close to sorting time. In Proc. of ICALP, volume 5125, pages 84–95. Springer, 2008. doi:10.1007/978-3-540-70575-8\_8.
  63. Michael Saks and C. Seshadhri. Space efficient streaming algorithms for the distance to monotonicity and asymmetric edit distance. In Proc. of SODA, pages 1698–1709, 2013. doi:10.1137/1.9781611973105.122.
  64. Time-space trade-offs for the longest common substring problem. In Proc. of CPM, pages 223–234, 2013. doi:10.1007/978-3-642-38905-4\_22.
  65. Fast and simple circular pattern matching. In Man-Machine Interactions 3, pages 537–544, 2014.
  66. Dan E. Willard. Log-logarithmic worst-case range queries are possible in space Θ⁢(N)Θ𝑁\Theta(N)roman_Θ ( italic_N ). Inf. Process. Lett., 17(2):81–84, 1983. doi:10.1016/0020-0190(83)90075-3.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com