Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Faster Algorithms for Text-to-Pattern Hamming Distances (2310.13174v3)

Published 19 Oct 2023 in cs.DS

Abstract: We study the classic Text-to-Pattern Hamming Distances problem: given a pattern $P$ of length $m$ and a text $T$ of length $n$, both over a polynomial-size alphabet, compute the Hamming distance between $P$ and $T[i\, .\, . \, i+m-1]$ for every shift $i$, under the standard Word-RAM model with $\Theta(\log n)$-bit words. - We provide an $O(n\sqrt{m})$ time Las Vegas randomized algorithm for this problem, beating the decades-old $O(n \sqrt{m \log m})$ running time [Abrahamson, SICOMP 1987]. We also obtain a deterministic algorithm, with a slightly higher $O(n\sqrt{m}(\log m\log\log m){1/4})$ running time. Our randomized algorithm extends to the $k$-bounded setting, with running time $O\big(n+\frac{nk}{\sqrt{m}}\big)$, removing all the extra logarithmic factors from earlier algorithms [Gawrychowski and Uzna\'{n}ski, ICALP 2018; Chan, Golan, Kociumaka, Kopelowitz and Porat, STOC 2020]. - For the $(1+\epsilon)$-approximate version of Text-to-Pattern Hamming Distances, we give an $\tilde{O}(\epsilon{-0.93}n)$ time Monte Carlo randomized algorithm, beating the previous $\tilde{O}(\epsilon{-1}n)$ running time [Kopelowitz and Porat, FOCS 2015; Kopelowitz and Porat, SOSA 2018]. Our approximation algorithm exploits a connection with $3$SUM, and uses a combination of Fredman's trick, equality matrix product, and random sampling; in particular, we obtain new results on approximate counting versions of $3$SUM and Exact Triangle, which may be of independent interest. Our exact algorithms use a novel combination of hashing, bit-packed FFT, and recursion; in particular, we obtain a faster algorithm for computing the sumset of two integer sets, in the regime when the universe size is close to quadratic in the number of elements. We also prove a fine-grained equivalence between the exact Text-to-Pattern Hamming Distances problem and a range-restricted, counting version of $3$SUM.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (78)
  1. Karl R. Abrahamson. Generalized string matching. SIAM J. Comput., 16(6):1039–1051, 1987. doi:10.1137/0216067.
  2. Pattern matching in the Hamming distance with thresholds. Inf. Process. Lett., 111(14):674–677, 2011. doi:10.1016/j.ipl.2011.04.004.
  3. Efficient matching of nonrectangular shapes. Ann. Math. Artif. Intell., 4:211–224, 1991. doi:10.1007/BF01531057.
  4. A lower-variance randomized algorithm for approximate string matching. Inf. Process. Lett., 113(18):690–692, 2013. doi:10.1016/j.ipl.2013.06.005.
  5. Faster algorithms for string matching with k𝑘kitalic_k mismatches. J. Algorithms, 50(2):257–275, 2004. doi:10.1016/S0196-6774(03)00097-X.
  6. Faster knapsack algorithms via bounded monotone min-plus-convolution. In Proc. 49th International Colloquium on Automata, Languages, and Programming (ICALP), volume 229, pages 31:1–31:21, 2022. doi:10.4230/LIPIcs.ICALP.2022.31.
  7. Elliptic curve fast Fourier transform (ECFFT) part I: Low-degree extension in time O(n log n) over all finite fields. In Proc. ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 700–737, 2023. doi:10.1137/1.9781611977554.ch30.
  8. Subquadratic algorithms for 3SUM. Algorithmica, 50(4):584–596, 2008. doi:10.1007/s00453-007-9036-3.
  9. Fast and compact regular expression matching. Theor. Comput. Sci., 409(3):486–496, 2008. doi:10.1016/j.tcs.2008.08.042.
  10. Sparse nonnegative convolution is equivalent to dense nonnegative convolution. In Proc. 53rd Annual ACM SIGACT Symposium on Theory of Computing (STOC), pages 1711–1724, 2021. doi:10.1145/3406325.3451090.
  11. Deterministic and Las Vegas algorithms for sparse nonnegative convolution. In Proc. 2022 ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 3069–3090, 2022. doi:10.1137/1.9781611977073.119.
  12. Top-k𝑘kitalic_k-convolution and the quest for near-linear output-sensitive subset sum. In Proc. 52nd Annual ACM SIGACT Symposium on Theory of Computing (STOC), pages 982–995, 2020. doi:10.1145/3357713.3384308.
  13. Fast n𝑛nitalic_n-fold boolean convolution via additive combinatorics. In Proc. 48th International Colloquium on Automata, Languages, and Programming (ICALP), volume 198, pages 41:1–41:17, 2021. doi:10.4230/LIPIcs.ICALP.2021.41.
  14. A fine-grained perspective on approximating subset sum and partition. In Proc. 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1797–1815, 2021. doi:10.1137/1.9781611976465.108.
  15. Faster regular expression matching. In Proc. 36th International Colloquium on Automata, Languages, and Programming (ICALP), pages 171–182, 2009. doi:10.1007/978-3-642-02927-1_16.
  16. The k-mismatch problem revisited. In Proc. 27th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 2039–2052, 2016. doi:10.1137/1.9781611974331.ch142.
  17. Approximating text-to-pattern Hamming distances. In Proc. 52nd Annual ACM SIGACT Symposium on Theory of Computing (STOC), pages 643–656, 2020. doi:10.1145/3357713.3384266.
  18. Approximate string matching: A simpler faster algorithm. SIAM J. Comput., 31(6):1761–1782, 2002. doi:10.1137/S0097539700370527.
  19. On the change-making problem. In Proc. 3rd SIAM Symposium on Simplicity in Algorithms (SOSA), pages 38–42, 2020. doi:10.1137/1.9781611976014.7.
  20. Reducing 3SUM to convolution-3SUM. In Proc. 3rd Symposium on Simplicity in Algorithms (SOSA), pages 1–7, 2020. doi:10.1137/1.9781611976014.1.
  21. Timothy M. Chan. More algorithms for all-pairs shortest paths in weighted graphs. SIAM J. Comput., 39(5):2075–2089, 2010. doi:10.1137/08071990X.
  22. Timothy M. Chan. Approximation schemes for 0-1 knapsack. In Proc. 1st Symposium on Simplicity in Algorithms (SOSA), volume 61, pages 5:1–5:12, 2018. doi:10.4230/OASIcs.SOSA.2018.5.
  23. Timothy M. Chan. More logarithmic-factor speedups for 3SUM, (median,+)-convolution, and some geometric 3SUM-hard problems. ACM Trans. Algorithms, 16(1):7:1–7:23, 2020. doi:https://doi.org/10.1145/3363541.
  24. Clustered integer 3SUM via additive combinatorics. In Proc. 47th Annual ACM Symposium on Theory of Computing (STOC), pages 31–40, 2015. doi:10.1145/2746539.2746568.
  25. Raphaël Clifford. Matrix multiplication and pattern matching under Hamming norm, 2009. URL: https://web.archive.org/web/20160818144748/http://www.cs.bris.ac.uk/Research/Algorithms/events/BAD09/BAD09/Talks/BAD09-Hammingnotes.pdf.
  26. A nearly quadratic-time FPTAS for knapsack. CoRR, abs/2308.07821, 2023. arXiv:2308.07821, doi:10.48550/arXiv.2308.07821.
  27. A subquadratic sequence alignment algorithm for unrestricted scoring matrices. SIAM J. Comput., 32(6):1654–1673, 2003. doi:10.1137/S0097539702402007.
  28. Pattern matching for spatial point sets. In Proc. 39th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 156–165, 1998. doi:10.1109/SFCS.1998.743439.
  29. Fredman’s trick meets dominance product: Fine-grained complexity of unweighted APSP, 3SUM counting, and more. In Proceedings of the 55th Annual ACM Symposium on Theory of Computing (STOC), pages 419–432. ACM, 2023. doi:10.1145/3564246.3585237.
  30. Martin Dietzfelbinger. Universal hashing and k𝑘kitalic_k-wise independent random variables via integer arithmetic without primes. In Proc. 13th Annual Symposium on Theoretical Aspects of Computer Science (STACS), volume 1046, pages 569–580, 1996. doi:10.1007/3-540-60922-9_46.
  31. Approximating knapsack and partition via dense subset sums. In Proc. 2023 ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 2961–2979, 2023. doi:10.1137/1.9781611977554.ch113.
  32. Faster matrix multiplication via asymmetric hashing. CoRR, abs/2210.10173, 2022. To appear in FOCS 2023. arXiv:2210.10173, doi:10.48550/arXiv.2210.10173.
  33. Exploiting word-level parallelism for fast convolutions and their applications in approximate string matching. Eur. J. Comb., 34(1):38–51, 2013. doi:10.1016/j.ejc.2012.07.013.
  34. String matching and other products. In Complexity of Computation, RM Karp (editor), SIAM-AMS Proceedings, volume 7, pages 113–125, 1974.
  35. Michael L. Fredman. New bounds on the complexity of the shortest path problem. SIAM J. Comput., 5(1):83–89, 1976. doi:10.1137/0205006.
  36. Martin Fürer. How fast can we multiply large integers on an actual computer? In Proc. 11th Latin American Symposium on Theoretical Informatics (LATIN), volume 8392, pages 660–670. Springer, 2014. doi:10.1007/978-3-642-54423-1_57.
  37. Improved string matching with k𝑘kitalic_k mismatches. SIGACT News, 17(4):52–54, 1986. doi:10.1145/8307.8309.
  38. Threesomes, degenerates, and love triangles. J. ACM, 65(4):22:1–22:25, 2018. doi:10.1145/3185378.
  39. Szymon Grabowski. New tabulation and sparse dynamic programming based techniques for sequence similarity problems. Discret. Appl. Math., 212:96–103, 2016. doi:10.1016/j.dam.2015.10.040.
  40. Dominance product and high-dimensional closest pair under L∞subscript𝐿L_{\infty}italic_L start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT. In Proc. 28th International Symposium on Algorithms and Computation (ISAAC), volume 92, pages 39:1–39:12, 2017. doi:10.4230/LIPIcs.ISAAC.2017.39.
  41. Towards unified approximate pattern matching for Hamming and L11{}_{\mbox{1}}start_FLOATSUBSCRIPT 1 end_FLOATSUBSCRIPT distance. In Proc. 45th International Colloquium on Automata, Languages, and Programming (ICALP), volume 107, pages 62:1–62:13, 2018. doi:10.4230/LIPIcs.ICALP.2018.62.
  42. Faster polynomial multiplication over finite fields. J. ACM, 63(6):52:1–52:23, 2017. doi:10.1145/3005344.
  43. Piotr Indyk. Faster algorithms for string matching problems: Matching the convolution bound. In Proc. 39th Annual Symposium on Foundations of Computer Science (FOCS), pages 166–173, 1998. doi:10.1109/SFCS.1998.743440.
  44. Ce Jin. An improved FPTAS for 0-1 knapsack. In Proc. 46th International Colloquium on Automata, Languages, and Programming (ICALP), volume 132, pages 76:1–76:14, 2019. doi:10.4230/LIPIcs.ICALP.2019.76.
  45. The one-way communication complexity of Hamming distance. Theory Comput., 4(1):129–135, 2008. doi:10.4086/toc.2008.v004a006.
  46. Howard J. Karloff. Fast algorithms for approximately counting mismatches. Inf. Process. Lett., 48(2):53–60, 1993. doi:10.1016/0020-0190(93)90177-B.
  47. Breaking the variance: Approximating the Hamming distance in O~⁢(1/ε)~𝑂1𝜀\tilde{O}(1/\varepsilon)over~ start_ARG italic_O end_ARG ( 1 / italic_ε ) time per alignment. In Proc. IEEE 56th Annual Symposium on Foundations of Computer Science (FOCS), pages 601–613, 2015. doi:10.1109/FOCS.2015.43.
  48. A simple algorithm for approximating the text-to-pattern Hamming distance. In Proc. 1st Symposium on Simplicity in Algorithms (SOSA), volume 61, pages 10:1–10:5, 2018. doi:10.4230/OASIcs.SOSA.2018.10.
  49. Higher lower bounds from the 3SUM conjecture. In Proc. 27th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1272–1287, 2016. doi:10.1137/1.9781611974331.ch89.
  50. Novel polynomial basis with fast fourier transform and its application to reed-solomon erasure codes. IEEE Trans. Inf. Theory, 62(11):6284–6299, 2016. doi:10.1109/TIT.2016.2608892.
  51. Approximate pattern matching with the L1subscript𝐿1L_{1}italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, L2subscript𝐿2L_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and L∞subscript𝐿L_{\infty}italic_L start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT metrics. Algorithmica, 60(2):335–348, 2011. doi:10.1007/s00453-009-9345-9.
  52. Monochromatic triangles, intermediate matrix products, and convolutions. In Proc. 11th Innovations in Theoretical Computer Science Conference (ITCS), volume 151, pages 53:1–53:18, 2020. doi:10.4230/LIPIcs.ITCS.2020.53.
  53. François Le Gall and Florent Urrutia. Improved rectangular matrix multiplication using powers of the Coppersmith-Winograd tensor. In Proc. 29th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1029–1046, 2018. doi:10.1137/1.9781611975031.67.
  54. Hamming distance completeness. In Proc. 30th Annual Symposium on Combinatorial Pattern Matching (CPM), volume 128, pages 14:1–14:17, 2019. doi:10.4230/LIPIcs.CPM.2019.14.
  55. Efficient string matching with k𝑘kitalic_k mismatches. Theor. Comput. Sci., 43:239–249, 1986. doi:10.1016/0304-3975(86)90178-7.
  56. Fast parallel and serial approximate string matching. J. Algorithms, 10(2):157–169, 1989. doi:10.1016/0196-6774(89)90010-2.
  57. Xiao Mao. (1-ε𝜀\varepsilonitalic_ε)-approximation of knapsack in nearly quadratic time. CoRR, abs/2308.07004, 2023. arXiv:2308.07004, doi:10.48550/arXiv.2308.07004.
  58. Jiří Matoušek. Computing dominances in Ensuperscript𝐸𝑛E^{n}italic_E start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. Inf. Process. Lett., 38(5):277–278, 1991. doi:10.1016/0020-0190(91)90071-O.
  59. A faster algorithm computing string edit distances. J. Comput. Syst. Sci., 20(1):18–31, 1980. doi:10.1016/0022-0000(80)90002-1.
  60. A subquadratic approximation scheme for partition. In Proc. 30th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 70–88, 2019. doi:10.1137/1.9781611975482.5.
  61. Gene Myers. A four russians algorithm for regular expression pattern matching. J. ACM, 39(2):432–448, apr 1992. doi:10.1145/128749.128755.
  62. Mihai Pătraşcu. Towards polynomial lower bounds for dynamic problems. In Proc. 42nd ACM Symposium on Theory of Computing (STOC), pages 603–610, 2010. doi:10.1145/1806689.1806772.
  63. Victor Shoup. New algorithms for finding irreducible polynomials over finite fields. In Proc. 29th Annual Symposium on Foundations of Computer Science (FOCS), pages 283–290, 1988. doi:10.1109/SFCS.1988.21944.
  64. Approximating approximate pattern matching. In Proc. 30th Annual Symposium on Combinatorial Pattern Matching (CPM), volume 128, pages 15:1–15:13, 2019. doi:10.4230/LIPIcs.CPM.2019.15.
  65. Efficient approximate and dynamic matching of patterns using a labeling paradigm (extended abstract). In Proc. 37th Annual Symposium on Foundations of Computer Science (FOCS), pages 320–328, 1996. doi:10.1109/SFCS.1996.548491.
  66. Tadao Takaoka. Subcubic cost algorithms for the all pairs shortest path problem. Algorithmica, 20(3):309–318, 1998. doi:10.1007/PL00009198.
  67. Mikkel Thorup. Randomized sorting in O⁢(n⁢log⁡log⁡n)𝑂𝑛𝑛O(n\log\log n)italic_O ( italic_n roman_log roman_log italic_n ) time and linear space using addition, shift, and bit-wise boolean operations. J. Algorithms, 42(2):205–230, 2002. doi:10.1006/jagm.2002.1211.
  68. Przemysław Uznański. Approximating text-to-pattern distance via dimensionality reduction. In Proc. 31st Annual Symposium on Combinatorial Pattern Matching (CPM), volume 161, pages 29:1–29:11, 2020. doi:10.4230/LIPIcs.CPM.2020.29.
  69. Przemysław Uznański. Recent advances in text-to-pattern distance algorithms. In Beyond the Horizon of Computability - 16th Conference on Computability in Europe (CiE), volume 12098, pages 353–365, 2020. doi:10.1007/978-3-030-51466-2_32.
  70. Virginia Vassilevska Williams. Problem 2 on problem set 2 of CS367, October 15, 2015. URL: http://theory.stanford.edu/~virgi/cs367/hw2.pdf.
  71. Finding, minimizing, and counting weighted subgraphs. In Proc. 41st Annual ACM Symposium on Theory of Computing (STOC), pages 455–464, 2009. doi:10.1137/09076619X.
  72. Virginia Vassilevska Williams and R. Ryan Williams. Subcubic equivalences between path, matrix, and triangle problems. J. ACM, 65(5):27:1–27:38, 2018. doi:10.1145/3186893.
  73. New bounds for matrix multiplication: from alpha to omega. CoRR, abs/2307.07970, 2023. To appear in SODA 2024. arXiv:2307.07970, doi:10.48550/arXiv.2307.07970.
  74. Joachim von zur Gathen and Jürgen Gerhard. Modern Computer Algebra. Cambridge University Press, 2013.
  75. Improved approximation schemes for (un-)bounded subset-sum and partition. CoRR, abs/2212.02883, 2022. arXiv:2212.02883, doi:10.48550/arXiv.2212.02883.
  76. R. Ryan Williams. Faster all-pairs shortest paths via circuit complexity. SIAM J. Comput., 47(5):1965–1985, 2018. doi:10.1137/15M1024524.
  77. David P. Woodruff. Optimal space lower bounds for all frequency moments. In Proc. 15th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 167–175, 2004. URL: https://dl.acm.org/doi/10.5555/982792.982817.
  78. Raphael Yuster. Efficient algorithms on sets of permutations, dominance, and real-weighted APSP. In Proc. 20th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 950–957, 2009. URL: https://dl.acm.org/doi/10.5555/1496770.1496873.
Citations (5)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com