Subsequences With Generalised Gap Constraints: Upper and Lower Complexity Bounds (2404.10497v1)
Abstract: For two strings u, v over some alphabet A, we investigate the problem of embedding u into w as a subsequence under the presence of generalised gap constraints. A generalised gap constraint is a triple (i, j, C_{i, j}), where 1 <= i < j <= |u| and C_{i, j} is a subset of A*. Embedding u as a subsequence into v such that (i, j, C_{i, j}) is satisfied means that if u[i] and u[j] are mapped to v[k] and v[l], respectively, then the induced gap v[k + 1..l - 1] must be a string from C_{i, j}. This generalises the setting recently investigated in [Day et al., ISAAC 2022], where only gap constraints of the form C_{i, i + 1} are considered, as well as the setting from [Kosche et al., RP 2022], where only gap constraints of the form C_{1, |u|} are considered. We show that subsequence matching under generalised gap constraints is NP-hard, and we complement this general lower bound with a thorough (parameterised) complexity analysis. Moreover, we identify several efficiently solvable subclasses that result from restricting the interval structure induced by the generalised gap constraints.
- Tight hardness results for LCS and other sequence similarity measures. In IEEE 56th Annual Symposium on Foundations of Computer Science, FOCS 2015, Berkeley, CA, USA, 17-20 October, 2015, pages 59–78, 2015. doi:10.1109/FOCS.2015.14.
- Consequences of faster alignment of sequences. In Automata, Languages, and Programming - 41st International Colloquium, ICALP 2014, Copenhagen, Denmark, July 8-11, 2014, Proceedings, Part I, pages 39–51, 2014. doi:10.1007/978-3-662-43948-7\_4.
- Longest common subsequence with gap constraints. In Combinatorics on Words - 14th International Conference, WORDS 2023, Umeå, Sweden, June 12-16, 2023, Proceedings, pages 60–76, 2023. doi:10.1007/978-3-031-33180-0\_5.
- A refined laser method and faster matrix multiplication. In Dániel Marx, editor, Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms, SODA 2021, Virtual Conference, January 10 - 13, 2021, pages 522–539. SIAM, 2021. doi:10.1137/1.9781611976465.32.
- Complex event recognition languages: Tutorial. In Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems, DEBS 2017, Barcelona, Spain, June 19-23, 2017, pages 7–10, 2017. doi:10.1145/3093742.3095106.
- Practical variable length gap pattern matching. In Experimental Algorithms - 15th International Symposium, SEA 2016, St. Petersburg, Russia, June 5-8, 2016, Proceedings, pages 1–16, 2016. doi:10.1007/978-3-319-38851-9\_1.
- Ricardo A. Baeza-Yates. Searching subsequences. Theor. Comput. Sci., 78(2):363–376, 1991.
- String matching with variable length gaps. Theor. Comput. Sci., 443:25–34, 2012. doi:10.1016/j.tcs.2012.03.029.
- Hans L. Bodlaender. A linear-time algorithm for finding tree-decompositions of small treewidth. SIAM J. Comput., 25(6):1305–1317, 1996. doi:10.1137/S0097539793251219.
- Hans L. Bodlaender. A partial k-arboretum of graphs with bounded treewidth. Theor. Comput. Sci., 209(1-2):1–45, 1998. doi:10.1016/S0304-3975(97)00228-4.
- Sketching, streaming, and fine-grained complexity of (weighted) LCS. In Proc. FSTTCS 2018, volume 122 of LIPIcs, pages 40:1–40:16, 2018.
- Multivariate fine-grained complexity of longest common subsequence. In Proc. SODA 2018, pages 1216–1235, 2018.
- Unshuffling a square is NP-hard. J. Comput. Syst. Sci., 80(4):766–776, 2014. doi:10.1016/j.jcss.2013.11.002.
- Fast indexes for gapped pattern matching. In SOFSEM 2020: Theory and Practice of Computer Science - 46th International Conference on Current Trends in Theory and Practice of Informatics, SOFSEM 2020, Limassol, Cyprus, January 20-24, 2020, Proceedings, pages 493–504, 2020. doi:10.1007/978-3-030-38919-2\_40.
- Stephen A. Cook. The complexity of theorem-proving procedures. In Michael A. Harrison, Ranan B. Banerji, and Jeffrey D. Ullman, editors, Proceedings of the 3rd Annual ACM Symposium on Theory of Computing, May 3-5, 1971, Shaker Heights, Ohio, USA, pages 151–158. ACM, 1971. doi:10.1145/800157.805047.
- Introduction to Algorithms, 3rd Edition. MIT Press, 2009. URL: http://mitpress.mit.edu/books/introduction-algorithms.
- Pathwidth of outerplanar graphs. J. Graph Theory, 55(1):27–41, 2007. URL: https://doi.org/10.1002/jgt.20218, doi:10.1002/JGT.20218.
- Subsequences with gap constraints: Complexity bounds for matching and analysis problems. In 33rd International Symposium on Algorithms and Computation, ISAAC 2022, December 19-21, 2022, Seoul, Korea, pages 64:1–64:18, 2022. URL: https://doi.org/10.4230/LIPIcs.ISAAC.2022.64, doi:10.4230/LIPICS.ISAAC.2022.64.
- Crossing numbers and cutwidths. J. Graph Algorithms Appl., 7(3):245–251, 2003. URL: https://doi.org/10.7155/jgaa.00069, doi:10.7155/JGAA.00069.
- Construction of aho corasick automaton in linear time for integer alphabets. Inf. Process. Lett., 98(2):66–72, 2006. URL: https://doi.org/10.1016/j.ipl.2005.11.019, doi:10.1016/J.IPL.2005.11.019.
- Faster matrix multiplication via asymmetric hashing. CoRR, abs/2210.10173, 2022. arXiv:2210.10173, doi:10.48550/arXiv.2210.10173.
- Graph separation and search number. In Proc. 1983 Allerton Conf. on Communication, Control, and Computing, 1983.
- The vertex separation and search number of a graph. Inf. Comput., 113(1):50–79, 1994. URL: https://doi.org/10.1006/inco.1994.1064, doi:10.1006/INCO.1994.1064.
- Matching patterns with variables under simon’s congruence. In Reachability Problems - 17th International Conference, RP 2023, Nice, France, October 11-13, 2023, Proceedings, pages 155–170, 2023. doi:10.1007/978-3-031-45286-4\_12.
- Testing k𝑘kitalic_k-binomial equivalence. In Multidisciplinary Creativity, a collection of papers dedicated to G. Păun 65th birthday, pages 239–248, 2015. available in CoRR abs/1509.00622.
- Puzzling over subsequence-query extensions: Disjunction and generalised gaps. In Proceedings of the 15th Alberto Mendelzon International Workshop on Foundations of Data Management (AMW 2023), Santiago de Chile, Chile, May 22-26, 2023, 2023. URL: https://ceur-ws.org/Vol-3409/paper3.pdf.
- Complex event recognition in the big data era: a survey. VLDB J., 29(1):313–352, 2020. doi:10.1007/s00778-019-00557-w.
- Decidability, complexity, and expressiveness of first-order logic over the subword ordering. In Proc. LICS 2017, pages 1–12, 2017.
- Algorithms for computing the longest parameterized common subsequence. In Combinatorial Pattern Matching, 18th Annual Symposium, CPM 2007, London, Canada, July 9-11, 2007, Proceedings, pages 265–273, 2007. doi:10.1007/978-3-540-73437-6\_27.
- On the complexity of k-sat. J. Comput. Syst. Sci., 62(2):367–375, 2001. doi:10.1006/jcss.2000.1727.
- On the index of Simon’s congruence for piecewise testability. Inf. Process. Lett., 115(4):515–519, 2015.
- The height of piecewise-testable languages with applications in logical complexity. In Proc. CSL 2016, volume 62 of LIPIcs, pages 37:1–37:22, 2016.
- The height of piecewise-testable languages and the complexity of the logic of subwords. Log. Methods Comput. Sci., 15(2), 2019.
- Richard M. Karp. Reducibility among combinatorial problems. In Raymond E. Miller and James W. Thatcher, editors, Proceedings of a symposium on the Complexity of Computer Computations, held March 20-22, 1972, at the IBM Thomas J. Watson Research Center, Yorktown Heights, New York, USA, The IBM Research Symposia Series, pages 85–103. Plenum Press, New York, 1972. doi:10.1007/978-1-4684-2001-2\_9.
- Discovering event queries from traces: Laying foundations for subsequence-queries with wildcards and gap-size constraints. In 25th International Conference on Database Theory, ICDT 2022, 29th March-1st April, 2022 Edinburgh, UK, 2022.
- Discovering multi-dimensional subsequence queries from traces - from theory to practice. In Datenbanksysteme für Business, Technologie und Web (BTW 2023), 20. Fachtagung des GI-Fachbereichs ,,Datenbanken und Informationssysteme” (DBIS), 06.-10, März 2023, Dresden, Germany, Proceedings, pages 511–533, 2023. doi:10.18420/BTW2023-24.
- Subsequences in bounded ranges: Matching and analysis problems. In Anthony W. Lin, Georg Zetzsche, and Igor Potapov, editors, Reachability Problems - 16th International Conference, RP 2022, Kaiserslautern, Germany, October 17-21, 2022, Proceedings, volume 13608 of Lecture Notes in Computer Science, pages 140–159. Springer, 2022. doi:10.1007/978-3-031-19135-0\_10.
- Combinatorial algorithms for subsequence matching: A survey. In Henning Bordihn, Géza Horváth, and György Vaszil, editors, Proceedings 12th International Workshop on Non-Classical Models of Automata and Applications, NCMA 2022, Debrecen, Hungary, August 26-27, 2022, volume 367 of EPTCS, pages 11–27, 2022. doi:10.4204/EPTCS.367.2.
- Dietrich Kuske. The subtrace order and counting first-order logic. In Proc. CSR 2020, volume 12159 of Lecture Notes in Computer Science, pages 289–302, 2020.
- Languages ordered by the subword order. In Proc. FOSSACS 2019, volume 11425 of Lecture Notes in Computer Science, pages 348–364, 2019.
- Computing the k𝑘kitalic_k-binomial complexity of the Thue-Morse word. In Proc. DLT 2019, volume 11647 of Lecture Notes in Computer Science, pages 278–291, 2019.
- Generalized Pascal triangle for binomial coefficients of words. Electron. J. Combin., 24(1.44):36 pp., 2017.
- Efficiently mining closed subsequences with gap constraints. In SDM, pages 313–322. SIAM, 2008.
- Efficient mining of gap-constrained subsequences and its various applications. ACM Trans. Knowl. Discov. Data, 6(1):2:1–2:39, 2012.
- David Maier. The complexity of some problems on subsequences and supersequences. J. ACM, 25(2):322–336, April 1978.
- Subword histories and Parikh matrices. J. Comput. Syst. Sci., 68(1):1–21, 2004.
- T.A.J. Nicholson. Permutation procedure for minimising the number of crossings in a network. Proceedings of the Institution of Electrical Engineers, 115:21–26(5), January 1968.
- Rohit J Parikh. Language generating devices. Quarterly Progress Report, 60:199–212, 1961.
- On the piecewise complexity of words and periodic words. In SOFSEM 2024: Theory and Practice of Computer Science - 48th International Conference on Current Trends in Theory and Practice of Computer Science, SOFSEM 2024, Cochem, Germany, February 19-23, 2024, Proceedings, pages 456–470, 2024. doi:10.1007/978-3-031-52113-3\_32.
- William E. Riddle. An approach to software system modelling and analysis. Comput. Lang., 4(1):49–66, 1979. doi:10.1016/0096-0551(79)90009-2.
- Another generalization of abelian equivalence: Binomial complexity of infinite words. Theor. Comput. Sci., 601:47–57, 2015.
- Arto Salomaa. Connections between subwords and certain matrix mappings. Theoret. Comput. Sci., 340(2):188–203, 2005.
- On arch factorization and subword universality for words and compressed words. In Combinatorics on Words - 14th International Conference, WORDS 2023, Umeå, Sweden, June 12-16, 2023, Proceedings, pages 274–287, 2023. doi:10.1007/978-3-031-33180-0\_21.
- Shinnosuke Seki. Absoluteness of subword inequality is undecidable. Theor. Comput. Sci., 418:116–120, 2012.
- Alan C. Shaw. Software descriptions with flow expressions. IEEE Trans. Software Eng., 4(3):242–254, 1978. doi:10.1109/TSE.1978.231501.
- Imre Simon. Hierarchies of events with dot-depth one — Ph.D. thesis. University of Waterloo, 1972.
- Imre Simon. Piecewise testable events. In Autom. Theor. Form. Lang., 2nd GI Conf., volume 33 of LNCS, pages 214–222, 1975.
- Manfred Wiegers. Recognizing outerplanar graphs in linear time. In Gottfried Tinhofer and Gunther Schmidt, editors, Graphtheoretic Concepts in Computer Science, International Workshop, WG ’86, Bernried, Germany, June 17-19, 1986, Proceedings, volume 246 of Lecture Notes in Computer Science, pages 165–176. Springer, 1986. doi:10.1007/3-540-17218-1\_57.
- Dan E. Willard. Log-logarithmic worst-case range queries are possible in space theta(n). Inf. Process. Lett., 17(2):81–84, 1983. doi:10.1016/0020-0190(83)90075-3.
- Ryan Williams. A new algorithm for optimal 2-constraint satisfaction and its implications. Theor. Comput. Sci., 348(2-3):357–365, 2005. doi:10.1016/j.tcs.2005.09.023.
- Virginia Vassilevska Williams. On some fine-grained questions in algorithms and complexity, pages 3447–3487. URL: https://www.worldscientific.com/doi/abs/10.1142/9789813272880_0188, arXiv:https://www.worldscientific.com/doi/pdf/10.1142/9789813272880_0188, doi:10.1142/9789813272880_0188.
- Georg Zetzsche. The complexity of downward closure comparisons. In Proc. ICALP 2016, volume 55 of LIPIcs, pages 123:1–123:14, 2016.
- On complexity and optimization of expensive queries in complex event processing. In International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 22-27, 2014, pages 217–228, 2014. doi:10.1145/2588555.2593671.