Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 87 tok/s
Gemini 2.5 Pro 51 tok/s Pro
GPT-5 Medium 17 tok/s Pro
GPT-5 High 23 tok/s Pro
GPT-4o 102 tok/s Pro
Kimi K2 166 tok/s Pro
GPT OSS 120B 436 tok/s Pro
Claude Sonnet 4 37 tok/s Pro
2000 character limit reached

Minimizing the Minimizers via Alphabet Reordering (2405.04052v1)

Published 7 May 2024 in cs.DS

Abstract: Minimizers sampling is one of the most widely-used mechanisms for sampling strings [Roberts et al., Bioinformatics 2004]. Let $S=S[1]\ldots S[n]$ be a string over a totally ordered alphabet $\Sigma$. Further let $w\geq 2$ and $k\geq 1$ be two integers. The minimizer of $S[i\mathinner{.\,.} i+w+k-2]$ is the smallest position in $[i,i+w-1]$ where the lexicographically smallest length-$k$ substring of $S[i\mathinner{.\,.} i+w+k-2]$ starts. The set of minimizers over all $i\in[1,n-w-k+2]$ is the set $\mathcal{M}{w,k}(S)$ of the minimizers of $S$. We consider the following basic problem: Given $S$, $w$, and $k$, can we efficiently compute a total order on $\Sigma$ that minimizes $|\mathcal{M}{w,k}(S)|$? We show that this is unlikely by proving that the problem is NP-hard for any $w\geq 2$ and $k\geq 1$. Our result provides theoretical justification as to why there exist no exact algorithms for minimizing the minimizers samples, while there exists a plethora of heuristics for the same purpose.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (24)
  1. Text indexing for long patterns: Anchors are all you need. Proc. VLDB Endow., 16(9):2117–2131, 2023. URL: https://www.vldb.org/pvldb/vol16/p2117-loukides.pdf, doi:10.14778/3598581.3598586.
  2. On the complexity of BWT-runs minimization via alphabet reordering. In Fabrizio Grandoni, Grzegorz Herman, and Peter Sanders, editors, 28th Annual European Symposium on Algorithms, ESA 2020, September 7-9, 2020, Pisa, Italy (Virtual Conference), volume 173 of LIPIcs, pages 15:1–15:13. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020. URL: https://doi.org/10.4230/LIPIcs.ESA.2020.15, doi:10.4230/LIPICS.ESA.2020.15.
  3. On the representation of de Bruijn graphs. J. Comput. Biol., 22(5):336–352, 2015. URL: https://doi.org/10.1089/cmb.2014.0160, doi:10.1089/CMB.2014.0160.
  4. KMC 2: fast and resource-frugal k-mer counting. Bioinform., 31(10):1569–1576, 2015. URL: https://doi.org/10.1093/bioinformatics/btv022, doi:10.1093/BIOINFORMATICS/BTV022.
  5. Finding an optimal alphabet ordering for Lyndon factorization is hard. In Markus Bläser and Benjamin Monmege, editors, 38th International Symposium on Theoretical Aspects of Computer Science, STACS 2021, March 16-19, 2021, Saarbrücken, Germany (Virtual Conference), volume 187 of LIPIcs, pages 35:1–35:15. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021. URL: https://doi.org/10.4230/LIPIcs.STACS.2021.35, doi:10.4230/LIPICS.STACS.2021.35.
  6. Sampled suffix array with minimizers. Softw. Pract. Exp., 47(11):1755–1771, 2017. URL: https://doi.org/10.1002/spe.2481, doi:10.1002/SPE.2481.
  7. Differentiable learning of sequence-specific minimizer schemes with DeepMinimizer. J. Comput. Biol., 29(12):1288–1304, 2022. URL: https://doi.org/10.1089/cmb.2022.0275, doi:10.1089/CMB.2022.0275.
  8. Weighted minimizer sampling improves long read mapping. Bioinform., 36(Supplement-1):i111–i118, 2020. URL: https://doi.org/10.1093/bioinformatics/btaa435, doi:10.1093/BIOINFORMATICS/BTAA435.
  9. Richard M. Karp. Reducibility among combinatorial problems. In Raymond E. Miller and James W. Thatcher, editors, Proceedings of a symposium on the Complexity of Computer Computations, held March 20-22, 1972, at the IBM Thomas J. Watson Research Center, Yorktown Heights, New York, USA, The IBM Research Symposia Series, pages 85–103. Plenum Press, New York, 1972. URL: https://doi.org/10.1007/978-1-4684-2001-2_9, doi:10.1007/978-1-4684-2001-2\_9.
  10. Efficient randomized pattern-matching algorithms. IBM J. Res. Dev., 31(2):249–260, 1987. URL: https://doi.org/10.1147/rd.312.0249, doi:10.1147/RD.312.0249.
  11. Heng Li. Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. Bioinform., 32(14):2103–2110, 2016. URL: https://doi.org/10.1093/bioinformatics/btw152, doi:10.1093/BIOINFORMATICS/BTW152.
  12. Heng Li. Minimap2: pairwise alignment for nucleotide sequences. Bioinform., 34(18):3094–3100, 2018. URL: https://doi.org/10.1093/bioinformatics/bty191, doi:10.1093/BIOINFORMATICS/BTY191.
  13. Bidirectional string anchors: A new string sampling mechanism. In Petra Mutzel, Rasmus Pagh, and Grzegorz Herman, editors, 29th Annual European Symposium on Algorithms, ESA 2021, September 6-8, 2021, Lisbon, Portugal (Virtual Conference), volume 204 of LIPIcs, pages 64:1–64:21. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021. URL: https://doi.org/10.4230/LIPIcs.ESA.2021.64, doi:10.4230/LIPICS.ESA.2021.64.
  14. Bidirectional string anchors for improved text indexing and top-$k$ similarity search. IEEE Trans. Knowl. Data Eng., 35(11):11093–11111, 2023. URL: https://doi.org/10.1109/TKDE.2022.3231780, doi:10.1109/TKDE.2022.3231780.
  15. Compact universal k-mer hitting sets. In Martin C. Frith and Christian Nørgaard Storm Pedersen, editors, Algorithms in Bioinformatics - 16th International Workshop, WABI 2016, Aarhus, Denmark, August 22-24, 2016. Proceedings, volume 9838 of Lecture Notes in Computer Science, pages 257–268. Springer, 2016. URL: https://doi.org/10.1007/978-3-319-43681-4_21, doi:10.1007/978-3-319-43681-4\_21.
  16. Feedback arc set problem and np-hardness of minimum recurrent configuration problem of chip-firing game on directed graphs. Ann. Comb., 19:373–396, 2015. URL: https://link.springer.com/article/10.1007/s00026-015-0266-9, doi:10.1007/s00026-015-0266-9.
  17. Reducing storage requirements for biological sequence comparison. Bioinform., 20(18):3363–3369, 2004. URL: https://doi.org/10.1093/bioinformatics/bth408, doi:10.1093/bioinformatics/bth408.
  18. Winnowing: Local algorithms for document fingerprinting. In Alon Y. Halevy, Zachary G. Ives, and AnHai Doan, editors, Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, San Diego, California, USA, June 9-12, 2003, pages 76–85. ACM, 2003. URL: https://doi.org/10.1145/872757.872770, doi:10.1145/872757.872770.
  19. Space-efficient representation of genomic k-mer count tables. Algorithms Mol. Biol., 17(1):5, 2022. URL: https://doi.org/10.1186/s13015-022-00212-0, doi:10.1186/S13015-022-00212-0.
  20. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome biology, 15(3):R46, 2014.
  21. Daniel H. Younger. Minimum feedback arc sets for a directed graph. IEEE Transactions on Circuit Theory, 10(2):238–245, 1963. doi:10.1109/TCT.1963.1082116.
  22. Improved design and analysis of practical minimizers. Bioinform., 36(Supplement-1):i119–i127, 2020. URL: https://doi.org/10.1093/bioinformatics/btaa472, doi:10.1093/BIOINFORMATICS/BTAA472.
  23. Sequence-specific minimizers via polar sets. Bioinform., 37(Supplement):187–195, 2021. URL: https://doi.org/10.1093/bioinformatics/btab313, doi:10.1093/BIOINFORMATICS/BTAB313.
  24. Creating and using minimizer sketches in computational genomics. J. Comput. Biol., 30(12):1251–1276, 2023. URL: https://doi.org/10.1089/cmb.2023.0094, doi:10.1089/CMB.2023.0094.

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 post and received 0 likes.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube