Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A step toward a reinforcement learning de novo genome assembler (2102.02649v4)

Published 2 Feb 2021 in q-bio.GN, cs.AI, and cs.LG

Abstract: De novo genome assembly is a relevant but computationally complex task in genomics. Although de novo assemblers have been used successfully in several genomics projects, there is still no 'best assembler', and the choice and setup of assemblers still rely on bioinformatics experts. Thus, as with other computationally complex problems, machine learning may emerge as an alternative (or complementary) way for developing more accurate and automated assemblers. Reinforcement learning has proven promising for solving complex activities without supervision - such games - and there is a pressing need to understand the limits of this approach to 'real' problems, such as the DFA problem. This study aimed to shed light on the application of machine learning, using reinforcement learning (RL), in genome assembly. We expanded upon the sole previous approach found in the literature to solve this problem by carefully exploring the learning aspects of the proposed intelligent agent, which uses the Q-learning algorithm, and we provided insights for the next steps of automated genome assembly development. We improved the reward system and optimized the exploration of the state space based on pruning and in collaboration with evolutionary computing. We tested the new approaches on 23 new larger environments, which are all available on the internet. Our results suggest consistent performance progress; however, we also found limitations, especially concerning the high dimensionality of state and action spaces. Finally, we discuss paths for achieving efficient and automated genome assembly in real scenarios considering successful RL applications - including deep reinforcement learning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. Portin P, Wilkins A. The Evolving Definition of the Term “Gene”. Genetics. 2017;205(4):1353–1364. doi:10.1534/genetics.116.196956.
  2. Heather JM, Chain B. The sequence of sequencers: The history of sequencing DNA. Genomics. 2016;107(1):1–8. doi:10.1016/j.ygeno.2015.11.003.
  3. MetaSort untangles metagenome assembly by reducing microbial community complexity. Nature Communications. 2017;8(1). doi:10.1038/ncomms14306.
  4. Microbial dark matter filling the niche in hypersaline microbial mats. Microbiome. 2020;8(1). doi:10.1186/s40168-020-00910-0.
  5. Computability of Models for Sequence Assembly. In: Lecture Notes in Computer Science; 2007. p. 289–301.
  6. QUAST: quality assessment tool for genome assemblies. Bioinformatics. 2013;29(8):1072–1075. doi:10.1093/bioinformatics/btt086.
  7. Machine learning meets genome assembly. Brief in Bioinformatics. 2018;doi:10.1093/bib/bby072.
  8. Yassine A, Riffi ME. A Review on Machine-Learning and Nature-Inspired Algorithms for Genome Assembly. International Journal of Advanced Computer Science and Applications (IJACSA). 2023;14(7):898. doi:10.14569/issn.2156-5570.
  9. LeCun Y. Deep Learning Hardware: Past, Present, and Future. In: 2019 IEEE International Solid- State Circuits Conference - (ISSCC); 2019.
  10. Reinforcement Learning, Fast and Slow. Trends in Cognitive Sciences. 2019;23(5):408–422. doi:10.1016/j.tics.2019.02.006.
  11. Sutton RS, Barto AG. Reinforcement Learning: An Introduction. Cambridge, MA, USA: A Bradford Book; 2018. Available from: http://incompleteideas.net/book/RLbook2020.pdf.
  12. A Reinforcement Learning Approach for Solving the Fragment Assembly Problem. In: 2011 13th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing; 2011.
  13. Genome Assembly Using Reinforcement Learning. In: Kowada L, de Oliveira D, editors. Advances in Bioinformatics and Computational Biology”. Cham: Springer International Publishing; 2020. p. 16–28.
  14. Introduction to Algorithms, Third Edition. Cambridge, MA, USA: The MIT Press; 2009.
  15. Grinstead CM, Snell JL. Introduction to Probability; 2012. Available from: https://books.google.com.br/books?id=7ip55ODL72wC.
  16. Smith TF, Waterman MS. Identification of common molecular subsequences. Journal of Molecular Biology. 1981;147(1):195–197. doi:10.1016/0022-2836(81)90087-5.
  17. Challenges of Real-World Reinforcement Learning. In: ICML 2019 Workshop on Reinforcement Learning for Real Life (RLRL); 2019.
  18. Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards. In: Wallach HM, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox EB, Garnett R, editors. Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada; 2019. p. 10376–10386.
  19. Baluja S, Caruana R. Removing The Genetics from The Standard Genetic Algorithm. In: In Proceedings of ICML’95. California: Elsevier; 1995. p. 38–46.
  20. Konar A. Evolutionary Computing Algorithms. In: Computational Intelligence; 2005. p. 323–351.
  21. Epsilon-BMC: A Bayesian Ensemble Approach to Epsilon-Greedy Exploration in Model-Free Reinforcement Learning. In: Adams RP, Gogate V, editors. Proceedings of Machine Learning Research. vol. 115. Tel Aviv, Israel: PMLR; 2020. p. 476–485.
  22. Peterson EJ, Verstynen TD. A way around the exploration-exploitation dilemma. bioRxiv. 2019;doi:10.1101/671362.
  23. Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. GigaScience. 2013;2(1). doi:10.1186/2047-217x-2-10.
  24. Roughgarden T. Algorithms Illuminated (Part 4): Algorithms for NP-Hard Problems. Algorithms Illuminated; 2020. Available from: https://books.google.com.br/books?id=FlmuzQEACAAJ.
  25. Yu Y. Towards Sample Efficient Reinforcement Learning. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence. IJCAI’18; 2018. p. 5739–5743.
  26. Barto AG. Intrinsic Motivation and Reinforcement Learning. In: Intrinsically Motivated Learning in Natural and Artificial Systems; 2012. p. 17–47.
  27. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature. 2019;575(7782):350–354. doi:10.1038/s41586-019-1724-z.
  28. Mastering the game of Go without human knowledge. Nature. 2017;550(7676):354–359. doi:10.1038/nature24270.
  29. Fjelland R. Why general artificial intelligence will not be realized. Humanities and Social Sciences Communications. 2020;7(1). doi:10.1057/s41599-020-0494-4.
  30. Game Adaptation by Using Reinforcement Learning Over Meta Games. Group Decision and Negotiation. 2020;doi:10.1007/s10726-020-09652-8.
  31. Cook WJ. In: Pushing the Limits; 2012. p. 211–212. Available from: http://www.jstor.org/stable/j.ctt7t8kc.15.
  32. Comparison of the two major classes of assembly algorithms: overlap–layout–consensus and de-bruijn-graph. Briefings in Functional Genomics. 2011;11(1):25–37. doi:10.1093/bfgp/elr035.
  33. Abstraction and Generalization in Reinforcement Learning: A Summary and Framework. In: Adaptive and Learning Agents; 2010. p. 1–32.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Kleber Padovani (1 paper)
  2. Roberto Xavier (1 paper)
  3. Rafael Cabral Borges (1 paper)
  4. Anna Reali (1 paper)
  5. Annie Chateau (2 papers)
  6. Ronnie Alves (7 papers)
  7. Andre Carvalho (4 papers)
Citations (1)
Youtube Logo Streamline Icon: https://streamlinehq.com