Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Boolean matrix logic programming for active learning of gene functions in genome-scale metabolic network models (2405.06724v3)

Published 10 May 2024 in q-bio.MN, cs.AI, and cs.LG

Abstract: Techniques to autonomously drive research have been prominent in Computational Scientific Discovery, while Synthetic Biology is a field of science that focuses on designing and constructing new biological systems for useful purposes. Here we seek to apply logic-based machine learning techniques to facilitate cellular engineering and drive biological discovery. Comprehensive databases of metabolic processes called genome-scale metabolic network models (GEMs) are often used to evaluate cellular engineering strategies to optimise target compound production. However, predicted host behaviours are not always correctly described by GEMs, often due to errors in the models. The task of learning the intricate genetic interactions within GEMs presents computational and empirical challenges. To address these, we describe a novel approach called Boolean Matrix Logic Programming (BMLP) by leveraging boolean matrices to evaluate large logic programs. We introduce a new system, $BMLP_{active}$, which efficiently explores the genomic hypothesis space by guiding informative experimentation through active learning. In contrast to sub-symbolic methods, $BMLP_{active}$ encodes a state-of-the-art GEM of a widely accepted bacterial host in an interpretable and logical representation using datalog logic programs. Notably, $BMLP_{active}$ can successfully learn the interaction between a gene pair with fewer training examples than random experimentation, overcoming the increase in experimental design space. $BMLP_{active}$ enables rapid optimisation of metabolic models to reliably engineer biological systems for producing useful compounds. It offers a realistic approach to creating a self-driving lab for microbial engineering.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (82)
  1. Explanatory machine learning for sequential human teaching. Machine Learning, 112:3591–3632, 2023.
  2. Beneficial and harmful explanatory machine learning. Machine Learning, 110:695–721, 2021.
  3. C. Angione. Human Systems Biology and Metabolic Modelling: A Review—From Disease Metabolism to Precision Medicine. BioMed Research International, 2019:e8304260, 2019.
  4. D. Angluin. Queries Revisited. In Proceedings of the 12th International Conference on Algorithmic Learning Theory, ALT ’01, pages 12–31, Berlin, Heidelberg, 2001. Springer-Verlag.
  5. Encoding Higher Level Extensions of Petri Nets in Answer Set Programming. In P. Cabalar and T. C. Son, editors, Logic Programming and Nonmonotonic Reasoning, Lecture Notes in Computer Science, pages 116–121, Berlin, Heidelberg, 2013. Springer.
  6. Genetic Control of Biochemical Reactions in Neurospora. Proceedings of the National Academy of Sciences, 27(11):499–506, 1941.
  7. T. M. Behrens and J. Dix. Model checking multi-agent systems with logic based Petri nets. Annals of Mathematics and Artificial Intelligence, 51(2):81–121, 2007.
  8. C. Berge. Hypergraphs: combinatorics of finite sets. North-Holland mathematical library. North Holland Distributors for the U.S.A. and Canada, Elsevier Science Pub. Co, Amsterdam New York, 1989.
  9. Evaluating E. coli genome-scale metabolic model accuracy with high-throughput mutant fitness data. Molecular Systems Biology, 19(12), 2023.
  10. Occam’s Razor. Information Processing Letters, 24(6):377–380, 1987.
  11. Autonomous chemical research with large language models. Nature, 624(7992):570–578, 2023.
  12. Augmenting large language models with chemistry tools. In NeurIPS 2023 AI for Science Workshop, 2023.
  13. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the National Academy of Sciences, 113(15):3932–3937, 2016.
  14. Combining inductive logic programming, active learning and robotics to discover the function of genes. Electronic Transactions in Artificial Intelligence, pages 1–36, 2001.
  15. What you always wanted to know about Datalog (and never dared to ask). IEEE Transactions on Knowledge and Data Engineering, 1(1):146–166, 1989.
  16. Violacein: Properties and Production of a Versatile Bacterial Pigment. BioMed Research International, 2015:1–8, 2015.
  17. TensorLog: A Probabilistic Database Implemented Using Deep-Learning Infrastructure. Journal of Artificial Intelligence Research, 67:285–325, 2020.
  18. Improving generalization with active learning. Machine Learning, 15(2):201–221, 1994.
  19. D. Conklin and I. H. Witten. Complexity-based induction. Machine Learning, 16(3):203–225, 1994.
  20. I. M. Copilowish. Matrix Development of the Calculus of Relations. The Journal of Symbolic Logic, 13(4):193–203, 1948.
  21. Global Genetic Networks and the Genotype-to-Phenotype Relationship. Cell, 177(1):85–100, 2019.
  22. A global genetic interaction network maps a wiring diagram of cellular function. Science, 353(6306), 2016.
  23. Inductive logic programming at 30. Machine Learning, 111:147–172, 2021.
  24. Abductive Knowledge Induction from Raw Data. In Z.-H. Zhou, editor, Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, pages 1845–1851. International Joint Conferences on Artificial Intelligence Organization, 2021.
  25. S. Dasgupta. Coarse sample complexity bounds for active learning. In Advances in Neural Information Processing Systems, volume 18. MIT Press, 2005.
  26. L. De Raedt and A. Kimmig. Probabilistic (logic) programming concepts. Machine Learning, 100(1):5–47, 2015.
  27. Encoding Reversing Petri Nets in Answer Set Programming. In I. Lanese and M. Rawski, editors, Reversible Computation, Lecture Notes in Computer Science, pages 264–271, Cham, 2020. Springer International Publishing.
  28. A. Domenici. Petri nets in logic. Microprocessing and Microprogramming, 30(1):193–198, 1990.
  29. R. Evans and E. Grefenstette. Learning Explanatory Rules from Noisy Data. Journal of Artificial Intelligence Research, 61:1–64, 2018.
  30. A neural-mechanistic hybrid approach improving the predictive power of genome-scale metabolic models. Nature Communications, 14(1):4669, 2023.
  31. Boolean matrix multiplication and transitive closure. In 12th Annual Symposium on Switching and Automata Theory (SWAT 1971), pages 129–131, 1971.
  32. M. Gelfond and V. Lifschitz. The Stable Model Semantics for Logic Programming. In Proceedings of International Logic Programming Conference and Symposium, pages 1070–1080, 1988.
  33. E. Grefenstette. Towards a Formal Distributional Semantics: Simulating Logical Calculi with Tensors. In Proceedings of the Second Joint Conference on Lexical and Computational Semantics (SEM), pages 1–10, 2013.
  34. A Bayesian machine scientist to aid in the solution of challenging scientific problems. Science Advances, 6(5), 2020.
  35. Machine learning applied to enzyme turnover numbers reveals protein structural correlates and improves metabolic models. Nature Communications, 9(1):5252, 2018.
  36. C. Hocquette and S. Muggleton. How Much Can Experimental Cost Be Reduced in Active Learning of Agent Strategies? In F. Riguzzi, E. Bellodi, and R. Zese, editors, Inductive Logic Programming, volume 11105, pages 38–53. Springer International Publishing, Cham, 2018.
  37. Y. E. Ioannidis. On the Computation of the Transitive Closure of Relational Operators. In Proceedings of the 12th International Conference on Very Large Data Bases, VLDB ’86, pages 403–411, San Francisco, CA, USA, 1986.
  38. The Automation of Science. Science, 324(5923):85–89, 2009.
  39. Functional genomic hypothesis generation and experimentation by a robot scientist. Nature, 427:247–252, 2004.
  40. BiGG Models: A platform for integrating, standardizing and sharing genome-scale models. Nucleic Acids Research, 44(D1):D515–D522, 2016.
  41. On scientific understanding with artificial intelligence. Nature Reviews Physics, 4:761–769, 2022.
  42. Scientific Discovery: Computational Explorations of the Creative Process. The MIT Press, 1987.
  43. F. Lin. From Satisfiability to Linear Algebra. Technical report, Hong Kong University of Science and Technology, 2013.
  44. J. W. Lloyd. Foundations of Logic Programming. Springer Science and Business Media, 2012.
  45. New insights into Escherichia coli metabolism: carbon scavenging, acetate metabolism and carbon recycling responses during growth on glycerol. Microbial Cell Factories, 11(1):46, 2012.
  46. D. Michie. Machine learning in the next five years. In Proceedings of the 3rd European Working Session on Learning, pages 107–122. Pitman, 1988.
  47. T. M. Mitchell. Generalization as search. Artificial Intelligence, 18:203–226, 1982.
  48. iML1515, a knowledgebase that computes escherichia coli traits. Nature Biotechnology, 35(10):904–908, 2017.
  49. S. H. Muggleton. Inductive logic programming. New Generation Computing, 8:295–318, 1991.
  50. S. H. Muggleton. Inverse entailment and Progol. New Generation Computing, 13:245–286, 1995.
  51. S. H. Muggleton. Hypothesizing an algorithm from one example: the role of specificity. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 381(2251):20220046, 2023.
  52. Theory completion using inverse entailment. In Proceedings of the 10th International Workshop on Inductive Logic Programming (ILP-00), pages 130–146, 2000.
  53. Ultra-strong machine learning: Comprehensibility of programs learned with ILP. Machine Learning, 107:1119–1140, 2018.
  54. S. Nienhuys-Cheng and R. Wolf. Foundations of Inductive Logic Programming. Springer-Verlag New York, Inc., 1997.
  55. Machine learning framework for assessment of microbial factory performance. PLOS ONE, 14(1):e0210558, 2019.
  56. B. O. Palsson. Systems Biology: Constraint-based Reconstruction and Analysis. Cambridge University Press, Cambridge, 2015.
  57. C. S. Peirce. Collected Papers of Charles Sanders Peirce, Volumes II. Harvard University Press, 1932.
  58. Deep symbolic regression: Recovering mathematical expressions from data via risk-seeking policy gradients. In Proceedings of the 9th International Conference on Learning Representations, 2020.
  59. Filling gaps in bacterial amino acid biosynthesis pathways with high-throughput genetics. PLOS Genetics, 14(1):e1007147, 2018.
  60. Recent advances on constraint-based models by integrating machine learning. Current Opinion in Biotechnology, 64:85–91, 2020.
  61. W. Reisig. Understanding Petri Nets: Modeling Techniques, Analysis Methods, Case Studies. Springer, Berlin, Heidelberg, 2013.
  62. Autonomous experiments using active learning and AI. Nature Reviews Materials, 8(9):563–564, 2023.
  63. G. Rozenberg and J. Engelfriet. Elementary net systems. In Reisig, W., Rozenberg, G. (eds) Lectures on Petri Nets I: Basic Models. ACPN 1996. Lecture Notes in Computer Science, volume 1491. Springer Berlin Heidelberg, 1998.
  64. Advances in flux balance analysis by integrating machine learning and mechanism-based models. Computational and Structural Biotechnology Journal, 19:4626–4640, 2021.
  65. Logic programming in tensor spaces. Annals of Mathematics and Artificial Intelligence, 89(12):1133–1153, 2021.
  66. T. Sato. A linear algebraic approach to datalog evaluation. Theory and Practice of Logic Programming, 17(3):244–265, 2017.
  67. T. Sato and K. Inoue. Differentiable learning of matricized DNFs and its application to Boolean networks. Machine Learning, 112(8):2821–2843, 2023.
  68. Abducing Relations in Continuous Spaces. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, pages 1956–1962, 2018.
  69. T. Sato and R. Kojima. Boolean Network Learning in Vector Spaces for Genome-wide Network Analysis. In Proceedings of the Eighteenth International Conference on Principles of Knowledge Representation and Reasoning, pages 560–569, Hanoii, Vietnam, 2021. International Joint Conferences on Artificial Intelligence Organization.
  70. Deep learning meets metabolomics: a methodological perspective. Briefings in Bioinformatics, 22(2):1531–1542, 2021.
  71. C. Shannon and W. Weaver. The Mathematical Theory of Communication. University of Illinois Press, Urbana, 1963.
  72. A. Srinivasan. The ALEPH manual. Machine Learning at the Computing Laboratory, Oxford University, 2001.
  73. A. Srinivasan and M. Bain. Knowledge-Guided Identification of Petri Net Models of Large Biological Systems. In S. H. Muggleton, A. Tamaddoni-Nezhad, and F. A. Lisi, editors, Inductive Logic Programming, Lecture Notes in Computer Science, pages 317–331, 2012.
  74. A. Tarski. A lattice-theoretical fixpoint theorem and its applications. Pacific Journal of Mathematics, 5(2):85–309, 1955.
  75. L. Todorovski and S. Džeroski. Integrating knowledge-driven and data-driven approaches to modeling. Ecological Modelling, 194(1):3–13, 2006.
  76. C. Tosh and S. Dasgupta. Diameter-Based Active Learning. In Proceedings of the 34th International Conference on Machine Learning, pages 3444–3452. PMLR, 2017.
  77. The Semantics of Predicate Logic as a Programming Language. Journal of the ACM, 23(4):733–742, 1976.
  78. Pooled CRISPR interference screening enables genome-scale functional genomics study in bacteria with superior performance. Nature Communications, 9(1):2475, 2018.
  79. Rapid Prediction of Bacterial Heterotrophic Fluxomics Using Machine Learning and Constraint Programming. PLOS Computational Biology, 12(4), 2016.
  80. Differentiable learning of logical rules for knowledge base reasoning. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, pages 2316–2325, Red Hook, NY, USA, 2017. Curran Associates Inc.
  81. A White-Box Machine Learning Approach for Revealing Antibiotic Mechanisms of Action. Cell, 177(6):1649–1661, 2019.
  82. Regulatory mechanisms underlying coordination of amino acid and glucose catabolism in Escherichia coli. Nature Communications, 10(1):3354, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Lun Ai (7 papers)
  2. Stephen H. Muggleton (15 papers)
  3. Shi-Shun Liang (4 papers)
  4. Geoff S. Baldwin (5 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.