Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Empowering Biomedical Discovery with AI Agents (2404.02831v2)

Published 3 Apr 2024 in cs.AI

Abstract: We envision "AI scientists" as systems capable of skeptical learning and reasoning that empower biomedical research through collaborative agents that integrate AI models and biomedical tools with experimental platforms. Rather than taking humans out of the discovery process, biomedical AI agents combine human creativity and expertise with AI's ability to analyze large datasets, navigate hypothesis spaces, and execute repetitive tasks. AI agents are poised to be proficient in various tasks, planning discovery workflows and performing self-assessment to identify and mitigate gaps in their knowledge. These agents use LLMs and generative models to feature structured memory for continual learning and use machine learning tools to incorporate scientific knowledge, biological principles, and theories. AI agents can impact areas ranging from virtual cell simulation, programmable control of phenotypes, and the design of cellular circuits to developing new therapies.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (207)
  1. Hiroaki Kitano. Nobel turing challenge: creating the engine for scientific discovery. NPJ systems biology and applications, 7(1):29, 2021.
  2. Autonomous chemical research with large language models. Nature, 624(7992):570–578, 2023.
  3. Augmenting large language models with chemistry tools. In NeurIPS 2023 AI for Science Workshop, 2023. URL https://openreview.net/forum?id=wdGIL6lx3l.
  4. The rise and potential of large language model based agents: A survey. arXiv preprint arXiv:2309.07864, 2023.
  5. Large language model based multi-agents: A survey of progress and challenges. arXiv preprint arXiv:2402.01680, 2024.
  6. Scientific discovery in the age of artificial intelligence. Nature, 620(7972):47–60, 2023a.
  7. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  8. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023.
  9. Improving language understanding by generative pre-training. 2018.
  10. Chatgpt for robotics: Design principles and model abilities. Microsoft Auton. Syst. Robot. Res, 2:20, 2023.
  11. Significant Gravitas. Autogpt, 2023. URL https://agpt.co.
  12. React: Synergizing reasoning and acting in language models. In The Eleventh International Conference on Learning Representations, 2023a. URL https://openreview.net/forum?id=WE_vluYUL-X.
  13. Reflexion: Language agents with verbal reinforcement learning. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  14. Autogen: Enabling next-gen llm applications via multi-agent conversation framework. arXiv preprint arXiv:2308.08155, 2023a.
  15. Progprompt: Generating situated robot task plans using large language models. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 11523–11530, 2023. doi: 10.1109/ICRA48891.2023.10161317.
  16. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In International Conference on Machine Learning, pages 9118–9147. PMLR, 2022a.
  17. On scientific understanding with artificial intelligence. Nature Reviews Physics, 4(12):761–769, 2022.
  18. Trustllm: Trustworthiness in large language models. arXiv:2401.05561, 2024.
  19. Understanding catastrophic forgetting in language models via implicit inference. arXiv:2309.10105, 2023.
  20. Ethics of large language models in medicine and medical research. The Lancet Digital Health, 5(6):e333–e335, 2023a.
  21. Unreliable llm bioethics assistants: Ethical and pedagogical risks. The American Journal of Bioethics, 23(10):89–91, 2023.
  22. The ethics of interaction: Mitigating security threats in llms. arXiv:2401.12273, 2024.
  23. Causal protein-signaling networks derived from multiparameter single-cell data. Science, 308(5721):523–529, 2005.
  24. Msa transformer. In International Conference on Machine Learning, pages 8844–8856. PMLR, 2021.
  25. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science, 379(6637):1123–1130, 2023.
  26. Accurate prediction of protein structures and interactions using a three-track neural network. Science, 373(6557):871–876, 2021.
  27. Predicting the sequence specificities of dna-and rna-binding proteins by deep learning. Nature biotechnology, 33(8):831–838, 2015.
  28. Transfer learning enables predictions in network biology. Nature, pages 1–9, 2023.
  29. Translation of genotype to phenotype by a hierarchy of cell subsystems. Cell systems, 2(2):77–88, 2016.
  30. Gene expression correlates of clinical prostate cancer behavior. Cancer cell, 1(2):203–209, 2002.
  31. Diffuse large b-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nature medicine, 8(1):68–74, 2002.
  32. Predicting drug response and synergy using a deep learning model of human cancer cells. Cancer cell, 38(5):672–684, 2020.
  33. A small-molecule tnik inhibitor targets fibrosis in preclinical and clinical models. Nature Biotechnology, March 2024. ISSN 1087-0156, 1546-1696. doi: 10.1038/s41587-024-02143-0. URL https://www.nature.com/articles/s41587-024-02143-0.
  34. A deep learning approach to antibiotic discovery. Cell, 180(4):688–702, 2020.
  35. The protein data bank. Nucleic acids research, 28(1):235–242, 2000.
  36. 1000 Genomes Project Consortium et al. An integrated map of genetic variation from 1,092 human genomes. Nature, 491(7422):56, 2012.
  37. Drugbank: a comprehensive resource for in silico drug discovery and exploration. Nucleic acids research, 34(suppl_1):D668–D672, 2006.
  38. Alphafold protein structure database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic acids research, 50(D1):D439–D444, 2022.
  39. Highly accurate protein structure prediction with alphafold. Nature, 596(7873):583–589, 2021.
  40. The anatomy of a large-scale hypertextual web search engine. Computer networks and ISDN systems, 30(1-7):107–117, 1998.
  41. Basic local alignment search tool. Journal of molecular biology, 215(3):403–410, 1990.
  42. The chembl database in 2017. Nucleic acids research, 45(D1):D945–D954, 2017.
  43. Fast and accurate protein structure search with foldseek. Nature Biotechnology, pages 1–4, 2023.
  44. Siren’s song in the ai ocean: a survey on hallucination in large language models. arXiv preprint arXiv:2309.01219, 2023a.
  45. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 2012.
  46. Deep learning. MIT press, 2016.
  47. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  48. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  49. Multi-fidelity active learning with GFlownets, 2024. URL https://openreview.net/forum?id=3QR230r11w.
  50. Training language models to follow instructions with human feedback. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=TG8KACxEON.
  51. Deep learning enables rapid identification of potent ddr1 kinase inhibitors. Nature biotechnology, 37(9):1038–1040, 2019.
  52. Adaptive machine learning for protein engineering. Current Opinion in Structural Biology, 72:145–152, February 2022. ISSN 0959440X. doi: 10.1016/j.sbi.2021.11.002. URL https://linkinghub.elsevier.com/retrieve/pii/S0959440X21001457.
  53. Top-down design of protein architectures with reinforcement learning. Science, 380(6642):266–273, April 2023. ISSN 0036-8075, 1095-9203. doi: 10.1126/science.adf6591.
  54. Deep batch active learning for drug discovery. bioRxiv, pages 2023–07, 2023.
  55. Evidential deep learning for guided molecular property prediction and discovery. ACS Central Science, 7(8):1356–1367, August 2021. ISSN 2374-7943, 2374-7951. doi: 10.1021/acscentsci.1c00546.
  56. Active learning for optimal intervention design in causal models. Nature Machine Intelligence, 5(10):1066–1075, October 2023b. ISSN 2522-5839. doi: 10.1038/s42256-023-00719-0.
  57. Optimizing risk-based breast cancer screening policies with reinforcement learning. Nature Medicine, 28(1):136–143, January 2022. ISSN 1078-8956, 1546-170X. doi: 10.1038/s41591-021-01599-w. URL https://www.nature.com/articles/s41591-021-01599-w.
  58. Cognitive architectures for language agents. Transactions on Machine Learning Research, 2024. ISSN 2835-8856. URL https://openreview.net/forum?id=1i6ZCvflQJ. Survey Certification.
  59. A survey on large language model based autonomous agents. Frontiers of Computer Science, 18(6):1–26, 2024.
  60. Finetuned language models are zero-shot learners. In International Conference on Learning Representations, 2022a. URL https://openreview.net/forum?id=gEZrGCozdqR.
  61. A visual–language foundation model for pathology image analysis using medical twitter. Nature medicine, 29(9):2307–2316, 2023a.
  62. Can generalist foundation models outcompete special-purpose tuning? case study in medicine. arXiv preprint arXiv:2311.16452, 2023.
  63. Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, pages 1–22, 2023.
  64. Biogpt: generative pre-trained transformer for biomedical text generation and mining. Briefings in Bioinformatics, 23(6):bbac409, 2022.
  65. Health system-scale language models are all-purpose prediction engines. Nature, pages 1–6, 2023.
  66. Large language models encode clinical knowledge. Nature, 620(7972):172–180, 2023a.
  67. Towards expert-level medical question answering with large language models. arXiv preprint arXiv:2305.09617, 2023b.
  68. Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc., 2020.
  69. Voyager: An open-ended embodied agent with large language models. In Intrinsically-Motivated and Open-Ended Learning Workshop @NeurIPS2023, 2023b. URL https://openreview.net/forum?id=nfx5IutEed.
  70. Promptbreeder: Self-referential self-improvement via prompt evolution, 2024. URL https://openreview.net/forum?id=HKkiX32Zw1.
  71. Large language models as optimizers. arXiv preprint arXiv:2309.03409, 2023.
  72. Yann LeCun. A path towards autonomous machine intelligence. Open Review, 62(1), 2022.
  73. Can large language models provide useful feedback on research papers? a large-scale empirical analysis. arXiv preprint arXiv:2310.01783, 2023a.
  74. Reconcile: Round-table conference improves reasoning via consensus among diverse llms. arXiv preprint arXiv:2309.13007, 2023.
  75. Biological research and self-driving labs in deep space supported by artificial intelligence. Nature Machine Intelligence, 5(3):208–219, 2023.
  76. Advancing mathematics by guiding human intuition with ai. Nature, 600(7887):70–74, 2021.
  77. A brief history of the hypothesis. Cell, 134(3):378–381, 2008.
  78. In silico protein interaction screening uncovers donson’s role in replication initiation. Science, 381(6664):eadi3448, 2023.
  79. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022b.
  80. Automated bioinformatics analysis via autoba. arXiv preprint arXiv:2309.03242, 2023a.
  81. Medagents: Large language models as collaborators for zero-shot medical reasoning. arXiv preprint arXiv:2311.10537, 2023.
  82. De novo drug design using reinforcement learning with multiple gpt agents. In Thirty-seventh Conference on Neural Information Processing Systems, 2023a.
  83. Levels of agi: Operationalizing progress on the path to agi. arXiv preprint arXiv:2311.02462, 2023.
  84. Dual use of artificial-intelligence-powered drug discovery. Nature Machine Intelligence, 4(3):189–191, March 2022. ISSN 2522-5839. doi: 10.1038/s42256-022-00465-9. URL https://www.nature.com/articles/s42256-022-00465-9.
  85. Prioritizing safeguarding over autonomy: Risks of llm agents for science. arXiv preprint arXiv:2402.04247, 2024.
  86. Protein design meets biosecurity, 2024.
  87. A tutorial on conducting genome-wide association studies: Quality control and statistical analysis. Int. J. Methods Psychiatr. Res., 27(2):e1608, June 2018.
  88. Genome-wide association studies. Nature Reviews Methods Primers, 1(1):59, Aug 2021. ISSN 2662-8449. doi: 10.1038/s43586-021-00056-9. URL https://doi.org/10.1038/s43586-021-00056-9.
  89. Felix W Frueh. Real-world clinical effectiveness, regulatory transparency and payer coverage: three ingredients for translating pharmacogenomics into clinical practice. Pharmacogenomics, 11(5):657–660, May 2010.
  90. CP Panayiotopoulos. The Epilepsies: Seizures, Syndromes and Management. Bladon Medical Publishing, Oxfordshire, UK, 2005.
  91. International League Against Epilepsy Consortium on Complex Epilepsies. Gwas meta-analysis of over 29,000 people with epilepsy identifies 26 risk loci and subtype-specific genetic architecture. Nature Communications, 9(1):1–12, 2018.
  92. Uk biobank: An open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS medicine, 12(3):e1001779, 2015.
  93. Identification of potential disease-associated variants in idiopathic generalized epilepsy using targeted sequencing. Journal of Human Genetics, 69(2):59–67, Feb 2024. ISSN 1435-232X. doi: 10.1038/s10038-023-01208-3. URL https://doi.org/10.1038/s10038-023-01208-3.
  94. Genes4Epilepsy: An epilepsy gene resource. Epilepsia, 64(5):1368–1375, May 2023.
  95. Recruitment strategies and lessons learned from a large genetic study of African Americans. PLOS Glob. Public Health, 2(8):e0000416, August 2022.
  96. Brahim Aissani. Confounding by linkage disequilibrium. Journal of Human Genetics, 59(2):110–115, 2014. ISSN 1435-232X. doi: 10.1038/jhg.2013.130. URL https://doi.org/10.1038/jhg.2013.130.
  97. The human cell atlas. eLife, 6:e27041, December 2017. ISSN 2050-084X. doi: 10.7554/eLife.27041. URL https://elifesciences.org/articles/27041.
  98. A next generation connectivity map: L1000 platform and the first 1,000,000 profiles. Cell, 171(6):1437–1452.e17, November 2017. ISSN 00928674. doi: 10.1016/j.cell.2017.10.049. URL https://linkinghub.elsevier.com/retrieve/pii/S0092867417313090.
  99. A proteome-wide atlas of drug mechanism of action. Nature Biotechnology, 41(6):845–857, June 2023. ISSN 1087-0156, 1546-1696. doi: 10.1038/s41587-022-01539-0. URL https://www.nature.com/articles/s41587-022-01539-0.
  100. Next-generation characterization of the cancer cell line encyclopedia. Nature, 569(7757):503–508, May 2019. ISSN 0028-0836, 1476-4687. doi: 10.1038/s41586-019-1186-3. URL https://www.nature.com/articles/s41586-019-1186-3.
  101. Three million images and morphological profiles of cells treated with matched chemical and genetic perturbations. Biorxiv, pages 2022–01, 2022.
  102. Rxrx3: Phenomics map of biology. bioRxiv, pages 2023–02, 2023.
  103. Convolutional networks for supervised mining of molecular patterns within cellular context. Nature Methods, 20(2):284–294, February 2023. ISSN 1548-7091, 1548-7105. doi: 10.1038/s41592-022-01746-2. URL https://www.nature.com/articles/s41592-022-01746-2.
  104. Serial lift-out: sampling the molecular anatomy of whole organisms. Nature Methods, pages 1–9, 2023.
  105. Emma Lundberg and Georg H. H. Borner. Spatial proteomics: a powerful discovery tool for cell biology. Nature Reviews Molecular Cell Biology, 20(5):285–302, May 2019. ISSN 1471-0072, 1471-0080. doi: 10.1038/s41580-018-0094-y. URL https://www.nature.com/articles/s41580-018-0094-y.
  106. Opencell: Endogenous tagging for the cartography of human cellular organization. Science, 375(6585):eabi6983, March 2022. ISSN 0036-8075, 1095-9203. doi: 10.1126/science.abi6983. URL https://www.science.org/doi/10.1126/science.abi6983.
  107. Building the next generation of virtual cells to understand cellular biology. Biophysical Journal, 122(18):3560–3569, September 2023. ISSN 00063495. doi: 10.1016/j.bpj.2023.04.006. URL https://linkinghub.elsevier.com/retrieve/pii/S0006349523002369.
  108. Contextualizing protein representations using deep learning on protein networks and single-cell data. Nature Methods, 2024.
  109. Digital pipette: open hardware for liquid transfer in self-driving laboratories. Digital Discovery, 2(6):1745–1751, 2023. doi: 10.1039/D3DD00115F. URL https://pubs.rsc.org/en/content/articlelanding/2023/dd/d3dd00115f.
  110. Systematic profiling of conditional pathway activation identifies context-dependent synthetic lethalities. Nature Genetics, 55(10):1709–1720, October 2023. ISSN 1061-4036, 1546-1718. doi: 10.1038/s41588-023-01515-7. URL https://www.nature.com/articles/s41588-023-01515-7.
  111. Slide-tags enables single-nucleus barcoding for multimodal spatial genomics. Nature, 625(7993):101–109, January 2024. ISSN 0028-0836, 1476-4687. doi: 10.1038/s41586-023-06837-4. URL https://www.nature.com/articles/s41586-023-06837-4.
  112. Proximity extension assay in combination with next-generation sequencing for high-throughput proteome-wide analysis. Molecular and Cellular Proteomics, 20:100168, 2021. ISSN 15359476. doi: 10.1016/j.mcpro.2021.100168. URL https://linkinghub.elsevier.com/retrieve/pii/S1535947621001407.
  113. Spatial single-cell mass spectrometry defines zonation of the hepatocyte proteome. Nature Methods, 20(10):1530–1536, October 2023. ISSN 1548-7091, 1548-7105. doi: 10.1038/s41592-023-02007-6. URL https://www.nature.com/articles/s41592-023-02007-6.
  114. High-plex protein and whole transcriptome co-mapping at cellular resolution with spatial cite-seq. Nature Biotechnology, 41(10):1405–1409, October 2023a. ISSN 1087-0156, 1546-1696. doi: 10.1038/s41587-023-01676-0. URL https://www.nature.com/articles/s41587-023-01676-0.
  115. Perturb-seq: Dissecting molecular circuits with scalable single-cell rna profiling of pooled genetic screens. Cell, 167(7):1853–1866.e17, December 2016. ISSN 00928674. doi: 10.1016/j.cell.2016.11.038. URL https://linkinghub.elsevier.com/retrieve/pii/S0092867416316105.
  116. Simultaneous crispr screening and spatial transcriptomics reveals intracellular, intercellular, and functional transcriptional circuits. Biorxiv, 2023.
  117. Diffdock: Diffusion steps, twists, and turns for molecular docking. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=kKF8_K-mBbS.
  118. Robust deep learning–based protein sequence design using proteinmpnn. Science, 378(6615):49–56, 2022.
  119. De novo design of protein structure and function with rfdiffusion. Nature, 620(7976):1089–1100, 2023.
  120. Discovery of a structural class of antibiotics with explainable deep learning. Nature, pages 1–9, 2023.
  121. Scaling deep learning for materials discovery. Nature, pages 1–6, 2023.
  122. Drugging the’undruggable’cancer targets. Nature Reviews Cancer, 17(8):502–508, 2017.
  123. Mitochondrial fragmentation drives selective removal of deleterious mtdna in the germline. Nature, 570(7761):380–384, 2019.
  124. Camel: Communicative agents for" mind" exploration of large language model society. In Thirty-seventh Conference on Neural Information Processing Systems, 2023b.
  125. Visual instruction tuning. In Thirty-seventh Conference on Neural Information Processing Systems, 2023b. URL https://openreview.net/forum?id=w0H2xGHlkw.
  126. Video-LLaMA: An instruction-tuned audio-visual language model for video understanding. In Yansong Feng and Els Lefever, editors, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 543–553, Singapore, December 2023c. Association for Computational Linguistics. doi: 10.18653/v1/2023.emnlp-demo.49. URL https://aclanthology.org/2023.emnlp-demo.49.
  127. Deep learning tools for the measurement of animal behavior in neuroscience. Current Opinion in Neurobiology, 60:1–11, February 2020. ISSN 09594388. doi: 10.1016/j.conb.2019.10.008. URL https://linkinghub.elsevier.com/retrieve/pii/S0959438819301151.
  128. Live-imaging of astrocyte morphogenesis and function in zebrafish neural circuits. Nature Neuroscience, 23(10):1297–1306, October 2020. ISSN 1097-6256, 1546-1726. doi: 10.1038/s41593-020-0703-x. URL https://www.nature.com/articles/s41593-020-0703-x.
  129. PaLM-e: An embodied multimodal language model. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors, Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 8469–8488. PMLR, 23–29 Jul 2023. URL https://proceedings.mlr.press/v202/driess23a.html.
  130. Proteome-wide mapping of short-lived proteins in human cells. Molecular cell, 81(22):4722–4735, 2021.
  131. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  132. MiniGPT-4: Enhancing vision-language understanding with advanced large language models. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=1tZbq88f27.
  133. Flamingo: a visual language model for few-shot learning. Advances in Neural Information Processing Systems, 35:23716–23736, 2022.
  134. InstructBLIP: Towards general-purpose vision-language models with instruction tuning. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=vvoWPYqZJA.
  135. Next-gpt: Any-to-any multimodal llm. arXiv preprint arXiv:2309.05519, 2023b.
  136. Introducing our multimodal models, 2023. URL https://www.adept.ai/blog/fuyu-8b.
  137. Revisiting human-agent communication: The importance of joint co-construction and understanding mental states. Frontiers in Psychology, 12:580955, 2021.
  138. Inner monologue: Embodied reasoning through planning with language models. In 6th Annual Conference on Robot Learning, 2022b. URL https://openreview.net/forum?id=3R3Pz5i0tye.
  139. Direct preference optimization: Your language model is secretly a reward model. arXiv preprint arXiv:2305.18290, 2023.
  140. Self-adaptive large language model (llm)-based multiagent systems. In 2023 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C), pages 104–109. IEEE, 2023.
  141. Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in neural information processing systems, 30, 2017.
  142. MetaGPT: Meta programming for multi-agent collaborative framework. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=VtmBAGCN7o.
  143. Building cooperative embodied agents modularly with large language models. In The Twelfth International Conference on Learning Representations, 2024a. URL https://openreview.net/forum?id=EnXJfQqy0K.
  144. Encouraging divergent thinking in large language models through multi-agent debate. arXiv preprint arXiv:2305.19118, 2023b.
  145. Improving language model negotiation with self-play and in-context learning from ai feedback. arXiv preprint arXiv:2305.10142, 2023.
  146. Roco: Dialectic multi-robot collaboration with large language models. arXiv preprint arXiv:2307.04738, 2023.
  147. Can language models teach weaker agents? teacher explanations improve students via theory of mind. arXiv preprint arXiv:2306.09299, 2023.
  148. Epidemic modeling with generative agents. arXiv preprint arXiv:2307.04986, 2023.
  149. Social simulacra: Creating populated prototypes for social computing systems. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology, pages 1–18, 2022.
  150. Talm: Tool augmented language models. arXiv preprint arXiv:2205.12255, 2022.
  151. Toolformer: Language models can teach themselves to use tools. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=Yacmpz84TH.
  152. Webgpt: Browser-assisted question-answering with human feedback. arXiv preprint arXiv:2112.09332, 2021.
  153. HuggingGPT: Solving AI tasks with chatGPT and its friends in hugging face. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=yHdTscY6Ci.
  154. Chatdb: Augmenting llms with databases as their symbolic memory. arXiv preprint arXiv:2306.03901, 2023b.
  155. A robotic platform for flow synthesis of organic compounds informed by ai planning. Science, 365(6453):eaax1566, 2019.
  156. Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691, 2022.
  157. Zero-shot text-to-image generation. In International Conference on Machine Learning, pages 8821–8831. PMLR, 2021.
  158. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=nZeVKeeFYf9.
  159. Communicative agents for software development. arXiv preprint arXiv:2307.07924, 2023.
  160. Llm as dba. arXiv preprint arXiv:2308.05481, 2023b.
  161. Ghost in the minecraft: Generally capable agents for open-world enviroments via large language models with text-based knowledge and memory. arXiv preprint arXiv:2305.17144, 2023.
  162. Text and code embeddings by contrastive pre-training. arXiv preprint arXiv:2201.10005, 2022.
  163. Memorybank: Enhancing large language models with long-term memory. Proceedings of the AAAI Conference on Artificial Intelligence, 38(17):19724–19731, Mar. 2024a. doi: 10.1609/aaai.v38i17.29946. URL https://ojs.aaai.org/index.php/AAAI/article/view/29946.
  164. QLoRA: Efficient finetuning of quantized LLMs. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=OUIFPHEgJU.
  165. Locating and editing factual associations in gpt. In Advances in Neural Information Processing Systems, 2022.
  166. A comprehensive study of knowledge editing for large language models. arXiv preprint arXiv:2401.01286, 2024b.
  167. Sayplan: Grounding large language models using 3d scene graphs for scalable robot task planning. In 7th Annual Conference on Robot Learning, 2023. URL https://openreview.net/forum?id=wMpOMO0Ss7a.
  168. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality. See https://vicuna. lmsys. org (accessed 14 April 2023), 2023.
  169. Large language models with controllable working memory. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors, Findings of the Association for Computational Linguistics: ACL 2023, pages 1774–1793, Toronto, Canada, July 2023c. Association for Computational Linguistics. doi: 10.18653/v1/2023.findings-acl.112. URL https://aclanthology.org/2023.findings-acl.112.
  170. Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213, 2022.
  171. Llm+ p: Empowering large language models with optimal planning proficiency. arXiv preprint arXiv:2304.11477, 2023c.
  172. Dynamic planning with a llm. arXiv preprint arXiv:2308.06391, 2023.
  173. Igniting language intelligence: The hitchhiker’s guide from chain-of-thought reasoning to language agents. arXiv preprint arXiv:2311.11797, 2023d.
  174. Let’s think outside the box: Exploring leap-of-thought in large language models with creative humor generation. In The IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR), 2024b.
  175. Planning with large language models via corrective re-prompting. In NeurIPS 2022 Foundation Models for Decision Making Workshop, 2022.
  176. Tree of thoughts: Deliberate problem solving with large language models. In Thirty-seventh Conference on Neural Information Processing Systems, 2023b. URL https://openreview.net/forum?id=5Xc1ecxO1h.
  177. Recmind: Large language model powered agent for recommendation. arXiv preprint arXiv:2308.14296, 2023c.
  178. Least-to-most prompting enables complex reasoning in large language models. In The Eleventh International Conference on Learning Representations, 2023c. URL https://openreview.net/forum?id=WZH7099tgfM.
  179. Self-consistency improves chain of thought reasoning in language models. In The Eleventh International Conference on Learning Representations, 2023d. URL https://openreview.net/forum?id=1PL1NIMMrw.
  180. Graph of thoughts: Solving elaborate problems with large language models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 17682–17690, 2024.
  181. Reasoning with language model is planning with world model. In The 2023 Conference on Empirical Methods in Natural Language Processing, 2023. URL https://openreview.net/forum?id=VTWWvYtF1R.
  182. Self-refine: Iterative refinement with self-feedback. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=S37hOerQLB.
  183. Llm-planner: Few-shot grounded planning for embodied agents with large language models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2998–3009, 2023.
  184. Teaching large language models to self-debug. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=KuPixIqPiq.
  185. Embers of autoregression: Understanding large language models through the problem they are trained to solve. arXiv preprint arXiv:2309.13638, 2023.
  186. Reasoning or reciting? exploring the capabilities and limitations of language models through counterfactual tasks. arXiv preprint arXiv:2307.02477, 2023c.
  187. Show your work: Scratchpads for intermediate computation with language models. arXiv preprint arXiv:2112.00114, 2021.
  188. Quantifying uncertainty in answers from any language model via intrinsic and extrinsic confidence assessment. arXiv preprint arXiv:2308.16175, 2023.
  189. Just ask for calibration: Strategies for eliciting calibrated confidence scores from language models fine-tuned with human feedback. In The 2023 Conference on Empirical Methods in Natural Language Processing, 2023. URL https://openreview.net/forum?id=g3faCfrwm7.
  190. Semantic uncertainty: Linguistic invariances for uncertainty estimation in natural language generation. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=VD-AYtP0dve.
  191. A tutorial on conformal prediction. Journal of Machine Learning Research, 9(3), 2008a.
  192. A tutorial on conformal prediction. Journal of Machine Learning Research, 9(12):371–421, 2008b. URL http://jmlr.org/papers/v9/shafer08a.html.
  193. Discovering language model behaviors with model-written evaluations. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors, Findings of the Association for Computational Linguistics: ACL 2023, pages 13387–13434, Toronto, Canada, July 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.findings-acl.847. URL https://aclanthology.org/2023.findings-acl.847.
  194. Fine-tuning aligned language models compromises safety, even when users do not intend to! In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=hTEGyKf0dZ.
  195. Jailbroken: How does LLM safety training fail? In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=jA235JGM09.
  196. Holistic evaluation of language models. Annals of the New York Academy of Sciences, 1525(1):140–146, 2023. doi: https://doi.org/10.1111/nyas.15007. URL https://nyaspubs.onlinelibrary.wiley.com/doi/abs/10.1111/nyas.15007.
  197. GAIA: a benchmark for general AI assistants. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=fibxvahvs3.
  198. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. Transactions on Machine Learning Research, 2023. ISSN 2835-8856. URL https://openreview.net/forum?id=uyTL5Bvosj.
  199. Benchmarking large language models as ai research agents. ArXiv, abs/2310.03302, 2023b. URL https://api.semanticscholar.org/CorpusID:263671541.
  200. Agentbench: Evaluating LLMs as agents. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=zAdUB0aCTQ.
  201. Machine learning for integrating data in biology and medicine: Principles, practice, and opportunities. Information Fusion, 50:71–91, 2019. ISSN 1566-2535. doi: https://doi.org/10.1016/j.inffus.2018.09.012. URL https://www.sciencedirect.com/science/article/pii/S1566253518304482.
  202. Office of Science and Technology Policy. Ai bill of rights. https://www.whitehouse.gov/ostp/ai-bill-of-rights/, 2023. Accessed: 02/2024.
  203. Ai regulation has its own alignment problem: The technical and institutional feasibility of disclosure, registration, licensing, and auditing. George Washington Law Review, 11 2023. Forthcoming, Available at SSRN: https://ssrn.com/abstract=4634443.
  204. Plug in the safety chip: Enforcing constraints for llm-driven robot agents. In International Conference on Robotics and Automation, 2024.
  205. Evaluating large language models trained on code. arXiv:2107.03374, 2021.
  206. Code as policies: Language model programs for embodied control. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 9493–9500. IEEE, 2023c.
  207. Enhancing trust in llm-based ai automation agents: New considerations and future challenges. In International Joint Conference on Artificial Intelligence, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Shanghua Gao (20 papers)
  2. Ada Fang (2 papers)
  3. Yepeng Huang (4 papers)
  4. Valentina Giunchiglia (4 papers)
  5. Ayush Noori (10 papers)
  6. Jonathan Richard Schwarz (11 papers)
  7. Yasha Ektefaie (5 papers)
  8. Jovana Kondic (5 papers)
  9. Marinka Zitnik (79 papers)
Citations (23)

Summary

Empowering Biomedical Discovery with AI Agents

The paper "Empowering Biomedical Discovery with AI Agents" presents a comprehensive perspective on advancing biomedical research through the integration of AI agents. These agents are envisioned as sophisticated systems, capable of skeptical learning and reasoning, integrating ML tools with experimental platforms, and collaborating effectively with human scientists. This document outlines the potential and structure of such AI agents, discusses their autonomy levels, details the required components for their deployment, and addresses the challenges and ethical considerations involved.

Overview of AI Agents in Biomedical Research

The paper introduces the concept of "AI scientists," which refers to AI systems that complement human expertise with the ability to analyze extensive datasets, navigate hypothesis spaces, and execute repetitive tasks. These AI agents are proficient in self-assessment, planning discovery workflows, and incorporating scientific knowledge, biological principles, and theories. The agents can tackle a variety of tasks, from hybrid cell simulation and programmable control of phenotypes to designing cellular circuits and developing new therapies.

Components of AI Agents

The architecture of AI agents integrates four key modules: perception, interaction, memory, and reasoning. These modules are essential for enabling the agents to engage with their environment and make informed decisions:

  1. Perception Modules: These modules allow agents to interpret and assimilate information from various data modalities including text, images, videos, longitudinal biosensor readouts, and genomics profiles. Agents can utilize both text-based interfaces and multi-modal perception capabilities to better model dynamic biological systems.
  2. Interaction Modules: Interaction capabilities are crucial for agents to communicate with scientists, other AI agents, and tools. These modules encompass agent-human interaction, multi-agent interaction, and tool use, allowing agents to execute specified tasks and collaborate on complex goals.
  3. Memory and Learning Modules: These modules provide the agents with the ability to store and recall information, as well as to learn from new data. Long-term memory stores essential knowledge, while short-term memory supports task execution by retaining relevant information temporarily.
  4. Reasoning Modules: Reasoning capabilities enable agents to plan experiments, make decisions on biological hypotheses, and synthesize concepts. The paper distinguishes between direct reasoning and reasoning with feedback, allowing agents to adjust their plans in response to experimental or human feedback.

Levels of Autonomy in AI Agents

The paper proposes a classification of AI agents into four levels of autonomy based on their proficiency in hypothesis generation, experimental design, and reasoning:

  1. Level 0: No AI agent. Traditional ML models are used as tools without generating hypotheses.
  2. Level 1: AI agents act as research assistants, performing narrow and specific tasks defined by scientists.
  3. Level 2: AI agents function as collaborators, capable of forming hypotheses but limited to developing ideas within existing data trends and literature.
  4. Level 3: AI agents operate as scientists, generating novel hypotheses and leveraging experimental designs that uncover new capabilities, achieving sophisticated reasoning and conceptual links between findings.

Implications for Biomedical Research

The implementation of AI agents holds significant promise for accelerating the pace of biomedical discoveries. A notable implication is the potential for AI agents to identify insights that may not have been possible using traditional methods alone—by making predictions across temporal and spatial scales and enabling enhanced understanding of biological systems. Additionally, the use of multi-agent systems allows for collaborative tasks, where different agents bring specialized knowledge and capabilities to solve complex problems that would be challenging for a single agent.

Challenges and Ethical Considerations

The paper acknowledges several challenges and ethical considerations in deploying AI agents:

  • Robustness and Reliability: Ensuring the reliability of AI predictions, minimizing errors, and addressing issues related to model sensitivity and hallucinatory outputs.
  • Evaluation Protocols: Developing rigorous frameworks for evaluating AI agents' performance, including considerations for reproducibility and transparency.
  • Dataset Generation: Creating and maintaining large, high-quality, multi-modal datasets necessary for training AI agents.
  • Governance and Regulation: Establishing comprehensive governance frameworks that balance innovation with accountability, ensuring ethical development and deployment of AI agents.
  • Risks and Safeguards: Addressing potential risks associated with autonomous experiments and ensuring that AI agents operate safely and within predefined ethical boundaries.

Conclusion

The paper underscores the transformative potential of AI agents in biomedical research, advocating for the integration of advanced AI capabilities to complement human creativity and expertise. By addressing the outlined challenges and implementing robust frameworks for evaluation and governance, AI agents can significantly contribute to the next era of scientific discovery, ultimately leading to innovative solutions in biomedicine. The vision presented in the paper offers a promising pathway towards achieving an advanced and collaborative human-AI scientific enterprise.