From Automation to Autonomy: A Survey on Large Language Models in Scientific Discovery (2505.13259v1)

Published 19 May 2025 in cs.CL

Abstract: LLMs are catalyzing a paradigm shift in scientific discovery, evolving from task-specific automation tools into increasingly autonomous agents and fundamentally redefining research processes and human-AI collaboration. This survey systematically charts this burgeoning field, placing a central focus on the changing roles and escalating capabilities of LLMs in science. Through the lens of the scientific method, we introduce a foundational three-level taxonomy-Tool, Analyst, and Scientist-to delineate their escalating autonomy and evolving responsibilities within the research lifecycle. We further identify pivotal challenges and future research trajectories such as robotic automation, self-improvement, and ethical governance. Overall, this survey provides a conceptual architecture and strategic foresight to navigate and shape the future of AI-driven scientific discovery, fostering both rapid innovation and responsible advancement. Github Repository: https://github.com/HKUST-KnowComp/Awesome-LLM-Scientific-Discovery.

PDF Abstract

This paper, "From Automation to Autonomy: A Survey on LLMs in Scientific Discovery" (Zheng et al., 19 May 2025 ), surveys the evolving role of LLMs in scientific research, charting their progression from basic automation tools to increasingly autonomous agents. The authors introduce a three-level taxonomy based on increasing autonomy: LLM as Tool (Level 1), LLM as Analyst (Level 2), and LLM as Scientist (Level 3). This framework is used to analyze LLM applications across the six stages of the scientific method: observation/problem definition, hypothesis development, experimentation/data collection, data analysis/interpretation, drawing conclusions, and iteration/refinement.

Three Levels of Autonomy

LLM as Tool (Level 1): At this most basic level, LLMs are used to automate specific, well-defined tasks within a single stage of the scientific method. Their role is to augment human researchers by accelerating routine activities under direct human supervision. Examples include summarizing literature, generating simple code snippets, or reformatting data. Autonomy is low, requiring explicit prompts and human validation of outputs. This level focuses on enhancing human efficiency.
- Practical Implementations: Research at this level involves applying LLMs to tasks like automated literature search using Retrieval-Augmented Generation (RAG) frameworks (e.g., PaperQA [Lala2023PaperQA], LitLLM [agarwal2024litLLMtoolkitscientificliterature]), information aggregation into tables (e.g., ArxivDIGESTables [newman2024arxivdigestablessynthesizingscientificliterature]), generating initial research ideas or drafting hypothesis statements (e.g., IdeaBench [guo2024ideabenchbenchmarkinglargelanguage]), assisting in experiment planning by generating protocols or code for execution (e.g., BioPlanner [odonoghue2023bioplannerautomaticevaluationLLMs], SciCode [tian2024scicoderesearchcodingbenchmark]), organizing and analyzing data in formats like tables (e.g., Chain-of-Table [wang2024chainoftableevolvingtablesreasoning]) and charts (e.g., ChartQA [masry2022chartqabenchmarkquestionanswering]), and providing feedback or critiquing conclusions (e.g., ReviewerGPT [liu2023reviewergptexploratorystudyusing]). Tools like AutomaTikZ [belouadi2024automatikztextguidedsynthesisscientific] for generating scientific figures from text are also examples.
LLM as Analyst (Level 2): LLMs demonstrate increased autonomy here, functioning as passive agents capable of more complex information processing, data modeling, and analytical reasoning across a sequence of tasks, often with reduced human intervention for intermediate steps. They can analyze datasets, interpret simulation outputs, and perform iterative model refinement based on human-defined goals. Human researchers still define the problem and validate the final insights.
- Practical Implementations: This level often involves LLM-based agents capable of orchestrating tool use for data analysis pipelines. Applications span various domains: Automated Machine Learning (AutoML) experiment design and execution (e.g., MLAgentBench [huang2024mlagentbenchevaluatinglanguageagents], IMPROVE [xue2025improveiterativemodelpipeline]); statistical data modeling and hypothesis validation using code execution agents for data analysis (e.g., InfiAgent-DABench [hu2024infiagentdabenchevaluatingagentsdata], BLADE [gu2024bladebenchmarkinglanguagemodel]); and function discovery (symbolic regression) by leveraging domain knowledge and iterative refinement (e.g., LLM-SR [shojaee2025LLMsrscientificequationdiscovery]). In natural sciences, agents are explored for tasks like causal graph discovery in chemistry/social science or end-to-end biomedical research involving wet/dry lab experiments (e.g., Coscientist [boiko2023autonomous], BioResearcher [luo2024intentionimplementationautomatingbiomedical]). General research benchmarks like DiscoveryWorld [jansen2024discoveryworldvirtualenvironmentdeveloping] evaluate multi-stage agent capabilities.
LLM as Scientist (Level 3): This level represents a significant leap, where LLM-based systems operate as active agents capable of orchestrating and navigating multiple stages of the scientific discovery process with considerable independence. They can initiate research by formulating hypotheses, plan and execute experiments, analyze data, draw conclusions, and even propose new research avenues, requiring minimal human intervention throughout the cycle.
- Practical Implementations: Prototypes at this level are emerging, particularly in computational fields like AI research. These systems often employ agent-based frameworks to automate the entire research workflow, from literature review and idea generation to experimentation, data analysis, and even drafting research papers. Key distinctions lie in their "Idea Development" and "Iterative Refinement" strategies. Some systems start with human-defined objectives or reference papers (e.g., Agent Laboratory [schmidgall2025agentlaboratoryusingLLM], AI-Researcher [HKUDS2025AIResearcher]), while others can generate diverse ideas from abstract prompts and evaluate their novelty internally (e.g., The AI Scientist [lu2024aiscientistfullyautomated] and AI Scientist-v2 [yamada2025aiscientistv2workshoplevelautomated]). Iterative refinement involves sophisticated feedback loops, either primarily internal using AI reviewers and evaluators (e.g., The AI Scientist) or incorporating human expertise for macro-level guidance and potential re-evaluation of the core hypothesis (e.g., Zochi [Intology2025Zochi]).

Challenges and Future Directions

The survey identifies several pivotal challenges and future directions for LLMs in scientific discovery:

Fully-Autonomous Research Cycle: Moving beyond single research instances to systems that can continuously iterate, identify new research questions based on findings, and strategically pursue long-term goals without human prompting.
Robotic Automation: Integrating LLMs with physical robotic systems to enable autonomous experimentation in natural science domains (chemistry, biology, materials science), translating computational plans into physical actions.
Transparency and Interpretability: Addressing the "black-box" nature of LLMs to ensure their reasoning is verifiable, conclusions are justifiable, and insights align with scientific principles, which is crucial for trust and validation.
Continuous Self-Improvement: Developing systems that can learn from their research experiences, assimilate experimental outcomes, and adapt their strategies over time, potentially through online reinforcement learning frameworks [pmlr-v202-carta23a].
Ethics and Societal Alignment: As AI systems gain autonomy, ensuring ethical constraints are embedded in their design and operations to prevent misuse (e.g., generating harmful substances) and ensure advancements serve human well-being.

The survey concludes by emphasizing that while significant progress has been made across all levels, the trajectory towards truly autonomous AI scientists presents complex challenges requiring further research, particularly in integrating physical world interaction, ensuring transparency, enabling continuous learning, and establishing robust ethical governance frameworks.

PDF Markdown Bookmark Chat (Pro)

Authors (7)

Tianshi Zheng (19 papers)
Zheye Deng (12 papers)
Hong Ting Tsang (4 papers)
Weiqi Wang (58 papers)
Jiaxin Bai (30 papers)
Zihao Wang (216 papers)
Yangqiu Song (196 papers)

Related Papers

Find Related Papers

GitHub

GitHub - HKUST-KnowComp/Awesome-LLM-Scientific-Discovery: From Automation to Autonomy: A Survey on Large Language Models in Scientific Discovery (1 star)

Tweets

https://twitter.com/morris_phd/status/1924820995963044263

https://twitter.com/MRauhalahti/status/1924798012804550928

https://twitter.com/Tianshi_0218/status/1925046809808011492

https://twitter.com/cackerman21/status/1927661927234224630