This paper, "From Automation to Autonomy: A Survey on LLMs in Scientific Discovery" (Zheng et al., 19 May 2025 ), surveys the evolving role of LLMs in scientific research, charting their progression from basic automation tools to increasingly autonomous agents. The authors introduce a three-level taxonomy based on increasing autonomy: LLM as Tool (Level 1), LLM as Analyst (Level 2), and LLM as Scientist (Level 3). This framework is used to analyze LLM applications across the six stages of the scientific method: observation/problem definition, hypothesis development, experimentation/data collection, data analysis/interpretation, drawing conclusions, and iteration/refinement.
Three Levels of Autonomy
- LLM as Tool (Level 1): At this most basic level, LLMs are used to automate specific, well-defined tasks within a single stage of the scientific method. Their role is to augment human researchers by accelerating routine activities under direct human supervision. Examples include summarizing literature, generating simple code snippets, or reformatting data. Autonomy is low, requiring explicit prompts and human validation of outputs. This level focuses on enhancing human efficiency.
- Practical Implementations: Research at this level involves applying LLMs to tasks like automated literature search using Retrieval-Augmented Generation (RAG) frameworks (e.g., PaperQA [Lala2023PaperQA], LitLLM [agarwal2024litLLMtoolkitscientificliterature]), information aggregation into tables (e.g., ArxivDIGESTables [newman2024arxivdigestablessynthesizingscientificliterature]), generating initial research ideas or drafting hypothesis statements (e.g., IdeaBench [guo2024ideabenchbenchmarkinglargelanguage]), assisting in experiment planning by generating protocols or code for execution (e.g., BioPlanner [odonoghue2023bioplannerautomaticevaluationLLMs], SciCode [tian2024scicoderesearchcodingbenchmark]), organizing and analyzing data in formats like tables (e.g., Chain-of-Table [wang2024chainoftableevolvingtablesreasoning]) and charts (e.g., ChartQA [masry2022chartqabenchmarkquestionanswering]), and providing feedback or critiquing conclusions (e.g., ReviewerGPT [liu2023reviewergptexploratorystudyusing]). Tools like AutomaTikZ [belouadi2024automatikztextguidedsynthesisscientific] for generating scientific figures from text are also examples.
- LLM as Analyst (Level 2): LLMs demonstrate increased autonomy here, functioning as passive agents capable of more complex information processing, data modeling, and analytical reasoning across a sequence of tasks, often with reduced human intervention for intermediate steps. They can analyze datasets, interpret simulation outputs, and perform iterative model refinement based on human-defined goals. Human researchers still define the problem and validate the final insights.
- Practical Implementations: This level often involves LLM-based agents capable of orchestrating tool use for data analysis pipelines. Applications span various domains: Automated Machine Learning (AutoML) experiment design and execution (e.g., MLAgentBench [huang2024mlagentbenchevaluatinglanguageagents], IMPROVE [xue2025improveiterativemodelpipeline]); statistical data modeling and hypothesis validation using code execution agents for data analysis (e.g., InfiAgent-DABench [hu2024infiagentdabenchevaluatingagentsdata], BLADE [gu2024bladebenchmarkinglanguagemodel]); and function discovery (symbolic regression) by leveraging domain knowledge and iterative refinement (e.g., LLM-SR [shojaee2025LLMsrscientificequationdiscovery]). In natural sciences, agents are explored for tasks like causal graph discovery in chemistry/social science or end-to-end biomedical research involving wet/dry lab experiments (e.g., Coscientist [boiko2023autonomous], BioResearcher [luo2024intentionimplementationautomatingbiomedical]). General research benchmarks like DiscoveryWorld [jansen2024discoveryworldvirtualenvironmentdeveloping] evaluate multi-stage agent capabilities.
- LLM as Scientist (Level 3): This level represents a significant leap, where LLM-based systems operate as active agents capable of orchestrating and navigating multiple stages of the scientific discovery process with considerable independence. They can initiate research by formulating hypotheses, plan and execute experiments, analyze data, draw conclusions, and even propose new research avenues, requiring minimal human intervention throughout the cycle.
- Practical Implementations: Prototypes at this level are emerging, particularly in computational fields like AI research. These systems often employ agent-based frameworks to automate the entire research workflow, from literature review and idea generation to experimentation, data analysis, and even drafting research papers. Key distinctions lie in their "Idea Development" and "Iterative Refinement" strategies. Some systems start with human-defined objectives or reference papers (e.g., Agent Laboratory [schmidgall2025agentlaboratoryusingLLM], AI-Researcher [HKUDS2025AIResearcher]), while others can generate diverse ideas from abstract prompts and evaluate their novelty internally (e.g., The AI Scientist [lu2024aiscientistfullyautomated] and AI Scientist-v2 [yamada2025aiscientistv2workshoplevelautomated]). Iterative refinement involves sophisticated feedback loops, either primarily internal using AI reviewers and evaluators (e.g., The AI Scientist) or incorporating human expertise for macro-level guidance and potential re-evaluation of the core hypothesis (e.g., Zochi [Intology2025Zochi]).
Challenges and Future Directions
The survey identifies several pivotal challenges and future directions for LLMs in scientific discovery:
- Fully-Autonomous Research Cycle: Moving beyond single research instances to systems that can continuously iterate, identify new research questions based on findings, and strategically pursue long-term goals without human prompting.
- Robotic Automation: Integrating LLMs with physical robotic systems to enable autonomous experimentation in natural science domains (chemistry, biology, materials science), translating computational plans into physical actions.
- Transparency and Interpretability: Addressing the "black-box" nature of LLMs to ensure their reasoning is verifiable, conclusions are justifiable, and insights align with scientific principles, which is crucial for trust and validation.
- Continuous Self-Improvement: Developing systems that can learn from their research experiences, assimilate experimental outcomes, and adapt their strategies over time, potentially through online reinforcement learning frameworks [pmlr-v202-carta23a].
- Ethics and Societal Alignment: As AI systems gain autonomy, ensuring ethical constraints are embedded in their design and operations to prevent misuse (e.g., generating harmful substances) and ensure advancements serve human well-being.
The survey concludes by emphasizing that while significant progress has been made across all levels, the trajectory towards truly autonomous AI scientists presents complex challenges requiring further research, particularly in integrating physical world interaction, ensuring transparency, enabling continuous learning, and establishing robust ethical governance frameworks.