SciToolAgent: A Knowledge Graph-Driven Scientific Agent for Multi-Tool Integration (2507.20280v1)
Abstract: Scientific research increasingly relies on specialized computational tools, yet effectively utilizing these tools demands substantial domain expertise. While LLMs show promise in tool automation, they struggle to seamlessly integrate and orchestrate multiple tools for complex scientific workflows. Here, we present SciToolAgent, an LLM-powered agent that automates hundreds of scientific tools across biology, chemistry, and materials science. At its core, SciToolAgent leverages a scientific tool knowledge graph that enables intelligent tool selection and execution through graph-based retrieval-augmented generation. The agent also incorporates a comprehensive safety-checking module to ensure responsible and ethical tool usage. Extensive evaluations on a curated benchmark demonstrate that SciToolAgent significantly outperforms existing approaches. Case studies in protein engineering, chemical reactivity prediction, chemical synthesis, and metal-organic framework screening further demonstrate SciToolAgent's capability to automate complex scientific workflows, making advanced research tools accessible to both experts and non-experts.
Summary
- The paper presents a knowledge graph-driven agent that integrates hundreds of scientific tools using LLM-based planning and execution.
- It details a modular architecture with Planner, Executor, and Summarizer modules that automate multi-step workflows while ensuring safety.
- Empirical results demonstrate a 94% overall accuracy, outperforming baseline methods in complex, multi-tool scientific tasks.
SciToolAgent: A Knowledge Graph-Driven Scientific Agent for Multi-Tool Integration
Motivation and Problem Statement
The increasing reliance on specialized computational tools in scientific research has led to significant challenges in tool orchestration, especially for complex, multi-step workflows in domains such as biology, chemistry, and materials science. While LLMs have demonstrated utility in automating tool usage, existing agent frameworks are limited by small toolsets, lack of explicit modeling of tool dependencies, and insufficient attention to safety and ethical considerations. These limitations hinder the automation of advanced scientific workflows and restrict accessibility for non-experts.
SciToolAgent Architecture and Methodology
SciToolAgent (\OURS{}) is an LLM-powered agent designed to automate the integration and execution of hundreds of scientific tools. The core innovation is the Scientific Tool Knowledge Graph (SciToolKG), which encodes tool functionalities, input/output formats, dependencies, and safety levels. The agent architecture comprises three LLM-driven modules: Planner, Executor, and Summarizer, with an integrated safety-checking module.
Figure 1: Overview of SciToolAgent, including the toolset, SciToolKG schema, and the end-to-end workflow from user query to solution synthesis.
SciToolKG Construction
SciToolKG is a directed graph G=(V,E), where nodes represent tools and attributes, and edges encode relationships such as functional dependencies and compatibility. The graph is manually curated, with attributes derived from tool documentation and expert input. This explicit modeling enables the agent to reason about tool selection and sequencing, overcoming the limitations of naive in-context learning.
Planner: Retrieve-Augmented Chain-of-Tools Generation
The Planner leverages retrieve-augmented generation over SciToolKG to identify and sequence relevant tools for a given query. The process involves:
- Full-graph retrieval: Semantic similarity between the query and tool metadata is computed to select top-k candidate tools.
- Sub-graph exploration: d-hop neighborhoods are explored to identify additional tools required for multi-step workflows.
- Tool combination and ranking: Candidate tools are ranked by combined similarity scores, optimizing for both relevance and complementarity.
- Chain-of-tools generation: LLMs generate an ordered sequence of tools, respecting dependencies and operational constraints.
Executor: Tool Invocation and Safety Monitoring
The Executor manages input preparation, tool invocation, error handling, and safety checks. Inputs are formatted per SciToolKG specifications, and outputs are processed for compatibility with downstream tools. The safety module cross-references outputs with a safeguard database (e.g., PubChem for hazardous compounds, UniProtKB for toxic proteins) using molecular similarity metrics (Tanimoto, Dice, Cosine) and sequence alignment (Smith-Waterman). Outputs exceeding a similarity threshold are flagged as potentially dangerous.
Summarizer: Output Synthesis and Iterative Refinement
The Summarizer integrates outputs from the tool chain, verifies consistency, and generates the final response. If the solution is unsatisfactory, the Summarizer prompts the Planner for plan refinement, enabling iterative improvement.
Foundation Models and Fine-Tuning
SciToolAgent supports both proprietary (OpenAI GPT-4o, o1) and open-source (Qwen2.5-72B, Qwen2.5-7B) LLMs. Fine-tuning of Qwen2.5-7B with LoRA and domain-specific instructions significantly improves tool planning and execution, though a performance gap remains relative to larger models.
Empirical Evaluation
SciToolEval Benchmark
A new benchmark, SciToolEval, was constructed with 531 scientific questions spanning single-tool (Level-1) and multi-tool (Level-2) tasks across multiple domains. Evaluation metrics include Pass Rate, Tool Planning Accuracy, and Final Answer Accuracy, with GPT-4o used for similarity-based answer evaluation.
Comparative Results
Figure 2: SciToolAgent outperforms ReAct, Reflexion, ChemCrow, and CACTUS on SciToolEval in both single-tool and multi-tool settings, with the largest gains in complex, multi-step workflows.
SciToolAgent achieves an overall accuracy of 94%, surpassing state-of-the-art baselines by 10%. In Level-2 (multi-tool) tasks, it outperforms ReAct and Reflexion by 20% and 10% in final answer accuracy, respectively. The explicit modeling of tool dependencies via SciToolKG is critical for these gains. Among foundation models, OpenAI o1 yields the highest accuracy, while GPT-4o offers the best cost-performance trade-off.
Case Studies: Automated Scientific Workflows
Protein Design and Analysis
Figure 3: SciToolAgent autonomously orchestrates sequence generation, stability prediction, structure modeling, and secondary structure analysis for protein design.
The agent integrates Chroma for sequence generation, ProteinForceGPT for mechanical stability, ESMFold for structure prediction, ANM/ProDy for vibrational analysis, and DSSP for secondary structure, producing reliable, multi-faceted outputs. Baseline agents failed due to incorrect or missing tool selection.
Machine Learning-Based Chemical Reactivity Prediction
Figure 4: SciToolAgent automates feature engineering, model selection, and evaluation for chemical reactivity prediction, identifying optimal descriptors and algorithms.
The agent evaluates multiple molecular features and ML algorithms, determining that electrical descriptors and Random Forest yield the highest accuracy. Baseline agents suffered from tool redundancy and hallucinations.
Chemical Synthesis and Safety Analysis
Figure 5: SciToolAgent predicts reaction outcomes, generates product descriptions, checks patents, and performs safety assessments, issuing warnings for toxic products.
The safety module successfully flags hazardous outputs, a capability absent in baseline agents, which failed to identify toxic products.
MOF Materials Screening
Figure 6: SciToolAgent automates multi-criteria screening of MOFs, integrating property prediction, simulation, and market data retrieval.
The agent identifies candidates meeting thermal stability, adsorption, and price constraints, with structure visualization supporting downstream analysis. Baseline agents encountered input errors and hallucinations.
Implementation Considerations and Limitations
- Manual SciToolKG construction: While effective, manual curation limits scalability. Automated knowledge extraction from literature and tool documentation is a promising direction.
- Model accessibility: Proprietary LLMs offer superior performance but may be inaccessible in resource-constrained settings. Fine-tuned open-source models partially bridge this gap.
- Extensibility: Standardized APIs and templates facilitate third-party tool integration. GUI-based registration is planned to lower barriers for non-programmers.
- Resource requirements: Multi-tool orchestration and LLM inference incur nontrivial computational and API costs, especially for large models and complex workflows.
Theoretical and Practical Implications
SciToolAgent demonstrates that explicit knowledge graph-driven tool orchestration, combined with LLM-based planning and execution, enables robust automation of complex scientific workflows. The integration of safety checks addresses critical ethical concerns in automated scientific discovery. The approach generalizes across domains and is extensible to new tools and tasks, provided the knowledge graph is maintained.
Theoretically, the work highlights the importance of structured, symbolic representations (SciToolKG) in augmenting LLM-based agents for compositional reasoning and planning. Practically, SciToolAgent lowers the barrier to advanced scientific research, making sophisticated computational tools accessible to a broader audience, including non-experts.
Future Directions
- Automated SciToolKG maintenance: Leveraging NLP and information extraction to scale and update the knowledge graph.
- Enhanced open-source LLMs: Further fine-tuning and instruction engineering to close the performance gap with proprietary models.
- Broader tool integration: Expanding to additional scientific domains and supporting more complex, cross-domain workflows.
- User interface improvements: GUI-based tool registration and workflow visualization to improve usability.
Conclusion
SciToolAgent establishes a robust framework for LLM-driven, knowledge graph-augmented scientific tool integration, achieving state-of-the-art performance on complex, multi-tool scientific tasks. The explicit modeling of tool dependencies and safety, combined with modular LLM-based planning and execution, enables reliable automation of advanced workflows. While challenges remain in scalability and model accessibility, the approach provides a strong foundation for democratizing scientific research and accelerating discovery through AI-driven automation.
Follow-up Questions
- How does the SciToolKG enhance the reasoning behind tool selection and sequencing?
- What are the trade-offs between proprietary and fine-tuned open-source LLMs in the context of SciToolAgent?
- How does the safety-checking module in SciToolAgent prevent hazardous outputs during tool integration?
- What challenges might arise from manually curating the SciToolKG for scalability and maintenance?
- Find recent papers about multi-tool integration in scientific research.
Related Papers
- Emergent autonomous scientific research capabilities of large language models (2023)
- SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning (2024)
- Agent Laboratory: Using LLM Agents as Research Assistants (2025)
- Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research (2025)
- Graph RAG-Tool Fusion (2025)
- MDCrow: Automating Molecular Dynamics Workflows with Large Language Models (2025)
- ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows (2025)
- SafeScientist: Toward Risk-Aware Scientific Discoveries by LLM Agents (2025)
- AI4Research: A Survey of Artificial Intelligence for Scientific Research (2025)
- Open Source Planning & Control System with Language Agents for Autonomous Scientific Discovery (2025)
Authors (6)
YouTube
alphaXiv
- SciToolAgent: A Knowledge Graph-Driven Scientific Agent for Multi-Tool Integration (14 likes, 0 questions)