SciToolAgent: A Knowledge Graph-Driven Scientific Agent for Multi-Tool Integration (2507.20280v1)

Published 27 Jul 2025 in cs.AI and cs.CL

Abstract: Scientific research increasingly relies on specialized computational tools, yet effectively utilizing these tools demands substantial domain expertise. While LLMs show promise in tool automation, they struggle to seamlessly integrate and orchestrate multiple tools for complex scientific workflows. Here, we present SciToolAgent, an LLM-powered agent that automates hundreds of scientific tools across biology, chemistry, and materials science. At its core, SciToolAgent leverages a scientific tool knowledge graph that enables intelligent tool selection and execution through graph-based retrieval-augmented generation. The agent also incorporates a comprehensive safety-checking module to ensure responsible and ethical tool usage. Extensive evaluations on a curated benchmark demonstrate that SciToolAgent significantly outperforms existing approaches. Case studies in protein engineering, chemical reactivity prediction, chemical synthesis, and metal-organic framework screening further demonstrate SciToolAgent's capability to automate complex scientific workflows, making advanced research tools accessible to both experts and non-experts.

Summary

The paper presents a knowledge graph-driven agent that integrates hundreds of scientific tools using LLM-based planning and execution.
It details a modular architecture with Planner, Executor, and Summarizer modules that automate multi-step workflows while ensuring safety.
Empirical results demonstrate a 94% overall accuracy, outperforming baseline methods in complex, multi-tool scientific tasks.

SciToolAgent: A Knowledge Graph-Driven Scientific Agent for Multi-Tool Integration

Motivation and Problem Statement

The increasing reliance on specialized computational tools in scientific research has led to significant challenges in tool orchestration, especially for complex, multi-step workflows in domains such as biology, chemistry, and materials science. While LLMs have demonstrated utility in automating tool usage, existing agent frameworks are limited by small toolsets, lack of explicit modeling of tool dependencies, and insufficient attention to safety and ethical considerations. These limitations hinder the automation of advanced scientific workflows and restrict accessibility for non-experts.

SciToolAgent Architecture and Methodology

SciToolAgent (\OURS{}) is an LLM-powered agent designed to automate the integration and execution of hundreds of scientific tools. The core innovation is the Scientific Tool Knowledge Graph (SciToolKG), which encodes tool functionalities, input/output formats, dependencies, and safety levels. The agent architecture comprises three LLM-driven modules: Planner, Executor, and Summarizer, with an integrated safety-checking module.

Figure 1: Overview of SciToolAgent, including the toolset, SciToolKG schema, and the end-to-end workflow from user query to solution synthesis.

SciToolKG Construction

SciToolKG is a directed graph $G = (V, E)$ , where nodes represent tools and attributes, and edges encode relationships such as functional dependencies and compatibility. The graph is manually curated, with attributes derived from tool documentation and expert input. This explicit modeling enables the agent to reason about tool selection and sequencing, overcoming the limitations of naive in-context learning.

Planner: Retrieve-Augmented Chain-of-Tools Generation

The Planner leverages retrieve-augmented generation over SciToolKG to identify and sequence relevant tools for a given query. The process involves:

Full-graph retrieval: Semantic similarity between the query and tool metadata is computed to select top- $k$ candidate tools.
Sub-graph exploration: $d$ -hop neighborhoods are explored to identify additional tools required for multi-step workflows.
Tool combination and ranking: Candidate tools are ranked by combined similarity scores, optimizing for both relevance and complementarity.
Chain-of-tools generation: LLMs generate an ordered sequence of tools, respecting dependencies and operational constraints.

Executor: Tool Invocation and Safety Monitoring

The Executor manages input preparation, tool invocation, error handling, and safety checks. Inputs are formatted per SciToolKG specifications, and outputs are processed for compatibility with downstream tools. The safety module cross-references outputs with a safeguard database (e.g., PubChem for hazardous compounds, UniProtKB for toxic proteins) using molecular similarity metrics (Tanimoto, Dice, Cosine) and sequence alignment (Smith-Waterman). Outputs exceeding a similarity threshold are flagged as potentially dangerous.

The Summarizer integrates outputs from the tool chain, verifies consistency, and generates the final response. If the solution is unsatisfactory, the Summarizer prompts the Planner for plan refinement, enabling iterative improvement.

Foundation Models and Fine-Tuning

SciToolAgent supports both proprietary (OpenAI GPT-4o, o1) and open-source (Qwen2.5-72B, Qwen2.5-7B) LLMs. Fine-tuning of Qwen2.5-7B with LoRA and domain-specific instructions significantly improves tool planning and execution, though a performance gap remains relative to larger models.

Empirical Evaluation

SciToolEval Benchmark

A new benchmark, SciToolEval, was constructed with 531 scientific questions spanning single-tool (Level-1) and multi-tool (Level-2) tasks across multiple domains. Evaluation metrics include Pass Rate, Tool Planning Accuracy, and Final Answer Accuracy, with GPT-4o used for similarity-based answer evaluation.

Comparative Results

Figure 2: SciToolAgent outperforms ReAct, Reflexion, ChemCrow, and CACTUS on SciToolEval in both single-tool and multi-tool settings, with the largest gains in complex, multi-step workflows.

SciToolAgent achieves an overall accuracy of 94%, surpassing state-of-the-art baselines by 10%. In Level-2 (multi-tool) tasks, it outperforms ReAct and Reflexion by 20% and 10% in final answer accuracy, respectively. The explicit modeling of tool dependencies via SciToolKG is critical for these gains. Among foundation models, OpenAI o1 yields the highest accuracy, while GPT-4o offers the best cost-performance trade-off.

Case Studies: Automated Scientific Workflows

Protein Design and Analysis

Figure 3: SciToolAgent autonomously orchestrates sequence generation, stability prediction, structure modeling, and secondary structure analysis for protein design.

The agent integrates Chroma for sequence generation, ProteinForceGPT for mechanical stability, ESMFold for structure prediction, ANM/ProDy for vibrational analysis, and DSSP for secondary structure, producing reliable, multi-faceted outputs. Baseline agents failed due to incorrect or missing tool selection.

Machine Learning-Based Chemical Reactivity Prediction

Figure 4: SciToolAgent automates feature engineering, model selection, and evaluation for chemical reactivity prediction, identifying optimal descriptors and algorithms.

The agent evaluates multiple molecular features and ML algorithms, determining that electrical descriptors and Random Forest yield the highest accuracy. Baseline agents suffered from tool redundancy and hallucinations.

Chemical Synthesis and Safety Analysis

Figure 5: SciToolAgent predicts reaction outcomes, generates product descriptions, checks patents, and performs safety assessments, issuing warnings for toxic products.

The safety module successfully flags hazardous outputs, a capability absent in baseline agents, which failed to identify toxic products.

MOF Materials Screening

Figure 6: SciToolAgent automates multi-criteria screening of MOFs, integrating property prediction, simulation, and market data retrieval.

The agent identifies candidates meeting thermal stability, adsorption, and price constraints, with structure visualization supporting downstream analysis. Baseline agents encountered input errors and hallucinations.

Implementation Considerations and Limitations

Manual SciToolKG construction: While effective, manual curation limits scalability. Automated knowledge extraction from literature and tool documentation is a promising direction.
Model accessibility: Proprietary LLMs offer superior performance but may be inaccessible in resource-constrained settings. Fine-tuned open-source models partially bridge this gap.
Extensibility: Standardized APIs and templates facilitate third-party tool integration. GUI-based registration is planned to lower barriers for non-programmers.
Resource requirements: Multi-tool orchestration and LLM inference incur nontrivial computational and API costs, especially for large models and complex workflows.

Theoretical and Practical Implications

SciToolAgent demonstrates that explicit knowledge graph-driven tool orchestration, combined with LLM-based planning and execution, enables robust automation of complex scientific workflows. The integration of safety checks addresses critical ethical concerns in automated scientific discovery. The approach generalizes across domains and is extensible to new tools and tasks, provided the knowledge graph is maintained.

Theoretically, the work highlights the importance of structured, symbolic representations (SciToolKG) in augmenting LLM-based agents for compositional reasoning and planning. Practically, SciToolAgent lowers the barrier to advanced scientific research, making sophisticated computational tools accessible to a broader audience, including non-experts.

Future Directions

Automated SciToolKG maintenance: Leveraging NLP and information extraction to scale and update the knowledge graph.
Enhanced open-source LLMs: Further fine-tuning and instruction engineering to close the performance gap with proprietary models.
Broader tool integration: Expanding to additional scientific domains and supporting more complex, cross-domain workflows.
User interface improvements: GUI-based tool registration and workflow visualization to improve usability.

Conclusion

SciToolAgent establishes a robust framework for LLM-driven, knowledge graph-augmented scientific tool integration, achieving state-of-the-art performance on complex, multi-tool scientific tasks. The explicit modeling of tool dependencies and safety, combined with modular LLM-based planning and execution, enables reliable automation of advanced workflows. While challenges remain in scalability and model accessibility, the approach provides a strong foundation for democratizing scientific research and accelerating discovery through AI-driven automation.

PDF Markdown

Follow-up Questions

Related Papers

Authors (6)

YouTube

Show All Videos

alphaXiv

SciToolAgent: A Knowledge Graph-Driven Scientific Agent for Multi-Tool Integration (14 likes, 0 questions)