SciToolAgent: Graph-Driven Scientific Automation

Updated 2 November 2025

SciToolAgent is an LLM-based agent that automates complex, multi-step scientific workflows using a curated, domain-structured knowledge graph.
It features a modular architecture—planner, executor, summarizer, and safety module—to ensure tool interoperability and responsible execution.
Empirical evaluations show up to 94% accuracy in multi-tool tasks, demonstrating significant improvements over traditional LLM-based approaches.

SciToolAgent is a LLM-based agent system designed for the intelligent automation and orchestration of hundreds of scientific tools spanning biology, chemistry, and materials science. Its core innovation is the use of a domain-structured knowledge graph to drive both the selection and sequencing of tools, enabling reliable construction and execution of complex, multi-step scientific workflows. The platform integrates rigorous safety-checking modules to ensure ethical and responsible tool use and is validated across a curated benchmark and diverse real-world case studies.

1. Knowledge Graph-Driven Architecture

At the heart of SciToolAgent is the SciToolKG, a scientific tool knowledge graph. This is a directed graph $G = (V, E)$ where the node set $V$ comprises representations of tools and attribute entities, and edge set $E$ encodes rich semantic relationships, such as functional dependencies, category hierarchies, input/output compatibility, and safety risk levels. Each of the 500+ scientific tools incorporated (e.g., BLAST, ESMFold, RDKit, MOFSimplify) is annotated with functional descriptions, input/output schemas, data format, safety class, and inter-tool dependencies.

This knowledge graph underpins all major SciToolAgent processes: semantic ranking and retrieval of tools for a given user query, dependency resolution for sequential tool execution, and metadata-provision for retrieval-augmented generative planning. Construction of the graph is manual and expert-curated, which enables high reliability in the current system but introduces scalability challenges for future extension.

2. Workflow: Query, Planning, Execution, and Summarization

SciToolAgent operates as a modular agent stack with distinct functional blocks:

Planner: Receives a user query and, via a combination of full-graph and sub-graph retrieval (using semantic similarity over tool descriptions and neighborhood context), assembles a set of highly relevant candidate tools. Tool combinations are ranked by their joint similarity to the user intent across $k$ -hop neighborhoods, enforcing both high relevance and chain compatibility:

$\mathbb{T}_\text{chain} = \{T_1 \rightarrow T_2 \rightarrow \ldots \rightarrow T_m\} \subseteq \mathbb{T}_\text{comb}$

Executor: Sequentially runs each tool in the prescribed chain, leveraging the LLM to extract and validate inputs, handle outputs, and adapt to errors or failures. The Executor continuously references SciToolKG to ensure that outputs of one tool can serve as valid inputs to the next (enforcing data-interop and workflow correctness).
Summarizer: Aggregates stepwise tool outputs, checks for logical/semantic consistency, de-duplicates, synthesizes user-facing responses, and—if the result is suboptimal—triggers replanning.
Safety module: Integrated into all execution paths, identifying and mitigating hazardous outputs (e.g., toxic chemicals or proteins) based on similarity metrics (e.g., maximizing the mean of Tanimoto, Dice, Cosine coefficients for molecules; Smith-Waterman alignment for proteins) against curated hazard databases. Workflows are interrupted and flagged if risk thresholds are breached.
Memory module: Archives historical queries, plans, and outputs for context-aware iterative research.

3. Graph-Based Retrieval-Augmented Tool Planning

The tool chain planning process exploits both global and local graph structure. Full-graph retrieval computes direct similarity scores between the user query and each tool. For local refinement, subgraph exploration is performed within the neighborhoods of top candidate tools, evaluating possible tool pairs or chains using concatenated tool metadata: $S'(q, T_i, T_j) = S(q, T_i) \times S(q, T_i \oplus T_j)$ This formulation not only identifies relevant tools but also assembles execution chains that honor explicit data and operational dependencies. The process produces an ordered, dependency-respecting workflow plan, shifting away from the error-prone, trial-and-error strategies of previous LLM-based agents.

4. Safety-Checking Mechanisms

Safety is enforced at each high-risk step using cross-referencing against databases such as PubChem (chemicals) and UniProtKB (proteins). The agent computes:

For molecules,

$\hat{S}_\text{mol}(x, D) = \max_{y \in D} \tfrac{1}{3} (\text{Tanimoto}(x, y) + \text{Dice}(x, y) + \text{Cosine}(x, y))$

For proteins,

$\hat{S}_\text{prot}(x, D) = \max_{y \in D} \text{Smith-Waterman}(x, y)$

Any result exceeding a risk threshold ( $\delta = 0.95$ ) triggers warnings or aborts. Only tools labeled as high-risk in SciToolKG are checked every invocation. This systematic integration of safety checking distinguishes SciToolAgent from prior academic and commercial tool-agent frameworks, which largely omit this layer.

5. Benchmark Evaluation and Empirical Performance

Evaluation on the SciToolEval benchmark (531 queries: 152 single-tool, 379 multi-tool tasks) shows that SciToolAgent achieves 94% overall accuracy, which is approximately a 10% absolute improvement over state-of-the-art LLM tool agent baselines (e.g., ReAct, Reflexion). For multi-tool tasks, the agent demonstrates a 10–20% higher final answer accuracy and requires fewer execution attempts (attributed to graph-guided planning).

Model ablations reveal that OpenAI o1 and GPT-4o (default) provide the best results, but domain-adapted, instruction-tuned open-source LLMs (e.g., Qwen2.5-7B FT) can close much of the gap when paired with SciToolAgent's pipeline, i.e., the planning and orchestration are more bottlenecked by workflow logic and graph fidelity than by raw model scale.

6. Case Studies: Applications in Protein, Chemistry, Materials Science

Extensive case studies demonstrate SciToolAgent’s domain reach:

Protein analysis: Multi-stage pipelines (sequence → folding prediction → force/energy calculation → frequency analysis → secondary structure) are planned and executed without manual input or tool selection.
Reactivity and ML screening: For amide condensation reactions, the agent benchmarks alternative molecular features and ML classifiers, finding optimal feature/classifier combinations; baseline agents fail to construct syntactically/semantically valid tool chains.
Chemical synthesis and risk: Aspirin and 4-chlorophenol syntheses are handled, with safety module aborting the workflow for hazardous outcomes that would otherwise pass undetected.
MOF screening: Automatic multi-property filtering, data extraction, and market validation via expert tool planning—demonstrating integration across simulation, data mining, and retrieval tools in long chains.

Empirical data show that baselines repeatedly mis-sequence steps, select unrelated tools, or fail on inter-tool data format mismatches—failure modes that SciToolAgent’s knowledge-graph-driven planning avoids.

7. Implications, Accessibility, and Extensibility

SciToolAgent exemplifies a new class of LLM-based scientific agents—combining explicit representation of domain knowledge (via SciToolKG) with data-driven, safety-aware orchestration. This enables non-expert users to access advanced computational research capabilities without deep tool or workflow knowledge, promoting reproducibility and responsible conduct. However, the manual construction and ongoing curation of SciToolKG remains a potential bottleneck for long-term extensibility. A plausible implication is the need for future semi- or fully-automated systems to extract tool attributes and dependencies at scale from documentation or scientific literature. The platform also exposes an advanced template for incorporating safety protocols into all future agentic scientific automation.

References:

Thorough methodology, empirical evaluation, and all technical claims can be verified in "SciToolAgent: A Knowledge Graph-Driven Scientific Agent for Multi-Tool Integration" (Ding et al., 27 Jul 2025). For source code and configuration data, users are directed to the referenced repository and supplementary information in the original publication.

PDF Markdown Chat (Pro)

References (1)

SciToolAgent: A Knowledge Graph-Driven Scientific Agent for Multi-Tool Integration (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to SciToolAgent.