SciAgent: A Unified Multi-Agent Scientific System

Updated 23 February 2026

SciAgent is a unified multi-agent system combining hierarchical meta-control with specialized domain workers to deliver expert-level scientific reasoning.
It dynamically assembles tailored reasoning pipelines through sub-agents executing symbolic, numeric, and multimodal tasks across STEM disciplines.
Evaluated on STEM Olympiad challenges, SciAgent achieves state-of-the-art performance, demonstrating cross-disciplinary adaptability and precision.

SciAgent is a unified multi-agent system for generalistic scientific reasoning, designed to achieve expert-level performance and adaptability across a broad range of scientific disciplines. The architecture combines hierarchical meta-control, specialized domain Worker Systems, and modular sub-agent execution to dynamically assemble and refine tailored reasoning pipelines for each problem. This system is evaluated on a suite of STEM Olympiad challenges and interdisciplinary benchmarks, demonstrating both domain generality and state-of-the-art performance. SciAgent represents a concrete step toward generalistic scientific intelligence: AI systems capable of coherent, cross-disciplinary reasoning at expert human levels (Li et al., 11 Nov 2025).

1. Hierarchical Architecture and Organization

SciAgent utilizes a three-layer hierarchical structure:

Coordinator Agent (meta level): Observes the problem $P$ , infers a domain vector $D\in\{\text{Math, Physics, Chemistry, General}\}$ and a complexity score $c$ , then selects a Worker System via

$\text{Route}(D,c) = \arg\max_{W\in\mathcal{W}} [\alpha\,\mathrm{sim}(D,W)+\beta\,\phi(c,W)]$

Worker Systems (domain level): Ensembles specialized for discrete scientific domains, e.g., mathematics, physics, chemistry, or general examinations. Each Worker decomposes the Coordinator’s abstract plan into a concrete pipeline of Sub-agents.
Sub-agents (execution level): Responsible for symbolic deduction, conceptual modeling, numerical computation, image analysis, proof verification, or domain summarization.

This separation of meta-reasoning (task routing), domain-specific planning, and low-level execution mirrors established principles in human and artificial reasoning, enabling dynamic task allocation and extensibility (Li et al., 11 Nov 2025).

2. Agent Roles and Worker System Composition

Each Worker system consists of distinct sub-agent types with tightly defined roles:

Worker Type	Core Sub-Agents	Distinctive Functionality
Math Olympiad	Generator, Improver, Reviewer	Proof generation and formal review
Physics Olympiad	Generator, Image Analyser, Reviewer, Summarizer	Equation derivation, diagram interpretation
Chemistry Olympiad	Generator, Molecule Recognition, SMILES Verify, Knowledge, Breakdown, Review, Summarizer	Reaction hypothesis, SMILES processing, knowledge lookup, and decomposition
General Exam	Generator, Breakdown, Image Analyze, Review	Mixed symbolic/numeric and multimodal tasks

Within each Worker, Sub-agents execute in a dynamically assembled pipeline, informed by problem characteristics and controlled through feedback and review signals. Agents pass structured JSON-like messages, with fields for sender, receiver, type (e.g., THOUGHT, ACTION, OBSERVATION, FEEDBACK), and payload containing intermediate artifacts (Li et al., 11 Nov 2025).

3. Dynamic Pipeline Assembly and Reasoning Process

The system employs problem decomposition and iterative refinement:

Task decomposition: The Coordinator decomposes $P$ into subtasks $\{T_i\}$ , where $T_i = f_i(P)$ . Planning within each Worker results in pipelines $[A_1 \xrightarrow{\rho_1} A_2 \xrightarrow{\rho_2} \dots ]$ , with routing edges $\rho_i$ triggered by feedback.
Pipeline execution: Agents iterate through pipeline stages, refining outputs based on the results of symbolic computation, model generation, verification, or summarization.
Message passing: Each iteration involves message exchange to update the pipeline context and determine termination (e.g., Reviewer pass threshold).

Representative intermediate artifacts include symbolic expressions (e.g., vector calculus integrals for electromagnetism problems), numerical differential equation objects, and molecular structure encodings for chemistry tasks. Sub-agents can initiate internal loops for self-review or external calls for validation (e.g., SMILES syntax and valence checks in chemistry) (Li et al., 11 Nov 2025).

4. Training, Specialization, and Knowledge Integration

The backbone for all Sub-agents is Gemini 2.5 Pro, pretrained on diverse data sources:

Sub-agent specialization: Proof generation and review agents are fine-tuned on proof logs (Lean4), molecule and SMILES agents use rule-based systems and supervised examples, and image analysers use vision encoders (CLIP-style) adapted to scientific diagrams.
Retrieval augmentation: Episodic memory is supported in select domains for problem similarity-driven retrieval and augmentation. Retrieval is triggered when problem features exceed domain-specific similarity thresholds.
Optimization: All agents utilize standard cross-entropy losses and regularization inherited from the pretrained LLM; there are no new loss functions introduced.

This layered knowledge integration ensures rapid adaptation to new domains and enables consistent performance across symbolic, numeric, and multimodal reasoning tasks (Li et al., 11 Nov 2025).

5. Performance on Scientific Benchmarks

SciAgent attains or exceeds gold-medal human performance across multiple Olympiad and benchmark challenges. Evaluation is conducted against official scores and validated by both LLM-based and human experts.

Task/Benchmark	Human Gold-medal/Top Level	SciAgent Performance
IMO 2025 (42 pts)	35.94 (average)	36
IMC 2025 (100 pts)	89.08 (average)	100
IPhO 2025 (30 pts)	23.4 (avg), 29.2 (top)	25.0
IPhO 2024 (30 pts)	25.3 (avg), 29.4 (top)	27.6
CPhO 2025 (320 pts)	199 (gold record)	264
IChO 2025	—	qualitative success
HLE (STEM mix)	—	consistently correct

Ablation studies demonstrate a 15–20 percentage point performance drop on average when the Coordinator is disabled, and ∼10 point reduction in math if self-review loops are removed. No formal $p$ -values are reported; performance is justified based on distributional overlap and consistency with human-verified results (Li et al., 11 Nov 2025).

6. Generality, Adaptability, and Theoretical Claims

Domain generality: The Coordinator–Worker–Sub-agent paradigm unifies symbolic and numeric reasoning. The same meta-logic flexibly routes math argumentation or physics modeling tasks to appropriate Worker specialists.
Reasoning adaptability: Pipeline graphs are configured dynamically for each problem and adapted via feedback. Sub-agent interaction patterns adapt to domain—short ReAct loops for multimodal perception, structured review for symbolic proofs.
Theoretical assertion: Within LLM-based paradigms, hierarchical meta-control paired with modular sub-agents is posited as necessary and sufficient for robust cross-domain transfer. Flat ensembles converge slowly or saturate on new scientific formalisms (Li et al., 11 Nov 2025).

7. Limitations and Prospective Enhancements

Principal limitations of the present SciAgent implementation include:

Underdeveloped chemistry and biology capabilities due to lack of public baselines (IChO/IBO), limiting quantitative evaluation in these domains.
Absence of end-to-end multimodal reasoning; each input type (text, equation, image) is processed by a distinct sub-agent, without fusion at the representation level.
Disjoint training of agents; potential exists for reinforcement-learning–based co-training and routing policy optimization.
Lack of persistent, evolving world model and inter-agent negotiation, restricting incremental learning and creativity.

Prospective research avenues include co-training agents, evolving routing policies via reinforcement learning, incorporating persistent memory, and enabling world-model emergence (Li et al., 11 Nov 2025).

SciAgent’s Coordinator–Worker–Sub-agent hierarchy, dynamic pipeline assembly, and domain-adaptive specialization establish a scalable paradigm for generalistic scientific reasoning, providing robust benchmark performance and cross-disciplinary extensibility.

Markdown Report Issue Upgrade to Chat

References (1)

SciAgent: A Unified Multi-Agent System for Generalistic Scientific Reasoning (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to SciAgent Multi-Agent System.