SciAgent: Unified Scientific Reasoning
- SciAgent is a unified multi-agent system for expert scientific reasoning, dynamically constructing problem-solving pipelines across disciplines.
- It leverages hierarchical agent coordination, meta-reasoning, and feedback-driven adaptivity to iteratively refine solutions for complex tasks.
- Empirical evaluations demonstrate SciAgent’s robust performance, often surpassing human gold-medalist benchmarks in math, physics, and chemistry challenges.
SciAgent refers to a unified multi-agent system for generalistic scientific reasoning, capable of expert-level, domain-adaptive problem solving across mathematics, physics, chemistry, and other scientific domains. In contrast to prior systems focused on narrow, handcrafted automation, SciAgent dynamically composes and refines reasoning pipelines by leveraging hierarchical agent coordination, meta-reasoning, and feedback-driven adaptivity. The system consistently attains or surpasses human gold-medalist performance on top-tier benchmarks, demonstrating cross-disciplinary generality and robust reasoning adaptability (Li et al., 11 Nov 2025).
1. Hierarchical Multi-Agent Architecture
At its core, SciAgent is structured as a three-tier hierarchy analogous to a research team:
- Coordinator Agent (Meta Level):
- Performs domain classification (mathematics, physics, chemistry, general exam), estimates problem difficulty, and selects an appropriate toolchain.
- Issues a routing call:
1 2 3 4
call ← Coordinator.solve(problem_text) domain, strategy ← classify(problem_text) Worker ← select_worker(domain, strategy) return Worker.solve(problem_text, strategy)
- Worker Systems (Domain Level):
- Each Worker System instantiates a domain-specific scientific paradigm.
- Math Olympiad Worker: Interleaves Generator, Improver, Reviewer in a reasoning–review cycle for symbolic deduction.
- Physics Olympiad Worker: Utilizes a ReAct-style loop (Thought–Action–Observation) among Generate, Image Analyser, Summarizer, and Reviewer for conceptual modeling, diagram interpretation, and derivation.
- Chemistry Olympiad Worker: Extends ReAct with Molecule Recognition, SMILES Verify, Chemistry Knowledge Agents for molecular and chemical-equation reasoning.
- General Exam Worker: Integrates Generate, Breakdown, Image Analyser, and Review agents to address mid-level multimodal tasks.
- Sub-Agents (Execution Level):
- Responsible for specialized reasoning steps:
- Symbolic Deduction: Algebraic manipulation, proof-step generation, chain-of-thought construction.
- Conceptual Modeling: Scenario translation to equations/laws (e.g., Maxwell’s equations, reaction mechanisms).
- Numerical Computation: Code execution, series expansions, difference-equation solves, eigenvalue calculations.
- Verification/Summarization: Consistency checking and result summarization.
Agents coordinate via structured message passing and critique–revision loops within each Worker. For example, the Math Worker conducts the following sequence:
solution₀ ← Generator.propose(problem)solution₁ ← Improver.refine(solution₀)verdict ← Reviewer.check(solution₁)- If
verdict = passthen returnsolution₁; elseGenerator.correct(verdict)
2. Dynamic Pipeline Construction and Adaptivity
Upon receiving a problem, the Coordinator Agent assigns a high-level “strategy” token that directs the Worker to self-assemble an adaptive reasoning pipeline, instead of following a static script. The Worker iteratively invokes sub-agents, adapts the pipeline based on intermediate feedback, and ensures convergence:
1 2 3 4 5 6 7 8 9 |
def solve(problem, strategy): pipeline ← instantiate_subagents(strategy) state ← initialize_ctx(problem) while not state.converged(): for agent in pipeline: Δ ← agent.act(state) state.update(Δ) pipeline ← adapt_pipeline(state.feedback) return state.final_answer() |
adapt_pipeline method dynamically inserts sub-agents (e.g., Modeler, Verifier) to address residuals or check additional constraints, supporting tightly coupled symbolic, conceptual, and numerical inference throughout the solution process.
3. Algorithmic Principles and Mathematical Foundations
At the system level, the coordination policy can be formalized: Within a Worker, the solution’s intermediate state is recursively updated:
Sub-agent operations include:
- Symbolic Solving: Representing proof obligations as sequents , generating transformation chains by applying rewrite rules.
- Parameter Optimization: For modeling, solving .
- Verification: Enforcing error constraints , or error checks in Taylor expansions.
4. Empirical Performance Across Scientific Benchmarks
SciAgent's performance was systematically evaluated on Olympiad-style and advanced scientific benchmarks:
| Benchmark | SciAgent Score | Human Gold Avg | Max | Notable Observations |
|---|---|---|---|---|
| IMO 2025 (42) | 36 | 35.94 | 42 | Surpassed avg. gold-level |
| IMC 2025 (100) | 100 (perfect) | 89.08 | 100 | Perfect score |
| IPhO 2025 (30) | 25.0 | 23.4 | 29.2 | Competitive with highest achievers |
| IPhO 2024 (30) | 27.6 | 25.3 | 29.4 | — |
| CPhO 2025 (320) | 264 | 199 | 320 | Strongly outperforming record |
| IChO 2025 | Correct mechanisms, SMILES, stoichiometry | — | — | No human aggregate yet |
| HLE Benchmark | Consistent correct solutions | LLMs fail | — | Succeeded where standard LLMs failed |
In each setting, SciAgent utilized the identical core protocol, with Worker Systems added for chemistry and general exams only, evidencing its domain generality. The system corrected complex reasoning errors missed by standard LLMs, including missing multiplicative constants in formula derivations.
5. Domain Generality and Ablation Studies
SciAgent does not rely on bespoke task-specific pipelines; instead, its unified meta-level design supports ready extension. An ablation removing the Coordinator (i.e., flattening all agents into a single layer) led to a 10–15% decrease in composite score for mixed mathematics–physics problem sets, substantiating the importance of meta-planning and routing. Disabling the Verification Sub-agent resulted in a doubling of numerical error rates, highlighting the necessity of persistent self-checking.
Importantly, the system required only the introduction of new Worker Systems for domain expansion (e.g., chemistry, general exams). There was no requirement for hand-engineered pipelines for individual competitions.
6. Limitations and Future Directions
Despite robust performance in mathematics and physics, the Chemistry Worker has not yet achieved human-equivalent aggregate scores in IChO, and biology modeling is largely constrained to image classification. Planned extensions include:
- Integration with laboratory automation and simulation platforms for seamless end-to-end scientific discovery.
- Enhanced support for multimodal tools (e.g., spectrum analyzers, computer vision for lab notebooks).
- Persistent memory for cross-session learning and cumulative expertise.
- Broadening domain coverage to biology, interdisciplinary materials science, and beyond.
7. Significance and Implications
SciAgent constitutes a concrete step toward generalistic scientific intelligence: a system capable of interpretable, cross-disciplinary inference and verifiable solutions at expert human levels, attained through self-assembling multi-agent pipelines, role-specialization, and rigorous meta-reasoning. Its demonstrated adaptability and systematic architecture position it as a benchmark for future research on autonomous, general-purpose scientific AI (Li et al., 11 Nov 2025).
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free