Papers
Topics
Authors
Recent
Search
2000 character limit reached

PhysicsMinions: Multimodal Physics Solver

Updated 21 January 2026
  • PhysicsMinions is a coevolutionary, multimodal, multi-agent system designed to solve complex Olympiad-level physics problems.
  • It integrates Visual, Logic, and Review Studios to extract diagrams, perform symbolic computations, and verify solutions through iterative refinement.
  • The framework enhances base model performance, achieving gold-medal-level accuracy by combining structured feedback and dual-stage verification.

PhysicsMinions is a coevolutionary, multimodal, multi-agent system designed to achieve state-of-the-art performance on Olympiad-level physics problems, particularly those presented in major international competitions such as the International Physics Olympiad (IPhO). The framework emphasizes multi-agent orchestration, multimodal perception, structured solution refinement, and dual-stage verification, achieving open-source gold-medal-level performance on tasks that demand complex symbolic reasoning, multimodal understanding, and iterative problem solving (Yu et al., 29 Sep 2025, Chen et al., 17 Nov 2025).

1. System Architecture and Components

PhysicsMinions consists of three interconnected "studios," each responsible for a specialized facet of solving Olympiad physics problems:

  • Visual Studio: Parses and structurally models all diagrammatic and data-driven content. Its internal pipeline consists of:
    • Inspector: Classifies figures (plots, schematics, free-body diagrams, etc.) and translates their contents to structured JSON, e.g., axis metadata, component lists.
    • Introspector (Image): Audits the JSON, enforces consistency (e.g., unit adherence), fills minor gaps with confidence annotations, and ensures self-contained representation.
    • Verifier (Image): Reconciles the finalized JSON with the source image, generating bug reports if mismatches arise.
  • Logic Studio: Manages symbolic and numerical solution processes:
    • Solver: Consumes the problem statement and validated diagram JSON, producing a two-part response—(1) a summary verdict and boxed answer, and (2) a detailed derivation in LaTeX.
    • Introspector (Self-Improve): Tightens the derivation, standardizes notation, corrects computational and logical flaws, focusing special attention on issues highlighted by the Verifiers' bug reports.
  • Review Studio: Implements a dual-stage verification cascade:
    • Physics-Verifier: Checks unit consistency, correct constant usage, and contextual appropriateness (e.g., ensuring quantities align with domain expectations such as force or energy).
    • General-Verifier: Performs deep, stepwise logical auditing—ensuring completeness, subpart matching, error-free algebraic manipulations, and correct inference chains.

The studios operate within a coevolutionary feedback loop, iteratively exchanging candidate solutions and bug reports. Convergence is enforced via a "consecutive verification" (CV) criterion: a draft must pass both Review Studio verifiers CV consecutive times before acceptance. The iterative structure enables self-correction and grounds the solution in both domain and general logical validity.

For text-only problems (as in some deployments), Visual Studio is disabled and the workflow proceeds via text prompt engineering and reviewer logic using specialized models such as P1 (Chen et al., 17 Nov 2025).

2. Iterative Refinement and Mathematical Formalization

PhysicsMinions' optimization scheme is cast as a feedback-driven search over the space of textual solutions, targeting satisfaction of both domain-specific and general logical constraints. Let SS denote a candidate solution, and define indicator functions:

  • Vphy(S)=1V_{\rm phy}(S) = 1 iff SS passes Physics-Verifier,
  • Vgen(S)=1V_{\rm gen}(S) = 1 iff SS passes General-Verifier.

The system seeks SS^* with Vphy(S)=Vgen(S)=1V_{\rm phy}(S^*) = V_{\rm gen}(S^*) = 1. At refinement step kk, SkS_k is updated using a correction operator TθT_\theta, parameterized by the introspector and informed by the latest bug report BkB_k:

Sk+1=Tθ(Sk,Bk)S_{k+1} = T_\theta(S_k, B_k)

The global objective is defined as:

L(S)=α(1Vphy(S))+β(1Vgen(S)),α,β>0L(S) = \alpha \cdot (1 - V_{\rm phy}(S)) + \beta \cdot (1 - V_{\rm gen}(S)), \quad \alpha, \beta > 0

with the process iteratively reducing L(S)L(S), terminating when L(Sk)=0L(S_k) = 0 is observed over CVCV consecutive kk.

Pseudocode for the studio loop:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
I = VisualExtract(image)
S = Solver(Q, I)  # initial solution
c, f = 0, 0
for _ in range(max_iter):
    S = IntrospectorImprove(S)
    pass_phy, report_phy = PhysicsVerify(S)
    if not pass_phy:
        S = IntrospectorImprove(S + report_phy); f += 1; c = 0
        if f >= CV: S = Solver(Q, I); f = c = 0
        continue
    pass_gen, report_gen = GeneralVerify(S)
    if not pass_gen:
        S = IntrospectorImprove(S + report_gen); f += 1; c = 0
        if f >= CV: S = Solver(Q, I); f = c = 0
        continue
    c += 1; f = 0
    if c >= CV: return S
return best_S_found
Here “+” denotes prompt concatenation.

3. Concrete Workflow Examples

Two representative Olympiad problem traces illustrate the pipeline:

A. IPhO Q1-C4: Visual Table Extraction

  • Problem: Identify xx-coordinates of the three peaks in a frequency-absorption plot.
  • Visual Studio emits structured JSON documenting peak coordinates with confidence scores.
  • Logic Studio reads JSON and directly reports the three xx values as boxed answers.
  • Review Studio passes the solution without need for further derivation or correction.

B. IPhO Q3-A6: Physics Derivation Failure and Correction

  • Problem: Symbolic derivation of a characteristic time tt, given qaq_a and constants η,K\eta, K.
  • Single-model baseline produces a result with incorrect units.
  • Physics-Verifier flags unit mismatch in tt; Introspector corrects variable substitution, returning to the correct formula and verified unit structure.
  • General-Verifier then identifies additional derivation gaps, prompting further introspector-led re-derivation. Acceptance follows once both passes succeed.

In both examples, the iterative review and correction loop systematically improves the solution, in contrast to direct one-shot inference.

4. Evaluation Metrics and Empirical Results

PhysicsMinions demonstrates significant performance gains on the HiPhO benchmark, which spans recent editions of global Olympiads:

Model IPhO APhO EuPhO NBPhO PanPhO PanMech F=MA Golds (out of 7)
Gemini-2.5-FT (single) 20.2 27.4 13.2 29.0 44.6 60.5 17.8 6
+ PhysicsMinions 21.5 28.0 16.5 33.3 57.8 72.0 24.0 7
Intern-S1 (single) 15.9 21.7 9.0 23.0 41.1 60.4 18.4 2
+ PhysicsMinions 20.8 25.2 10.1 28.9 46.8 68.7 22.7 6
Qwen2.5VL-32B (single) 9.9 16.5 6.9 15.3 22.5 28.1 7.6 0
+ PhysicsMinions 12.4 17.7 9.0 21.0 29.5 36.0 12.0 2

On the 2025 IPhO, Open-source Intern-S1 with PhysicsMinions achieves a Pass@32 score of 26.8/30 (4th of 406 contestants), outperforming the single-model best of 22.7/30 (22nd place). PhysicsMinions is the first open-source system to win a gold medal under the IPhO average-score metric.

Agentic ablation:

  • Pass@k scaling (Intern-S1, IPhO): Pass@1=15.9, Pass@32=26.8 with PhysicsMinions; single-model Pass@32=22.7.
  • Comparative agentic methods: PhysicsMinions > Self-Refine (×3) > Best-of-3 > Self-MoA.

5. Integration with Specialized Physics Reasoning Models (P1 Family)

The P1 series of models, notably P1-235B-A22B, are trained via reinforcement learning using group sequence policy optimization (GSPO) with truncated importance sampling. PhysicsMinions integrates P1 as both the Solver Minion and Introspector, with the Physics-Verifier leveraging symbolic checkers (SymPy) for rigorous algebraic and dimensionality validation (Chen et al., 17 Nov 2025).

P1’s RL setup:

  • State: full context of the problem and generated tokens.
  • Action: next token.
  • Reward: aggregate binary correctness per boxed sub-answer.
  • Objective: maximize J(πθ)=Eτπθ[t=0Tr(st,at)]J(\pi_\theta) = \mathbb{E}_{\tau\sim\pi_\theta} \left[ \sum_{t=0}^T r(s_t, a_t) \right].
  • Loss: GSPO surrogate with length-normalized importance ratio and clipped weights.

Deployment in PhysicsMinions yields non-trivial inference-time gains:

  • P1-235B-A22B alone achieves average 35.9/57.2 (12 gold + 1 silver, HiPhO).
  • Combined with PhysicsMinions: 38.4/57.2, 12 gold + 1 silver, #1 overall (surpassing Gemini-2.5-Pro at 37.7).
  • At IPhO 2025: P1-235B-A22B+PhysicsMinions achieves 23.2/30 (top single-model result).

6. Generalization, Scaling, and Ablation Studies

PhysicsMinions’ improvements scale with base model capability: large models (Gemini-2.5-FT, Intern-S1) see absolute score gains of up to +6 points, while mid-sized models (Qwen2.5VL-32B) gain +2–3 points. Visual Studio is indispensable for multimodal problems: omitting it drops performance by up to 4.9 points. Review Studio ablation removes up to 2.4 points (depending on which verifier is withheld).

Variance on stopping criterion (CV) shows that CV=2 maximizes accuracy (e.g., 20.8 on IPhO for Intern-S1), with lower CV yielding under-verification and higher CV leading to over-iteration and diminishing returns.

7. Key Insights, Limitations, and Prospective Extensions

Insights:

  • Coevolutionary feedback, combining solution synthesis with iterative, bug-report–driven correction, consistently breaks the performance ceiling of single-pass models.
  • Structured extraction of diagrammatic information is critical to robust multimodal physics reasoning.
  • Dual-stage verification (Physics-Verifier and General-Verifier) offers substantially higher error coverage than monolithic systems.

Limitations:

  • Visual Studio may misinterpret fine-grained chart details without further calibration.
  • Computational costs scale by 2×2\times3×3\times compared to direct inference.
  • Simple problems may incur unnecessary correction cycles due to heuristic stopping.

Future Directions:

  • Integration of advanced chart-digitization and computer vision modules for <1% diagram extraction error.
  • Coupling with external symbolic and numerical solvers (e.g., SageMath, SciPy) for further algebraic robustness.
  • Adapting the coevolutionary multi-agent paradigm to other Olympiad domains, including mathematical proof and complex biological/engineering diagrams.

PhysicsMinions exemplifies a systematic, robust approach to high-level scientific problem solving, setting a new standard for open-source performance in complex symbolic and multimodal benchmarks and providing a platform for generalized, agentic problem solving in STEM contexts (Yu et al., 29 Sep 2025, Chen et al., 17 Nov 2025).

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to PhysicsMinions.