Towards Verifiable and Self-Correcting AI Physicists for Quantum Many-Body Simulations
Abstract: Recent advances in automated scientific discovery have shown remarkable promise across frontier research domains, with agent systems driven by LLMs emerging as powerful tools for physics research. However, in practical applications, LLM scientific research is prone to hallucinations, highlighting the need for reliable verification and error-correction mechanisms. Here we introduce PhysVEC, an automated multi-agent framework for verifiable and error-correcting AI-driven physics research. PhysVEC incorporates a programming verifier and a scientific verifier to ensure both coding correctness and physical validity, and provides human-auditable evidence at each stage. We curate QMB100, an end-to-end research-level benchmark dataset consisting of $100$ tasks extracted from $21$ high impact articles that focus on quantum many-body physics. We evaluated PhysVEC with four frontier LLMs and found that it significantly outperformed baselines in both programming tests and scientific tests across all LLMs and task categories. PhysVEC demonstrates effective inference-time scaling and delivers accurate physical predictions through integrated verification and error-correction mechanisms, paving the way for reliable and interpretable AI physicists.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.