Papers
Topics
Authors
Recent
Search
2000 character limit reached

Towards a Large Physics Benchmark

Published 29 Jul 2025 in physics.data-an, cs.AI, hep-ph, physics.comp-ph, and physics.hist-ph | (2507.21695v1)

Abstract: We introduce a benchmark framework developed by and for the scientific community to evaluate, monitor and steer LLM development in fundamental physics. Building on philosophical concepts of scientific understanding and creativity, we develop a scoring system in which each question is scored by an expert for its correctness, difficulty, and surprise. The questions are of three forms: (i) multiple-choice questions for conceptual understanding, (ii) analytical problems requiring mathematical derivation, and (iii) openended tasks requiring complex problem solving. Our current dataset contains diverse set of examples, including a machine learning challenge to classify high-energy physics events, such as the four top quark signal. To ensure continued relevance, we propose a living benchmark, where physicists contribute questions, for instance alongside new publications. We invite contributions via: http://www.physicsbenchmarks.org/. We hope that this benchmark will enable a targeted AI development that can make a meaningful contribution to fundamental physics research.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We found no open problems mentioned in this paper.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.

alphaXiv

  1. Towards a Large Physics Benchmark (8 likes, 0 questions)