Towards a Large Physics Benchmark (2507.21695v1)
Abstract: We introduce a benchmark framework developed by and for the scientific community to evaluate, monitor and steer LLM development in fundamental physics. Building on philosophical concepts of scientific understanding and creativity, we develop a scoring system in which each question is scored by an expert for its correctness, difficulty, and surprise. The questions are of three forms: (i) multiple-choice questions for conceptual understanding, (ii) analytical problems requiring mathematical derivation, and (iii) openended tasks requiring complex problem solving. Our current dataset contains diverse set of examples, including a machine learning challenge to classify high-energy physics events, such as the four top quark signal. To ensure continued relevance, we propose a living benchmark, where physicists contribute questions, for instance alongside new publications. We invite contributions via: http://www.physicsbenchmarks.org/. We hope that this benchmark will enable a targeted AI development that can make a meaningful contribution to fundamental physics research.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.