Dice Question Streamline Icon: https://streamlinehq.com

Cause of performance decline of o3-mini across later physics topics

Ascertain the causes of the observed decline in accuracy of OpenAI’s o3-mini on story problems as topics progress across Halliday and Resnick’s Fundamentals of Physics Vol. 1, particularly in waves and thermodynamics, and determine the factors driving this drop in performance.

Information Square Streamline Icon: https://streamlinehq.com

Background

The results table shows near-perfect performance on early mechanics chapters, but lower success rates in later chapters: 87% and 76% in Waves I and II, and declines in kinetic theory and thermodynamics (96% → 92% → 88%). The discussion suggests possible contributing factors such as increased mathematical complexity and potential underrepresentation of these topics in training data.

The authors explicitly note that the reason for this trend has not been identified and remains an unresolved question, motivating future paper on broader topic coverage (including Vol. 2) and more challenging problems, as well as testing updated models like o4-mini.

References

The question of why the model performance of o3-mini was dropping as the topics progressed remains open.

AI Reasoning Models for Problem Solving in Physics (2508.20941 - Bralin et al., 28 Aug 2025) in Section 6: Limitations and Future Work