Reliability of AI reasoning models for physics problem solving
Determine whether Large Language Model reasoning systems—including models such as OpenAI’s o3-mini—can be considered reliable for physics problem solving under the study’s working definition of reliability, namely producing correct answers repeatedly across introductory physics story problems and topics.
References
Until then, the question of reasoning models' reliability for the purposes of problem solving will remain open.
                — AI Reasoning Models for Problem Solving in Physics
                
                (2508.20941 - Bralin et al., 28 Aug 2025) in Discussion and Conclusions