Cause of GPT-4’s notably strong performance in the Matter at Extremes module
Investigate the factors underlying GPT-4’s strong performance on assessments in the Matter at Extremes module (covering topics such as particle colliders and superconductivity), specifically testing whether this is due to a limited variety of canonical question forms or to fortuitous alignment with the distributions present in GPT-4’s training data.
References
We were impressed by the quality of answers here but are are unclear why this might be the case - perhaps a limited set of questions (or variations thereof) exist, or perhaps the nature of the questions is fortuitously coincidental with GPT-4's training set.
— Can ChatGPT pass a physics degree? Making a case for reformation of assessment of undergraduate degrees
(2412.01312 - Pimbblet et al., 2 Dec 2024) in Section 4, “Matter at Extremes”