Can ChatGPT pass a physics degree? Making a case for reformation of assessment of undergraduate degrees (2412.01312v1)

Published 2 Dec 2024 in physics.ed-ph

Abstract: The emergence of conversational natural language processing models presents a significant challenge for Higher Education. In this work, we use the entirety of a UK physics undergraduate (BSc with Honours) degree including all examinations and coursework to test if ChatGPT (GPT-4) can pass a degree. We adopt a "maximal cheating" approach wherein we permit ourselves to modify questions for clarity, split questions up into smaller sub-components, expand on answers given - especially for long form written responses, obtaining references, and use of advanced coaching, plug-ins and custom instructions to optimize outputs. In general, there are only certain parts of the degree in question where GPT-4 fails. Explicitly these include compulsory laboratory elements, and the final project which is assessed by a viva. If these were no issue, then GPT-4 would pass with a grade of an upper second class overall. In general, coding tasks are performed exceptionally well, along with simple single-step solution problems. Multiple step problems and longer prose are generally poorer along with interdisciplinary problems. We strongly suggest that there is now a necessity to urgently re-think and revise assessment practice in physics - and other disciplines - due to the existence of AI such as GPT-4. We recommend close scrutiny of assessment tasks: only invigilated in-person examinations, vivas, laboratory skills testing (or "performances" in other disciplines), and presentations are not vulnerable to GPT-4, and urge consideration of how AI can be embedded within the disciplinary context.

Summary

The paper demonstrates GPT-4’s strong performance in computational tasks, achieving 93% in Intermediate Quantum Mechanics and 85% in Matter at Extremes.
The paper employs a maximal cheating methodology by deconstructing questions and modifying prompts to reveal GPT-4’s proficiency and shortcomings in multi-step reasoning and interdisciplinary tasks.
The paper calls for assessment reform by advocating for robust in-person lab work and oral examinations to maintain academic integrity in the era of advanced AI.

Evaluating the Capabilities of GPT-4 in Passing a Physics Degree

This paper by Pimbblet and Morrell investigates the capability of OpenAI's GPT-4 in completing the assessments contained within a UK Physics undergraduate (BSc with Honours) program. By employing a "maximal cheating" approach, the researchers explored the extent to which GPT-4 could complete exams and coursework and achieve a passing grade. Despite notable successes in specific areas, GPT-4 was unable to pass the entire degree due to its incapacity to engage in compulsory, hands-on laboratory elements and viva assessments that require real-time interaction and understanding.

The methodology employed for this assessment was particularly thorough. The researchers leveraged GPT-4's strengths by breaking down questions, expanding on answers, modifying prompts, and making use of its coaching features. However, challenges arose in multi-step problem solving, complex reasoning tasks, and disciplines that were interdisciplinary in nature. In practical scenarios, such as laboratory work which demands physical interaction, GPT-4's lack of autonomy rendered it incapable of succeeding.

Key Findings

Performance Insights: GPT-4 excelled in computational tasks, programming exercises, and factual recall. Tasks that involved direct application of mathematical formulas or programming algorithms were completed with high efficacy. GPT-4 obtained notably high grades in modules like Intermediate Quantum Mechanics with Advanced Computation (93%) and Matter at Extremes (85%).
Assessing Limitations: Critical skills such as reasoning through multi-stage problems, handling graphical data, and engaging in live, interactive assessments (e.g., vivas) were identified as areas where GPT-4 underperformed. Notably, it was unable to participate in hands-on laboratory tasks or demonstrate understanding in verbal examinations, which are compulsory components in passing the degree.
Implications for Assessment Design: These findings underscore the need for reform in academic assessment, particularly focusing on robust testing methods impervious to AI facilitation. The viability of in-person examinations, practical labs, and oral defenses ensures academic integrity in the face of advancing AI capabilities.

Implications and Future Directions

The paper recommends two potential pathways in response to these challenges: increasing the robustness of assessments via in-person methods or embedding AI as a tool to enhance learning, thereby requiring educational curricula to adapt. While the former secures academic standards by mitigating unauthorized AI use, the latter approach could empower students to use AI effectively and ethically. This indicates an evolving educational landscape where AI literacy becomes paramount alongside traditional academic skills.

Speculating on future developments, the evolving landscape of AI in education suggests a continuous assessment of curricular structures. As AI capabilities expand, particularly in areas that currently challenge them, educational institutions must carefully balance the integration of AI tools with maintaining rigorous academic standards. The outcomes of using AI in physics assessments hint at a broader application across disciplines, warranting a deeper exploration into the integration and resistance strategies needed within academia.

In conclusion, this paper provides a comprehensive evaluation of GPT-4's capabilities within an academic framework, emphasizing the pressing need for educators and institutions to adapt assessment practices in response to AI advances. The dual strategy approach—adapting assessments to either exclude or include AI tools responsibly—provides a meaningful roadmap for future educational strategies. These considerations underscore the complex interaction between AI deployment and pedagogical integrity, marking a transformative phase in higher education.