Using Large Language Models to Assign Partial Credit to Students' Explanations of Problem-Solving Process: Grade at Human Level Accuracy with Grading Confidence Index and Personalized Student-facing Feedback

Published 9 Dec 2024 in physics.ed-ph | (2412.06910v2)

Abstract: This study examines the feasibility and potential advantages of using LLMs, in particular GPT-4o, to perform partial credit grading of large numbers of student written responses to introductory level physics problems. Students were instructed to write down verbal explanations of their reasoning process when solving one conceptual and two numerical calculation problems on in class exams. The explanations were then graded according to a 3-item rubric with each item grades as binary (1 or 0). We first demonstrate that machine grading using GPT-4o with no examples nor reference answer can reliably agree with human graders on 70%-80% of all cases, which is equal to or higher than the level at which two human graders agree with each other. Two methods are essential for achieving this level of accuracy: 1. Adding explanation language to each rubric item that targets the errors of initial machine grading. 2. Running the grading process 5 times and taking the most frequent outcome. Next, we show that the variation in outcomes across 5 machine grading attempts as measured by the Shannon Entropy can serve as a grading confidence index, allowing a human instructor to identify ~40% of all potentially incorrect gradings by reviewing just 10 - 15% of all responses. Finally, we show that it is straightforward to use GPT-4o to write clear explanations of the partial credit grading outcomes. Those explanations can be used as feedback for students, which will allow students to understand their grades and raise different opinions when necessary. Almost all feedback messages generated were rated 3 or above on a 5-point scale by two experienced instructors. The entire grading and feedback generating process cost roughly $5 per 100 student answers, which shows immense promise for automating labor-intensive grading process by a combination of machine grading with human input and supervision.

Abstract PDF Upgrade to Chat

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Open Problems

We found no open problems mentioned in this paper.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Generate Now

Using Large Language Models to Assign Partial Credit to Students' Explanations of Problem-Solving Process: Grade at Human Level Accuracy with Grading Confidence Index and Personalized Student-facing Feedback

Summary

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Authors (2)

Collections

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Using Large Language Models to Assign Partial Credit to Students' Explanations of Problem-Solving Process: Grade at Human Level Accuracy with Grading Confidence Index and Personalized Student-facing Feedback

Summary

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Related Papers

Authors (2)

Collections

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research