Alignment of LLM Instructional Behavior with Expert Human Tutors

Determine how closely the instructional behavior of large language models aligns with that of human tutors when responding to student errors in mathematics tutoring contexts.

Background

The paper investigates how LLMs compare to human tutors in math remediation dialogues, focusing on instructional strategies such as restating/revoicing and pressing for accuracy, as well as linguistic features like lexical diversity, readability, politeness, and agency.

The authors frame the central motivation by noting uncertainty about the degree to which LLMs emulate expert human instructional behavior in responses to student errors. They then perform controlled, turn-level comparisons across expert human tutors, novice human tutors, and multiple LLMs.

References

As LLMs are increasingly used to generate tutoring responses, an important open question is how closely their instructional behavior aligns with that of human tutors when responding to student errors.