2000 character limit reached
Analyzing Large Language Models for Classroom Discussion Assessment (2406.08680v1)
Published 12 Jun 2024 in cs.CL
Abstract: Automatically assessing classroom discussion quality is becoming increasingly feasible with the help of new NLP advancements such as LLMs. In this work, we examine how the assessment performance of 2 LLMs interacts with 3 factors that may affect performance: task formulation, context length, and few-shot examples. We also explore the computational efficiency and predictive consistency of the 2 LLMs. Our results suggest that the 3 aforementioned factors do affect the performance of the tested LLMs and there is a relation between consistency and performance. We recommend a LLM-based assessment approach that has a good balance in terms of predictive performance, computational efficiency, and consistency.