Human-AI Collaborative Essay Scoring: A Dual-Process Framework with LLMs (2401.06431v2)

Published 12 Jan 2024 in cs.CL and cs.AI

Abstract: Receiving timely and personalized feedback is essential for second-language learners, especially when human instructors are unavailable. This study explores the effectiveness of LLMs, including both proprietary and open-source models, for Automated Essay Scoring (AES). Through extensive experiments with public and private datasets, we find that while LLMs do not surpass conventional state-of-the-art (SOTA) grading models in performance, they exhibit notable consistency, generalizability, and explainability. We propose an open-source LLM-based AES system, inspired by the dual-process theory. Our system offers accurate grading and high-quality feedback, at least comparable to that of fine-tuned proprietary LLMs, in addition to its ability to alleviate misgrading. Furthermore, we conduct human-AI co-grading experiments with both novice and expert graders. We find that our system not only automates the grading process but also enhances the performance and efficiency of human graders, particularly for essays where the model has lower confidence. These results highlight the potential of LLMs to facilitate effective human-AI collaboration in the educational context, potentially transforming learning experiences through AI-generated feedback.

PDF HTML Abstract

Introduction

Educational institutions around the globe are constantly seeking innovative solutions to provide timely and personalized feedback to learners, particularly in language education. With an expanding reliance on automated tools to supplement language learning, Automated Essay Scoring (AES) systems have garnered significant attention. The development and deployment of such systems are paramount, especially in contexts with high student-to-teacher ratios, where individual feedback from educators becomes a logistical challenge. This focus has led to the exploration of LLMs as tools for AES, where their capabilities are assessed in comparison to human instructors and traditional AES methodologies.

Enhancing AES with LLMs

LLMs like GPT-4 and fine-tuned GPT-3.5 have made substantial strides forward. They demonstrate capabilities that encompass superior accuracy, consistency, generalizability, and, critically, interpretability when compared to traditional models. An AES system powered by these advanced LLMs can offer detailed explanations for their scoring, a feature that commonly available AES tools often lack. Particularly in situations where specific grading criteria are complex, such as evaluating the logical structure of essays, LLMs reveal their adeptness at understanding and adhering to such intricate guidelines.

Human-AI Collaborative Grading

Human evaluation experiments complementing this research emphasize the collaborative prowess of AI and humans. The paper revealed that LLM-generated feedback can significantly augment the grading accuracy of novices, equating their performances to expert graders'. Expert graders also benefit from the AI's presence by maintaining greater scoring consistency and efficiency. This finding is pivotal because it illustrates how AI-generated feedback does not merely replace the human element but enhances it, promoting a synergy that could redefine educational assessments.

Conclusion and Future Directions

Concluding, the research underscores LLMs as formidable allies in the landscape of language education and, specifically, in the task of automated essay scoring. By integrating these advanced AI tools, the grading process not only becomes more effective but also supports educators and learners in a more personalized manner. It opens up a new dialog on the future of education technology, where the boundaries of AI assistance continue to expand, presenting a nuanced model of support for both students and teachers.

As the field of LLMs continues to evolve, the possibilities for refashioning educational tools and methodologies are vast. Further investigation is warranted to explore and understand the full scope of LLMs' abilities and to refine their collaborative roles within diverse educational settings. This research paves the way for future studies aimed at unraveling the nuanced dynamics of human-AI interactions and their implications for pedagogy and learning experiences.

PDF Markdown Bookmark Chat (Pro)

References (34)

Authors (7)

Changrong Xiao (2 papers)
Wenxing Ma (1 paper)
Sean Xin Xu (3 papers)
Kunpeng Zhang (31 papers)
Yufang Wang (5 papers)
Qi Fu (7 papers)
Qingping Song (1 paper)

Citations (5)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - Xiaochr/LLM-AES: [arXiv] From Automation to Augmentation: Large Language Models Elevating the Essay Scoring Landscape (27 stars)

Tweets

https://twitter.com/yishii_0207/status/1747757620327522380