Automatic Essay Multi-dimensional Scoring with Fine-tuning and Multiple Regression (2406.01198v1)

Published 3 Jun 2024 in cs.CL and cs.AI

Abstract: Automated essay scoring (AES) involves predicting a score that reflects the writing quality of an essay. Most existing AES systems produce only a single overall score. However, users and L2 learners expect scores across different dimensions (e.g., vocabulary, grammar, coherence) for English essays in real-world applications. To address this need, we have developed two models that automatically score English essays across multiple dimensions by employing fine-tuning and other strategies on two large datasets. The results demonstrate that our systems achieve impressive performance in evaluation using three criteria: precision, F1 score, and Quadratic Weighted Kappa. Furthermore, our system outperforms existing methods in overall scoring.

Authors (2)

Kun Sun (51 papers)
Rong Wang (150 papers)

Citations (2)

View on Semantic Scholar

Summary

Automatic Essay Multi-dimensional Scoring with Fine-tuning and Multiple Regression

The paper "Automatic Essay Multi-dimensional Scoring with Fine-tuning and Multiple Regression" addresses a nuanced challenge in the field of automated essay scoring (AES), moving beyond holistic scoring to develop a system that assesses essays across multiple dimensions such as vocabulary, grammar, and coherence. This work is grounded in the recognition that second-language (L2) learners and educators require more detailed feedback to enhance educational outcomes.

Key Contributions and Methodologies

The authors developed two AES models utilizing fine-tuning strategies and multiple regression techniques applied to BERT-based classifiers. By selecting the RoBERTa and DistilBERT models, well-regarded for their text classification capacities, the researchers aimed to leverage pre-existing strengths in language understanding to enhance AES effectiveness. Their approach involves configuring these models with a dual-head architecture for classification and regression tasks, facilitating a multi-dimensional evaluation of essay quality. Additionally, contrastive learning is incorporated to effectively process supplemental information relating to essay prompts, topics, and requirements.

The reported results demonstrate notable performance improvements across multiple metrics, specifically precision, F1 score, and Quadratic Weighted Kappa (QWK). The system rigorously validated performance on sizable datasets, including ELLIPSE and IELTS, furnishing a comprehensive framework for multi-dimensional AES that aligns closely with real-world educational requirements. The models showed consistent and reliable performance across both datasets, indicating a robust generalization capability.

Evaluation and Results

The research presents a thorough evaluation of the proposed AES models. For example, in Study 1, the RoBERTa-based model obtained QWK scores exceeding 0.8 in multiple dimensions of the ELLIPSE dataset. This significant performance was validated by the similar high scoring reported in Study 2 with the IELTS dataset, illustrating a high level of model accuracy and utility across different educational settings and essay qualities.

When compared to preceding AES systems, which often focused on a single holistic score, this multi-dimensional approach presents a marked improvement, addressing both effectiveness and user expectation alignment. The new AES system outperforms existing methods in overall scoring, with enhanced reliability and comprehensiveness.

Practical and Theoretical Implications

Practically, this paper's paper provides a significant extension in the capabilities of AES systems. The introduction of a multi-dimensional grading framework represents a step forward in educational technologies, offering richer, detailed, and actionable feedback that is essential for effective language learning support for L2 learners.

From a theoretical perspective, the research contributes to the ongoing discourse on transformer-based models and their adaptation for specialized NLP tasks. It demonstrates the adaptability of pre-existing model architectures through fine-tuning, enabling them to perform well in domains beyond their initial design, showcasing the potential for further application in other graded assessment fields.

Speculations on Future Developments

Looking ahead, this research opens avenues for further exploration in AES. Future works could consider expanding the framework to include other dimensions of writing, incorporating more granular linguistic features, or exploring unsupervised methods for score prediction where labeled data is limited. Additionally, the integration of more varied datasets could enhance the generalizability and adaptability of these systems across wider educational contexts and demographic variables.

In conclusion, this paper makes a significant contribution to the field of automated essay assessment by introducing a robust and flexible multi-dimensional scoring system, setting a benchmark for future research and developments in automatic grading systems and educational technology.

PDF Markdown

Related Papers

Find Related Papers

Tweets

https://twitter.com/Sharp_K_Sun/status/1797974092710281349