Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 90 tok/s
Gemini 2.5 Pro 53 tok/s Pro
GPT-5 Medium 41 tok/s
GPT-5 High 42 tok/s Pro
GPT-4o 109 tok/s
GPT OSS 120B 477 tok/s Pro
Kimi K2 222 tok/s Pro
2000 character limit reached

RDBE: Reasoning Distillation-Based Evaluation Enhances Automatic Essay Scoring (2407.13781v1)

Published 3 Jul 2024 in cs.CL, cs.CY, and cs.LG

Abstract: Recently, various encoder-only and encoder-decoder pre-trained models like BERT and T5 have been applied to automatic essay scoring (AES) as small LLMs. However, existing studies have primarily treated this task akin to a classification problem, focusing solely on outputting scores in the target text without offering interpretations for the generated scores. Departing from the approaches, we introduce Reasoning Distillation-Based Evaluation (RDBE), which integrates interpretability to elucidate the rationale behind model scores while enhancing performance through initial reasoning. This interpretive capability is acquired during training by leveraging generated reasoning from a LLM to distill a small LLM (SLM). Our experimental results demonstrate the efficacy of RDBE across all scoring rubrics considered in the dataset. RDBE outperforms both zero-shot LLM generation and generation from a baseline fine-tuned model, establishing itself as state-of-the-art in the corresponding dataset. This highlights its practical interpretative output and enhanced performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (13)
  1. J. Cohen. Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychological bulletin, 70(4):213, 1968.
  2. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, 2019.
  3. Autoregressive score generation for multi-trait essay scoring. arXiv preprint arXiv:2403.08332, 2024.
  4. F. Dong and Y. Zhang. Automatic features for essay scoring–an empirical study. In Proceedings of the 2016 conference on empirical methods in natural language processing, pages 1072–1077, 2016.
  5. Longt5: Efficient text-to-text transformer for long sequences. arXiv preprint arXiv:2112.07916, 2021.
  6. Large language models are reasoning teachers. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 14852–14882, 2023.
  7. Distilling step-by-step! outperforming larger language models with less training data and smaller model sizes. In Findings of the Association for Computational Linguistics: ACL 2023, pages 8003–8017, 2023.
  8. Many hands make light work: Using essay traits to automatically score essays. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1485–1495, 2022.
  9. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research, 21(140):1–67, 2020.
  10. Automated english digital essay grader using machine learning. In 2019 IEEE International Conference on Engineering, Technology and Education (TALE), pages 1–6. IEEE, 2019.
  11. On the use of bert for automated essay scoring: Joint learning of multi-scale essay representation. arXiv preprint arXiv:2205.03835, 2022.
  12. Enhancing automated essay scoring performance via fine-tuning pre-trained language models with combination of regression and ranking. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1560–1569, 2020.
  13. Dress: Dataset for rubric-based essay scoring on efl writing. arXiv preprint arXiv:2402.16733, 2024.
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube