Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Autoregressive Score Generation for Multi-trait Essay Scoring (2403.08332v1)

Published 13 Mar 2024 in cs.CL and cs.AI

Abstract: Recently, encoder-only pre-trained models such as BERT have been successfully applied in automated essay scoring (AES) to predict a single overall score. However, studies have yet to explore these models in multi-trait AES, possibly due to the inefficiency of replicating BERT-based models for each trait. Breaking away from the existing sole use of encoder, we propose an autoregressive prediction of multi-trait scores (ArTS), incorporating a decoding process by leveraging the pre-trained T5. Unlike prior regression or classification methods, we redefine AES as a score-generation task, allowing a single model to predict multiple scores. During decoding, the subsequent trait prediction can benefit by conditioning on the preceding trait scores. Experimental results proved the efficacy of ArTS, showing over 5% average improvements in both prompts and traits.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (24)
  1. Majdi Beseiso and Saleh Alzahrani. 2020. An empirical analysis of bert embedding for automated essay scoring. International Journal of Advanced Computer Science and Applications, 11(10).
  2. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  3. Domain-adaptive neural automated essay scoring. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1011–1020.
  4. Jacob Cohen. 1968. Weighted kappa: nominal scale agreement provision for scaled disagreement or partial credit. Psychological bulletin, 70(4):213.
  5. Automated essay scoring with string kernels and word embeddings. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 503–509, Melbourne, Australia. Association for Computational Linguistics.
  6. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  7. Prompt-and trait relation-aware cross-prompt essay trait scoring. arXiv preprint arXiv:2305.16826.
  8. Fei Dong and Yue Zhang. 2016. Automatic features for essay scoring-an empirical study. In EMNLP, volume 435, pages 1072–1077.
  9. Attention-based recurrent convolutional neural network for automatic essay scoring. In CoNLL, pages 153–162.
  10. Automated chinese essay scoring from multiple traits. In Proceedings of the 29th International Conference on Computational Linguistics, pages 3007–3016.
  11. A trait-based deep learning automated essay scoring system with adaptive feedback. Int J Adv Comput Sci Appl, 11(5):287–293.
  12. Many hands make light work: Using essay traits to automatically score essays. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1485–1495.
  13. Toward automated multi-trait scoring of essays: Investigating links among holistic, analytic, and text feature scores. Applied Linguistics, 31(3):391–417.
  14. Sandeep Mathias and Pushpak Bhattacharyya. 2018. Asap++: Enriching the asap automated essay grading dataset with essay attribute scores. In Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018).
  15. Sandeep Mathias and Pushpak Bhattacharyya. 2020. Can neural networks automatically score essay traits? In Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 85–91.
  16. Elijah Mayfield and Alan W Black. 2020. Should you fine-tune bert for automated essay scoring? In Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 151–162.
  17. On the importance of data size in probing fine-tuned models. In Findings of the Association for Computational Linguistics: ACL 2022, pages 228–238, Dublin, Ireland. Association for Computational Linguistics.
  18. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67.
  19. Automated cross-prompt scoring of essay traits. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 13745–13753.
  20. Language models and automated essay scoring. arXiv preprint arXiv:1909.09482.
  21. Kaveh Taghipour and Hwee Tou Ng. 2016. A neural approach to automated essay scoring. In Proceedings of the 2016 conference on empirical methods in natural language processing, pages 1882–1891.
  22. Neural automated essay scoring incorporating handcrafted features. In Proceedings of the 28th International Conference on Computational Linguistics, pages 6077–6088.
  23. On the use of bert for automated essay scoring: Joint learning of multi-scale essay representation. arXiv preprint arXiv:2205.03835.
  24. Enhancing automated essay scoring performance via fine-tuning pre-trained language models with combination of regression and ranking. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1560–1569.
Citations (3)

Summary

We haven't generated a summary for this paper yet.