Autoregressive Score Generation for Multi-trait Essay Scoring (2403.08332v1)
Abstract: Recently, encoder-only pre-trained models such as BERT have been successfully applied in automated essay scoring (AES) to predict a single overall score. However, studies have yet to explore these models in multi-trait AES, possibly due to the inefficiency of replicating BERT-based models for each trait. Breaking away from the existing sole use of encoder, we propose an autoregressive prediction of multi-trait scores (ArTS), incorporating a decoding process by leveraging the pre-trained T5. Unlike prior regression or classification methods, we redefine AES as a score-generation task, allowing a single model to predict multiple scores. During decoding, the subsequent trait prediction can benefit by conditioning on the preceding trait scores. Experimental results proved the efficacy of ArTS, showing over 5% average improvements in both prompts and traits.
- Majdi Beseiso and Saleh Alzahrani. 2020. An empirical analysis of bert embedding for automated essay scoring. International Journal of Advanced Computer Science and Applications, 11(10).
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- Domain-adaptive neural automated essay scoring. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1011–1020.
- Jacob Cohen. 1968. Weighted kappa: nominal scale agreement provision for scaled disagreement or partial credit. Psychological bulletin, 70(4):213.
- Automated essay scoring with string kernels and word embeddings. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 503–509, Melbourne, Australia. Association for Computational Linguistics.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- Prompt-and trait relation-aware cross-prompt essay trait scoring. arXiv preprint arXiv:2305.16826.
- Fei Dong and Yue Zhang. 2016. Automatic features for essay scoring-an empirical study. In EMNLP, volume 435, pages 1072–1077.
- Attention-based recurrent convolutional neural network for automatic essay scoring. In CoNLL, pages 153–162.
- Automated chinese essay scoring from multiple traits. In Proceedings of the 29th International Conference on Computational Linguistics, pages 3007–3016.
- A trait-based deep learning automated essay scoring system with adaptive feedback. Int J Adv Comput Sci Appl, 11(5):287–293.
- Many hands make light work: Using essay traits to automatically score essays. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1485–1495.
- Toward automated multi-trait scoring of essays: Investigating links among holistic, analytic, and text feature scores. Applied Linguistics, 31(3):391–417.
- Sandeep Mathias and Pushpak Bhattacharyya. 2018. Asap++: Enriching the asap automated essay grading dataset with essay attribute scores. In Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018).
- Sandeep Mathias and Pushpak Bhattacharyya. 2020. Can neural networks automatically score essay traits? In Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 85–91.
- Elijah Mayfield and Alan W Black. 2020. Should you fine-tune bert for automated essay scoring? In Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 151–162.
- On the importance of data size in probing fine-tuned models. In Findings of the Association for Computational Linguistics: ACL 2022, pages 228–238, Dublin, Ireland. Association for Computational Linguistics.
- Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67.
- Automated cross-prompt scoring of essay traits. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 13745–13753.
- Language models and automated essay scoring. arXiv preprint arXiv:1909.09482.
- Kaveh Taghipour and Hwee Tou Ng. 2016. A neural approach to automated essay scoring. In Proceedings of the 2016 conference on empirical methods in natural language processing, pages 1882–1891.
- Neural automated essay scoring incorporating handcrafted features. In Proceedings of the 28th International Conference on Computational Linguistics, pages 6077–6088.
- On the use of bert for automated essay scoring: Joint learning of multi-scale essay representation. arXiv preprint arXiv:2205.03835.
- Enhancing automated essay scoring performance via fine-tuning pre-trained language models with combination of regression and ranking. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1560–1569.