Neural Automated Writing Evaluation with Corrective Feedback (2402.17613v2)
Abstract: The utilization of technology in second language learning and teaching has become ubiquitous. For the assessment of writing specifically, automated writing evaluation (AWE) and grammatical error correction (GEC) have become immensely popular and effective methods for enhancing writing proficiency and delivering instant and individualized feedback to learners. By leveraging the power of NLP and machine learning algorithms, AWE and GEC systems have been developed separately to provide language learners with automated corrective feedback and more accurate and unbiased scoring that would otherwise be subject to examiners. In this paper, we propose an integrated system for automated writing evaluation with corrective feedback as a means of bridging the gap between AWE and GEC results for second language learners. This system enables language learners to simulate the essay writing tests: a student writes and submits an essay, and the system returns the assessment of the writing along with suggested grammatical error corrections. Given that automated scoring and grammatical correction are more efficient and cost-effective than human grading, this integrated system would also alleviate the burden of manually correcting innumerable essays.
- Automatic Text Scoring Using Neural Networks. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 715–725, Berlin, Germany. Association for Computational Linguistics.
- College Students’ Perceptions of an Automated Writing Evaluation as a Supplementary Feedback Tool in a Writing Class. Jurnal Ilmu Pendidikan, 27(1):41–51.
- Reference-based Metrics can be Replaced with Reference-less Metrics in Evaluating Grammatical Error Correction Systems. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 343–348, Taipei, Taiwan. Asian Federation of Natural Language Processing.
- The BEA-2019 Shared Task on Grammatical Error Correction. In Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 52–75, Florence, Italy. Association for Computational Linguistics.
- Automatic Annotation and Evaluation of Error Types for Grammatical Error Correction. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 793–805, Vancouver, Canada. Association for Computational Linguistics.
- Jean Chandler. 2003. The efficacy of various kinds of error feedback for improvement in the accuracy and fluency of L2 student writing. Journal of Second Language Writing, 12(3):267–296.
- Chi-Fen Emily Chen and Wei-Yuan Eugene Cheng. 2008. Beyond the Design of Automated Writing Evaluation: Pedagogical Practices and Perceived Learning Effectiveness in EFL Writing Classes. Language Learning & Technology, 12(2):94–112.
- Yuan Chen and Xia Li. 2023. PMAES: Prompt-mapping Contrastive Learning for Cross-prompt Automated Essay Scoring. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1489–1503, Toronto, Canada. Association for Computational Linguistics.
- Myounghee Cho. 2018. The Use and Efficacy of Correction Symbols in KFL Writing. The Korean Language in America, 22(1):83–92.
- The utility of article and preposition error correction systems for English language learners: Feedback and assessment. Language Testing, 27(3):419–436.
- Jeff Connor-Linton and Charlene Polio. 2014. Comparing perspectives on L2 writing: Multiple analyses of a common corpus. Journal of Second Language Writing, 26(December):1–9.
- How complex is that sentence? A proposed revision of the Rosenberg and Abbeduto D-Level Scale. Technical report, University of Georgia, Athens, Georgia.
- Constrained Multi-Task Learning for Automated Essay Scoring. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 789–799, Berlin, Germany. Association for Computational Linguistics.
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
- Semire Dikli and Susan Bleyle. 2014. Automated Essay Scoring feedback for second language writers: How does it compare to instructor feedback? Assessing Writing, 22(0):1–17.
- Prompt- and Trait Relation-aware Cross-prompt Essay Trait Scoring. In Findings of the Association for Computational Linguistics: ACL 2023, pages 1538–1551, Toronto, Canada. Association for Computational Linguistics.
- Attention-based Recurrent Convolutional Neural Network for Automatic Essay Scoring. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), pages 153–162, Vancouver, Canada. Association for Computational Linguistics.
- Dana R. Ferris. 1999. The case for grammar correction in L2 writing classes: A response to truscott (1996). Journal of Second Language Writing, 8(1):1–11.
- Dana R. Ferris. 2004. The “Grammar Correction” Debate in L2 Writing: Where are we, and where do we go from here? (and what do we do in the meantime …?). Journal of Second Language Writing, 13(1):49–62.
- Dana R. Ferris. 2014. Responding to student writing: Teachers’ philosophies and practices. Assessing Writing, 19(1):6–23.
- Lyn Frazier. 1985. Syntactic complexity. In David R Dowty, Lauri Karttunen, and Arnold M Zwicky, editors, Natural Language Parsing: Psychological, Computational, and Theoretical Perspectives, pages 129–189. Cambridge University Press, Cambridge.
- Fluency Boost Learning and Inference for Neural Grammatical Error Correction. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1055–1065, Melbourne, Australia. Association for Computational Linguistics.
- Automated Chinese Essay Scoring from Multiple Traits. In Proceedings of the 29th International Conference on Computational Linguistics, pages 3007–3016, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
- Heterogeneous Recycle Generation for Chinese Grammatical Error Correction. In Proceedings of the 28th International Conference on Computational Linguistics, pages 2191–2201, Barcelona, Spain (Online). International Committee on Computational Linguistics.
- Testing ESL composition: a practical approach. Longman Higher Education Division (a Pearson Education company), Rowley, MA.
- Improving Domain Generalization for Prompt-Aware Essay Scoring via Disentangled Representation Learning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 12456–12470, Toronto, Canada. Association for Computational Linguistics.
- Encoder-Decoder Models Can Benefit from Pre-trained Masked Language Models in Grammatical Error Correction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4248–4254, Online. Association for Computational Linguistics.
- Zixuan Ke and Vincent Ng. 2019. Automated Essay Scoring: A Survey of the State of the Art. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, pages 6300–6308, Macao. International Joint Conferences on Artificial Intelligence Organization.
- The accuracy of computer-assisted feedback and students’ responses to it. Language Learning & Technology, 19(2):50–68.
- Overview of NLP-TEA 2016 Shared Task for Chinese Grammatical Error Diagnosis. In Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications ({NLPTEA}2016), pages 40–48, Osaka, Japan. The COLING 2016 Organizing Committee.
- Young-Ju Lee. 2020. Effect of Automated Writing Evaluation Feedback on Korean University Students’ Revision Behavior. Foreign Languages Education, 27(4):1–22.
- Neural automated writing evaluation for Korean L2 writing. Natural Language Engineering, 29(5):1341–1363.
- Towards Better ESL Practices for Implementing Automated Writing Evaluation. CALICO Journal, 31(3):323–344.
- Multilingual Denoising Pre-training for Neural Machine Translation. Transactions of the Association for Computational Linguistics, 8:726–742.
- RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR, abs/1907.1.
- Exploring Effectiveness of GPT-3 in Grammatical Error Correction: A Study on Performance and Controllability in Prompt-Based Methods. In Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023), pages 205–219, Toronto, Canada. Association for Computational Linguistics.
- Xiaofei Lu. 2010. Automatic analysis of syntactic complexity in second language writing. International Journal of Corpus Linguistics, 15(4):474–496.
- The Stanford CoreNLP Natural Language Processing Toolkit. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 55–60, Baltimore, Maryland. Association for Computational Linguistics.
- Sandeep Mathias and Pushpak Bhattacharyya. 2018. ASAP++: Enriching the ASAP Automated Essay Grading Dataset with Essay Attribute Scores. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), pages 1169–1173, Miyazaki, Japan. European Language Resources Association (ELRA).
- Automated Essay Scoring with Discourse-Aware Neural Models. In Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 484–493, Florence, Italy. Association for Computational Linguistics.
- GECToR – Grammatical Error Correction: Tag, Not Rewrite. In Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 163–170, Seattle, WA, USA → Online. Association for Computational Linguistics.
- Ruth O’Neill and Alex M. T. Russell. 2019. Stop! Grammar time: University students’ perceptions of the automated feedback program Grammarly. Australasian Journal of Educational Technology, 35(1):42–56.
- Lourdes Ortega. 2003. Syntactic Complexity Measures and their Relationship to L2 Proficiency: A Research Synthesis of College-level L2 Writing. Applied Linguistics, 24(4):492–518.
- fairseq: A Fast, Extensible Toolkit for Sequence Modeling. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), pages 48–53, Minneapolis, Minnesota. Association for Computational Linguistics.
- Jungyeul Park and Junghee Lee. 2016. A Korean Learner Corpus and its Features. Journal of the Linguistic Society of Korea, 75:69–85.
- Charlene G. Polio. 1997. Measures of Linguistic Accuracy in Second Language Writing Research. Language learning, 47(1):101–143.
- Dadi Ramesh and Suresh Kumar Sanampudi. 2021. An automated essay scoring systems: a systematic literature review. Artificial Intelligence Review, pages 1–33.
- Jim Ranalli. 2018. Automated written corrective feedback: how well can students make use of it? Computer Assisted Language Learning, 31(7):653–674.
- Investigating the Effects of Perceived Feedback Source on Second Language Writing Performance: A Quasi-Experimental Study. The Asia-Pacific Education Researcher, 30(6):585–595.
- Automated Cross-prompt Scoring of Essay Traits. Proceedings of the AAAI Conference on Artificial Intelligence, 35(15):13745–13753.
- Sheldon Rosenberg and Leonard Abbeduto. 1987. Indicators of linguistic competence in the peer group conversational behavior of mildly retarded adults. Applied Psycholinguistics, 8(1):19–32.
- A Simple Recipe for Multilingual Grammatical Error Correction. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 702–707, Online. Association for Computational Linguistics.
- Mark D. Shermis and Jill C. Burstein. 2003. Introduction. In Mark D. Shermis and Jill C. Burstein, editors, Automated Essay Scoring: A Cross-disciplinary Perspective, pages xiii–xvii. Routledge, New York.
- Felix Stahlberg and Shankar Kumar. 2021. Synthetic Data Generation for Grammatical Error Correction with Tagged Corruption Models. In Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications, pages 37–47, Online. Association for Computational Linguistics.
- Kaveh Taghipour and Hwee Tou Ng. 2016. A Neural Approach to Automated Essay Scoring. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1882–1891, Austin, Texas. Association for Computational Linguistics.
- John Truscott. 1999. The case for “The Case Against Grammar Correction in L2 Writing Classes”: A response to Ferris. Journal of Second Language Writing, 8(2):111–122.
- Masaki Uto. 2021. A review of deep-neural automated essay scoring models. Behaviormetrika, 48(2):459–484.
- Neural Automated Essay Scoring Incorporating Handcrafted Features. In Proceedings of the 28th International Conference on Computational Linguistics, pages 6077–6088, Barcelona, Spain (Online). International Committee on Computational Linguistics.
- Attention is All you Need. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 6000–6010. Curran Associates, Inc.
- A Comprehensive Survey of Grammatical Error Correction. ACM Transactions on Intelligent Systems and Technology, 12(5):1–51.
- Enhancing Automated Essay Scoring Performance via Fine-tuning Pre-trained Language Models with Combination of Regression and Ranking. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1560–1569, Online. Association for Computational Linguistics.
- Victor H. Yngve. 1960. A Model and an Hypothesis for Language Structure. Proceedings of the American Philosophical Society, 104(5):444–466.
- Towards standardizing Korean Grammatical Error Correction: Datasets and Annotation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6713–6742, Toronto, Canada. Association for Computational Linguistics.
- Zhe (Victor) Zhang. 2020. Engaging with automated writing evaluation (AWE) feedback on L2 writing: Student perceptions and revisions. Assessing Writing, 43(January):100439.
- Zhe (Victor) Zhang and Ken Hyland. 2018. Student engagement with teacher and automated feedback on L2 writing. Assessing Writing, 36(April):90–102.
- Improving Grammatical Error Correction via Pre-Training a Copy-Augmented Architecture with Unlabeled Data. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 156–165, Minneapolis, Minnesota. Association for Computational Linguistics.