Evaluation of ChatGPT Feedback on ELL Writers' Coherence and Cohesion (2310.06505v1)
Abstract: Since its launch in November 2022, ChatGPT has had a transformative effect on education where students are using it to help with homework assignments and teachers are actively employing it in their teaching practices. This includes using ChatGPT as a tool for writing teachers to grade and generate feedback on students' essays. In this study, we evaluated the quality of the feedback generated by ChatGPT regarding the coherence and cohesion of the essays written by English Language Learners (ELLs) students. We selected 50 argumentative essays and generated feedback on coherence and cohesion using the ELLIPSE rubric. During the feedback evaluation, we used a two-step approach: first, each sentence in the feedback was classified into subtypes based on its function (e.g., positive reinforcement, problem statement). Next, we evaluated its accuracy and usability according to these types. Both the analysis of feedback types and the evaluation of accuracy and usability revealed that most feedback sentences were highly abstract and generic, failing to provide concrete suggestions for improvement. The accuracy in detecting major problems, such as repetitive ideas and the inaccurate use of cohesive devices, depended on superficial linguistic features and was often incorrect. In conclusion, ChatGPT, without specific training for the feedback generation task, does not offer effective feedback on ELL students' coherence and cohesion.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Validity arguments for diagnostic assessment using automated writing evaluation. Language testing, 32(3):385–405, 2015.
- Chi-Fen Emily Chen and Wei-Yuan Eugene Cheng Cheng. Beyond the design of automated writing evaluation: Pedagogical practices and perceived learning effectiveness in efl writing classes. Language Learning & Technology, 12(2):94–112, 2008.
- Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416, 2022.
- An analysis of gpt-3’s performance in grammatical error correction. arXiv preprint arXiv:2303.14342, 2023.
- Semire Dikli. The nature of automated essay scoring feedback. Calico Journal, 28(1):99–134, 2010.
- Rod Ellis. A typology of written corrective feedback types. ELT journal, 63(2):97–107, 2009.
- Is chatgpt a highly fluent grammatical error correction system? a comprehensive evaluation. arXiv preprint arXiv:2304.01746, 2023.
- Feedback prize - english language learning, 2022.
- Intelligent. 8 in 10 teachers approve of student use of chatgpt, nearly all use it themselves. Intelligent, January 23, 2023.
- Intelligent. Nearly 1 in 3 college students have used chatgpt on written assignments. Intelligent, January 23, 2023.
- The accuracy of computer-assisted feedback and students’ responses to it. Language, Learning & Technology, 19(2), 2015.
- Corrective feedback and learner uptake: Negotiation of form in communicative classrooms. Studies in second language acquisition, 19(1):37–66, 1997.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
- Automated writing evaluation for formative assessment of second language writing: Investigating the accuracy and usefulness of feedback as part of argument-based validation. Educational Psychology, 37(1):8–25, 2017.
- Prompt programming for large language models: Beyond the few-shot paradigm. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, pages 1–7, 2021.
- The writing pal intelligent tutoring system: Usability testing and development. Computers and Composition, 34:39–59, 2014.
- Presentation, expectations, and experience: Sources of student perceptions of automated writing evaluation. Computers in Human Behavior, 70:207–221, 2017.
- Chatgpt or grammarly? evaluating chatgpt on grammatical error correction benchmark. arXiv preprint arXiv:2303.13648, 2023.
- Rating short l2 essays on the cefr scale with gpt-4. In Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023), pages 576–584, 2023.
- Large language models are human-level prompt engineers. arXiv preprint arXiv:2211.01910, 2022.
- Su-Youn Yoon (2 papers)
- Eva Miszoglad (1 paper)
- Lisa R. Pierce (1 paper)