Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Evaluation of ChatGPT Feedback on ELL Writers' Coherence and Cohesion (2310.06505v1)

Published 10 Oct 2023 in cs.CL and cs.AI

Abstract: Since its launch in November 2022, ChatGPT has had a transformative effect on education where students are using it to help with homework assignments and teachers are actively employing it in their teaching practices. This includes using ChatGPT as a tool for writing teachers to grade and generate feedback on students' essays. In this study, we evaluated the quality of the feedback generated by ChatGPT regarding the coherence and cohesion of the essays written by English Language Learners (ELLs) students. We selected 50 argumentative essays and generated feedback on coherence and cohesion using the ELLIPSE rubric. During the feedback evaluation, we used a two-step approach: first, each sentence in the feedback was classified into subtypes based on its function (e.g., positive reinforcement, problem statement). Next, we evaluated its accuracy and usability according to these types. Both the analysis of feedback types and the evaluation of accuracy and usability revealed that most feedback sentences were highly abstract and generic, failing to provide concrete suggestions for improvement. The accuracy in detecting major problems, such as repetitive ideas and the inaccurate use of cohesive devices, depended on superficial linguistic features and was often incorrect. In conclusion, ChatGPT, without specific training for the feedback generation task, does not offer effective feedback on ELL students' coherence and cohesion.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (21)
  1. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  2. Validity arguments for diagnostic assessment using automated writing evaluation. Language testing, 32(3):385–405, 2015.
  3. Chi-Fen Emily Chen and Wei-Yuan Eugene Cheng Cheng. Beyond the design of automated writing evaluation: Pedagogical practices and perceived learning effectiveness in efl writing classes. Language Learning & Technology, 12(2):94–112, 2008.
  4. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416, 2022.
  5. An analysis of gpt-3’s performance in grammatical error correction. arXiv preprint arXiv:2303.14342, 2023.
  6. Semire Dikli. The nature of automated essay scoring feedback. Calico Journal, 28(1):99–134, 2010.
  7. Rod Ellis. A typology of written corrective feedback types. ELT journal, 63(2):97–107, 2009.
  8. Is chatgpt a highly fluent grammatical error correction system? a comprehensive evaluation. arXiv preprint arXiv:2304.01746, 2023.
  9. Feedback prize - english language learning, 2022.
  10. Intelligent. 8 in 10 teachers approve of student use of chatgpt, nearly all use it themselves. Intelligent, January 23, 2023.
  11. Intelligent. Nearly 1 in 3 college students have used chatgpt on written assignments. Intelligent, January 23, 2023.
  12. The accuracy of computer-assisted feedback and students’ responses to it. Language, Learning & Technology, 19(2), 2015.
  13. Corrective feedback and learner uptake: Negotiation of form in communicative classrooms. Studies in second language acquisition, 19(1):37–66, 1997.
  14. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  15. Automated writing evaluation for formative assessment of second language writing: Investigating the accuracy and usefulness of feedback as part of argument-based validation. Educational Psychology, 37(1):8–25, 2017.
  16. Prompt programming for large language models: Beyond the few-shot paradigm. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, pages 1–7, 2021.
  17. The writing pal intelligent tutoring system: Usability testing and development. Computers and Composition, 34:39–59, 2014.
  18. Presentation, expectations, and experience: Sources of student perceptions of automated writing evaluation. Computers in Human Behavior, 70:207–221, 2017.
  19. Chatgpt or grammarly? evaluating chatgpt on grammatical error correction benchmark. arXiv preprint arXiv:2303.13648, 2023.
  20. Rating short l2 essays on the cefr scale with gpt-4. In Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023), pages 576–584, 2023.
  21. Large language models are human-level prompt engineers. arXiv preprint arXiv:2211.01910, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Su-Youn Yoon (2 papers)
  2. Eva Miszoglad (1 paper)
  3. Lisa R. Pierce (1 paper)
Citations (7)