Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

How Can I Improve? Using GPT to Highlight the Desired and Undesired Parts of Open-ended Responses (2405.00291v1)

Published 1 May 2024 in cs.CL, cs.AI, and cs.HC

Abstract: Automated explanatory feedback systems play a crucial role in facilitating learning for a large cohort of learners by offering feedback that incorporates explanations, significantly enhancing the learning process. However, delivering such explanatory feedback in real-time poses challenges, particularly when high classification accuracy for domain-specific, nuanced responses is essential. Our study leverages the capabilities of LLMs, specifically Generative Pre-Trained Transformers (GPT), to explore a sequence labeling approach focused on identifying components of desired and less desired praise for providing explanatory feedback within a tutor training dataset. Our aim is to equip tutors with actionable, explanatory feedback during online training lessons. To investigate the potential of GPT models for providing the explanatory feedback, we employed two commonly-used approaches: prompting and fine-tuning. To quantify the quality of highlighted praise components identified by GPT models, we introduced a Modified Intersection over Union (M-IoU) score. Our findings demonstrate that: (1) the M-IoU score effectively correlates with human judgment in evaluating sequence quality; (2) using two-shot prompting on GPT-3.5 resulted in decent performance in recognizing effort-based (M-IoU of 0.46) and outcome-based praise (M-IoU of 0.68); and (3) our optimally fine-tuned GPT-3.5 model achieved M-IoU scores of 0.64 for effort-based praise and 0.84 for outcome-based praise, aligning with the satisfaction levels evaluated by human coders. Our results show promise for using GPT models to provide feedback that focuses on specific elements in their open-ended responses that are desirable or could use improvement.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. Towards tutorial dialog to support self-explanation: Adding natural language understanding to a cognitive tutor. In Proceedings of Artificial Intelligence in Education, pages 246–255, 2001.
  2. Does help help? introducing the bayesian evaluation and assessment methodology. In Intelligent Tutoring Systems: 9th International Conference, ITS 2008, Montreal, Canada, June 23-27, 2008 Proceedings 9, pages 383–394. Springer, 2008.
  3. Towards automated generation and evaluation of questions in educational domains. In Proceedings of the 15th International Conference on Educational Data Mining, volume 701, 2022.
  4. Creating a dataset for named entity recognition in the archaeology domain. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 4573–4577, 2020.
  5. Automatic feedback in online learning environments: A systematic literature review. Computers and Education: Artificial Intelligence, 2:100027, 2021.
  6. A boundary regression model for nested named entity recognition. Cognitive Computation, 15(2):534–551, 2023.
  7. Development of scenario-based mentor lessons: an iterative design process for training at scale. In Proceedings of the Ninth ACM Conference on Learning@ Scale, pages 469–471, 2022.
  8. Scenario-based training and on-the-job support for equitable mentoring. In The Learning Ideas Conference, pages 581–592. Springer, 2022.
  9. Can large language models provide feedback to students? a case study on chatgpt. In 2023 IEEE International Conference on Advanced Learning Technologies (ICALT), pages 323–325. IEEE, 2023.
  10. Assessing the proficiency of large language models in automatic feedback generation: An evaluation study. 2024.
  11. Building gold standard corpora for medical natural language processing tasks. In AMIA Annual Symposium Proceedings, volume 2012, page 144. American Medical Informatics Association, 2012.
  12. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the NAACL-HLT, Volume 1, pages 4171–4186, 2019.
  13. Academic interventions for elementary and middle school students with low socioeconomic status: A systematic review and meta-analysis. Review of educational research, 87(2):243–282, 2017.
  14. A survey of data augmentation approaches for nlp. In Findings of the ACL: ACL-IJCNLP 2021, pages 968–988, 2021.
  15. Interrater agreement and interrater reliability: key concepts, approaches, and applications. Research in Social and Administrative Pharmacy, 9(3):330–338, 2013.
  16. Proposal for an extension of traditional named entities: From guidelines to evaluation, an overview. In Proceedings of the 5th linguistic annotation workshop, pages 92–100, 2011.
  17. How common are common wrong answers? crowdsourcing remediation at scale. In Proceedings of the Tenth ACM Conference on Learning@ Scale, pages 70–80, 2023.
  18. Identification, exploration, and remediation: Can teachers predict common wrong answers? In LAK23: 13th International Learning Analytics and Knowledge Conference, pages 399–410, 2023.
  19. Not too late: Improving academic outcomes among adolescents. American Economic Review, 113(3):738–765, 2023.
  20. Improving assessment of tutoring practices using retrieval-augmented generation, 2024.
  21. J. Hattie and H. Timperley. The power of feedback. Review of Educational Research, 2007.
  22. The assistments ecosystem: Building a platform that brings scientists and teachers together for minimally invasive research on human learning and teaching. International Journal of Artificial Intelligence in Education, 24:470–497, 2014.
  23. The Impact of Feedback in Higher Education: Improving assessment outcomes for learners. Springer Nature, 2019.
  24. Comparative analysis of gpt-4 and human graders in evaluating praise given to students in synthetic dialogues. arXiv preprint arXiv:2307.02018, 2023.
  25. Rates and types of teacher praise: A review and future directions. Psychology in the Schools, 52(5):463–476, 2015.
  26. D. Jurafsky and J. H. Martin. Speech and language processing. 3rd, 2022.
  27. Using large language models to assess tutors’ performance in reacting to students making math errors, 2024.
  28. K. S. Kalyan. A survey of gpt-3 family large language models including chatgpt and gpt-4. Natural Language Processing Journal, page 100048, 2023.
  29. Person versus process praise and criticism: implications for contingent self-worth and coping. Developmental psychology, 35(3):835, 1999.
  30. Chatgpt for good? on opportunities and challenges of large language models for education. Learning and individual differences, 103:102274, 2023.
  31. M. Konkol and M. Konopík. Segment representations in named entity recognition. In International conference on text, speech, and dialogue, pages 61–70. Springer, 2015.
  32. A blueprint for scaling tutoring and mentoring across public schools. AERA Open, 7:23328584211042858, 2021.
  33. E. Latif and X. Zhai. Fine-tuning chatgpt for automatic scoring. Computers and Education: Artificial Intelligence, page 100210, 2024.
  34. Retrieval-augmented generation to improve math question-answering: Trade-offs between groundedness and human preference, 2023.
  35. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474, 2020.
  36. A survey on deep learning for named entity recognition. IEEE TKDE, 34(1):50–70, 2020.
  37. Learner-centred analytics of feedback content in higher education. In LAK23, pages 100–110, 2023.
  38. Leveraging large language models to enhance feedback provision in tutor training program. International Journal of Artificial Intelligence in Education, 2024.
  39. Is it a good move? mining effective tutoring strategies from human–human tutorial dialogues. Future Generation Computer Systems, 127:194–207, 2022.
  40. Using large language models to provide explanatory feedback to human tutors. arXiv preprint arXiv:2306.15498, 2023.
  41. Span-based nested named entity recognition with pretrained language model. In Database Systems for Advanced Applications: 26th International Conference, DASFAA 2021, Taipei, Taiwan, April 11–14, 2021, Proceedings, Part II 26, pages 620–628. Springer, 2021.
  42. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35, 2023.
  43. Re-weighting tokens: A simple and effective active learning strategy for named entity recognition. arXiv preprint arXiv:2311.00906, 2023.
  44. M. L. McHugh. Interrater reliability: the kappa statistic. Biochemia medica, 22(3):276–282, 2012.
  45. Exploring automated distractor and feedback generation for math multiple-choice questions via in-context learning. arXiv preprint arXiv:2308.03234, 2023.
  46. Auc maximization for low-resource named entity recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 37(11):13389–13399, Jun. 2023.
  47. Low-resource named entity recognition: Can one-vs-all auc maximization help? In 2023 IEEE International Conference on Data Mining (ICDM), pages 1241–1246. IEEE, 2023.
  48. The impressive effects of tutoring on prek-12 learning: A systematic review and meta-analysis of the experimental evidence. 2020.
  49. Ontask: Delivering data-informed, personalized learning support actions. Journal of Learning Analytics, 5(3):235–249, 2018.
  50. T. Patikorn and N. T. Heffernan. Effectiveness of crowd-sourcing on-demand assistance from teachers in online learning platforms. In Proceedings of the Seventh ACM Conference on Learning@ Scale, pages 115–124, 2020.
  51. C. Pornprasit and C. Tantithamthavorn. Gpt-3.5 for code review automation: How do few-shot learning, prompt design, and model fine-tuning impact their performance? arXiv preprint arXiv:2402.00905, 2024.
  52. J. Reich. Teaching drills: Advancing practice-based teacher education through short, low-stakes, high-frequency practice. Journal of Technology and Teacher Education, 30(2):217–228, 2022.
  53. Designing learner-centred text-based feedback: a rapid review and qualitative synthesis. Assessment & Evaluation in Higher Education, 46(6):894–912, 2021.
  54. A. Shrivastava and J. Heer. Iseql: Interactive sequence learning. In Proceedings of the 25th International Conference on Intelligent User Interfaces, pages 43–54, 2020.
  55. When the tutor becomes the student: Design and evaluation of efficient scenario-based lessons for tutors. In LAK23: 13th International Learning Analytics and Knowledge Conference, pages 250–261, 2023.
  56. Large language models are human-level prompt engineers. In The Eleventh International Conference on Learning Representations, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Jionghao Lin (36 papers)
  2. Eason Chen (23 papers)
  3. Zeifei Han (1 paper)
  4. Ashish Gurung (7 papers)
  5. Danielle R. Thomas (11 papers)
  6. Wei Tan (55 papers)
  7. Ngoc Dang Nguyen (8 papers)
  8. Kenneth R. Koedinger (21 papers)
Citations (2)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets