Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Automated Distractor and Feedback Generation for Math Multiple-choice Questions via In-context Learning (2308.03234v2)

Published 7 Aug 2023 in cs.CL

Abstract: Multiple-choice questions (MCQs) are ubiquitous in almost all levels of education since they are easy to administer, grade, and are a reliable form of assessment. An important aspect of MCQs is the distractors, i.e., incorrect options that are designed to target specific misconceptions or insufficient knowledge among students. To date, the task of crafting high-quality distractors has largely remained a labor-intensive process for teachers and learning content designers, which has limited scalability. In this work, we explore the task of automated distractor and corresponding feedback message generation in math MCQs using LLMs. We establish a formulation of these two tasks and propose a simple, in-context learning-based solution. Moreover, we propose generative AI-based metrics for evaluating the quality of the feedback messages. We conduct extensive experiments on these tasks using a real-world MCQ dataset. Our findings suggest that there is a lot of room for improvement in automated distractor and feedback generation; based on these findings, we outline several directions for future work.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Anthony J. Nitko. Educational assessment of students. Prentice-Hall, Iowa, USA, 1996.
  2. Peter Airasian. Classroom assessment: Concepts and applications. McGraw-Hill, Ohio, USA, 2001.
  3. Educational testing and measurement. John Wiley & Sons, New Jersey, USA, 2016.
  4. Adding teacher-created motivational video to an its. In Proceedings of 26th Florida Artificial Intelligence Research Society Conference, pages 503–508, 2013.
  5. Automatic distractor generation for multiple-choice english vocabulary questions. Research and practice in technology enhanced learning, 13:1–16, 2018.
  6. Multiple choice question generation utilizing an ontology. In Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, pages 303–312, 2017.
  7. Automatic distractor generation for multiple choice questions in standard tests. arXiv preprint arXiv:2011.13100, 2020.
  8. Qdg: A unified model for automatic question-distractor pairs generation. Applied Intelligence, 53(7):8275–8285, 2023.
  9. Diverse distractor generation for constructing high-quality multiple choice questions. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30:280–291, 2021.
  10. Generating distractors for reading comprehension questions from real examinations. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 6423–6430, 2019.
  11. Bert-based distractor generation for swedish reading comprehension questions using a small-scale dataset. arXiv preprint arXiv:2108.03973, 2021.
  12. Cdgp: Automatic cloze distractor generation based on pre-trained language model. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 5835–5840, 2022.
  13. End-to-end generation of multiple-choice questions using text-to-text transfer transformer models. Expert Systems with Applications, 208:118258, 2022.
  14. Automatic generation and delivery of multiple-choice math quizzes. In Principles and Practice of Constraint Programming: 19th International Conference, CP 2013, Uppsala, Sweden, September 16-20, 2013. Proceedings 19, pages 848–863. Springer, 2013.
  15. Math multiple choice question solving and distractor generation with attentional gru networks. International Educational Data Mining Society, 2021.
  16. Toward automated content feedback generation for non-native spontaneous speech. In Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 306–315, Florence, Italy, August 2019. Association for Computational Linguistics.
  17. Generating feedback for English foreign language exercises. In Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 127–136, New Orleans, Louisiana, June 2018. Association for Computational Linguistics.
  18. Automated feedback generation for introductory programming assignments. SIGPLAN Not., 48(6):15–26, jun 2013.
  19. Feedback generation for performance problems in introductory programming assignments. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2014, page 41–51, New York, NY, USA, 2014. Association for Computing Machinery.
  20. Context-aware and data-driven feedback generation for programming assignments. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021, page 328–340, New York, NY, USA, 2021. Association for Computing Machinery.
  21. Marcus Messer. Grading programming assignments with an automated grading and feedback assistant. In Maria Mercedes Rodrigo, Noburu Matsuda, Alexandra I. Cristea, and Vania Dimitrova, editors, Artificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners’ and Doctoral Consortium, pages 35–40, Cham, 2022. Springer International Publishing.
  22. A context-aware approach to personalized feedback for novice programmers. In Maria Mercedes Rodrigo, Noburu Matsuda, Alexandra I. Cristea, and Vania Dimitrova, editors, Artificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners’ and Doctoral Consortium, pages 59–64, Cham, 2022. Springer International Publishing.
  23. Insta-reviewer: A data-driven approach for generating instant feedback on students’ project reports. International Educational Data Mining Society, 2022.
  24. Effectiveness of crowd-sourcing on-demand assistance from teachers in online learning platforms. In Proceedings of the Seventh ACM Conference on Learning@ Scale, pages 115–124, 2020.
  25. Cognitive tutors: Technology bringing learning sciences to the classroom. 2006.
  26. Modeling common misconceptions in learning process data. In Proceedings of the Sixth International Conference on Learning Analytics & Knowledge, pages 369–377, 2016.
  27. Automatic generation of problems and explanations for an intelligent algebra tutor. In Proceedings of the Artificial Intelligence in Education: 20th International Conference, pages 383–395, 2019.
  28. Cognitive tutor: Applied research in mathematics education. Psychonomic bulletin & review, 14:249–255, April 2007.
  29. Mathematical language processing: Automatic grading and feedback for open response mathematical questions. In Proceedings of the Second (2015) ACM Conference on Learning @ Scale, L@S ’15, page 167–176, New York, NY, USA, 2015. Association for Computing Machinery.
  30. Math operation embeddings for open-ended solution analysis and feedback, 2021.
  31. Effect of immediate feedback on math achievement at the high school level. In Ig Ibert Bittencourt, Mutlu Cukurova, Kasia Muldner, Rose Luckin, and Eva Millán, editors, Artificial Intelligence in Education, pages 263–267, Cham, 2020. Springer International Publishing.
  32. Automated personalized feedback improves learning gains in an intelligent tutoring system. In Ig Ibert Bittencourt, Mutlu Cukurova, Kasia Muldner, Rose Luckin, and Eva Millán, editors, Artificial Intelligence in Education, pages 140–146, Cham, 2020. Springer International Publishing.
  33. Language models are few-shot learners. abs/2005.14165, 2020.
  34. What makes good in-context examples for gpt-3. abs/2101.06804, 2021.
  35. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 11 2019.
  36. Bleu: a method for automatic evaluation of machine translation. pages 311–318, 2002.
  37. Chin-Yew Lin. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain, July 2004. Association for Computational Linguistics.
  38. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pages 65–72, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics.
  39. The specificity and helpfulness of peer-to-peer feedback in higher education. In Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022), pages 107–117, Seattle, Washington, July 2022. Association for Computational Linguistics.
  40. Starting from “zero”: An incremental zero-shot learning approach for assessing peer feedback comments. In Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022), pages 46–50, Seattle, Washington, July 2022. Association for Computational Linguistics.
  41. Evaluating large language models trained on code, 2021.
  42. OpenAI. Introducing chatgpt, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Hunter McNichols (7 papers)
  2. Wanyong Feng (8 papers)
  3. Jaewook Lee (44 papers)
  4. Alexander Scarlatos (16 papers)
  5. Digory Smith (7 papers)
  6. Simon Woodhead (16 papers)
  7. Andrew Lan (48 papers)
Citations (18)

Summary

We haven't generated a summary for this paper yet.