AutoTutor meets Large Language Models: A Language Model Tutor with Rich Pedagogy and Guardrails (2402.09216v3)
Abstract: LLMs have found several use cases in education, ranging from automatic question generation to essay evaluation. In this paper, we explore the potential of using LLMs to author Intelligent Tutoring Systems. A common pitfall of LLMs is their straying from desired pedagogical strategies such as leaking the answer to the student, and in general, providing no guarantees. We posit that while LLMs with certain guardrails can take the place of subject experts, the overall pedagogical design still needs to be handcrafted for the best learning results. Based on this principle, we create a sample end-to-end tutoring system named MWPTutor, which uses LLMs to fill in the state space of a pre-defined finite state transducer. This approach retains the structure and the pedagogy of traditional tutoring systems that has been developed over the years by learning scientists but brings in additional flexibility of LLM-based approaches. Through a human evaluation study on two datasets based on math word problems, we show that our hybrid approach achieves a better overall tutoring score than an instructed, but otherwise free-form, GPT-4. MWPTutor is completely modular and opens up the scope for the community to improve its performance by improving individual modules or using different teaching strategies that it can follow.
- Intelligent Tutoring Systems. Science 228, 4698 (1985), 456–462. https://doi.org/10.1126/science.228.4698.456 arXiv:https://www.science.org/doi/pdf/10.1126/science.228.4698.456
- David Baidoo-Anu and Leticia Owusu Ansah. 2023. Education in the era of generative artificial intelligence (AI): Understanding the potential benefits of ChatGPT in promoting teaching and learning. Journal of AI 7, 1 (2023), 52–62. https://dergipark.org.tr/en/pub/jai/issue/77844/1337500
- Benjamin S Bloom. 1984. The 2 sigma problem: The search for methods of group instruction as effective as one-to-one tutoring. Educational researcher 13, 6 (1984), 4–16.
- On the Opportunities and Risks of Foundation Models. arXiv:2108.07258 [cs.LG]
- Authoring Conversational Intelligent Tutoring Systems. In Adaptive Instructional Systems, Robert A. Sottilare and Jessica Schwarz (Eds.). Springer International Publishing, Cham, 593–603.
- GPTutor: A ChatGPT-Powered Programming Tool for Code Explanation. In Artificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners, Doctoral Consortium and Blue Sky - 24th International Conference, AIED 2023, Tokyo, Japan, July 3-7, 2023, Proceedings (Communications in Computer and Information Science, Vol. 1831), Ning Wang, Genaro Rebolledo-Mendez, Vania Dimitrova, Noboru Matsuda, and Olga C. Santos (Eds.). Springer, 321–327. https://doi.org/10.1007/978-3-031-36336-8_50
- Scaling Instruction-Finetuned Language Models. arXiv:2210.11416 [cs.LG]
- Training Verifiers to Solve Math Word Problems. arXiv preprint arXiv:2110.14168 (2021).
- Holistic Analysis of Hallucination in GPT-4V(ision): Bias and Interference Challenges. arXiv:2311.03287 [cs.LG]
- Chain-of-Verification Reduces Hallucination in Large Language Models. arXiv:2309.11495 [cs.CL]
- Teaching Tactics in AutoTutor. (01 2000).
- Arthur C. Graesser. 2016. Conversations with AutoTutor Help Students Learn. International Journal of Artificial Intelligence in Education 26, 1 (01 Mar 2016), 124–132. https://doi.org/10.1007/s40593-015-0086-4
- Intelligent Tutoring Systems with Conversational Dialogue. AI Magazine 22, 4 (Dec. 2001), 39. https://doi.org/10.1609/aimag.v22i4.1591
- Opening the Door to Non-programmers: Authoring Intelligent Tutor Behavior by Demonstration. In Intelligent Tutoring Systems, James C. Lester, Rosa Maria Vicari, and Fábio Paraguaçu (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 162–174.
- Large Language Models are Zero-Shot Reasoners. arXiv:2205.11916 [cs.CL]
- Jailbreaking ChatGPT via Prompt Engineering: An Empirical Study. arXiv:2305.13860 [cs.SE]
- MathDial: A Dialogue Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems. arXiv:2305.14536 [cs.CL]
- AutoTutor and Family: A Review of 17 Years of Natural Language Tutoring. International Journal of Artificial Intelligence in Education 24 (12 2014). https://doi.org/10.1007/s40593-014-0029-5
- Show Your Work: Scratchpads for Intermediate Computation with Language Models. https://arxiv.org/abs/2112.00114
- OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]
- José Paladines and Jaime Ramirez. 2020. A Systematic Literature Review of Intelligent Tutoring Systems With Dialogue in Natural Language. IEEE Access 8 (2020), 164246–164267. https://doi.org/10.1109/ACCESS.2020.3021383
- Generative AI for Programming Education: Benchmarking ChatGPT, GPT-4, and Human Tutors. In Proceedings of the 2023 ACM Conference on International Computing Education Research - Volume 2 (Chicago, IL, USA) (ICER ’23). Association for Computing Machinery, New York, NY, USA, 41–42. https://doi.org/10.1145/3568812.3603476
- ”Call me Kiran” – ChatGPT as a Tutoring Chatbot in a Computer Science Course. In Proceedings of the 26th International Academic Mindtrek Conference. 83–94. https://dl.acm.org/doi/pdf/10.1145/3616961.3616974
- Brian Reiser. 2004. Scaffolding Complex Learning: The Mechanisms of Structuring and Problematizing Student Work. Journal Of The Learning Sciences 13 (01 2004), 273–304. https://doi.org/10.1207/s15327809jls1303_2
- Decomposition: A K-8 Computational Thinking Learning Trajectory. In Proceedings of the 2018 ACM Conference on International Computing Education Research (Espoo, Finland) (ICER ’18). Association for Computing Machinery, New York, NY, USA, 124–132. https://doi.org/10.1145/3230977.3230979
- Ruffle&Riley: Towards the Automated Induction of Conversational Tutoring Systems. arXiv:2310.01420 [cs.CL]
- D. Sleeman and J.S. Brown. 1982. Intelligent Tutoring Systems. Academic Press. https://books.google.ch/books?id=pjqcAAAAMAAJ
- Ting Song and Kurt Becker. 2014. Expert vs. novice: Problem decomposition/recomposition in engineering design. In 2014 International Conference on interactive collaborative learning (ICL). IEEE, 181–190.
- CLASS: A Design Framework for building Intelligent Tutoring Systems based on Learning Science principles. arXiv:2305.13272 [cs.CL]
- Ferhat Yarkin and Justin P. Coon. 2022. Simple Gray Coding and LLR Calculation for MDS Modulation Systems. CoRR abs/2201.08237 (2022). arXiv:2201.08237 https://arxiv.org/abs/2201.08237
- MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models. arXiv:2309.12284 [cs.CL]
- How Language Model Hallucinations Can Snowball. arXiv:2305.13534 [cs.CL]
- BERTScore: Evaluating Text Generation with BERT. arXiv:1904.09675 [cs.CL]