Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Generalizable Agents in Text-Based Educational Environments: A Study of Integrating RL with LLMs (2404.18978v1)

Published 29 Apr 2024 in cs.LG, cs.AI, and cs.CY

Abstract: There has been a growing interest in developing learner models to enhance learning and teaching experiences in educational environments. However, existing works have primarily focused on structured environments relying on meticulously crafted representations of tasks, thereby limiting the agent's ability to generalize skills across tasks. In this paper, we aim to enhance the generalization capabilities of agents in open-ended text-based learning environments by integrating Reinforcement Learning (RL) with LLMs. We investigate three types of agents: (i) RL-based agents that utilize natural language for state and action representations to find the best interaction strategy, (ii) LLM-based agents that leverage the model's general knowledge and reasoning through prompting, and (iii) hybrid LLM-assisted RL agents that combine these two strategies to improve agents' performance and generalization. To support the development and evaluation of these agents, we introduce PharmaSimText, a novel benchmark derived from the PharmaSim virtual pharmacy environment designed for practicing diagnostic conversations. Our results show that RL-based agents excel in task completion but lack in asking quality diagnostic questions. In contrast, LLM-based agents perform better in asking diagnostic questions but fall short of completing the task. Finally, hybrid LLM-assisted RL agents enable us to overcome these limitations, highlighting the potential of combining RL and LLMs to develop high-performing agents for open-ended learning environments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. Simulated Learners in Educational Technology: A Systematic Literature Review and a Turing-like Test. International Journal of Artificial Intelligence in Education (IJAIED), pages 1–41, 2023.
  2. Using Online Practice Spaces to Investigate Challenges in Enacting Principles of Equitable Computer Science Teaching. In Proceedings of the Technical Symposium on Computer Science Education (SIGCSE), pages 882–887, 2018.
  3. Predicting the Effects of Skill Model Changes on Student Progress. In Proceedings of the International Conference on Intelligent Tutoring Systems (ITS), Part II, pages 300–302, 2010.
  4. Kappa Learning: A New Item-Similarity Method for Clustering Educational Items from Response Data. In Proceedings of the International Conference on Educational Data Mining (EDM), 2019.
  5. The Apprentice Learner Architecture: Closing the Loop between Learning Theory and Educational Data. In Proceedings of the International Conference on Educational Data Mining (EDM), pages 151–158, 2016.
  6. Lena Pareto. A Teachable Agent Game Engaging Primary School Children to Learn Arithmetic Concepts and Reasoning. International Journal of Artificial Intelligence in Education (IJAIED), 24(3):251–283, 2014.
  7. Reinforcement Learning for Education: Opportunities and Challenges. CoRR, abs/2107.08828, 2021.
  8. Approximately Optimal Teaching of Approximately Optimal Learners. IEEE Transactions of Learning Technololy, 11(2):152–164, 2018.
  9. Pick the Moment: Identifying Critical Pedagogical Decisions Using Long-Short Term Rewards. In Proceedings of the International Conference on Educational Data Mining (EDM), 2020.
  10. Hierarchical Reinforcement Learning for Pedagogical Policy Induction. In Proceedings of the International Conference on Artificial Intelligence in Education (AIED), pages 544–556, 2019.
  11. Faster Teaching via POMDP Planning. Cognitive Science, 40(6):1290–1332, 2016.
  12. Zero-shot Learning of Hint Policy via Reinforcement Learning and Program Synthesis. In Proceedings of the International Conference on Educational Data Mining (EDM), 2020.
  13. Toward Automatic Hint Generation for Logic Proof Tutoring Using Historical Student Data. In Proceedings of the International Conference on Intelligent Tutoring Systems (ITS), pages 373–382, 2008.
  14. Synthesizing Tasks for Block-based Programming. In Proceedings of Advances in Neural Information Processing Systems (NeurIPS), 2020.
  15. Neural Task Synthesis for Visual Programming. Transactions of Machine Learning Research (TMLR), 2024.
  16. Learning Expert Models for Educationally Relevant Tasks using Reinforcement Learning. In Proceedings of the International Conference on Educational Data Mining (EDM), 2021.
  17. Leveraging Grammar and Reinforcement Learning for Neural Program Synthesis. In Proceedings of the International Conference on Learning Representations (ICLR), 2018.
  18. Aligning Superhuman AI with Human Behavior: Chess as a Model System. In Proceedings of the SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pages 1677–1687, 2020.
  19. Generative AI for Education (GAIED): Advances, Opportunities, and Challenges. CoRR, abs/2402.01580, 2024.
  20. Tom B. Brown et al. Language Models are Few-Shot Learners. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2020.
  21. Sébastien Bubeck et al. Sparks of Artificial General Intelligence: Early Experiments with GPT-4. CoRR, abs/2303.12712, 2023.
  22. A Novel Framework for the Generation of Multiple Choice Question Stems Using Semantic and Machine-Learning Techniques. International Journal of Artificial Intelligence in Education (IJAIED), pages 1–44, 2023.
  23. Automatic Generation of Programming Exercises and Code Explanations Using Large Language Models. In Proceedings of the Conference on International Computing Education Research (ICER), 2022.
  24. Generative AI for Programming Education: Benchmarking ChatGPT, GPT-4, and Human Tutors. In Proceedings of the Conference on International Computing Education Research - Volume 2 (ICER V.2), 2023.
  25. Automated Distractor and Feedback Generation for Math Multiple-choice Questions via In-context Learning. NeurIPS’23 Workshop on Generative AI for Education (GAIED), 2023.
  26. Large Language Models (GPT) for Automating Feedback on Programming Assignments. CoRR, abs/2307.00150, 2023.
  27. Assessing Student Errors Experimentation Using Artificial Intelligence and Large Language Models: A Comparative Study with Human Raters. CoRR, abs/2308.06088, 2023.
  28. Comparative Analysis of GPT-4 and Human Graders in Evaluating Praise Given to Students in Synthetic Dialogues. CoRR, abs/2307.02018, 2023.
  29. Automating Human Tutor-Style Programming Feedback: Leveraging GPT-4 Tutor Model for Hint Generation and GPT-3.5 Student Model for Hint Validation. In Proceedings of the International Learning Analytics and Knowledge Conference (LAK), 2024.
  30. Learning Gain Differences between ChatGPT and Human Tutor Generated Algebra Hints. CoRR, abs/2302.06871, 2023.
  31. The AI Teacher Test: Measuring the Pedagogical Ability of Blender and GPT-3 in Educational Dialogues. In Proceedings of the International Conference on Educational Data Mining (EDM), 2022.
  32. Generative Agent for Teacher Training: Designing Educational Problem-Solving Simulations with Large Language Model-based Agents for Pre-Service Teachers. NeurIPS’23 Workshop on Generative AI for Education (GAIED), 2023.
  33. Ruffle&Riley: Towards the Automated Induction of Conversational Tutoring Systems. NeurIPS’23 Workshop on Generative AI for Education (GAIED), 2023.
  34. Large Language Models for In-Context Student Modeling: Synthesizing Student’s Behavior in Visual Programming. CoRR, abs/2310.10690, 2023.
  35. GPTeach: Interactive TA Training with GPT-based Students. In Proceedings of the Conference on Learning @ Scale (L@S), pages 226–236, 2023.
  36. ScienceWorld: Is Your Agent Smarter than a 5th Grader? In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 11279–11298, 2022.
  37. Do Embodied Agents Dream of Pixelated Sheep: Embodied Decision Making using Language Guided World Modelling. In Proceedings of the International Conference on Machine Learning (ICML), pages 26311–26325, 2023.
  38. Pre-Trained Language Models for Interactive Decision-Making. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS), 2022.
  39. Guiding Pretraining in Reinforcement Learning with Large Language Models. In Proceedings of the International Conference on Machine Learning (ICML), pages 8657–8677, 2023.
  40. Reward Design with Language Models. In Proceedings of the International Conference on Learning Representations (ICLR), 2023.
  41. A Machine Learning Approach for Automatic Student Model Discovery. In Proceedings of the International Conference on Educational Data Mining (EDM), pages 31–40, 2011.
  42. Knowledge Tracing: Modeling the Acquisition of Procedural Knowledge. User Modeling and User-Adapted Interaction, 4:253–278, 2005.
  43. Semi-Markov Model for Simulating MOOC Students. In Proceedings of the International Conference on Educational Data Mining (EDM), pages 358–363, 2016.
  44. Modeling Interactions Across Skills: A Method to Construct and Compare Models Predicting the Existence of Skill Relationships. In Proceedings of the International Conference on Educational Data Mining (EDM), pages 292–297, 2016.
  45. Statistical Consequences of Using Multi-Armed Bandits to Conduct Adaptive Educational Experiments. Journal of Educational Data Mining (JEDM), 11:47–79, 2019.
  46. Multi-Armed Bandit Algorithms for Adaptive Learning: A Survey. In Proceedings of the International Conference on Artificial Intelligence in Education (AIED), pages 273–278, 2021.
  47. From {Solution Synthesis} to {Student Attempt Synthesis} for Block-Based Visual Programming Tasks. In Proceedings of the International Conference on Educational Data Mining (EDM), 2022.
  48. SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents. CoRR, abs/2310.11667, 2023.
  49. Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark. In Proceedings of the International Conference on Machine Learning (ICML), pages 26837–26867, 2023.
  50. Deep Reinforcement Learning with a Natural Language Action Space. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2016.
  51. ReAct: Synergizing Reasoning and Acting in Language Models. In Proceedings of the International Conference on Learning Representations (ICLR), 2023.
  52. CLIN: A Continually Learning Language Agent for Rapid Task Adaptation and Generalization. CoRR, abs/2310.10134, 2023.
  53. Reflexion: An Autonomous Agent with Dynamic Memory and Self-Reflection. CoRR, abs/2303.11366, 2023.
  54. Reinforcement Learning: An Introduction. MIT press, 2018.
  55. Playing Atari with Deep Reinforcement Learning. CoRR, abs/1312.5602, 2013.
  56. Enriching Word Vectors with Subword Information. CoRR, abs/1607.04606, 2016.
  57. Graph Constrained Reinforcement Learning for Natural Language Action Spaces. In Proceedings of the International Conference on Learning Representations (ICLR), 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Bahar Radmehr (3 papers)
  2. Adish Singla (96 papers)
  3. Tanja Käser (45 papers)