Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Modeling Learner Performance with Large Language Models (2403.14661v1)

Published 29 Feb 2024 in cs.CY, cs.CL, and cs.LG

Abstract: Recent work exploring the capabilities of pre-trained LLMs has demonstrated their ability to act as general pattern machines by completing complex token sequences representing a wide array of tasks, including time-series prediction and robot control. This paper investigates whether the pattern recognition and sequence modeling capabilities of LLMs can be extended to the domain of knowledge tracing, a critical component in the development of intelligent tutoring systems (ITSs) that tailor educational experiences by predicting learner performance over time. In an empirical evaluation across multiple real-world datasets, we compare two approaches to using LLMs for this task, zero-shot prompting and model fine-tuning, with existing, non-LLM approaches to knowledge tracing. While LLM-based approaches do not achieve state-of-the-art performance, fine-tuned LLMs surpass the performance of naive baseline models and perform on par with standard Bayesian Knowledge Tracing approaches across multiple metrics. These findings suggest that the pattern recognition capabilities of LLMs can be used to model complex learning trajectories, opening a novel avenue for applying LLMs to educational contexts. The paper concludes with a discussion of the implications of these findings for future research, suggesting that further refinements and a deeper understanding of LLMs' predictive mechanisms could lead to enhanced performance in knowledge tracing tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (66)
  1. GPT-3-driven pedagogical agents to train children’s curious question-asking skills. International Journal of Artificial Intelligence in Education, pages 1–36, 2023. Publisher: Springer.
  2. G. Abdelrahman and Q. Wang. Knowledge Tracing with Sequential Key-Value Memory Networks. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 175–184, Paris France, July 2019. ACM.
  3. Knowledge tracing: A survey. ACM Computing Surveys, 55(11):1–37, 2023. Publisher: ACM New York, NY.
  4. Intelligent tutoring systems. Science (New York, N.Y.), 228(4698):456–462, 1985. Publisher: American Association for the Advancement of Science.
  5. Part 1: simple definition and calculation of accuracy, sensitivity and specificity. 2015. Publisher: ARCHIVES OF ACADEMIC EMERGENCY MEDICINE (EMERGENCY).
  6. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  7. C. Cao. Scaffolding CS1 courses with a large language model-powered intelligent tutoring system. In Companion proceedings of the 28th international conference on intelligent user interfaces, pages 229–232, 2023.
  8. TEMPO: Prompt-based Generative Pre-trained Transformer for Time Series Forecasting, Oct. 2023. arXiv:2310.04948 [cs].
  9. Learning Factors Analysis – A General Method for Cognitive Model Evaluation and Improvement. In D. Hutchison, T. Kanade, J. Kittler, J. M. Kleinberg, F. Mattern, J. C. Mitchell, M. Naor, O. Nierstrasz, C. Pandu Rangan, B. Steffen, M. Sudan, D. Terzopoulos, D. Tygar, M. Y. Vardi, G. Weikum, M. Ikeda, K. D. Ashley, and T.-W. Chan, editors, Intelligent Tutoring Systems, volume 4053, pages 164–175. Springer Berlin Heidelberg, Berlin, Heidelberg, 2006. Series Title: Lecture Notes in Computer Science.
  10. LLM4TS: Aligning Pre-Trained LLMs as Data-Efficient Time-Series Forecasters, Jan. 2024. arXiv:2308.08469 [cs].
  11. Evaluating Large Language Models Trained on Code, July 2021. arXiv:2107.03374 [cs].
  12. MEDITRON-70B: Scaling Medical Pretraining for Large Language Models, Nov. 2023. arXiv:2311.16079 [cs].
  13. DAS3H: Modeling Student Learning and Forgetting for Optimally Scheduling Distributed Practice of Skills, May 2019. arXiv:1905.06873 [cs, stat].
  14. Towards an Appropriate Query, Key, and Value Computation for Knowledge Tracing. In Proceedings of the Seventh ACM Conference on Learning @ Scale, L@S ’20, pages 341–344, New York, NY, USA, Aug. 2020. Association for Computing Machinery.
  15. Knowledge tracing: Modeling the acquisition of procedural knowledge. User modeling and user-adapted interaction, 4:253–278, 1994. Publisher: Springer.
  16. A. de Vries. The growing energy footprint of artificial intelligence. Joule, 7(10):2191–2194, Oct. 2023.
  17. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  18. Addressing the assessment challenge with an online system that tutors as it assesses. User modeling and user-adapted interaction, 19:243–266, 2009. Publisher: Springer.
  19. When is deep learning the best approach to knowledge tracing? Journal of Educational Data Mining, 12(3):31–54, 2020.
  20. S. González-Carvajal and E. C. Garrido-Merchán. Comparing BERT against traditional machine learning text classification. arXiv preprint arXiv:2005.13012, 2020.
  21. Intelligent tutoring systems. 2012. Publisher: American Psychological Association.
  22. Large Language Models Are Zero-Shot Time Series Forecasters, Oct. 2023. arXiv:2310.07820 [cs].
  23. S. Hochreiter and J. Schmidhuber. LSTM can solve hard long time lag problems. Advances in neural information processing systems, 9, 1996.
  24. S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997. Publisher: MIT press.
  25. A deep transfer learning approach to modeling teacher discourse in the classroom. In LAK21: 11th international learning analytics and knowledge conference, pages 302–312, 2021.
  26. S. Kikalishvili. Unlocking the potential of GPT-3 in education: opportunities, limitations, and recommendations for effective integration. Interactive Learning Environments, pages 1–13, 2023. Publisher: Taylor & Francis.
  27. Solving quantitative reasoning problems with language models. Advances in Neural Information Processing Systems, 35:3843–3857, 2022.
  28. C. Lin and M. Chi. Intervention-BKT: Incorporating Instructional Interventions into Bayesian Knowledge Tracing. In A. Micarelli, J. Stamper, and K. Panourgia, editors, Intelligent Tutoring Systems, volume 9684, pages 208–218. Springer International Publishing, Cham, 2016. Series Title: Lecture Notes in Computer Science.
  29. EKT: Exercise-Aware Knowledge Tracing for Student Performance Prediction. IEEE Transactions on Knowledge and Data Engineering, 33(1):100–115, Jan. 2021. Conference Name: IEEE Transactions on Knowledge and Data Engineering.
  30. Large Language Models are Few-Shot Health Learners, May 2023. arXiv:2305.15525 [cs].
  31. A. Lopez-Lira and Y. Tang. Can ChatGPT forecast stock price movements? return predictability and large language models. arXiv preprint arXiv:2304.07619, 2023.
  32. Large Language Models as General Pattern Machines, Oct. 2023. arXiv:2307.04721 [cs].
  33. G. M. Muktadir. A brief history of prompt: Leveraging language models. arXiv preprint arXiv:2310.04438, 2023.
  34. S. Pandey and G. Karypis. A self-attentive model for knowledge tracing. arXiv preprint arXiv:1907.06837, 2019.
  35. KT-IDEM: Introducing Item Difficulty to the Knowledge Tracing Model. In J. A. Konstan, R. Conejo, J. L. Marzo, and N. Oliver, editors, User Modeling, Adaption and Personalization, volume 6787, pages 243–254. Springer Berlin Heidelberg, Berlin, Heidelberg, 2011. Series Title: Lecture Notes in Computer Science.
  36. Performance Factors Analysis–A New Alternative to Knowledge Tracing. Online Submission, 2009. Publisher: ERIC.
  37. Deep Knowledge Tracing. Advances in neural information processing systems, 28, 2015.
  38. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  39. A. I. P. Sanpablo. Development and evaluation of a diagnostic exam for undergraduate biomedical engineering students using GPT language model-based virtual agents. In XLVI mexican conference on biomedical engineering: Proceedings of CNIB 2023, november 2–4, 2023, villahermosa tabasco, méxico-volume 1: Signal processing and bioinformatics, volume 96, page 128. Springer Nature, 2023.
  40. Empirical evaluation of deep learning models for knowledge tracing: Of hyperparameters and metrics on performance and replicability. arXiv preprint arXiv:2112.15072, 2021.
  41. Adaptive robot language tutoring based on Bayesian knowledge tracing and predictive decision-making. In Proceedings of the 2017 ACM/IEEE international conference on human-robot interaction, pages 128–136, 2017.
  42. Effidit: Your AI writing assistant. arXiv preprint arXiv:2208.01815, 2022.
  43. P. Steif and N. Bier. Oli engineering statics-fall 2011, 2014.
  44. Reviewriter: AI-generated instructions for peer review writing. In Proceedings of the 18th workshop on innovative use of NLP for building educational applications (BEA 2023), pages 57–71, 2023.
  45. Exercise-Enhanced Sequential Modeling for Student Performance Prediction. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1), Apr. 2018. Number: 1.
  46. How to fine-tune BERT for text classification? In Chinese computational linguistics: 18th china national conference, CCL 2019, kunming, china, october 18–20, 2019, proceedings 18, pages 194–206. Springer, 2019.
  47. A. Tack and C. Piech. The AI teacher test: Measuring the pedagogical ability of blender and GPT-3 in educational dialogues. arXiv preprint arXiv:2205.07540, 2022.
  48. Large language models in medicine. Nature medicine, 29(8):1930–1940, 2023. Publisher: Nature Publishing Group US New York.
  49. LLaMA 2: Open Foundation and Fine-Tuned Chat Models, July 2023. arXiv:2307.09288 [cs].
  50. Short answer questions generation by fine-tuning BERT and GPT-2. In Proceedings of the 29th international conference on computers in education conference, ICCE, 2021.
  51. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  52. Unraveling downstream gender bias from large language models: A study on AI educational writing assistance. arXiv preprint arXiv:2311.03311, 2023.
  53. Zero-shot cross-lingual summarization via large language models. In Proceedings of the 4th new frontiers in summarization workshop, pages 12–23, 2023.
  54. Structured persuasive writing support in legal education: A model and tool for German legal case solutions. In Findings of the association for computational linguistics: ACL 2023, pages 2296–2313, 2023.
  55. Emergent Abilities of Large Language Models. Transactions on Machine Learning Research, June 2022.
  56. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Jan. 2023. arXiv:2201.11903 [cs].
  57. Zero-shot information extraction via chatting with ChatGPT. arXiv preprint arXiv:2302.10205, 2023.
  58. Assessment of chemistry knowledge in large language models that generate code. Digital Discovery, 2(2):368–376, 2023. Publisher: Royal Society of Chemistry.
  59. A brief overview of ChatGPT: The history, status quo and potential future development. IEEE/CAA Journal of Automatica Sinica, 10(5):1122–1136, 2023. Publisher: IEEE.
  60. Utilizing a pretrained language model (BERT) to classify preservice physics teachers’ written reflections. International Journal of Artificial Intelligence in Education, 33(3):439–466, 2023. Publisher: Springer.
  61. Leveraging language foundation models for human mobility forecasting. In Proceedings of the 30th International Conference on Advances in Geographic Information Systems, SIGSPATIAL ’22, pages 1–9, New York, NY, USA, Nov. 2022. Association for Computing Machinery.
  62. Temporal data meets LLM–Explainable financial time series forecasting. arXiv preprint arXiv:2306.11025, 2023.
  63. Individualized Bayesian Knowledge Tracing Models. In Artificial intelligence in education: 16th international conference, AIED 2013, memphis, TN, USA, july 9-13, 2013. Proceedings 16, pages 171–180. Springer, 2013.
  64. Dynamic Key-Value Memory Networks for Knowledge Tracing. In Proceedings of the 26th International Conference on World Wide Web, WWW ’17, pages 765–774, Republic and Canton of Geneva, CHE, Apr. 2017. International World Wide Web Conferences Steering Committee.
  65. One Fits All:Power General Time Series Analysis by Pretrained LM, Oct. 2023. arXiv:2302.11939 [cs].
  66. M. Zong and B. Krishnamachari. Solving math word problems concerning systems of equations with GPT-3. In Proceedings of the AAAI conference on artificial intelligence, volume 37, pages 15972–15979, 2023. Issue: 13.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Seyed Parsa Neshaei (8 papers)
  2. Richard Lee Davis (1 paper)
  3. Adam Hazimeh (1 paper)
  4. Bojan Lazarevski (1 paper)
  5. Pierre Dillenbourg (11 papers)
  6. Tanja Käser (45 papers)
Citations (4)
X Twitter Logo Streamline Icon: https://streamlinehq.com