Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Large Language Models for Education: A Survey and Outlook (2403.18105v2)

Published 26 Mar 2024 in cs.CL and cs.AI

Abstract: The advent of LLMs has brought in a new era of possibilities in the realm of education. This survey paper summarizes the various technologies of LLMs in educational settings from multifaceted perspectives, encompassing student and teacher assistance, adaptive learning, and commercial tools. We systematically review the technological advancements in each perspective, organize related datasets and benchmarks, and identify the risks and challenges associated with deploying LLMs in education. Furthermore, we outline future research opportunities, highlighting the potential promising directions. Our survey aims to provide a comprehensive technological picture for educators, researchers, and policymakers to harness the power of LLMs to revolutionize educational practices and foster a more effective personalized learning environment.

This paper, "LLMs for Education: A Survey and Outlook" (Wang et al., 26 Mar 2024 ), provides a comprehensive overview of the applications, technologies, datasets, challenges, and future directions of LLMs in educational settings. It categorizes the landscape from a technological perspective, focusing on practical implementation and the impact of LLMs on students, teachers, and adaptive learning systems.

The survey organizes the applications of LLMs in education into a technology-centric taxonomy:

  1. Study Assisting: LLMs provide timely support to students during independent learning.
    • Question Solving (QS): LLMs act as powerful solvers for questions across various subjects (math, law, medicine, finance, programming). Techniques like Chain-of-Thought (CoT) prompting [wei2022chain], few-shot learning, and leveraging external tools or multi-agent conversations [wu2023autogen] are employed to improve performance on complex problems. External verifier modules can also rectify intermediate errors [cobbe2021training].
    • Error Correction (EC): LLMs offer instant feedback on student errors, particularly in language grammar and programming code. Fine-tuning LLMs on hybrid datasets [fan2023grammargpt] and using models trained on code like Codex have shown effectiveness in correcting errors in text and introductory programming assignments [zhang2022repairing]. Few-shot example generation pipelines can boost performance and stability in bug fixing [do2023using].
    • Confusion Helper (CH): Instead of giving direct answers, LLMs generate pedagogical guidance and hints. This involves techniques like guided question generation based on input conditioning or reinforcement learning and adapting explanations based on student profiles (age, education level, detail level) using instructional prompts [rooein2023know]. While promising, LLM-generated hints may still lack the depth and personalization of human tutors [pardos2023learning].
  2. Teach Assisting: LLMs help instructors reduce burdensome routine workloads.
    • Question Generation (QG): LLMs are used to automatically generate educational questions, including reading comprehension and multiple-choice questions (MCQs). Fine-tuning with supplemental materials, employing controllable text generation approaches [xiao2023evaluating], integrating generation control modules [doughty2024comparative], and aligning prompts with educational taxonomies [lee2023few] help produce relevant and diverse questions [zhou2023learning].
    • Automatic Grading (AG): LLMs can automate scoring for open-ended questions and essays, moving beyond simple semantic comparison. Prompt tuning algorithms with contexts, rubrics, and examples show satisfactory performance [yancey2023rating]. Integrating CoT allows LLMs to provide detailed comments along with scores [xiao2024automation]. Multimodal LLMs can even grade responses involving handwritten text and images [li2024automated]. Cross-prompt pre-finetuning can help achieve comparable performance with limited labeled samples [funayama2023reducing].
    • Material Creation (MC): LLMs assist teachers in creating high-quality educational content like asynchronous course materials, language learning resources, and interactive worked examples. Human-in-the-loop processes [leiker2023prototyping], zero-shot prompting, prompt chaining, and one-shot learning strategies are explored for this purpose [jury2024evaluating].
  3. Adaptive Learning: LLMs contribute to personalized learning experiences.
    • Knowledge Tracing (KT): LLMs are used to generate auxiliary information for student records and question text. Extracting knowledge keywords from questions [ni2023enhancing] helps address cold-start scenarios. Predicting question difficulties based on text and knowledge concepts using LLMs [lee2023difficulty] overcomes missing information problems. LLMs can even simulate student incorrect responses by reasoning with distorted facts based on knowledge profiles [sonkar2023deduction].
    • Content Personalizing (CP): LLMs generate customized learning content and paths. This includes generating dynamic learning paths based on knowledge mastery [kuo2023leveraging], incorporating knowledge concept structures [kabir2023LLM], and creating contextualized questions based on student interests through prompt engineering [yadav2023contextualizing]. Chat-based LLMs can also explain learning recommendations, often using Knowledge Graphs (KGs) as context [abu2024knowledge].
  4. Education Toolkit: The paper also surveys various commercial LLM-powered tools, categorizing them by function: Chatbots (e.g., ChatGPT, Bard, Perplexity), Content Creation (e.g., Curipod, Diffit, MagicSchool), Teaching Aides (e.g., gotFeedback, Grammarly, ChatPDF), Quiz Generators (e.g., QuestionWell, Formative AI, Quizizz AI), and Collaboration Tools (e.g., summarize.tech, Parlay Genie).

The paper also discusses the availability of Datasets and Benchmarks. It notes that many publicly available datasets exist, particularly for text-rich tasks like Question Solving (QS), Error Correction (EC), Question Generation (QG), and Automatic Grading (AG). These datasets cover various subjects, levels, languages, and modalities (text, image, table). Examples provided cover math, science, linguistic, and computer science domains. A summary table is included in the Appendix.

Critical Risks and Potential Challenges are identified based on the responsible AI framework [Microsoft2024ResponsibleAI]:

  • Fairness and Inclusiveness: Bias in LLM training data can lead to demographic bias and unequal access or quality for underrepresented groups [weidinger2021ethical]. Mitigation strategies include reinforced unbiased prompts [li2023fairness] and fine-tuning [caliskanpsychological].
  • Reliability and Safety: LLMs suffer from hallucinations, toxic output, and inconsistencies [ji2023survey]. Temporal misalignment [cheng2024dated] and the potential for misinformation via deepfakes [shoaib2023deepfakes] are concerns. Strategies involve metacognitive approaches for error correction [tan2024tuningfree] and Retrieval-Augmented Generation (RAG) to improve accuracy [gao2024retrievalaugmented].
  • Transparency and Accountability: The black-box nature of LLMs raises concerns about plagiarism, cheating, and difficulty in tracking the source of generated content [Milano2023]. New assessment frameworks are needed [macneil2024imagining]. Techniques like identifying LLM-human mixed texts [gao2024LLMasacoauthor] and incorporating citations [huang2023citation] are explored.
  • Privacy and Security: Protecting personally identifiable information is crucial, especially in educational settings [das2024security]. Concerns include data tracking, profiling, and the privacy risks of deepfakes [shoaib2023deepfakes]. Mitigation involves RAG + fine-tuning frameworks [hicke2023aita] and providing users with data deletion options [masikisiki223investigating].
  • Overly Dependency on LLMs: Students may become overly reliant on LLMs for tasks like writing or problem-solving, potentially hindering the development of critical thinking and independent skills [Milano2023]. Moderated approaches and tools designed to stimulate critical thinking are suggested [krupp2023challenges, adewumi2023procot].

Finally, the paper outlines promising Future Directions:

  • Pedagogical Interest Aligned LLMs: Developing models that align with human instructors' behavior and incorporate multi-discipline knowledge using techniques like RAG [gao2023retrieval] and fine-tuning on pedagogical datasets.
  • LLM-Multi-Agent Education Systems: Designing collaborative frameworks where multiple LLM agents work together to solve complex educational tasks, potentially involving human instructors as agents [yang2024content].
  • Multimodal and Multilingual Supports: Advancing LLMs to process and integrate diverse data sources (text, image, etc.) [wu2023next] and understand cultural/regional nuances for equitable global education access.
  • Edge Computing and Efficiency: Optimizing lightweight LLMs for deployment on edge devices to reduce latency, enable offline access, and enhance privacy in resource-constrained environments.
  • Efficient Training of Specialized Models: Creating domain-specific LLMs (e.g., for math or science) that offer deep knowledge and accuracy in specific fields while being cost-effective to train.
  • Ethical and Privacy Considerations: Developing robust frameworks and guidelines for responsible LLM deployment, focusing on data security, student privacy, bias mitigation, and transparency.

In conclusion, the survey highlights the transformative potential of LLMs in education across various applications while also emphasizing the critical technical and ethical challenges that must be addressed for their effective and responsible deployment. The identified future directions point towards developing more pedagogically aligned, collaborative, multimodal, multilingual, efficient, and ethically sound LLM systems for education.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (203)
  1. 2024. QuestionWell. https://www.questionwell.org/
  2. Knowledge tracing: A survey. Comput. Surveys 55, 11 (2023), 1–37.
  3. Knowledge Graphs as Context Sources for LLM-Based Explanations of Learning Recommendations. arXiv preprint arXiv:2403.03008 (2024).
  4. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023).
  5. ProCoT: Stimulating Critical Thinking and Writing of Students through Engagement with Large Language Models (LLMs). arXiv:2312.09801 [cs.CL]
  6. Formative AI. 2024a. Formative AI. https://www.formative.com/
  7. Quizizz AI. 2024b. Quizizz AI. ttps://quizizz.com/home/quizizz-ai
  8. Mathqa: Towards interpretable math word problem solving with operation-based formalisms. arXiv preprint arXiv:1905.13319 (2019).
  9. Adapting to the Impact of AI in Scientific Writing: Balancing Benefits and Drawbacks while Developing Policies and Regulations. arXiv:2306.06699 [q-bio.OT]
  10. Evaluating the Quality of LLM-Generated Explanations for Logical Errors in CS1 Student Programs. In Proceedings of the 16th Annual ACM India Compute Conference. 49–54.
  11. Educational data mining to predict students’ academic performance: A survey study. Education and Information Technologies 28, 1 (2023), 905–971.
  12. Learning to reuse distractors to support multiple choice question generation in education. IEEE Transactions on Learning Technologies (2022).
  13. TOEFL11: A corpus of non-native English. ETS Research Report Series 2013, 2 (2013), i–15.
  14. Michael Bommarito II and Daniel Martin Katz. 2022. GPT takes the bar exam. arXiv preprint arXiv:2212.14402 (2022).
  15. GutenTag: an NLP-driven tool for digital humanities research in the Project Gutenberg corpus. In Proceedings of the Fourth Workshop on Computational Linguistics for Literature. 42–47.
  16. The BEA-2019 shared task on grammatical error correction. In Proceedings of the fourteenth workshop on innovative use of NLP for building educational applications. 52–75.
  17. Chahat Raj1 Anjishnu Mukherjee1 Aylin Caliskan and Antonios Anastasopoulos1 Ziwei Zhu. [n. d.]. A Psychological View to Social Bias in LLMs: Evaluation and Mitigation. ([n. d.]).
  18. LearningQ: a large-scale dataset for educational question generation. In Proceedings of the international AAAI conference on web and social media, Vol. 12.
  19. Artificial intelligence in education: A review. IEEE Access 8 (2020), 75264–75278.
  20. Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks. arXiv preprint arXiv:2211.12588 (2022).
  21. Theoremqa: A theorem-driven question answering dataset. In The 2023 Conference on Empirical Methods in Natural Language Processing.
  22. Exploring the potential of large language models (llms) in learning on graphs. arXiv preprint arXiv:2307.03393 (2023).
  23. Dated Data: Tracing Knowledge Cutoffs in Large Language Models. arXiv:2403.12958 [cs.CL]
  24. Few-Shot Fairness: Unveiling LLM’s Potential for Fairness-Aware Classification. arXiv:2402.18502 [cs.CL]
  25. Systematic literature review on opportunities, challenges, and future research recommendations of artificial intelligence in education. Computers and Education: Artificial Intelligence 4 (2023), 100118.
  26. Think you have solved question answering? try arc, the ai2 reasoning challenge. arXiv preprint arXiv:1803.05457 (2018).
  27. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168 (2021).
  28. Predicting student performance from LMS data: A comparison of 17 blended courses using Moodle LMS. IEEE Transactions on Learning Technologies 10, 1 (2016), 17–29.
  29. Conker. 2024. Conker. https://www.conker.ai/
  30. Demographic predictors of students’ science participation over the age of 16: An Australian case study. Research in Science Education 50, 1 (2020), 361–373.
  31. Education Copilot. 2024. Education Copilot. https://educationcopilot.com/
  32. Copy.ai. 2024. Copy.ai. https://www.copy.ai/
  33. Neural grammatical error correction for romanian. In 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI). IEEE, 625–631.
  34. Chatlaw: Open-source legal large language model with integrated external knowledge bases. arXiv preprint arXiv:2306.16092 (2023).
  35. Xiliang Cui and Bao-lin Zhang. 2011. The principles for building the “international corpus of learner chinese”. Applied Linguistics 2 (2011), 100–108.
  36. Curipod. 2024. Curipod. http://www.curipod.com/ai
  37. Security and Privacy Challenges of Large Language Models: A Survey. arXiv:2402.00888 [cs.CL]
  38. Developing NLP tools with a new corpus of learner Spanish. In Proceedings of the Twelfth Language Resources and Evaluation Conference. 7238–7243.
  39. Computing education in the era of generative AI. Commun. ACM 67, 2 (2024), 56–67.
  40. Independent student learning aided by computers: an acceptable alternative to lectures? Computers & Education 35, 3 (2000), 223–241.
  41. Diffit. 2024. Diffit. https://beta.diffit.me/
  42. Tung Do Viet and Konstantin Markov. 2023. Using Large Language Models for Bug Localization and Fixing. In 2023 12th International Conference on Awareness Science and Technology (iCAST). IEEE, 192–197.
  43. A comparative study of AI-generated (GPT-4) and human-crafted MCQs in programming education. In Proceedings of the 26th Australasian Computing Education Conference. 114–123.
  44. FlaCGEC: A Chinese Grammatical Error Correction Dataset with Fine-grained Linguistic Annotation. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. 5321–5325.
  45. Eduaide.ai. 2024. Eduaide.ai. https://www.eduaide.ai/
  46. Recommender systems in the era of large language models (llms). arXiv preprint arXiv:2307.02046 (2023).
  47. Grammargpt: Exploring open-source llms for native chinese grammatical error correction with supervised fine-tuning. In CCF International Conference on Natural Language Processing and Chinese Computing. Springer, 69–80.
  48. Experts’ view on challenges and needs for fairness in artificial intelligence for education. In International Conference on Artificial Intelligence in Education. Springer, 243–255.
  49. Logits of API-Protected LLMs Leak Proprietary Information. arXiv:2403.09539 [cs.CL]
  50. Reducing the cost: Cross-prompt pre-finetuning for short answer scoring. In International conference on artificial intelligence in education. Springer, 78–89.
  51. LLM-as-a-Coauthor: The Challenges of Detecting LLM-Human Mixcase. arXiv:2401.05952 [cs.CL]
  52. Pal: Program-aided language models. In International Conference on Machine Learning. PMLR, 10764–10799.
  53. Retrieval-Augmented Generation for Large Language Models: A Survey. arXiv:2312.10997 [cs.CL]
  54. Parlay Genie. 2024. Parlay Genie. https://parlayideas.com/
  55. Khanq: A dataset for generating deep questions in education. In Proceedings of the 29th International Conference on Computational Linguistics. 5925–5938.
  56. Google. 2024. Google Bard. https://bard.google.com
  57. gotFeedback. 2024. gotFeedback. https://www.gotlearning.com/gotfeedback/
  58. Grammarly. 2024. Grammarly. https://app.grammarly.com/
  59. Roman Grundkiewicz and Marcin Junczys-Dowmunt. 2014. The wiked error corpus: A corpus of corrective wikipedia edits and its application to grammatical error correction. In Advances in Natural Language Processing: 9th International Conference on NLP, PolTAL 2014, Warsaw, Poland, September 17-19, 2014. Proceedings 9. Springer, 478–490.
  60. Exploring the potential of chatgpt in automated code refinement: An empirical study. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering. 1–13.
  61. Eduqg: A multi-format multiple-choice dataset for the educational domain. IEEE Access 11 (2023), 20885–20896.
  62. Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300 (2020).
  63. Measuring mathematical problem solving with the math dataset. arXiv preprint arXiv:2103.03874 (2021).
  64. AI-TA: Towards an Intelligent Question-Answer Teaching Assistant using Open-Source LLMs. arXiv:2311.02775 [cs.LG]
  65. How well do computers solve math word problems? large-scale dataset construction and evaluation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 887–896.
  66. Jie Huang and Kevin Chen-Chuan Chang. 2023. Citation: A Key to Building Responsible and Accountable Large Language Models. arXiv:2307.02185 [cs.CL]
  67. C-eval: A multi-level multi-discipline chinese evaluation suite for foundation models. Advances in Neural Information Processing Systems 36 (2024).
  68. Survey of hallucination in natural language generation. Comput. Surveys 55, 12 (2023), 1–38.
  69. What disease does this patient have? a large-scale open domain question answering dataset from medical exams. Applied Sciences 11, 14 (2021), 6421.
  70. Position Paper: What Can Large Language Models Tell Us about Time Series Analysis. arXiv preprint arXiv:2402.02713 (2024).
  71. Evaluating LLM-generated Worked Examples in an Introductory Programming Course. In Proceedings of the 26th Australasian Computing Education Conference. 77–86.
  72. Defects4J: A database of existing faults to enable controlled testing studies for Java programs. In Proceedings of the 2014 international symposium on software testing and analysis. 437–440.
  73. Md Rayhan Kabir and Fuhua Lin. 2023. An LLM-Powered Adaptive Practicing System. In AIED 2023 workshop on Empowering Education with LLMs-the Next-Gen Interface and Content Generation, AIED.
  74. Firuz Kamalov and Ikhlaas Gurrib. 2023. A new era of artificial intelligence in education: A multifaceted revolution. arXiv preprint arXiv:2305.18303 (2023).
  75. ChatGPT for good? On opportunities and challenges of large language models for education. Learning and individual differences 103 (2023), 102274.
  76. How novices use LLM-based code generators to solve CS1 coding tasks in a self-paced learning environment. In Proceedings of the 23rd Koli Calling International Conference on Computing Education Research. 1–12.
  77. Exploring the Frontiers of LLMs in Psychological Applications: A Comprehensive Review. arXiv:2401.01519 [cs.LG]
  78. A diagram is worth a dozen images. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14. Springer, 235–251.
  79. Khanmigo. 2024. Khanmigo. https://blog.khanacademy.org/teacher-khanmigo/
  80. Textbook question answering with multi-modal context graph understanding and self-supervised open-set comprehension. arXiv preprint arXiv:1811.00232 (2018).
  81. Data mining and education. Wiley Interdisciplinary Reviews: Cognitive Science 6, 4 (2015), 333–353.
  82. Critical success factors and challenges for individual digital study assistants in higher education: A mixed methods analysis. Education and Information Technologies 28, 4 (2023), 4475–4503.
  83. Osama Koraishi. 2023. Teaching English in the age of AI: Embracing ChatGPT to optimize EFL materials and assessment. Language Education and Technology 3, 1 (2023).
  84. Challenges and Opportunities of Moderating Usage of Large Language Models in Education. arXiv:2312.14969 [cs.HC]
  85. MACHINE LEARNING IN EDUCATION-A SURVEY OF CURRENT RESEARCH TRENDS. Annals of DAAAM & Proceedings 29 (2018).
  86. Leveraging LLMs for Adaptive Testing and Learning in Taiwan Adaptive Learning Platform (TALP). (2023).
  87. Race: Large-scale reading comprehension dataset from examinations. arXiv preprint arXiv:1704.04683 (2017).
  88. A. Latham and S. Goltz. 2019. A Survey of the General Public’s Views on the Ethics of Using AI in Education. In Artificial Intelligence in Education (Lecture Notes in Computer Science, Vol. 11625), S. Isotani, E. Millán, A. Ogan, P. Hastings, B. McLaren, and R. Luckin (Eds.). Springer, Cham. https://doi.org/10.1007/978-3-030-23204-7_17
  89. Artificial general intelligence (AGI) for education. arXiv preprint arXiv:2304.12479 (2023).
  90. The ManyBugs and IntroClass benchmarks for automated repair of C programs. IEEE Transactions on Software Engineering 41, 12 (2015), 1236–1256.
  91. Few-shot is enough: exploring ChatGPT prompt engineering method for automatic question generation in english education. Education and Information Technologies (2023), 1–33.
  92. Difficulty-Focused Contrastive Learning for Knowledge Tracing with a Large Language Model-Based Difficulty Prediction. arXiv preprint arXiv:2312.11890 (2023).
  93. Prototyping the use of Large Language Models (LLMs) for adult learning content creation at scale. arXiv preprint arXiv:2306.01815 (2023).
  94. Automated Feedback for Student Math Responses Based on Multi-Modality and Fine-Tuning. In Proceedings of the 14th Learning Analytics and Knowledge Conference. 763–770.
  95. Bringing Generative AI to Adaptive Learning in Education. arXiv preprint arXiv:2402.14601 (2024).
  96. Cmmlu: Measuring massive multitask language understanding in chinese. arXiv preprint arXiv:2306.09212 (2023).
  97. Steering LLMs Towards Unbiased Responses: A Causality-Guided Debiasing Framework. arXiv:2403.08743 [cs.CL]
  98. Adapting Large Language Models for Education: Foundational Capabilities, Potentials, and Challenges. arXiv preprint arXiv:2401.08664 (2023).
  99. Your Large Language Model is Secretly a Fairness Proponent and You Should Prompt it Like One. arXiv:2402.12150 [cs.CL]
  100. A Survey on Fairness in Large Language Models. arXiv:2308.10149 [cs.CL]
  101. Yunqi Li and Yongfeng Zhang. 2023. Fairness of ChatGPT. arXiv:2305.18569 [cs.LG]
  102. Automating code review activities by large-scale pre-training. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1035–1047.
  103. Distractor generation for multiple choice questions using learning to rank. In Proceedings of the thirteenth workshop on innovative use of NLP for building educational applications. 284–290.
  104. Can large language models reason about medical questions? Patterns (2023).
  105. QuixBugs: A multi-lingual program repair benchmark set based on the Quixey Challenge. In Proceedings Companion of the 2017 ACM SIGPLAN international conference on systems, programming, languages, and applications: software for humanity. 55–56.
  106. A comprehensive survey on deep learning techniques in educational data mining. arXiv preprint arXiv:2309.04761 (2023).
  107. Investigating the effect of an adaptive learning intervention on students’ learning. Educational technology research and development 65 (2017), 1605–1625.
  108. Automatic short answer grading via multiway attention networks. In Artificial Intelligence in Education: 20th International Conference, AIED 2019, Chicago, IL, USA, June 25-29, 2019, Proceedings, Part II 20. Springer, 169–173.
  109. Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models. arXiv:2402.17177 [cs.CV]
  110. Using the Concept of Game-Based Learning in Education. International Journal of Emerging Technologies in Learning (2020).
  111. Inter-GPS: Interpretable geometry problem solving with formal language and symbolic reasoning. arXiv preprint arXiv:2105.04165 (2021).
  112. Dynamic prompt learning via policy gradient for semi-structured mathematical reasoning. arXiv preprint arXiv:2209.14610 (2022).
  113. Iconqa: A new benchmark for abstract diagram understanding and visual language reasoning. arXiv preprint arXiv:2110.13214 (2021).
  114. Imagining Computing Education Assessment after Generative AI. arXiv:2401.04601 [cs.HC]
  115. Personalized education in the artificial intelligence era: what to expect next. IEEE Signal Processing Magazine 38, 3 (2021), 37–50.
  116. MagicSchool.ai. 2024. MagicSchool.ai. https://www.magicschool.ai/
  117. On the educational impact of chatgpt: Is artificial intelligence ready to obtain a university degree?. In Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1. 47–53.
  118. Investigating the Efficacy of Large Language Models in Reflective Assessment Methods through Chain of Thoughts Prompting. arXiv:2310.00272 [cs.CL]
  119. A diverse corpus for evaluating and developing English math word problem solvers. arXiv preprint arXiv:2106.15772 (2021).
  120. Microsoft. 2024. Bing Chat. https://learn.microsoft.com/en-us/training/modules/enhance-teaching-learning-bing-chat/
  121. Microsoft. 2024. What is responsible AI? Accessed: 2024-02-08.
  122. Large language models challenge the future of higher education. Nature Machine Intelligence 5 (2023), 333–334. https://doi.org/10.1038/s42256-023-00644-2
  123. Examining national trends in educational placements for students with significant disabilities. Remedial and Special Education 38, 1 (2017), 3–12.
  124. Czech grammar error correction with a large and diverse corpus. Transactions of the Association for Computational Linguistics 10 (2022), 452–467.
  125. Deep learning recommendation model for personalization and recommendation systems. arXiv preprint arXiv:1906.00091 (2019).
  126. The CoNLL-2014 shared task on grammatical error correction. In Proceedings of the eighteenth conference on computational natural language learning: shared task. 1–14.
  127. Enhancing student performance prediction on learnersourced questions with sgnn-llm synergy. arXiv preprint arXiv:2309.13500 (2023).
  128. Nolej. 2024. Nolej. https://nolej.io/
  129. Large Language Model (LLM) Bias Index–LLMBI. arXiv preprint arXiv:2312.14769 (2023).
  130. OpenAI. 2024. OpenAI Chat. https://chat.openai.com
  131. R OpenAI. 2023. Gpt-4 technical report. arxiv 2303.08774. View in Article 2, 5 (2023).
  132. Training language models to follow instructions with human feedback. Advances in neural information processing systems 35 (2022), 27730–27744.
  133. Medmcqa: A large-scale multi-subject multi-choice dataset for medical domain question answering. In Conference on health, inference, and learning. PMLR, 248–260.
  134. Zachary A Pardos and Shreya Bhandari. 2023. Learning gain differences between ChatGPT and human tutor generated algebra hints. arXiv preprint arXiv:2302.06871 (2023).
  135. Chat PDF. 2024. Chat PDF. https://www.chatpdf.com/
  136. Perplexity AI. 2024. Perplexity AI. https://perplexity.ai
  137. Pi.ai. 2024. Pi. http://www.Pi.ai
  138. Large language models for education: Grading open-ended questions using chatgpt. In Proceedings of the XXXVII Brazilian Symposium on Software Engineering. 293–302.
  139. Comparing different approaches to generating mathematics explanations using large language models. In International Conference on Artificial Intelligence in Education. Springer, 290–295.
  140. PISA Programme for international student assessment (PISA) PISA 2000 technical report: PISA 2000 technical report. oecd Publishing.
  141. Cristobal Romero and Sebastian Ventura. 2007. Educational data mining: A survey from 1995 to 2005. Expert systems with applications 33, 1 (2007), 135–146.
  142. Cristóbal Romero and Sebastián Ventura. 2010. Educational data mining: a review of the state of the art. IEEE Transactions on Systems, Man, and Cybernetics, Part C (applications and reviews) 40, 6 (2010), 601–618.
  143. Cristobal Romero and Sebastian Ventura. 2013. Data mining in education. Wiley Interdisciplinary Reviews: Data mining and knowledge discovery 3, 1 (2013), 12–27.
  144. Cristobal Romero and Sebastian Ventura. 2020. Educational data mining and learning analytics: An updated survey. Wiley interdisciplinary reviews: Data mining and knowledge discovery 10, 3 (2020), e1355.
  145. Know Your Audience: Do LLMs Adapt to Different Age and Education Levels? arXiv preprint arXiv:2312.02065 (2023).
  146. A simple recipe for multilingual grammatical error correction. arXiv preprint arXiv:2106.03830 (2021).
  147. Alla Rozovskaya and Dan Roth. 2019. Grammar error correction in morphologically rich languages: The case of Russian. Transactions of the Association for Computational Linguistics 7 (2019), 1–17.
  148. Large scale analytics of global and regional MOOC providers: Differences in learners’ demographics, preferences, and perceptions. Computers & Education 180 (2022), 104426.
  149. Thrilled by your progress! Large language models (GPT-4) no longer struggle to pass assessments in higher education programming courses. In Proceedings of the 2023 ACM Conference on International Computing Education Research-Volume 1. 78–92.
  150. Deepfakes, Misinformation, and Disinformation in the Era of Frontier AI, Generative AI, and Large AI Models. arXiv:2311.17394 [cs.CR]
  151. Automatic Generation of Socratic Questions for Learning to Solve Math Word Problems. ([n. d.]).
  152. Shashank Sonkar and Richard G Baraniuk. 2023. Deduction under Perturbed Evidence: Probing Student Simulation (Knowledge Tracing) Capabilities of Large Language Models. (2023).
  153. Christian Stab and Iryna Gurevych. 2014. Annotating argument components and relations in persuasive essays. In Proceedings of COLING 2014, the 25th international conference on computational linguistics: Technical papers. 1501–1510.
  154. summarize.tech. 2024. summarize.tech. https://www.summarize.tech/
  155. Hao Sun. 2023. Offline prompt evaluation and optimization with inverse reinforcement learning. arXiv preprint arXiv:2309.06553 (2023).
  156. Predicting challenge moments from students’ discourse: A comparison of GPT-4 to two traditional natural language processing approaches. In Proceedings of the 14th Learning Analytics and Knowledge Conference (LAK ’24). ACM. https://doi.org/10.1145/3636555.3636905
  157. Teo Susnjak. 2022. ChatGPT: The end of online exam integrity? arXiv preprint arXiv:2212.09292 (2022).
  158. Oleksiy Syvokon and Olena Nahorna. 2021. UA-GEC: Grammatical error correction and fluency corpus for the ukrainian language. arXiv preprint arXiv:2103.16997 (2021).
  159. Towards applying powerful large ai models in classroom teaching: Opportunities, challenges and prospects. arXiv preprint arXiv:2305.03433 (2023).
  160. Tuning-Free Accountable Intervention for LLM Deployment – A Metacognitive Approach. arXiv:2403.05636 [cs.AI]
  161. Large language models in medicine. Nature medicine 29, 8 (2023), 1930–1940.
  162. Jörg Tiedemann. 2020. The Tatoeba Translation Challenge–Realistic Data Sets for Low Resource and Multilingual MT. arXiv preprint arXiv:2010.06354 (2020).
  163. Analyzing the quality of submissions in online programming courses. In 2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering Education and Training (ICSE-SEET). IEEE, 271–282.
  164. Goblin Tools. 2024. Goblin Tools. https://goblin.tools/
  165. Introduction to SIGHAN 2015 bake-off for Chinese spelling check. In Proceedings of the Eighth SIGHAN Workshop on Chinese Language Processing. 32–37.
  166. An empirical study on learning bug-fixing patches in the wild via neural machine translation. ACM Transactions on Software Engineering and Methodology (TOSEM) 28, 4 (2019), 1–29.
  167. Twee. 2024. Twee. https://twee.com/
  168. Shyam Upadhyay and Ming-Wei Chang. 2016. Annotating derivations: A new evaluation strategy and dataset for algebra word problems. arXiv preprint arXiv:1609.07197 (2016).
  169. Large language models are implicitly topic models: Explaining and finding good demonstrations for in-context learning. In Workshop on Efficient Systems for Foundation Models@ ICML2023.
  170. Deep neural solver for math word problems. In Proceedings of the 2017 conference on empirical methods in natural language processing. 845–854.
  171. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems 35 (2022), 24824–24837.
  172. Ethical and social risks of harm from Language Models. arXiv:2112.04359 [cs.CL]
  173. Crowdsourcing multiple choice science questions. arXiv preprint arXiv:1707.06209 (2017).
  174. Autogen: Enabling next-gen llm applications via multi-agent conversation framework. arXiv preprint arXiv:2308.08155 (2023).
  175. Bloomberggpt: A large language model for finance. arXiv preprint arXiv:2303.17564 (2023).
  176. An empirical study on challenging math problem solving with gpt-4. arXiv preprint arXiv:2306.01337 (2023).
  177. From Automation to Augmentation: Large Language Models Elevating Essay Scoring Landscape. arXiv preprint arXiv:2401.06431 (2024).
  178. Evaluating reading comprehension exercises generated by LLMs: A showcase of ChatGPT in education applications. In Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023). 610–625.
  179. A Review of Data Mining in Personalized Education: Current Trends and Future Prospects. arXiv preprint arXiv:2402.17236 (2024).
  180. Superclue: A comprehensive chinese large language model benchmark. arXiv preprint arXiv:2307.15020 (2023).
  181. FCGEC: fine-grained corpus for Chinese grammatical error correction. arXiv preprint arXiv:2210.12364 (2022).
  182. Fantastic Questions and Where to Find Them: FairytaleQA–An Authentic Dataset for Narrative Comprehension. arXiv preprint arXiv:2203.13947 (2022).
  183. Contextualizing problems to student interests at scale in intelligent tutoring system using large language models. arXiv preprint arXiv:2306.00190 (2023).
  184. Practical and ethical challenges of large language models in education: A systematic scoping review. British Journal of Educational Technology 55, 1 (2024), 90–112.
  185. Rating short l2 essays on the cefr scale with gpt-4. In Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023). 576–584.
  186. Fingpt: Open-source financial large language models. arXiv preprint arXiv:2306.06031 (2023).
  187. A new dataset and method for automatically grading ESOL texts. In Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies. 180–189.
  188. Automatic generation of headlines for online math questions. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 9490–9497.
  189. How well do Large Language Models perform in Arithmetic tasks? arXiv preprint arXiv:2304.02015 (2023).
  190. Large language models for robotics: A survey. arXiv preprint arXiv:2311.07226 (2023).
  191. Repairing bugs in python assignments using large language models. arXiv preprint arXiv:2209.14876 (2022).
  192. M3exam: A multilingual, multimodal, multilevel benchmark for examining large language models. Advances in Neural Information Processing Systems 36 (2024).
  193. Evaluating the Performance of Large Language Models on GAOKAO Benchmark.
  194. Does Correction Remain An Problem For Large Language Models? arXiv preprint arXiv:2308.01776 (2023).
  195. Overview of ctc 2021: Chinese text correction for native speakers. arXiv preprint arXiv:2208.05681 (2022).
  196. Retrieval-Augmented Generation for AI-Generated Content: A Survey. arXiv:2402.19473 [cs.CV]
  197. Ape210k: A large-scale and template-rich dataset of math word problems. arXiv preprint arXiv:2009.11506 (2020).
  198. Agieval: A human-centric benchmark for evaluating foundation models. arXiv preprint arXiv:2304.06364 (2023).
  199. Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification. In The International Conference on Learning Representations.
  200. ”The teachers are confused as well”: A Multiple-Stakeholder Ethics Discussion on Large Language Models in Computing Education. arXiv:2401.12453 [cs.CY]
  201. Learning by Analogy: Diverse Questions Generation in Math Word Problem. In Findings of the Association for Computational Linguistics: ACL 2023. 11091–11104.
  202. Red teaming chatgpt via jailbreaking: Bias, robustness, reliability and toxicity. arXiv preprint arXiv:2301.12867 (2023).
  203. Niina Zuber and Jan Gogoll. 2023. Vox Populi, Vox ChatGPT: Large Language Models, Education and Democracy. arXiv:2311.06207 [cs.CY]
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Shen Wang (111 papers)
  2. Tianlong Xu (11 papers)
  3. Hang Li (277 papers)
  4. Chaoli Zhang (24 papers)
  5. Joleen Liang (2 papers)
  6. Jiliang Tang (204 papers)
  7. Philip S. Yu (592 papers)
  8. Qingsong Wen (139 papers)
Citations (38)
Youtube Logo Streamline Icon: https://streamlinehq.com