Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Knowledge Distillation of LLM for Automatic Scoring of Science Education Assessments (2312.15842v3)

Published 26 Dec 2023 in cs.CL and cs.AI
Knowledge Distillation of LLM for Automatic Scoring of Science Education Assessments

Abstract: This study proposes a method for knowledge distillation (KD) of fine-tuned LLMs into smaller, more efficient, and accurate neural networks. We specifically target the challenge of deploying these models on resource-constrained devices. Our methodology involves training the smaller student model (Neural Network) using the prediction probabilities (as soft labels) of the LLM, which serves as a teacher model. This is achieved through a specialized loss function tailored to learn from the LLM's output probabilities, ensuring that the student model closely mimics the teacher's performance. To validate the performance of the KD approach, we utilized a large dataset, 7T, containing 6,684 student-written responses to science questions and three mathematical reasoning datasets with student-written responses graded by human experts. We compared accuracy with state-of-the-art (SOTA) distilled models, TinyBERT, and artificial neural network (ANN) models. Results have shown that the KD approach has 3% and 2% higher scoring accuracy than ANN and TinyBERT, respectively, and comparable accuracy to the teacher model. Furthermore, the student model size is 0.03M, 4,000 times smaller in parameters and x10 faster in inferencing than the teacher model and TinyBERT, respectively. The significance of this research lies in its potential to make advanced AI technologies accessible in typical educational settings, particularly for automatic scoring.

Introduction

The integration of AI and LLMs into education has resulted in significant advancements in personalized learning experiences. LLMs show promise in enhancing learning by providing personalized content, supporting learning, and facilitating automatic scoring systems. However, their deployment faces challenges due to their large size and computational requirements. This is particularly problematic for educational environments with hardware limitations such as mobile devices, tablets, and school-provided laptops without high-end graphics processing units or tensor processing units.

Knowledge Distillation for LLM

The paper examines the distillation of knowledge from fine-tuned LLMs into smaller, more manageable models, allowing them to run on devices with fewer resources. This process involves a "teacher-student" learning approach where a smaller "student" model is trained using the predictive probabilities generated by a larger LLM, known as the "teacher" model. Advanced techniques involving specialized loss functions are used to instruct the student model not only to mimic the LLM's performance but to do so with a fraction of the computational requirements.

Empirical Validation

Researchers conducted experiments using a significant dataset, labeled 7T, containing thousands of student-written responses to science questions, along with three additional datasets in the educational domain. The smaller models trained via knowledge distillation were compared with original neural network models in terms of their accuracy in automatic scoring. The distilled student models displayed comparable accuracy to the LLM when tested on the 7T dataset and higher accuracy than original neural networks on other datasets while being considerably smaller in size and computational load.

Impact on Education

The implications of using distilled models in education are far-reaching. By enabling effective automatic scoring systems to operate on less powerful hardware, this research contributes to the democratization of AI in education. It offers a viable solution to the problem of deploying sophisticated AI models in resource-constrained environments and aligns with the increasing demand for personalized learning and adaptive assessment tools in the educational sector. The paper provides a proof of concept for the successful application of knowledge distillation in educational technology, highlighting its potential to transform educational assessment practices and ensure equitable access to advanced AI technologies in typical school settings.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Exploring the limits of simple learners in knowledge distillation for document classification with DocBERT. In Proceedings of the 5th Workshop on Representation Learning for NLP. 72–77.
  2. Knowledge distillation from internal representations. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 7350–7357.
  3. Automatic Scoring of Dream Reports’ Emotional Content with Large Language Models. arXiv preprint arXiv:2302.14828 (2023).
  4. Text is text, no matter what: Unifying text recognition using knowledge distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 983–992.
  5. Lifelong language knowledge distillation. arXiv preprint arXiv:2010.02123 (2020).
  6. Towards a formative feedback generation agent: Leveraging a human-in-the-loop, chain-of-thought prompting approach with LLMs to evaluate formative assessment responses in K-12 science. (2023).
  7. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
  8. Using gpt-4 to augment unbalanced data for automatic scoring. arXiv preprint arXiv:2310.18365 (2023).
  9. Artificial intelligence for student assessment: A systematic review. Applied Sciences 11, 12 (2021), 5467.
  10. Knowledge Distillation of Large Language Models. arXiv preprint arXiv:2306.08543 (2023).
  11. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).
  12. Improving efficient neural ranking models with cross-architecture knowledge distillation. arXiv preprint arXiv:2010.02666 (2020).
  13. Wayne Holmes and Ilkka Tuomi. 2022. State of the art and practice in AI in education. European Journal of Education 57, 4 (2022), 542–570.
  14. Yoon Kim and Alexander M Rush. 2016. Sequence-level knowledge distillation. arXiv preprint arXiv:1606.07947 (2016).
  15. Artificial general intelligence (AGI) for education. arXiv preprint arXiv:2304.12479 (2023).
  16. Ehsan Latif and Xiaoming Zhai. 2023a. Automatic Scoring of Students’ Science Writing Using Hybrid Neural Network. arXiv preprint arXiv:2312.03752 (2023).
  17. Ehsan Latif and Xiaoming Zhai. 2023b. Fine-tuning chatgpt for automatic scoring. arXiv preprint arXiv:2310.10072 (2023).
  18. AI Gender Bias, Disparities, and Fairness: Does Training Data Matter? arXiv preprint arXiv:2312.10833 (2023).
  19. Applying Large Language Models and Chain-of-Thought for Automatic Scoring. arXiv preprint arXiv:2312.03748 (2023).
  20. Multimodality of AI for Education: Towards Artificial General Intelligence. arXiv preprint arXiv:2312.06037 (2023).
  21. Distilling ChatGPT for Explainable Automated Student Answer Assessment. arXiv preprint arXiv:2305.12962 (2023).
  22. Prompt distillation for efficient llm-based recommendation. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. 1348–1357.
  23. Yongqi Li and Wenjie Li. 2021. Data distillation for text classification. arXiv preprint arXiv:2104.08448 (2021).
  24. Let GPT be a Math Tutor: Teaching Math Word Problem Solvers with Customized Exercise Generation. arXiv preprint arXiv:2305.14386 (2023).
  25. Calibrating LLM-Based Evaluator. arXiv preprint arXiv:2309.13308 (2023).
  26. Context matters: A strategy to pre-train language model for science education. arXiv preprint arXiv:2301.12031 (2023).
  27. Gender bias in neural natural language processing. Logic, Language, and Security: Essays Dedicated to Andre Scedrov on the Occasion of His 65th Birthday (2020), 189–202.
  28. Simple LLM Prompting is State-of-the-Art for Robust and Multilingual Dialogue Evaluation. arXiv preprint arXiv:2308.16797 (2023).
  29. Promptmix: A class boundary augmentation method for large language model distillation. arXiv preprint arXiv:2310.14192 (2023).
  30. Towards LLM-based Autograding for Short Textual Answers. arXiv preprint arXiv:2309.11508 (2023).
  31. Neil Selwyn. 2019. Should robots replace teachers?: AI and the future of education. John Wiley & Sons.
  32. Chain-of-thought prompting for responding to in-depth dialogue questions with LLM. arXiv preprint arXiv:2305.11792 (2023).
  33. PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization. arXiv preprint arXiv:2306.05087 (2023).
  34. Evaluating Reading Comprehension Exercises Generated by LLMs: A Showcase of ChatGPT in Education Applications. In Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023). 610–625.
  35. Ruochen Xu and Yiming Yang. 2017. Cross-lingual distillation for text classification. arXiv preprint arXiv:1705.02073 (2017).
  36. Zhengwu Yuan and Xiankang Peng. 2021. Research of weibo text classification based on knowledge distillation and joint model. In 2021 IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Vol. 5. IEEE, 202–207.
  37. Xiaoming Zhai. 2022. ChatGPT user experience: Implications for education. Available at SSRN 4312418 (2022).
  38. Xiaoming Zhai. 2023. Using Gpt-4 to Augment Unbalanced Data for Automatic Scoring. Available at SSRN (2023).
  39. A Review of Artificial Intelligence (AI) in Education from 2010 to 2020. Complexity 2021 (2021), 1–18.
  40. Applying machine learning to automatically assess scientific models. Journal of Research in Science Teaching 59, 10 (2022), 1765–1794.
  41. Xiaoming Zhai and Ross H Nehm. 2023. AI and formative assessment: The train has left the station. Journal of Research in Science Teaching (2023).
  42. A meta-analysis of machine learning-based science assessments: Factors impacting machine-human score agreements. Journal of Science Education and Technology 30 (2021), 361–379.
  43. Applying machine learning in science assessment: a systematic review. Studies in Science Education 56, 1 (2020), 111–151.
  44. Cross-domain knowledge distillation for text classification. Neurocomputing 509 (2022), 11–20.
  45. Llm as dba. arXiv preprint arXiv:2308.05481 (2023).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Ehsan Latif (36 papers)
  2. Luyang Fang (6 papers)
  3. Ping Ma (31 papers)
  4. Xiaoming Zhai (48 papers)
Citations (2)