Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The Promises and Pitfalls of Using Language Models to Measure Instruction Quality in Education (2404.02444v1)

Published 3 Apr 2024 in cs.CL and cs.AI

Abstract: Assessing instruction quality is a fundamental component of any improvement efforts in the education system. However, traditional manual assessments are expensive, subjective, and heavily dependent on observers' expertise and idiosyncratic factors, preventing teachers from getting timely and frequent feedback. Different from prior research that mostly focuses on low-inference instructional practices on a singular basis, this paper presents the first study that leverages NLP techniques to assess multiple high-inference instructional practices in two distinct educational settings: in-person K-12 classrooms and simulated performance tasks for pre-service teachers. This is also the first study that applies NLP to measure a teaching practice that is widely acknowledged to be particularly effective for students with special needs. We confront two challenges inherent in NLP-based instructional analysis, including noisy and long input data and highly skewed distributions of human ratings. Our results suggest that pretrained LLMs (PLMs) demonstrate performances comparable to the agreement level of human raters for variables that are more discrete and require lower inference, but their efficacy diminishes with more complex teaching practices. Interestingly, using only teachers' utterances as input yields strong results for student-centered variables, alleviating common concerns over the difficulty of collecting and transcribing high-quality student speech data in in-person teaching settings. Our findings highlight both the potential and the limitations of current NLP techniques in the education domain, opening avenues for further exploration.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (59)
  1. A guide to classroom observation. Routledge.
  2. Computationally identifying funneling and focusing questions in classroom discourse. In Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022), pages 224–233.
  3. Anita L Archer and Charles A Hughes. 2010. Explicit instruction: Effective and efficient teaching. Guilford Publications.
  4. Semi-automatic detection of teacher questions from human-transcripts of audio in live classrooms. International Educational Data Mining Society.
  5. David K Cohen and Jal D Mehta. 2017. Why reform sometimes succeeds: Understanding the conditions that produce reforms that last. American educational research journal, 54(4):644–690.
  6. Connor Cook. 2018. An open vocabulary approach for detecting authentic questions in classroom discourse. In . Proceedings of the 11th International Conference on Educational Data Mining (EDM 2018).
  7. Larry Cuban. 1993. How teachers taught: Constancy and change in American classrooms, 1890-1990. Teachers College Press.
  8. Dorottya Demszky and Heather Hill. 2023. The NCTE transcripts: A dataset of elementary math classroom transcripts. In Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023), pages 528–538, Toronto, Canada. Association for Computational Linguistics.
  9. Dorottya Demszky and Jing Liu. 2023. M-powering teachers: Natural language processing powered feedback improves 1: 1 instruction and student outcomes. In Proceedings of the Tenth ACM Conference on Learning@ Scale, pages 59–69.
  10. Can automated feedback improve teachers’ uptake of student ideas? evidence from a randomized controlled trial in a large-scale online course. Educational Evaluation and Policy Analysis, page 01623737231169270.
  11. Measuring conversational uptake: A case study on student-teacher interactions. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1638–1653.
  12. Laura M Desimone and Katie Pak. 2017. Instructional coaching as high-quality professional development. Theory into practice, 56(1):3–12.
  13. Qlora: Efficient finetuning of quantized llms. Advances in Neural Information Processing Systems, 36.
  14. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  15. Words matter: automatic detection of teacher questions in live classroom discourse using linguistics, acoustics, and context. In Proceedings of the Seventh International Learning Analytics & Knowledge Conference, pages 218–227.
  16. Mathematical capabilities of chatgpt. Advances in Neural Information Processing Systems, 36.
  17. Intensive intervention for students with mathematics disabilities: Seven principles of effective practice. Learning Disability Quarterly, 31(2):79–92.
  18. Mathematics instruction for students with learning disabilities: A meta-analysis of instructional components. Review of educational research, 79(3):1202–1242.
  19. Measure for measure: The relationship between measures of instructional practice in middle school english language arts and teachers’ value-added scores. American Journal of Education, 119(3):445–470.
  20. Mathematical knowledge for teaching and the mathematical quality of instruction: An exploratory study. Cognition and instruction, 26(4):430–511.
  21. Andrew D Ho and Thomas J Kane. 2013. The reliability of classroom observations by school personnel. research paper. met project. Bill & Melinda Gates Foundation.
  22. Neural multi-task learning for teacher question detection in online classrooms. In Artificial Intelligence in Education: 21st International Conference, AIED 2020, Ifrane, Morocco, July 6–10, 2020, Proceedings, Part I 21, pages 269–281. Springer.
  23. “beautiful work, you’re rock stars!”: Teacher analytics to uncover discourse that supports or undermines student motivation, identity, and belonging in classrooms. In LAK22: 12th International Learning Analytics and Knowledge Conference, pages 230–238.
  24. Thomas J Kane and Douglas O Staiger. 2012. Gathering feedback for teaching: Combining high-quality observations with student surveys and achievement gains. research paper. met project. Bill & Melinda Gates Foundation.
  25. Sean Kelly. 2007. Classroom discourse and the distribution of student engagement. Social Psychology of Education, 10(3):331–352.
  26. Using global observation protocols to inform research on teaching effectiveness and school improvement: Strengths and emerging limitations. Education Policy Analysis Archives, 28:62–62.
  27. Automatically measuring question authenticity in real-world classrooms. Educational Researcher, 47(7):451–464.
  28. The effect of teacher coaching on instruction and achievement: A meta-analysis of the causal evidence. Review of educational research, 88(4):547–588.
  29. Curriculum-driven edubot: A framework for developing language learning chatbots through synthesizing conversational data. arXiv preprint arXiv:2309.16804.
  30. Zachary C Lipton. 2018. The mythos of model interpretability: In machine learning, the concept of interpretability is both important and slippery. Queue, 16(3):31–57.
  31. Gpt understands, too. AI Open.
  32. Large language models and causal inference in collaboration: A comprehensive survey. arXiv preprint arXiv:2403.09606.
  33. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
  34. Classroom observation for evaluating and improving teaching: An international perspective. Studies in Educational Evaluation, 49:15–29.
  35. High-leverage practices in special education. Council for Exceptional Children Arlington, VA.
  36. Fightin’ words: Lexical feature selection and evaluation for identifying the content of political conflict. Political Analysis, 16(4):372–403.
  37. Marjorie Montague. 1992. The effects of cognitive and metacognitive strategy instruction on the mathematical problem solving of middle school students with learning disabilities. Journal of learning disabilities, 25(4):230–248.
  38. Empowering teacher learning with ai: Automated evaluation of teacher attention to student ideas during argumentation-focused discussion. In LAK23: 13th International Learning Analytics and Knowledge Conference, pages 122–132.
  39. Sebastian Raschka. 2023. Finetuning llms with lora and qlora: Insights from hundreds of experiments. https://lightning.ai/pages/community/lora-insights/. Accessed: 2024-03-05.
  40. Joseph M Reilly and Bertrand Schneider. 2019. Predicting the quality of collaborative problem solving through linguistic analysis of discourse. International Educational Data Mining Society.
  41. Barak Rosenshine and Norma Furst. 1971. Research on teacher performance criteria. Research in teacher education, pages 37–72.
  42. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.
  43. Tarta: teacher activity recognizer from transcriptions and audio. In Artificial Intelligence in Education: 22nd International Conference, AIED 2021, Utrecht, The Netherlands, June 14–18, 2021, Proceedings, Part I 22, pages 369–380. Springer.
  44. Using transformers to provide teachers with personalized feedback on their classroom discourse: The talkmoves application. arXiv preprint arXiv:2105.07949.
  45. Mei Tan and Dorottya Demszky. 2023. Sit down now: How teachers’ language reveals the dynamics of classroom management practices.
  46. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  47. An empirical study on robustness to spurious correlations using pre-trained language models. Transactions of the Association for Computational Linguistics, 8:621–633.
  48. Artificial intelligence in classroom discourse: A systematic review of the past decade. International Journal of Educational Research, 123:102275.
  49. Rose Wang and Dorottya Demszky. 2023. Is ChatGPT a good teacher coach? measuring zero-shot performance for scoring and providing actionable insights on classroom instruction. In Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023), pages 626–667, Toronto, Canada. Association for Computational Linguistics.
  50. Step-by-step remediation of students’ mathematical mistakes. arXiv preprint arXiv:2310.10648.
  51. Donna Wilson and Marcus Conyers. 2016. Teaching students to drive their brains: Metacognitive strategies, activities, and lesson ideas. Ascd.
  52. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, pages 38–45.
  53. Ted Wragg. 2011. An introduction to classroom observation (Classic edition). Routledge.
  54. Automatic dialogic instruction detection for k-12 online one-on-one classes. In Artificial Intelligence in Education: 21st International Conference, AIED 2020, Ifrane, Morocco, July 6–10, 2020, Proceedings, Part II 21, pages 340–345. Springer.
  55. Exploring the limits of chatgpt for query or aspect-based text summarization. arXiv preprint arXiv:2302.08081.
  56. Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems, 32.
  57. Generating and evaluating tests for k-12 students with language model simulations: A case study on sentence reading efficiency. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 2190–2205, Singapore. Association for Computational Linguistics.
  58. Benchmarking large language models for news summarization. Transactions of the Association for Computational Linguistics, 12:39–57.
  59. Explore spurious correlations at the concept level in language models for text classification. arXiv preprint arXiv:2311.08648.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Paiheng Xu (14 papers)
  2. Jing Liu (526 papers)
  3. Nathan Jones (16 papers)
  4. Julie Cohen (2 papers)
  5. Wei Ai (48 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.