Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SIGHT: A Large Annotated Dataset on Student Insights Gathered from Higher Education Transcripts (2306.09343v1)

Published 15 Jun 2023 in cs.CL and cs.AI

Abstract: Lectures are a learning experience for both students and teachers. Students learn from teachers about the subject material, while teachers learn from students about how to refine their instruction. However, online student feedback is unstructured and abundant, making it challenging for teachers to learn and improve. We take a step towards tackling this challenge. First, we contribute a dataset for studying this problem: SIGHT is a large dataset of 288 math lecture transcripts and 15,784 comments collected from the Massachusetts Institute of Technology OpenCourseWare (MIT OCW) YouTube channel. Second, we develop a rubric for categorizing feedback types using qualitative analysis. Qualitative analysis methods are powerful in uncovering domain-specific insights, however they are costly to apply to large data sources. To overcome this challenge, we propose a set of best practices for using LLMs to cheaply classify the comments at scale. We observe a striking correlation between the model's and humans' annotation: Categories with consistent human annotations (>$0.9$ inter-rater reliability, IRR) also display higher human-model agreement (>$0.7$), while categories with less consistent human annotations ($0.7$-$0.8$ IRR) correspondingly demonstrate lower human-model agreement ($0.3$-$0.5$). These techniques uncover useful student feedback from thousands of comments, costing around $\$0.002$ per comment. We conclude by discussing exciting future directions on using online student feedback and improving automated annotation techniques for qualitative research.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Fadia Nasser-Abu Alhija and Barbara Fresko. 2009. Student evaluation of instruction: what can be learned from students’ written comments? Studies in Educational evaluation, 35(1):37–44.
  2. Learning from youtube: an analysis of information literacy in user discourse. In Proceedings of the 2011 iConference, pages 640–642.
  3. Analysing ratemyprofessors evaluations across institutions, disciplines, and cultures: The tell-tale signs of a good professor. In Social Informatics: 8th International Conference, SocInfo 2016, Bellevue, WA, USA, November 11-14, 2016, Proceedings, Part I 8, pages 438–453. Springer.
  4. Student evaluation of teaching in business education: Discovering student sentiments using text mining techniques. e-Journal of Business Education and Scholarship of Teaching, 13(3):1–13.
  5. Martin W Bauer and George Gaskell. 2000. Qualitative researching with text, image and sound: A practical handbook for social research. Sage.
  6. Studying learning in the worldwide classroom research into edx’s first mooc. Research & Practice in Assessment, 8:13–25.
  7. Jennifer Brook. 2011. The affordances of youtube for language learning and teaching. Hawaii Pacific University TESOL Working Paper Series, 9(1):2.
  8. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  9. Yining Chen and Leon B Hoshower. 2003. Student evaluation of teaching effectiveness: An assessment of student perception and motivation. Assessment & evaluation in higher education, 28(1):71–88.
  10. Generating long sequences with sparse transformers.
  11. Scaling instruction-finetuned language models.
  12. Peter A Cohen. 1981. Student ratings of instruction and student achievement: A meta-analysis of multisection validity studies. Review of educational research, 51(3):281–309.
  13. Juliet Corbin et al. 1990. Basics of qualitative research grounded theory procedures and techniques.
  14. Juliet M Corbin and Anselm Strauss. 1990. Grounded theory research: Procedures, canons, and evaluative criteria. Qualitative sociology, 13(1):3–21.
  15. Youtube as an educational resource in medical education: a scoping review. Medical Science Educator, 30:1775–1782.
  16. Ilana Dubovi and Iris Tabak. 2020. An empirical analysis of knowledge co-construction in youtube comments. Computers & Education, 156:103939.
  17. Frederick Erickson et al. 1985. Qualitative methods in research on teaching. Institute for Research on Teaching.
  18. Warren E Evans and Ronald E Guymon. 1978. Clarity of explanation: A powerful indicator of teacher effectiveness.
  19. Learning Mathematics for Teaching Project. 2011. Measuring the mathematical quality of instruction. Journal of Mathematics Teacher Education, 14(1):25–47.
  20. Response construct tagging: NLP-aided assessment for engineering education. In Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022), pages 250–261, Seattle, Washington. Association for Computational Linguistics.
  21. Chatgpt outperforms crowd-workers for text-annotation tasks.
  22. Text analytics approach to extract course improvement suggestions from students’ feedback. Research and Practice in Technology Enhanced Learning, 13:1–19.
  23. Pamela Gravestock and Emily Gregor-Greenleaf. 2008. Student course evaluations: Research, models and trends. Higher Education Quality Council of Ontario Toronto.
  24. Anthony G Greenwald and Gerald M Gillmore. 1997. Grading leniency is a removable contaminant of student ratings. American psychologist, 52(11):1209.
  25. Student evaluations of teaching: improving teaching quality in higher education. Perspectives: Policy and Practice in Higher Education, 21(1):26–33.
  26. Investigating instructor talk in novel contexts: Widespread use, unexpected categories, and an emergent sampling strategy. CBE—Life Sciences Education, 18(3):ar47.
  27. Nira Hativa. 1998. Lack of clarity in university teaching: A case study. Higher Education, pages 353–381.
  28. Annollm: Making large language models to be better crowdsourced annotators.
  29. Setsum: Summarization and visualization of student evaluations of teaching. arXiv preprint arXiv:2207.03640.
  30. Improving the quality of teaching by utilising written student feedback: A streamlined process. Computers & education, 157:103965.
  31. Distinguishing the themes emerging from masses of open student feedback. In 2019 42nd International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), pages 557–561. IEEE.
  32. Yunsung Kim and Chris Piech. 2023. High-resolution course feedback: Timely feedback for course instructors.
  33. Chatgpt: Beginning of an end of manual linguistic data annotation? use case of automatic genre identification. arXiv e-prints, pages arXiv–2303.
  34. Making sense of comments on youtube educational videos: A self-directed learning perspective. Online Information Review, 41(5):611–625.
  35. Multimodal lecture presentations dataset: Understanding multimodality in educational slides. arXiv preprint arXiv:2208.08080.
  36. Mary W Lindahl and Michael L Unger. 2010. Cruelty in student teaching evaluations. College Teaching, 58(3):71–76.
  37. Herbert W Marsh and Lawrence A Roche. 1997. Making students’ evaluations of teaching effectiveness effective: The critical issues of validity, bias, and utility. American psychologist, 52(11):1187.
  38. MIT OCW. 2020. 2020 ocw impact report. https://ocw.mit.edu/ocw-www/2020-19_ocw_impact_report.pdf.
  39. MIT OCW. 2023. Massachusetts institute of technology: Mit opencouseware. https://ocw.mit.edu/.
  40. OpenAI. 2023. Introducing chatgpt and whisper apis. https://openai.com/blog/introducing-chatgpt-and-whisper-apis.
  41. Training language models to follow instructions with human feedback.
  42. Cliodhna O’Connor and Helene Joffe. 2020. Intercoder reliability in qualitative research: debates and practical guidelines. International journal of qualitative methods, 19:1609406919899220.
  43. Classroom Assessment Scoring System™: Manual K-3. Paul H Brookes Publishing.
  44. Robust speech recognition via large-scale weak supervision.
  45. Training language models with language feedback.
  46. Beyond the biology: A systematic investigation of noncontent instructor talk in an introductory biology course. CBE—Life Sciences Education, 14(4):ar43.
  47. Charles Welch and Rada Mihalcea. 2016. Targeted sentiment to understand student comments. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 2471–2481, Osaka, Japan. The COLING 2016 Organizing Committee.
  48. Yuankun Yao and Marilyn L Grady. 2005. How do faculty make formative use of student evaluation feedback?: A multiple case study. Journal of Personnel Evaluation in Education, 18:107–126.
  49. Francisco Zabaleta. 2007. The use and misuse of student evaluations of teaching. Teaching in higher education, 12(1):55–76.
  50. Can large language models transform computational social science? arXiv submission 4840038.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Rose E. Wang (19 papers)
  2. Pawan Wirawarn (3 papers)
  3. Noah Goodman (57 papers)
  4. Dorottya Demszky (23 papers)
Citations (8)