Papers
Topics
Authors
Recent
2000 character limit reached

A Large-Scale Real-World Evaluation of LLM-Based Virtual Teaching Assistant (2506.17363v1)

Published 20 Jun 2025 in cs.CY and cs.AI

Abstract: Virtual Teaching Assistants (VTAs) powered by LLMs have the potential to enhance student learning by providing instant feedback and facilitating multi-turn interactions. However, empirical studies on their effectiveness and acceptance in real-world classrooms are limited, leaving their practical impact uncertain. In this study, we develop an LLM-based VTA and deploy it in an introductory AI programming course with 477 graduate students. To assess how student perceptions of the VTA's performance evolve over time, we conduct three rounds of comprehensive surveys at different stages of the course. Additionally, we analyze 3,869 student--VTA interaction pairs to identify common question types and engagement patterns. We then compare these interactions with traditional student--human instructor interactions to evaluate the VTA's role in the learning process. Through a large-scale empirical study and interaction analysis, we assess the feasibility of deploying VTAs in real-world classrooms and identify key challenges for broader adoption. Finally, we release the source code of our VTA system, fostering future advancements in AI-driven education: \texttt{https://github.com/sean0042/VTA}.

Summary

Evaluation and Deployment of an LLM-Based Virtual Teaching Assistant

The paper "A Large-Scale Real-World Evaluation of an LLM-Based Virtual Teaching Assistant" by Kweon et al. presents an empirical paper involving the development, deployment, and evaluation of a Virtual Teaching Assistant (VTA) powered by LLMs. The paper focuses on its application within a graduate-level AI programming course in South Korea. The overarching objective of the research is to assess the feasibility of integrating such VTAs into real-world educational settings and to understand the dynamics of student-VTA interactions compared to traditional student-instructor engagements.

Study Design and Implementation

The research was conducted over a 14-week course, enrolling 477 students, and involved deploying an LLM-based VTA specifically developed for this paper. This system aimed to provide real-time, contextually relevant responses to student inquiries, thus aiding in reducing the inherent pressure on human instructors and enhancing the learning environment. The authors implemented the VTA using a combination of open-source Python libraries, including LangChain, Streamlit, and LangSmith, tailored to handle Retrieval-Augmented Generation (RAG) tasks. This facilitated the generation of responses by leveraging a vector database constructed from processed educational materials. The database was consistently updated to stay relevant to the course timeline, comprising over 1,500 document chunks by the semester's end.

Empirical Evaluation

The assessment comprised three survey rounds targeting metrics such as helpfulness, trustworthiness, appropriateness, and comfort. These were benchmarked against human instructor interactions to provide a comparative perspective. Strong numerical emphasis was placed on the widespread interaction data collected, consisting of 3,869 student-VTA question-response pairs. These were analyzed to uncover patterns and highlight potential barriers to more widespread VTA adoption.

Key Findings and Analysis

  1. Student Engagement: The paper found significant variation in student-VTA interaction levels. Interestingly, students with limited prior coding and machine learning experience exhibited higher engagement levels. This suggests VTAs' potential as effective tools for personalized learning support, particularly for students from non-technical disciplines.
  2. Comparison with Human Instructors: The VTA was used approximately 25 times more frequently than interactions with human instructors. Theory-related questions were notably more prevalent with the VTA, indicating students might feel more comfortable discussing conceptual material with non-judgmental, tireless AI systems.
  3. Perception Analyses: Trust in, and satisfaction with, the VTA increased post-deployment, although trustworthiness lagged behind that of human instructors. Frequent users of the VTA over time particularly noted improvements in perceived helpfulness and comfort. VTAs were seen as promoting a more inclusive environment, especially benefiting students who hesitated to query human instructors.
  4. Challenges and Limitations: The paper identified that common constraints such as slow responses—a result of interface design rather than computational delay—and hallucinations remain challenges that need to be addressed. Furthermore, the VTA's effectiveness outside programming-focused domains is yet to be validated.

Implications and Future Directions

The research offers practical insight into the growing feasibility of AI-based educational tools in large classroom settings. While the VTA's potential as an adjunct to human instruction is evident, challenges in perception reliability and interaction naturalness persist. The release of the paper’s VTA source code aims to spur further research and refinement in this domain.

Future directions may involve enhancing the VTA’s capabilities with streaming features for more dynamic interaction experiences and employing hybrid retrieval methodologies to refine the accuracy of educational content delivery. The applicability of similar systems in subjects heavily requiring qualitative assessment (e.g., humanities) remains an interesting avenue for future investigation. The paper underscores the critical need to continue enhancing the interaction quality of VTAs with the simultaneous aim of maintaining, or ideally improving, educational efficacy.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.