Empowering Vocabulary Learning Through Teaching AI: Using LLMs as a Student to Perform Learning by Teaching in Vocabulary Acquisition

Published 20 Apr 2026 in cs.HC | (2604.17893v1)

Abstract: "Learning by Teaching (LbT)" helps learners deepen their understanding by explaining concepts to others, with questions playing a vital role in identifying knowledge gaps and reinforcing comprehension. However, existing systems for generating such questions often rely on rigid templates and are expensive to build. To overcome these limitations, we developed a system using LLMs to create dynamic, contextually relevant questions for LbT. In our English vocabulary learning study, we examined which learner characteristics best leverage the system's benefits. Our results showed improved memory retention over traditional methods at three and seven days of testing, with ten participants. Additionally, we identified traits linked to better learning outcomes, highlighting the potential for tailored approaches. These findings support the development of scalable, cost-effective solutions to enhance LbT methods across various fields.

Abstract PDF Upgrade to Chat

Authors (8)

Summary

The paper demonstrates that leveraging LLMs as simulated students in learning-by-teaching significantly improves both immediate and delayed vocabulary retention.
The paper outlines an innovative system architecture, using GPT-4o to generate context-aware questions that stimulate active learner engagement and detailed responses.
The paper highlights practical benefits, including personalized pacing and scalability, while addressing challenges of cognitive overload and repetitive questioning.

LLMs as Conversational Students in Learning by Teaching for Vocabulary Acquisition

Learning by Teaching in Digital Education

Learning by Teaching (LbT) has been theoretically and empirically validated as an effective mechanism to enhance comprehension and retention in educational settings, particularly by requiring learners to articulate knowledge and address questions posed by others. Traditional implementations introduce logistical and psychological barriers, such as the stress of peer interaction and the challenge of sourcing suitable learners for explanations. This study addresses these bottlenecks by simulating the student role with LLMs for interactive, scalable LbT in English vocabulary acquisition (2604.17893).

System Architecture: LLMs as Simulated Students

The system leverages GPT-4o as a dynamic question generator and conversational partner, emulating a beginner-level English student. At each learning iteration, participants correct a deliberately erroneous sentence containing the target vocabulary or idiom and provide explanations. The LLM, prompted with student-like characteristics and prior learner responses, generates contextually relevant questions. Learners answer these questions, thus iteratively reinforcing conceptual understanding and addressing knowledge gaps.

Figure 1: The operational flow of the proposed LbT system, integrating LLMs as conversational students for adaptive question generation.

In contrast, the baseline system omits LLM-generated interactions, focusing solely on correcting sentences without conversational reinforcement.

Figure 2: The baseline learning system, devoid of LLM-mediated question generation and interaction.

Experimental Protocol and Interface

Ten university students participated in a crossover design consisting of pretest, LbT interactions, and multiple posttests administered at intervals (immediate, three days, seven days). The learning interface presented multiple-choice questions based on vocabulary from standardized proficiency tests (Eiken), with results and justifications provided per session.

Figure 3: User interface for the pretest phase, featuring multiple-choice selection and immediate feedback.

Memory Retention and Learning Outcomes

Analysis of posttest scores demonstrates clear superiority of the proposed system over the baseline in both immediate and delayed recall. The system achieves improved retention at three and seven days post-learning, with marked gains in correct answer percentages.

Figure 4: Differential performance in percentage of correct answers between the LLM-mediated system and the baseline across successive test intervals.

The gain is particularly pronounced in users who actively engage in the conversational loop, submitting detailed input and maintaining a balanced cognitive load. The correlation between number of words entered per interaction and learning outcome highlights the role of cognitive engagement.

Figure 5: Relationship between average word count per interaction and overall system engagement.

Interaction Analysis and System Limitations

While most participants benefited from the LLM-mediated approach, heterogeneity in interaction behavior led to varying outcomes. Some users experienced cognitive overload, especially when exposed to excessive question generation or attempted to learn large volumes simultaneously, resulting in diminished performance (notably participant p5). Additionally, deterministic settings of GPT-4o (temperature set to zero) led to repetitive questioning, reducing perceived conversational diversity—a problem, as evidenced by duplicated questions documented in the appendix.

Practical and Theoretical Implications

This study establishes LLMs as viable, adaptive, and cost-effective simulated students for LbT paradigms. The system is generalizable beyond vocabulary acquisition, supporting scalable deployment across domains requiring explanatory learning. Practically, LLM-mediated LbT circumvents psychological and logistical challenges inherent to human peer teaching, enabling personalized pacing and content adaptation.

Theoretically, engagement with simulated students via LLMs aligns with constructivist pedagogical principles, as learners must actively resolve knowledge discrepancies. The observed retention improvements substantiate the hypothesis that dynamic interaction, rather than rote correction, promotes deeper memory encoding. Importantly, tailoring conversational complexity and feedback to individual learner characteristics is critical, and future research should focus on adaptive calibration via cognitive modeling.

Future Directions

Extending the system to larger cohorts and diverse domains will enhance statistical reliability and explore cross-linguistic generalization. Automated adjustments of conversational difficulty and question novelty based on real-time interaction metrics may mitigate cognitive overload. Integrating affective feedback analysis and dynamic prompt adaptation can further refine educational efficacy. Additionally, investigating long-term retention and transfer effects in real-world settings is warranted.

Conclusion

LLMs, when deployed as simulated students in interactive Learning by Teaching frameworks, significantly enhance vocabulary acquisition and memory retention. The findings highlight both the potential for personalized, scalable educational interventions and the necessity of adaptive system design to accommodate heterogeneous learner behaviors. The approach provides a robust foundation for future digital education systems leveraging LLMs for dynamic, explainable teaching interactions.

(2604.17893)

Markdown Report Issue