TeachQuiz Protocol: Evaluating Knowledge Transfer
- TeachQuiz Protocol is defined as a framework that isolates and quantifies effective knowledge transfer by comparing pre- and post-intervention learner performance.
- The methodology employs a two-phase approach of selective unlearning followed by controlled educational content delivery to ensure the instructional value measures knowledge recovery directly.
- Empirical results in Code2Video demonstrate up to 40% improvement in knowledge recovery, validating the protocol as a robust benchmark for educational content efficacy.
TeachQuiz Protocol is an educational framework and evaluation metric designed to directly quantify knowledge transfer within learning systems, with particular relevance to environments incorporating generated educational content such as videos, quizzes, or interactive lecture modules. Its formalization in recent research centers on an end-to-end methodology to assess whether educational interventions (e.g., videos generated from code, adaptive quizzes) effectively reclaim or transmit targeted concepts, especially to agents or learners whose prior knowledge has been selectively suppressed. This approach isolates pedagogical efficacy from collateral factors, providing a stringent benchmark for content delivery and learning mechanisms.
1. Conceptual Foundation and Motivation
TeachQuiz Protocol is constructed to address specific deficiencies in traditional educational content evaluation that focus narrowly on surface-level properties (such as visual quality, layout, or code efficiency) (Chen et al., 1 Oct 2025). The key innovation is measuring the actual reacquisition of domain knowledge: it does not merely register whether content is visually appealing or structurally sound, but whether it substantively enables knowledge recovery when a learner (or vision-LLM, VLM) is intentionally reset to an unlearned state regarding specific concepts.
In the framework as used in Code2Video (Chen et al., 1 Oct 2025), the central question is: given an educational artifact (video, quiz, etc.), and a "learner" forcibly deconditioned from the target concept via an unlearning algorithm, can the artifact reliably propagate necessary knowledge such that the learner can answer concept-specific questions accurately post-exposure?
2. Protocol Architecture and Methodological Workflow
The TeachQuiz Protocol consists of two key stages:
- Unlearning Phase: The learner (typically, VLM) is subjected to prompt-based selective unlearning, denoted as . This blocks canonical knowledge of the target concept , using shadow sets (definitions, formulas, aliases, exemplar solutions, etc.). As a result, any quiz posed should elicit an "INSUFFICIENT EVIDENCE" response if dependent solely on blocked information.
- Knowledge Recovery Phase: The learner is immediately presented with the educational content (e.g., video ), mediated by a controlled prompt . A quiz set is administered to determine whether the learner has reconstructed purely from video-grounded evidence.
The key metric—TeachQuiz score—is then defined:
where is the fraction of correct answers in the unlearned state and is the fraction after video exposure.
This protocol is strictly segregated from other metrics, such as aesthetic scores or code runtime efficiency, ensuring that gains are attributable solely to the instructional value of the artifact.
3. Selective Unlearning Mechanisms
Selective unlearning is operationalized via prompt engineering that disables access to predefined knowledge bases. In the Code2Video framework, this involves shadow sets covering all relevant representations for concept (mathematical formulas, textual definitions, aliases, visual exemplars). Unlearning creates an artificial knowledge void, thus presenting a controlled environment where content effectiveness is directly measurable. When the VLM is prompted absent any educational artifact, it must correctly abstain from answers requiring blocked knowledge.
This suggests that TeachQuiz can be adapted to other modalities where knowledge unlearning can be strictly enforced.
4. Application in Educational Content Generation and Evaluation
Originally applied in Code2Video (Chen et al., 1 Oct 2025), TeachQuiz Protocol serves both as an internal development benchmark and as a comparative evaluation tool for generated educational content. Experiments tested Planner–Coder–Critic agentic workflows against direct code generation baselines. Notably:
- The Code2Video full pipeline yielded up to 40% improvement in TeachQuiz scores over direct code generation.
- In human studies, generated videos sometimes even outperformed professionally produced tutorials based on measured knowledge recovery.
TeachQuiz thus enforces a high bar for educational systems: only those that induce genuine knowledge reacquisition pass stringent pedagogical tests.
5. Mathematical Notation and Operational Criteria
The formal underpinnings of TeachQuiz rely on canonical set-theoretical and response functions. The core evaluation can be written as:
- : target concept
- : educational video produced
- : quiz set specific to
- : VLM answering function, where are ground-truth labels
Compute:
and
This directly isolates the gain due to content-mediated knowledge transfer.
6. Empirical Findings and Comparative Performance
Experimental data from Code2Video (Chen et al., 1 Oct 2025) demonstrate:
- Clear stratification of knowledge recovery by protocol: agentic pipelines surpass non-agentic or direct generation methods on TeachQuiz scores.
- The metric is robust against confounding by VLM prior knowledge: unlearning ensures that only the intended concept transmission is measured.
- Results suggest that code-based, visually structured content is particularly effective for concept reacquisition, contingent on Planner–Coder–Critic agent design.
This suggests that agentic frameworks, when evaluated by the TeachQuiz Protocol, are well-suited for scalable and interpretable educational video production.
7. Significance and Future Directions
TeachQuiz Protocol advances the state-of-the-art in the evaluation of educational content by directly tying content effectiveness to measurable knowledge recovery. Its methodological attribution ensures that future research can compare and iterate educational systems with precision. Further work, as outlined in Code2Video (Chen et al., 1 Oct 2025), may involve more granular unlearning strategies and video-grounded questioning, extension to proprietary VLMs, or translation into human-in-the-loop evaluations.
The protocol’s strict criteria encourage the advancement of educational designs that prioritize knowledge clarity, rigor, and instructional coherence, serving as a foundation for next-generation pedagogical research and system deployment.