Artificial General Teacher (AGT)

Updated 22 May 2026

Artificial General Teacher (AGT) is an innovative educational system that employs AGI to provide human-level, cross-domain instruction with multimodal, adaptive capabilities.
It integrates large language models, knowledge graphs, and vision modules to generate stepwise explanations and dynamic pedagogical dialogues tailored to individual learners.
Its architecture combines curriculum decomposition, Bayesian learner modeling, and real-time feedback loops to optimize conceptual understanding and personalized learning outcomes.

An Artificial General Teacher (AGT) is an advanced educational system constructed on AGI principles, exhibiting human-level versatility, cross-domain pedagogical competence, autonomous instructional planning, and adaptability to individual learners’ needs and affective states. Unlike conventional, task-specific intelligent tutoring systems, the AGT paradigm aspires to not only master and solve educational problems across diverse subjects but also to generate stepwise explanations, engage in dynamic pedagogical dialogues, visually ground its instruction, and orchestrate individualized, data-driven teaching trajectories (Latif et al., 2023, Nguyen-Truong et al., 3 Apr 2026).

1. Core Definition and Theoretical Foundation

At its foundation, an Artificial General Teacher is characterized by four canonical capabilities: (1) semantic understanding of educational content, (2) autonomous problem solving, (3) generative, stepwise natural-language explanation, and (4) visually grounded or context-appropriate instructional scaffolding. The AGT embodies AGI attributes—domain generality, robust reasoning, adaptive planning, and multimodal perception/action—configured for educational objectives, such as maximizing student conceptual mastery, engagement, and metacognitive skill (Latif et al., 2023, Nguyen-Truong et al., 3 Apr 2026). In visual domains (e.g., geometry), the “act of pointing” to relevant structures is mathematically formalized as the Referring Image Segmentation (RIS) problem: given a schematic diagram and a language expression, the AGT produces a pixel-level binary mask localizing the referenced element (Nguyen-Truong et al., 3 Apr 2026).

2. System Architectures and Key Modules

AGT architectures typically integrate several tightly-coupled components:

LLM Backbone: Foundation models (e.g., GPT-4, Florence-2, Qwen-VL) provide the base for natural language understanding, dialogue management, and generative explanation (Latif et al., 2023, Nguyen-Truong et al., 3 Apr 2026).
Knowledge Representation and Reasoning: Curriculum ontologies, domain knowledge graphs, and probabilistic logical engines enable concept-level inference and multi-turn, curriculum-aligned planning (Latif et al., 2023).
Learner Modeling Subsystem: Bayesian Knowledge Tracing models or dynamic Bayesian networks estimate and update the student’s latent mastery distribution per skill, adapted continuously with new observations (Latif et al., 2023).
Planning and Decision-Making Engine: Hierarchical planners combine symbolic curriculum sequencing with model-based deep reinforcement learning, optimizing for cumulative pedagogical utility—typically as increments in mastery penalized by cognitive load (Latif et al., 2023, Liu et al., 24 Dec 2025).
Multimodal Perception and Action: Integration of vision models for RIS, voice/affect sensing, and a multimodal interaction interface for dialog and visual pointing (Nguyen-Truong et al., 3 Apr 2026).
Memory and Reflection Loops: Persistent memory modules, implemented as high-dimensional vector databases or explicit declarative/procedural memory, archive interaction tuples for retrieval-augmented planning and continual self-improvement (Liu et al., 24 Dec 2025, Jinxin et al., 2023).
Persona and Skill Management: Persona managers encode “Big Five” or teaching-style facets as explicit tree structures, ensuring agent identity stability and differentiated role-play in classroom simulation (Jinxin et al., 2023).

Specific system instantiations, such as the multi-agent “AgentTutor” (comprising curriculum decomposition, learner assessment, dynamic strategy, teaching reflection, and knowledge/experience memory modules), organize these subsystems to implement a Markov decision-process–style tutoring policy optimizing expected final learning outcomes (Liu et al., 24 Dec 2025).

3. Methods for Personalization, Adaptation, and Multi-Turn Feedback

AGT systems synthesize explicit learner models, adaptive pedagogical policies, and interactive mechanisms for personalized instruction:

Curriculum Decomposition: LLM-powered decomposition modules produce multi-level hierarchical objectives (e.g., using Bloom’s taxonomy) from high-level goals, enumerating sub-skills and cognitive targets (Liu et al., 24 Dec 2025).
Assessment and State Tracking: Fine-grained formative assessment assigns each subgoal a proficiency score $p_j$ and current Bloom level $l_j$ ; evolving proficiency graphs guide instructional pacing and focus (Liu et al., 24 Dec 2025).
Adaptive Planning: Agent planners (e.g., LATS—Language Agent Tree Search) explore instructional action trees, balancing exploitation of known effective actions and exploration of novel strategies, using UCT or value-based sampling (Liu et al., 24 Dec 2025). In classroom simulation, genetic adaptation evolves pedagogical policy vectors to maximize aggregated learner scores, supporting the emergence of differentiated teaching styles and equitable outcomes (Sanyal et al., 25 May 2025).
Personalized Retrieval-Augmented Generation: Modules like Persona-RAG match retrievals to individual student learning styles and reasoning preferences, outscoring vanilla RAG in conceptual and analysis-based task performance (Sanyal et al., 25 May 2025).
Teaching Reflection and Metacognitive Scaffolding: Direct Preference Optimization techniques, offline RL objectives, or memory-augmented dialogue loops provide strategic hinting, adaptive question difficulty, and reflective formative feedback (Liu et al., 24 Dec 2025, Jinxin et al., 2023).
Visual Grounding: In geometry, AGTs fine-tuned on synthetic, mask-annotated procedural data can resolve language descriptions to pixel-wise masks, supporting visually grounded, stepwise explanations—critical for mathematical domains where spatial reasoning is required (Nguyen-Truong et al., 3 Apr 2026).

4. Pedagogical Abilities, Evaluation Protocols, and Performance Metrics

Empirical rigor in AGT evaluation is established through multi-faceted frameworks:

Three Principal Pedagogical Abilities (Tack et al., 2022):
- “Speak like a teacher”: clarity, register, and educational tone.
- “Understand the student”: conversational uptake, expansion on prior utterances.
- “Help the student”: direct effect on learning gains, including scaffolding, diagnosis, and hinting.
Bayesian Bradley–Terry Modeling: Human raters compare responses from AGTs and human teachers; ability scores ( $\alpha_{ikl}$ ) are estimated for each pedagogical dimension, and ability gaps ( $\Delta\alpha$ ) quantify the shortfall or superiority of models (Tack et al., 2022).
Task-Specific Metrics and Benchmarks:
- Visual Grounding: Intersection-over-Union (IoU), Buffered IoU for thin structure alignment (e.g., Florence-2: 49% IoU, 85% BIoU on geometry RIS) (Nguyen-Truong et al., 3 Apr 2026).
- Coding/Problem Solving: Pass@1 for first-attempt code correctness (e.g., AgentTutor: 92.7% HumanEval, 89.4% MBPP) (Liu et al., 24 Dec 2025).
- Dialogue and Engagement: Time-on-task, help-request frequency, expert ratings on relevance, feedback quality, adaptability (Liu et al., 24 Dec 2025, Tack et al., 2022).
- Equity and Fairness: Disparate impact ratio, variance of scores across simulated or real learners (Sanyal et al., 25 May 2025).

These metrics inform architectural ablations and directly guide the refinement of both policy modules and model training protocols.

5. Cognitive Architectures and Memory Systems

Recent AGT frameworks implement cognitive architectures inspired by established models (e.g., ACT*), comprising:

Memory Systems:
- Working Memory ( $M_w$ ): buffer of immediate events and observations.
- Declarative Memory ( $M_d$ ): accumulation of facts, formed via chain-of-thought summarization.
- Procedural Memory ( $M_p$ ): “how-to” fragments, formed through chain-of-action summarization.
Reflection and Planning:
- Reflection modules ( $F_{\text{ref}}$ ) prompt the LLM over $M_d$ and a skill library $L$ .
- Planning modules ( $l_j$ 0) operate over $l_j$ 1 and $l_j$ 2, composing future actions.
- Actions are determined by combining current reflection, planning, and working memory through a prompting procedure ( $l_j$ 3) (Jinxin et al., 2023).
Persona and Consistency Control: Persona trees instantiate multidimensional teacher identities; consistency checkers identify and rectify persona drift by querying and restoring personality or style vectors (Jinxin et al., 2023).
Skill Library Operations: Skill selection is modeled as a softmax over semantic similarity in embedding space, supporting explainable, grounded pedagogical moves in both reflection and execution (Jinxin et al., 2023).

6. Limitations and Open Research Challenges

Current AGT implementations exhibit notable limitations:

Generality and Domain Shift: Fine-tuned models, even with innovative data engines and buffered evaluation, remain limited to synthetic or schematic input forms (e.g., 2D geometry diagrams) and may fail on real-world or handwritten data (Nguyen-Truong et al., 3 Apr 2026).
Policy and State Expressivity: Pedagogical policy spaces are discretized and coarse; true AGT-level expressivity necessitates hierarchical, continuous policy frameworks capable of interactive, multimodal, real-time adaptation (Sanyal et al., 25 May 2025).
Lifelong Learning and Memory Updating: Current memory modules lack explicit decay or reinforcement mechanisms; lifelong skill acquisition, consolidation, and safe continual learning protocols remain open tasks (Jinxin et al., 2023).
Modeling Evolving Student State: Most frameworks assume fixed or session-level student traits; robust online inference over dynamic, multi-dimensional learner states, including affect, misconceptions, and goal drift, requires further research (Sanyal et al., 25 May 2025, Liu et al., 24 Dec 2025).
Ethical, Privacy, and Fairness Imperatives: Responsible AGT deployment demands rigorous bias auditing, privacy-respecting inference (e.g., on-device affect sensing), open learner models, and independent ethical auditing (Latif et al., 2023).

Addressing these open challenges entails interdisciplinary advances in language, vision, reinforcement learning, curriculum design, cognitive diagnostics, ethics, and policy.

7. Prospects for AGT Realization

The trajectory toward full Artificial General Teacher capability involves the progressive integration of:

Hierarchical, cross-domain curriculum and assessment cycles informed by real-world educational theory (Liu et al., 24 Dec 2025).
Rich, multi-agent orchestration balancing individual and group learning dynamics (Jinxin et al., 2023).
Dynamic, high-dimensional personalization encompassing cognitive, affective, and social factors (Sanyal et al., 25 May 2025, Latif et al., 2023).
Visual, symbolic, and language-based instruction and explanation tools unified in a single architecture (Nguyen-Truong et al., 3 Apr 2026).
Robust, transparent, and secure data, memory, and skill traceability for lifelong operation and auditability (Liu et al., 24 Dec 2025).
Meta-learning and self-improving pedagogical planners capable of rapid domain transfer and continual adaptation (Sanyal et al., 25 May 2025).

The current best-performing AGT prototypes—AgentTutor, CGMI, Florence-2-based geometry explainers, and Persona-RAG teachers—each instantiate key AGT functionalities but fall short of comprehensive domain generality, continual learning, or robust real-world deployment. Further progress will require new algorithms for student modeling, reward formulation, lifelong skill management, and the formalization of safety and fairness principles at every layer of the AGT stack (Liu et al., 24 Dec 2025, Jinxin et al., 2023, Nguyen-Truong et al., 3 Apr 2026, Latif et al., 2023, Sanyal et al., 25 May 2025, Tack et al., 2022).