CogEvo-Edu: Adaptive Hierarchical Tutoring
- CogEvo-Edu is a hierarchical, multi-agent tutoring system that integrates a Cognitive Perception Layer, Knowledge Evolution Layer, and Meta-Control Layer.
- It employs dual memory for personalized learning and dynamic chunk valuation to maintain high factual precision across complex STEM domains.
- Empirical evaluations in DSP education show substantial improvements in knowledge delivery, memory consistency, and adaptive teaching strategies.
CogEvo-Edu is a hierarchical, multi-agent educational system that couples retrieval, memory, and adaptive control to advance conversational LLM tutoring in complex STEM domains. Synthesizing a cognitive evolution perspective, CogEvo-Edu departs from standard static retrieval-augmented generation (RAG) pipelines by integrating three tightly coupled architectural layers—the Cognitive Perception Layer (CPL), Knowledge Evolution Layer (KEL), and Meta-Control Layer (MCL)—to deliver adaptive, long-horizon, and personalized tutoring experiences. The system’s empirical validation centers on digital signal processing (DSP) education, where it demonstrates substantial improvements in both knowledge delivery and student model adaptivity compared to prior approaches (Wu et al., 29 Nov 2025).
1. Hierarchical Architecture Overview
CogEvo-Edu’s architecture is defined by three distinct but interrelated layers:
- Cognitive Perception Layer (CPL): Maintains a dual-memory student model by splitting state into Short-Term Sensory Memory and Long-Term Cognitive Memory. The short-term memory captures the most recent question-answer (QA) turns:
The long-term profile stores structured, confidence-weighted features:
where is the feature key (e.g., “weak on Z-transforms”), the value, and the confidence.
- Knowledge Evolution Layer (KEL): Manages a dynamic knowledge base , with each chunk (text, code, derivation) annotated with a spatiotemporal value , driving chunk activation, compression, and lifecycle management.
- Meta-Control Layer (MCL): Orchestrates a set of specialist teaching agents (e.g., Explanation, Diagnosis, Question-Gen) using a parameterized policy . The MCL executes a dual-loop optimization: an inner loop for micro-step teaching (reinforcement learning), and an outer loop for meta-optimizing policy and hyperparameters.
At every interaction, CPL updates , KEL selects relevant , and MCL selects an agent and policy based on the system state.
2. Cognitive Perception Layer: Dual Memory and Consolidation
Dual-Memory Student Modeling
CPL’s short-term memory and long-term profile support temporal abstraction and profile stability. Newly extracted candidate features, , are merged into using an operator :
Confidence-Weighted Consolidation
Feature matching is performed using cosine similarity: if , the new feature reinforces; otherwise, it corrects the old feature. The confidence update follows:
with as the learning rate. This dynamic enables rapid, low-overhead self-correction and high-fidelity personalization under context constraints.
3. Knowledge Evolution Layer: Dynamic Chunk Valuation and Management
Spatiotemporal Value Function
Each chunk’s retrieval utility is quantified:
where is retrieval frequency, is time since last access, a decay constant, and semantic density:
with as the chunk’s embedding.
Lifecycle Management
The knowledge base is partitioned:
Chunks in remain fully indexed, those in are deleted, and items undergo LLM-based semantic compression:
This ensures a balance between storage and retrieval quality.
4. Meta-Control Layer: Hierarchical Decision and Adaptation
Markov Decision Formulation
MCL models tutoring as a hierarchical Markov Decision Process (MDP), with system state , action , and reward , where is the present concept or query.
Actions encode the choice of lead agent, teaching strategy, content difficulty, and retrieval/compression policy.
Dual-Loop Optimization
- Inner Loop: Maximizes discounted expected reward under fixed hyperparameters:
Policy-gradient methods (e.g., REINFORCE) are applied for .
- Outer Loop: Jointly adapts and CPL/KEL hyperparameters to maximize the long-term reward
Parameter updates:
Gradients with respect to can be estimated by finite-difference or evolutionary strategies.
Pseudocode Outline
1 2 3 4 5 6 7 8 9 |
Initialize θ, λ
repeat (outer-loop epochs):
Collect a batch of interaction trajectories {τ_i} under policy π_θ and hyperparams λ
For each trajectory τ_i:
Compute cumulative reward R(τ_i) via LLM-as-a-Judge on DSP-EduBench
Estimate ∇_θ J and ∇_λ J from {τ_i, R(τ_i)}
θ ← θ + α_inner ∇_θ J
λ ← λ + α_outer ∇_λ J
until convergence |
5. Joint Hyperparameter Adaptation and System Dynamics
The meta-optimization outer loop dynamically adapts both CPL and KEL rates:
- If profile update speed () is insufficient for effective personalization, the meta-learner increases .
- If KEL pruning (controlled by ) is too aggressive—compromising factual correctness—parameters and thresholds can be tuned for improved retention.
- All adaptations target improvements in the aggregate objective as measured by downstream task performance.
This suggests that automated joint adaptation across memory, value, and policy layers supports more robust and flexible system dynamics than static or single-agent designs.
6. DSP-EduBench: Domain Evaluation and Results
Benchmark Design
DSP-EduBench evaluates system behaviors in DSP education. The core features comprise:
- Content domains: theory (derivations), intuition, and code (MATLAB/Python)
- Resource base: heterogeneous texts, proofs, code
- Simulated student personas:
- UserA (novice): definition amnesia
- UserB (medium): logical mistake susceptibility
- UserC (advanced): implementation-driven focus
- Scripts: long-horizon concept discrimination, diagnosis, debugging
- Annotations: correct solutions, required knowledge, optimal strategies
Evaluation Procedure
A three-model LLM-as-a-Judge ensemble (GLM-4.5, DeepSeek-V3.1, Qwen3-max) scores each dialogue turn from 1–10 along six axes:
- Knowledge Precision: Factual Correctness, Contextual Relevance
- Cognitive Coherence: Memory Consistency, Personalization Alignment
- Pedagogical Strategy: Knowledge Guidance, Strategy Flexibility
Mean across metrics yields overall score.
Comparative Results
| Experiment | Factual | Contextual | Memory | Personalization | Guidance | Strategy | Average |
|---|---|---|---|---|---|---|---|
| LLM Only | 5.8 | 6.5 | 4.2 | 4.8 | 5.5 | 5.1 | 5.32 |
| Static RAG | 8.4 | 6.9 | 4.5 | 5.0 | 5.8 | 5.2 | 5.97 |
| Simple Memory | 6.1 | 6.7 | 7.6 | 6.8 | 6.0 | 5.5 | 6.45 |
| Single Agent | 7.9 | 7.5 | 6.2 | 5.8 | 6.2 | 4.9 | 6.42 |
| CogEvo-Edu (Ours) | 9.3 | 9.1 | 9.5 | 9.2 | 8.9 | 9.4 | 9.23 |
KEL’s value-based pruning and compression boost factual correctness and contextual relevance, while CPL’s structured consolidation substantially raises memory consistency and personalization versus simple memory. MCL’s meta-optimized orchestration outperforms single-agent policies in knowledge guidance and strategic flexibility, suggesting the importance of coupled, adaptive architecture for educational LLMs.
7. Significance and Implications
CogEvo-Edu demonstrates that treating retrieval, memory formation, and pedagogical control as a coupled cognitive evolution process yields consistent, substantial gains in complex tutoring domains where sustained personalization and deep conceptual understanding are required. The integrated dual-memory models, dynamic knowledge value functions, and meta-control dual-loop provide a scalable and robust foundation for future educational LLM systems. The results documented in DSP-EduBench validate the approach and provide a reproducible benchmark for the field (Wu et al., 29 Nov 2025).