CogEvo-Edu: Adaptive Hierarchical Tutoring

Updated 6 December 2025

CogEvo-Edu is a hierarchical, multi-agent tutoring system that integrates a Cognitive Perception Layer, Knowledge Evolution Layer, and Meta-Control Layer.
It employs dual memory for personalized learning and dynamic chunk valuation to maintain high factual precision across complex STEM domains.
Empirical evaluations in DSP education show substantial improvements in knowledge delivery, memory consistency, and adaptive teaching strategies.

CogEvo-Edu is a hierarchical, multi-agent educational system that couples retrieval, memory, and adaptive control to advance conversational LLM tutoring in complex STEM domains. Synthesizing a cognitive evolution perspective, CogEvo-Edu departs from standard static retrieval-augmented generation (RAG) pipelines by integrating three tightly coupled architectural layers—the Cognitive Perception Layer (CPL), Knowledge Evolution Layer (KEL), and Meta-Control Layer (MCL)—to deliver adaptive, long-horizon, and personalized tutoring experiences. The system’s empirical validation centers on digital signal processing (DSP) education, where it demonstrates substantial improvements in both knowledge delivery and student model adaptivity compared to prior approaches (Wu et al., 29 Nov 2025).

1. Hierarchical Architecture Overview

CogEvo-Edu’s architecture is defined by three distinct but interrelated layers:

Cognitive Perception Layer (CPL): Maintains a dual-memory student model by splitting state into Short-Term Sensory Memory and Long-Term Cognitive Memory. The short-term memory $\mathcal{H}_t$ captures the most recent $w$ question-answer (QA) turns:

$\mathcal{H}_t = \{(q_i, a_i)\}_{i=t-w}^t\,.$

The long-term profile $\mathcal{P}_t$ stores structured, confidence-weighted features:

$\mathcal{P}_t = \{(k_j, v_j, \omega_j)\}_{j=1}^M,$

where $k_j$ is the feature key (e.g., “weak on Z-transforms”), $v_j$ the value, and $\omega_j \in [0,1]$ the confidence.

Knowledge Evolution Layer (KEL): Manages a dynamic knowledge base $\mathcal{K} = \{c_i\}_{i=1}^N$ , with each chunk $c_i$ (text, code, derivation) annotated with a spatiotemporal value $V(c_i)$ , driving chunk activation, compression, and lifecycle management.
Meta-Control Layer (MCL): Orchestrates a set of specialist teaching agents (e.g., Explanation, Diagnosis, Question-Gen) using a parameterized policy $\pi_\theta$ . The MCL executes a dual-loop optimization: an inner loop for micro-step teaching (reinforcement learning), and an outer loop for meta-optimizing policy and hyperparameters.

At every interaction, CPL updates $\mathcal{P}_t$ , KEL selects relevant $\mathcal{K}_t^{\text{act}}$ , and MCL selects an agent and policy based on the system state.

2. Cognitive Perception Layer: Dual Memory and Consolidation

Dual-Memory Student Modeling

CPL’s short-term memory $\mathcal{H}_t$ and long-term profile $\mathcal{P}_t$ support temporal abstraction and profile stability. Newly extracted candidate features, $F_{\text{new}} = \mathrm{LLM}_{\text{extract}}(\mathcal{H}_t)$ , are merged into $\mathcal{P}_t$ using an operator $\Psi$ :

$\mathcal{P}_{t+1} = \Psi(\mathcal{P}_t, \mathcal{H}_t) = \mathcal{P}_t \oplus F_{\text{new}}\,.$

Confidence-Weighted Consolidation

Feature matching is performed using cosine similarity: if ${\rm sim}(f_{\text{new}}, f_{\text{old}}) > \tau_{\text{match}}$ , the new feature reinforces; otherwise, it corrects the old feature. The confidence update follows:

$\omega_{\rm new} = \begin{cases} \omega_{\rm old} + \eta (1 - \omega_{\rm old}) & \text{(reinforcement)} \ \omega_{\rm old} - \eta\, \omega_{\rm old} & \text{(correction)} \end{cases}$

with $\eta$ as the learning rate. This dynamic enables rapid, low-overhead self-correction and high-fidelity personalization under context constraints.

3. Knowledge Evolution Layer: Dynamic Chunk Valuation and Management

Spatiotemporal Value Function

Each chunk’s retrieval utility is quantified:

$V(c_i) = \alpha\,\frac{f(c_i)}{\max_j f(c_j)} + \beta\,\exp\!\left(-\frac{\Delta t_i}{\tau_{\rm decay}}\right) + \gamma\,\mathcal{D}_{\rm sem}(c_i)$

where $f(c_i)$ is retrieval frequency, $\Delta t_i$ is time since last access, $\tau_{\rm decay}$ a decay constant, and $\mathcal{D}_{\rm sem}(c_i)$ semantic density:

$\mathcal{D}_{\rm sem}(c_i) = \frac{1}{k}\sum_{c_j\in \mathrm{KNN}(c_i)} \cos(\mathbf{e}_i, \mathbf{e}_j)\,,$

with $\mathbf{e}_i$ as the chunk’s embedding.

Lifecycle Management

The knowledge base is partitioned:

$\begin{cases} \mathcal{K}_{\rm act} = \{c_i: V(c_i) \geq \theta_{\rm solid}\} \ \mathcal{K}_{\rm sol} = \{\theta_{\rm forget} \leq V(c_i) < \theta_{\rm solid}\} \ \mathcal{K}_{\rm del} = \{V(c_i) < \theta_{\rm forget}\} \end{cases}$

Chunks in $\mathcal{K}_{\rm act}$ remain fully indexed, those in $\mathcal{K}_{\rm del}$ are deleted, and $\mathcal{K}_{\rm sol}$ items undergo LLM-based semantic compression:

$c_i' \leftarrow \mathrm{LLM}_{\rm summ}(c_i)$

This ensures a balance between $O(|\mathcal{K}|)$ storage and retrieval quality.

4. Meta-Control Layer: Hierarchical Decision and Adaptation

Markov Decision Formulation

MCL models tutoring as a hierarchical Markov Decision Process (MDP), with system state $s_t = (\mathcal{P}_t, \mathcal{K}_t^{\rm act}, x_t)$ , action $a_t \sim \pi_\theta(a_t|s_t)$ , and reward $r_t = R(s_t, a_t, s_{t+1})$ , where $x_t$ is the present concept or query.

Actions encode the choice of lead agent, teaching strategy, content difficulty, and retrieval/compression policy.

Dual-Loop Optimization

Inner Loop: Maximizes discounted expected reward under fixed hyperparameters:

$J_{\rm inner}(\theta|\lambda) = \mathbb{E}_{\pi_\theta}\left[ \sum_{t=0}^T \gamma^t r_t \right]$

Policy-gradient methods (e.g., REINFORCE) are applied for $\theta$ .

Outer Loop: Jointly adapts $\theta$ and CPL/KEL hyperparameters $\lambda = \{\eta, \alpha, \beta, \gamma, \theta_{\rm solid}, \theta_{\rm forget}, \tau_{\rm decay}\}$ to maximize the long-term reward

$J(\theta, \lambda) = \mathbb{E}\left[\sum_t r_t\right]$

Parameter updates:

$\theta \leftarrow \theta + \alpha_{\rm inner}\nabla_\theta J\,, \quad \lambda \leftarrow \lambda + \alpha_{\rm outer}\nabla_\lambda J$

Gradients with respect to $\lambda$ can be estimated by finite-difference or evolutionary strategies.

Pseudocode Outline

Initialize θ, λ
repeat (outer-loop epochs):
  Collect a batch of interaction trajectories {τ_i} under policy π_θ and hyperparams λ
  For each trajectory τ_i:
    Compute cumulative reward R(τ_i) via LLM-as-a-Judge on DSP-EduBench
  Estimate ∇_θ J and ∇_λ J from {τ_i, R(τ_i)}
  θ ← θ + α_inner ∇_θ J
  λ ← λ + α_outer ∇_λ J
until convergence

5. Joint Hyperparameter Adaptation and System Dynamics

The meta-optimization outer loop dynamically adapts both CPL and KEL rates:

If profile update speed ( $\eta$ ) is insufficient for effective personalization, the meta-learner increases $\eta$ .
If KEL pruning (controlled by $\theta_{\rm forget}$ ) is too aggressive—compromising factual correctness—parameters $(\alpha, \beta, \gamma)$ and thresholds can be tuned for improved retention.
All adaptations target improvements in the aggregate objective $J(\theta, \lambda)$ as measured by downstream task performance.

This suggests that automated joint adaptation across memory, value, and policy layers supports more robust and flexible system dynamics than static or single-agent designs.

6. DSP-EduBench: Domain Evaluation and Results

Benchmark Design

DSP-EduBench evaluates system behaviors in DSP education. The core features comprise:

Content domains: theory (derivations), intuition, and code (MATLAB/Python)
Resource base: heterogeneous texts, proofs, code
Simulated student personas:
- UserA (novice): definition amnesia
- UserB (medium): logical mistake susceptibility
- UserC (advanced): implementation-driven focus
Scripts: long-horizon concept discrimination, diagnosis, debugging
Annotations: correct solutions, required knowledge, optimal strategies

Evaluation Procedure

A three-model LLM-as-a-Judge ensemble (GLM-4.5, DeepSeek-V3.1, Qwen3-max) scores each dialogue turn from 1–10 along six axes:

Knowledge Precision: Factual Correctness, Contextual Relevance
Cognitive Coherence: Memory Consistency, Personalization Alignment
Pedagogical Strategy: Knowledge Guidance, Strategy Flexibility

Mean across metrics yields overall score.

Comparative Results

Experiment	Factual	Contextual	Memory	Personalization	Guidance	Strategy	Average
LLM Only	5.8	6.5	4.2	4.8	5.5	5.1	5.32
Static RAG	8.4	6.9	4.5	5.0	5.8	5.2	5.97
Simple Memory	6.1	6.7	7.6	6.8	6.0	5.5	6.45
Single Agent	7.9	7.5	6.2	5.8	6.2	4.9	6.42
CogEvo-Edu (Ours)	9.3	9.1	9.5	9.2	8.9	9.4	9.23

KEL’s value-based pruning and compression boost factual correctness and contextual relevance, while CPL’s structured consolidation substantially raises memory consistency and personalization versus simple memory. MCL’s meta-optimized orchestration outperforms single-agent policies in knowledge guidance and strategic flexibility, suggesting the importance of coupled, adaptive architecture for educational LLMs.

7. Significance and Implications

CogEvo-Edu demonstrates that treating retrieval, memory formation, and pedagogical control as a coupled cognitive evolution process yields consistent, substantial gains in complex tutoring domains where sustained personalization and deep conceptual understanding are required. The integrated dual-memory models, dynamic knowledge value functions, and meta-control dual-loop provide a scalable and robust foundation for future educational LLM systems. The results documented in DSP-EduBench validate the approach and provide a reproducible benchmark for the field (Wu et al., 29 Nov 2025).

PDF Markdown Chat (Pro)

References (1)

CogEvo-Edu: Cognitive Evolution Educational Multi-Agent Collaborative System (2025)

CogEvo-Edu: Adaptive Hierarchical Tutoring

1. Hierarchical Architecture Overview

2. Cognitive Perception Layer: Dual Memory and Consolidation

Dual-Memory Student Modeling

Confidence-Weighted Consolidation

3. Knowledge Evolution Layer: Dynamic Chunk Valuation and Management

Spatiotemporal Value Function

Lifecycle Management

4. Meta-Control Layer: Hierarchical Decision and Adaptation

Markov Decision Formulation

Dual-Loop Optimization

Pseudocode Outline

5. Joint Hyperparameter Adaptation and System Dynamics

6. DSP-EduBench: Domain Evaluation and Results

Benchmark Design

Evaluation Procedure

Comparative Results

7. Significance and Implications

Whiteboard

Follow Topic

Continue Learning

CogEvo-Edu: Adaptive Hierarchical Tutoring

1. Hierarchical Architecture Overview

2. Cognitive Perception Layer: Dual Memory and Consolidation

Dual-Memory Student Modeling

Confidence-Weighted Consolidation

3. Knowledge Evolution Layer: Dynamic Chunk Valuation and Management

Spatiotemporal Value Function

Lifecycle Management

4. Meta-Control Layer: Hierarchical Decision and Adaptation

Markov Decision Formulation

Dual-Loop Optimization

Pseudocode Outline

5. Joint Hyperparameter Adaptation and System Dynamics

6. DSP-EduBench: Domain Evaluation and Results

Benchmark Design

Evaluation Procedure

Comparative Results

7. Significance and Implications

Sponsor

Whiteboard

Follow Topic

Continue Learning

Related Topics