Papers
Topics
Authors
Recent
2000 character limit reached

CogEvo-Edu: Adaptive Hierarchical Tutoring

Updated 6 December 2025
  • CogEvo-Edu is a hierarchical, multi-agent tutoring system that integrates a Cognitive Perception Layer, Knowledge Evolution Layer, and Meta-Control Layer.
  • It employs dual memory for personalized learning and dynamic chunk valuation to maintain high factual precision across complex STEM domains.
  • Empirical evaluations in DSP education show substantial improvements in knowledge delivery, memory consistency, and adaptive teaching strategies.

CogEvo-Edu is a hierarchical, multi-agent educational system that couples retrieval, memory, and adaptive control to advance conversational LLM tutoring in complex STEM domains. Synthesizing a cognitive evolution perspective, CogEvo-Edu departs from standard static retrieval-augmented generation (RAG) pipelines by integrating three tightly coupled architectural layers—the Cognitive Perception Layer (CPL), Knowledge Evolution Layer (KEL), and Meta-Control Layer (MCL)—to deliver adaptive, long-horizon, and personalized tutoring experiences. The system’s empirical validation centers on digital signal processing (DSP) education, where it demonstrates substantial improvements in both knowledge delivery and student model adaptivity compared to prior approaches (Wu et al., 29 Nov 2025).

1. Hierarchical Architecture Overview

CogEvo-Edu’s architecture is defined by three distinct but interrelated layers:

  1. Cognitive Perception Layer (CPL): Maintains a dual-memory student model by splitting state into Short-Term Sensory Memory and Long-Term Cognitive Memory. The short-term memory Ht\mathcal{H}_t captures the most recent ww question-answer (QA) turns:

Ht={(qi,ai)}i=twt.\mathcal{H}_t = \{(q_i, a_i)\}_{i=t-w}^t\,.

The long-term profile Pt\mathcal{P}_t stores structured, confidence-weighted features:

Pt={(kj,vj,ωj)}j=1M,\mathcal{P}_t = \{(k_j, v_j, \omega_j)\}_{j=1}^M,

where kjk_j is the feature key (e.g., “weak on Z-transforms”), vjv_j the value, and ωj[0,1]\omega_j \in [0,1] the confidence.

  1. Knowledge Evolution Layer (KEL): Manages a dynamic knowledge base K={ci}i=1N\mathcal{K} = \{c_i\}_{i=1}^N, with each chunk cic_i (text, code, derivation) annotated with a spatiotemporal value V(ci)V(c_i), driving chunk activation, compression, and lifecycle management.
  2. Meta-Control Layer (MCL): Orchestrates a set of specialist teaching agents (e.g., Explanation, Diagnosis, Question-Gen) using a parameterized policy πθ\pi_\theta. The MCL executes a dual-loop optimization: an inner loop for micro-step teaching (reinforcement learning), and an outer loop for meta-optimizing policy and hyperparameters.

At every interaction, CPL updates Pt\mathcal{P}_t, KEL selects relevant Ktact\mathcal{K}_t^{\text{act}}, and MCL selects an agent and policy based on the system state.

2. Cognitive Perception Layer: Dual Memory and Consolidation

Dual-Memory Student Modeling

CPL’s short-term memory Ht\mathcal{H}_t and long-term profile Pt\mathcal{P}_t support temporal abstraction and profile stability. Newly extracted candidate features, Fnew=LLMextract(Ht)F_{\text{new}} = \mathrm{LLM}_{\text{extract}}(\mathcal{H}_t), are merged into Pt\mathcal{P}_t using an operator Ψ\Psi:

Pt+1=Ψ(Pt,Ht)=PtFnew.\mathcal{P}_{t+1} = \Psi(\mathcal{P}_t, \mathcal{H}_t) = \mathcal{P}_t \oplus F_{\text{new}}\,.

Confidence-Weighted Consolidation

Feature matching is performed using cosine similarity: if sim(fnew,fold)>τmatch{\rm sim}(f_{\text{new}}, f_{\text{old}}) > \tau_{\text{match}}, the new feature reinforces; otherwise, it corrects the old feature. The confidence update follows:

ωnew={ωold+η(1ωold)(reinforcement) ωoldηωold(correction)\omega_{\rm new} = \begin{cases} \omega_{\rm old} + \eta (1 - \omega_{\rm old}) & \text{(reinforcement)} \ \omega_{\rm old} - \eta\, \omega_{\rm old} & \text{(correction)} \end{cases}

with η\eta as the learning rate. This dynamic enables rapid, low-overhead self-correction and high-fidelity personalization under context constraints.

3. Knowledge Evolution Layer: Dynamic Chunk Valuation and Management

Spatiotemporal Value Function

Each chunk’s retrieval utility is quantified:

V(ci)=αf(ci)maxjf(cj)+βexp ⁣(Δtiτdecay)+γDsem(ci)V(c_i) = \alpha\,\frac{f(c_i)}{\max_j f(c_j)} + \beta\,\exp\!\left(-\frac{\Delta t_i}{\tau_{\rm decay}}\right) + \gamma\,\mathcal{D}_{\rm sem}(c_i)

where f(ci)f(c_i) is retrieval frequency, Δti\Delta t_i is time since last access, τdecay\tau_{\rm decay} a decay constant, and Dsem(ci)\mathcal{D}_{\rm sem}(c_i) semantic density:

Dsem(ci)=1kcjKNN(ci)cos(ei,ej),\mathcal{D}_{\rm sem}(c_i) = \frac{1}{k}\sum_{c_j\in \mathrm{KNN}(c_i)} \cos(\mathbf{e}_i, \mathbf{e}_j)\,,

with ei\mathbf{e}_i as the chunk’s embedding.

Lifecycle Management

The knowledge base is partitioned:

{Kact={ci:V(ci)θsolid} Ksol={θforgetV(ci)<θsolid} Kdel={V(ci)<θforget}\begin{cases} \mathcal{K}_{\rm act} = \{c_i: V(c_i) \geq \theta_{\rm solid}\} \ \mathcal{K}_{\rm sol} = \{\theta_{\rm forget} \leq V(c_i) < \theta_{\rm solid}\} \ \mathcal{K}_{\rm del} = \{V(c_i) < \theta_{\rm forget}\} \end{cases}

Chunks in Kact\mathcal{K}_{\rm act} remain fully indexed, those in Kdel\mathcal{K}_{\rm del} are deleted, and Ksol\mathcal{K}_{\rm sol} items undergo LLM-based semantic compression:

ciLLMsumm(ci)c_i' \leftarrow \mathrm{LLM}_{\rm summ}(c_i)

This ensures a balance between O(K)O(|\mathcal{K}|) storage and retrieval quality.

4. Meta-Control Layer: Hierarchical Decision and Adaptation

Markov Decision Formulation

MCL models tutoring as a hierarchical Markov Decision Process (MDP), with system state st=(Pt,Ktact,xt)s_t = (\mathcal{P}_t, \mathcal{K}_t^{\rm act}, x_t), action atπθ(atst)a_t \sim \pi_\theta(a_t|s_t), and reward rt=R(st,at,st+1)r_t = R(s_t, a_t, s_{t+1}), where xtx_t is the present concept or query.

Actions encode the choice of lead agent, teaching strategy, content difficulty, and retrieval/compression policy.

Dual-Loop Optimization

  • Inner Loop: Maximizes discounted expected reward under fixed hyperparameters:

Jinner(θλ)=Eπθ[t=0Tγtrt]J_{\rm inner}(\theta|\lambda) = \mathbb{E}_{\pi_\theta}\left[ \sum_{t=0}^T \gamma^t r_t \right]

Policy-gradient methods (e.g., REINFORCE) are applied for θ\theta.

  • Outer Loop: Jointly adapts θ\theta and CPL/KEL hyperparameters λ={η,α,β,γ,θsolid,θforget,τdecay}\lambda = \{\eta, \alpha, \beta, \gamma, \theta_{\rm solid}, \theta_{\rm forget}, \tau_{\rm decay}\} to maximize the long-term reward

J(θ,λ)=E[trt]J(\theta, \lambda) = \mathbb{E}\left[\sum_t r_t\right]

Parameter updates:

θθ+αinnerθJ,λλ+αouterλJ\theta \leftarrow \theta + \alpha_{\rm inner}\nabla_\theta J\,, \quad \lambda \leftarrow \lambda + \alpha_{\rm outer}\nabla_\lambda J

Gradients with respect to λ\lambda can be estimated by finite-difference or evolutionary strategies.

Pseudocode Outline

1
2
3
4
5
6
7
8
9
Initialize θ, λ
repeat (outer-loop epochs):
  Collect a batch of interaction trajectories {τ_i} under policy π_θ and hyperparams λ
  For each trajectory τ_i:
    Compute cumulative reward R(τ_i) via LLM-as-a-Judge on DSP-EduBench
  Estimate ∇_θ J and ∇_λ J from {τ_i, R(τ_i)}
  θ ← θ + α_inner ∇_θ J
  λ ← λ + α_outer ∇_λ J
until convergence

5. Joint Hyperparameter Adaptation and System Dynamics

The meta-optimization outer loop dynamically adapts both CPL and KEL rates:

  • If profile update speed (η\eta) is insufficient for effective personalization, the meta-learner increases η\eta.
  • If KEL pruning (controlled by θforget\theta_{\rm forget}) is too aggressive—compromising factual correctness—parameters (α,β,γ)(\alpha, \beta, \gamma) and thresholds can be tuned for improved retention.
  • All adaptations target improvements in the aggregate objective J(θ,λ)J(\theta, \lambda) as measured by downstream task performance.

This suggests that automated joint adaptation across memory, value, and policy layers supports more robust and flexible system dynamics than static or single-agent designs.

6. DSP-EduBench: Domain Evaluation and Results

Benchmark Design

DSP-EduBench evaluates system behaviors in DSP education. The core features comprise:

  • Content domains: theory (derivations), intuition, and code (MATLAB/Python)
  • Resource base: heterogeneous texts, proofs, code
  • Simulated student personas:
    • UserA (novice): definition amnesia
    • UserB (medium): logical mistake susceptibility
    • UserC (advanced): implementation-driven focus
  • Scripts: long-horizon concept discrimination, diagnosis, debugging
  • Annotations: correct solutions, required knowledge, optimal strategies

Evaluation Procedure

A three-model LLM-as-a-Judge ensemble (GLM-4.5, DeepSeek-V3.1, Qwen3-max) scores each dialogue turn from 1–10 along six axes:

  • Knowledge Precision: Factual Correctness, Contextual Relevance
  • Cognitive Coherence: Memory Consistency, Personalization Alignment
  • Pedagogical Strategy: Knowledge Guidance, Strategy Flexibility

Mean across metrics yields overall score.

Comparative Results

Experiment Factual Contextual Memory Personalization Guidance Strategy Average
LLM Only 5.8 6.5 4.2 4.8 5.5 5.1 5.32
Static RAG 8.4 6.9 4.5 5.0 5.8 5.2 5.97
Simple Memory 6.1 6.7 7.6 6.8 6.0 5.5 6.45
Single Agent 7.9 7.5 6.2 5.8 6.2 4.9 6.42
CogEvo-Edu (Ours) 9.3 9.1 9.5 9.2 8.9 9.4 9.23

KEL’s value-based pruning and compression boost factual correctness and contextual relevance, while CPL’s structured consolidation substantially raises memory consistency and personalization versus simple memory. MCL’s meta-optimized orchestration outperforms single-agent policies in knowledge guidance and strategic flexibility, suggesting the importance of coupled, adaptive architecture for educational LLMs.

7. Significance and Implications

CogEvo-Edu demonstrates that treating retrieval, memory formation, and pedagogical control as a coupled cognitive evolution process yields consistent, substantial gains in complex tutoring domains where sustained personalization and deep conceptual understanding are required. The integrated dual-memory models, dynamic knowledge value functions, and meta-control dual-loop provide a scalable and robust foundation for future educational LLM systems. The results documented in DSP-EduBench validate the approach and provide a reproducible benchmark for the field (Wu et al., 29 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to CogEvo-Edu.