Papers
Topics
Authors
Recent
Search
2000 character limit reached

GraphRAG-Induced Dual Knowledge Structure Graphs for Personalized Learning Path Recommendation

Published 27 Jun 2025 in cs.IR | (2506.22303v2)

Abstract: Learning path recommendation seeks to provide learners with a structured sequence of learning items (\eg, knowledge concepts or exercises) to optimize their learning efficiency. Despite significant efforts in this area, most existing methods primarily rely on prerequisite relationships, which present two major limitations: 1) Requiring prerequisite relationships between knowledge concepts, which are difficult to obtain due to the cost of expert annotation, hindering the application of current learning path recommendation methods. 2) Relying on a single, sequentially dependent knowledge structure based on prerequisite relationships implies that difficulties at any stage can cause learning blockages, which in turn disrupt subsequent learning processes. To address these challenges, we propose a novel approach, GraphRAG-Induced Dual Knowledge Structure Graphs for Personalized Learning Path Recommendation (KnowLP), which enhances learning path recommendations by incorporating both prerequisite and similarity relationships between knowledge concepts. Specifically, we introduce a knowledge concept structure graph generation module EDU-GraphRAG that adaptively constructs knowledge concept structure graphs for different educational datasets, significantly improving the generalizability of learning path recommendation methods. We then propose a Discrimination Learning-driven Reinforcement Learning (DLRL) module, which mitigates the issue of blocked learning paths, further enhancing the efficacy of learning path recommendations. Finally, we conduct extensive experiments on three benchmark datasets, demonstrating that our method not only achieves state-of-the-art performance but also provides interpretable reasoning for the recommended learning paths.

Summary

  • The paper presents a novel DLELP framework that uses dual knowledge graphs, combining prerequisite and similarity relationships for effective learning path recommendations.
  • The methodology leverages generative LLM-based text refinement and GraphRAG to construct comprehensive KC structure graphs even in datasets lacking expert annotations.
  • Multi-agent reinforcement learning, featuring P-Agent, S-Agent, and D-Agent, optimizes personalized learning paths and demonstrates significant performance gains over SOTA methods.

Dual Knowledge Structure Graphs and Discrimination Learning for Personalized Learning Path Recommendation

Motivation and Problem Setting

Personalized learning path recommendation (LPR) aims to sequence educational content to optimize individual learner mastery efficiently. Traditional methods predominantly rely on knowledge concept (KC) prerequisite relationship graphs supplied by domain experts. However, most real-world educational datasets lack explicit or complete prerequisite graphs, rendering prior LPR algorithms either inapplicable or suboptimal. Moreover, exclusive dependence on prerequisite relationships often leads to the "blocked phenomenon," where learners become impeded due to confusion among similar KCs—an effect well-documented in cognitive science as a failure of discrimination learning. Addressing both sparse KC structure graphs and the need for discrimination learning is critical for advancing adaptive LPR systems. Figure 1

Figure 1

Figure 1: Comparison between traditional prerequisite-only methods (top) and the proposed KnowLP approach incorporating similarity-based discrimination learning (bottom).

Methodological Advances

KC Structure Graph Generation Using TextGrad and GraphRAG

The proposed framework, DLELP, begins by constructing comprehensive KC structure graphs comprising both prerequisite and similarity relationships. Since most educational datasets lack detailed KC descriptions, generative LLMs are used to generate explanatory texts for each KC. To mitigate LLM hallucination and ensure high-fidelity knowledge extraction, TextGrad—a prompt-based output refinement method—is deployed iteratively, filtering and upgrading explanations through LLM self-critique cycles.

Subsequently, the EDU-GraphRAG module, a domain-adapted version of the GraphRAG framework, parses the refined KC texts to extract both prerequisite and similarity relationships via entity/relation extraction and community-level graph summarization. This process ensures coverage and density in the resulting KC structure graphs even for datasets lacking any initial structure. The methodology yields dual knowledge graphs that encode both hierarchical (prerequisite) and associative (similarity) structure. Figure 2

Figure 2

Figure 2: Overview of the KnowLP framework showing dual graph generation (left) and sequential path generation via multi-agent RL (right).

Reinforcement Learning for Learning Path Recommendation

DLELP frames learning path generation as a sequential decision-making problem in a personalized environment. The key design is a tri-agent RL architecture:

  • Prerequisite Agent (P-Agent): Generates core learning sequences adhering to prerequisite constraints using Proximal Policy Optimization (PPO). The action space is dynamically pruned to feasible paths via a backward traversal that identifies optimal initial KCs, thus reducing sample complexity and mitigating local minima.
  • Similarity Agent (S-Agent): Triggered when mastery improvement over sequential KC recommendations falls below a tunable threshold Ï„\tau. The agent then injects sequences of similar KCs into the path to promote discrimination learning, leveraging the similarity subgraph.
  • Difficulty Agent (D-Agent): Matches each KC recommendation to available exercises by aligning item difficulty (modeled via embedding) with learner current mastery, as tracked by a Difficulty-Integrated Knowledge Tracing (DIMKT) model.

At each decision stage, the composite state vector incorporates historical learner performance, current knowledge estimates, and learning goals, and each agent selects actions according to domain-adaptive policies. Reward signals are computed based on the delta in predicted mastery, maximizing the normalized gain over a session. Figure 3

Figure 3

Figure 3: Example explanation sequence for a recommended learning path, illustrating stepwise agent decisions and rationales.

Empirical Analysis

Graph Generation and Structure Comparison

DLELP demonstrates substantial improvements in the density and coverage of generated KC structure graphs compared to expert-provided graphs. On the Junyi and MOOCCubeX datasets, the generated graphs capture more KCs and richer (albeit denser) prerequisite relations, and avoid the pitfalls of erroneously incomplete or overly dense baseline graphs present in public datasets. Figure 4

Figure 4

Figure 4: Comparison between original and KnowLP-generated KC structure graphs, quantifying node and edge counts.

Figure 5

Figure 5

Figure 5: Downstream performance comparison using ground-truth vs. generated structure graphs; KnowLP graphs consistently enable superior or at least parity LPR quality.

Ablation and Simulation Studies

Systematic ablation of the S-Agent reveals significant performance degradation without discrimination learning, confirming the theoretical hypothesis that similarity-driven interventions are essential to avoiding blocked progression, particularly for long learning sequences (step=20).

In simulation experiments using knowledge evolution simulators (KES), DLELP outperforms prior SOTA models (e.g., SRC, GEHRL, DLPR), exhibiting robustness under dynamic learner behavior and unseen exercise sequences. Figure 6

Figure 6

Figure 6: Learning outcome improvements in simulated online education scenario, comparing advanced LPR methods on KES-generated behavioral data.

Interpretation and Explainability

EDU-GraphRAG also produces community-level summaries supporting path-level explainability. For each recommended path, the underlying KC dependencies and similarity-based discriminations are made explicit, enhancing interpretability for stakeholders and facilitating trust in automated recommendations. Figure 7

Figure 7

Figure 7: Impact of applying TextGrad; post-TextGrad explanations effectively separate closely related KCs, reducing confusion in both graph induction and downstream recommendations.

Quantitative Results

DLELP achieves statistically significant performance gains in normalized learning objective improvement (EpE_p) across three benchmarks (Junyi, MOOCCubeX, ASSIST09), consistently outperforming KC correlation-based, vanilla RL-based, and advanced hierarchical RL (HRL) LPR algorithms. The improvement margin is pronounced for longer path recommendations and in scenarios with incomplete or absent structure graphs.

Notably, removal of the similarity agent results in a 20–60% drop in efficacy across all settings, and DLELP-generated structure graphs maintain or even slightly exceed the performance of manually annotated expert graphs.

Theoretical and Practical Implications

This work demonstrates a paradigm shift from single-structure LPR (prerequisites only) to dual-structure approaches encoding both prerequisite and similarity relations. The empirical findings validate cognitive science theory that interleaved discrimination learning between similar concepts is essential for robust knowledge transfer, especially in complex domains with high concept overlap. The framework's reliance on scalable LLM-based explanation generation and retrieval-augmented graph induction paves the way for fully automated LPR in domains lacking any prior structure or exhaustive expert annotation.

Practically, this approach generalizes to any context where item structures are latent and concept similarity impedes or accelerates mastery. Applications extend beyond education to adaptive training, content curation, and skill acquisition pipelines in enterprise and industrial settings.

Future Research Directions

Prospects for continued research include:

  • Refinement of similarity metrics and more sophisticated adaptive thresholds Ï„\tau for triggering discrimination interventions.
  • Integration with multimodal knowledge tracing (e.g., video, audio KCs).
  • Exploration of meta-learning for agent policy transfer across domains or cohorts.
  • Deployment in large-scale MOOCs with real-time learner feedback and human-in-the-loop explainability.

Conclusion

The DLELP framework establishes a new standard for personalized LPR by synthesizing automated dual-structure KC graph induction, discrimination learning, and multi-agent reinforcement learning in a unified, explainable pipeline. Empirical evidence underscores its effectiveness and extensibility, addressing longstanding data and modeling gaps in adaptive educational systems.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 1 like about this paper.