Knowledge Tracing

Updated 22 June 2026

Knowledge tracing is a computational task that models student mastery over skills through analysis of learning interactions.
Modern KT methods employ deep neural networks, structured domain knowledge, and multi-modal data to improve prediction accuracy.
Advances in KT enhance interpretability and personalization, leading to adaptive curricula and actionable insights for educators.

Knowledge tracing (KT) is the computational task of modeling a student’s evolving mastery of skills or "knowledge concepts" (KCs) over time based on their learning interactions, with the primary goal of predicting future performance. This paradigm underpins intelligent tutoring systems and online learning platforms by enabling individualized recommendations, adaptive curriculum sequencing, and formative assessment. The KT literature spans classical probabilistic models, logistic/latent factor models, deep neural architectures, and—more recently—frameworks that integrate structured prior knowledge, multi-modal data, and pre-trained LLMs.

1. Formal Problem Definition and Historical Trajectory

At its core, KT observes, for each student $i$ , a sequence $\{(q_i^t, c_i^t, r_i^t)\}_{t=1}^T$ where $q_i^t$ is the question index, $c_i^t$ is the knowledge concept index, and $r_i^t \in\{0,1\}$ indicates correctness. The central computational goal is to estimate the probability that the student will answer the next question correctly: $\hat y_i^{t+1} = P(r_i^{t+1}=1 \mid q_i^{\leq t}, c_i^{\leq t}, r_i^{\leq t}).$ Early approaches such as Bayesian Knowledge Tracing (BKT) modeled each KC as a two-state hidden Markov model with parameters for prior mastery, learning, guessing, and slipping (Abdelrahman et al., 2022, Shen et al., 2021). Logistic and factor models—Performance Factor Analysis (PFA), Item Response Theory (IRT)—incorporated counts of prior successes/failures, student and item embeddings, and multi-KC dependencies.

The emergence of Deep Knowledge Tracing (DKT) (Abdelrahman et al., 2022) marked a shift to end-to-end sequence models (RNNs/LSTMs) that learn latent knowledge states from raw interaction sequences, resulting in substantial improvements in predictive accuracy on large-scale datasets (Hicke, 2023). This evolution led to a spectrum of neural and hybrid architectures that further increased model capacity and flexibility (Shen et al., 2021).

2. Contemporary Methods: Deep, Structured, and Semantic KT

2.1 Neural Sequence Models and Memory-Augmented Variants

Canonical deep models (e.g., DKT, DKVMN) embed each interaction,

$e_{q_t} = \mathrm{Embed}(q_t),\quad x_t = [e_{q_t}; r_i^t],$

and update a latent student state via an RNN or memory mechanism to produce a correctness prediction for the next time step (Lee et al., 2024). Subsequent innovations introduced key–value memories (DKVMN), self-attention layers (SAKT, SAINT, AKT), and memory slots explicitly aligned with KCs for interpretability and fine-grained mastery tracking (Shen et al., 2021, Abdelrahman et al., 2022).

2.2 Structure-Informed and Side-Information Models

Integrating domain knowledge, such as skill-to-skill relational graphs or hierarchical concept trees, has been shown to enhance both accuracy and robustness in data-sparse or cold-start regimes. Methods using expert-labeled or learned skill graphs as regularization targets in neural architectures (via auxiliary projection heads or node embeddings) guide the learned space to reflect human expertise and improve generalization to new or rare KCs (Kim et al., 2023). Probabilistic frameworks such as Knowledge-Tree-based Knowledge Tracing (KT $^2$ ) model mastery over hierarchical (tree-structured) KCs using Hidden Markov Trees, showing state-of-the-art AUC in low-resource, online-update settings (Gao et al., 11 Jun 2025).

Additionally, side-information, including question–question or question–KC graphs, is incorporated as pre-trained embeddings or Laplacian regularizers to enforce smoothness and leverage structural proximity across items (Wang et al., 2019).

Recent advances leverage the semantic richness of item and concept text, as well as student-generated content (e.g., code or natural language queries). LLM-based Knowledge Tracing (LKT) fully replaces ID-based embeddings with pre-trained LLMs (PLMs) such as BERT or RoBERTa, encoding each $(\text{concept}, \text{question}, \text{response})$ tuple into a text sequence: $\mathbf{x}_i = [\texttt{[CLS]}; \mathbf{c}_i^1; \mathbf{q}_i^1; \mathbf{r}_i^1; \dots; \mathbf{c}_i^T; \mathbf{q}_i^T; \mathbf{r}_i^T; \texttt{[EOS]}]$ and fine-tuning the PLM to predict masked correctness tokens. LKT achieves strong out-of-domain/cold-start performance, significantly surpassing neural KT baselines on benchmarks where semantic alignment is crucial (Lee et al., 2024).

Multi-modal extensions also process student code, problem-solving traces, or natural language queries, fusing heterogeneous signals to model deeper aspects of understanding and diagnose misconception classes (Kim et al., 22 Jan 2025, Park et al., 29 Nov 2025). Frameworks such as SQKT employ specialized code and skill extractors alongside multi-head attention, while KT-PSP utilizes LLMs to extract process-level mathematical proficiency indicators (Kim et al., 22 Jan 2025, Park et al., 29 Nov 2025).

3. Advances in Interpretability, Personalization, and Error Diagnosis

Interpretability remains a central requirement for trustworthy KT. Models increasingly provide token-level attribution through attention mechanisms (unveiling which words or concepts drive predictions), explicit inference paths (mapping student errors to prior interactions and low-mastery KCs), and local model-agnostic explanations (e.g., by LIME) (Lee et al., 2024, Yue et al., 2023). Attention-based KT architectures make student reasoning partially transparent by exposing which past skills or exercises most contribute to current correctness estimates.

Personalization is addressed by clustering students dynamically by segmental ability traces, hierarchically modeling latent learning profiles, or maintaining individualized parameter sets (e.g., in iBKT, KT $\{(q_i^t, c_i^t, r_i^t)\}_{t=1}^T$ 0) (Yue et al., 2023, Gao et al., 11 Jun 2025). Fine-grained diagnostics—such as option tracing—move beyond correctness to predict specific distractor choices, enabling the system to infer latent misconception clusters and tailor feedback accordingly (Ghosh et al., 2021).

4. Evaluation Protocols, Benchmarks, and Quantitative Comparisons

Standard evaluation protocols apply cross-validation on large-scale, real-world datasets such as ASSISTments, Statics2011, XES3G5M, Junyi Academy, EdNet, and Eedi2020, reporting AUC and accuracy on next-response prediction (Gao et al., 11 Jun 2025, Lee et al., 2024, Hicke, 2023, Wang et al., 2022). Table 1 illustrates comparative AUC values for neural and structure-augmented KT models:

Model	ASSIST2009	Statics2011	XES3G5M-T	Junyi
DKT	0.7852	0.7659	0.7852	0.7443
AKT	0.8216	0.8115	--	0.7867
QAKT	0.8171	0.8209	--	0.7908
LKT	0.8508±0.0021†	--	0.8508	--
KT $\{(q_i^t, c_i^t, r_i^t)\}_{t=1}^T$ 1	--	--	0.733‡	--

†RoBERTa-based LKT, zero-shot cold-start; ‡incremental, low-resource regime (Lee et al., 2024, Jia et al., 2023, Gao et al., 11 Jun 2025).

Recent works systematically show that augmenting KT with semantic PLM features (Lee et al., 2024), domain-knowledge priors (Kim et al., 2023, Gao et al., 11 Jun 2025), or expert-encoded skill graphs improves predictive performance, particularly in data-sparse or cold-start settings.

5. Open Research Directions and Ongoing Challenges

Despite rapid methodological advances, KT remains challenged by:

Generalization and cold start: Leveraging semantic content (text, images, code), hierarchical skill structures, or learned skill graphs to enable robust prediction on new students, skills, or questions (Lee et al., 2024, Gao et al., 11 Jun 2025).
Interpretable and actionable feedback: Developing transparent models that deliver human-interpretable mastery trajectories, error types, and proficiency profiles, enabling educators to intervene effectively (Yue et al., 2023, Park et al., 29 Nov 2025).
Data sparsity and privacy: Addressing scenarios with limited or strictly partitioned data (e.g., federated KT) and optimizing for privacy-preserving deployment (Suresh et al., 2022).
Richer modalities: Integrating non-binary supervision sources (open-ended responses, solution traces, time-series features) and multi-modal content into the KT pipeline (Kim et al., 22 Jan 2025, Park et al., 29 Nov 2025).
Learning-to-teach: Unifying KT with curriculum adaptation and optimal teaching policies, possibly by coupling with reinforcement learning agents (Abdelrahman et al., 2022).

6. Practical Impact and Applications

Knowledge tracing underpins a wide range of intelligent educational technologies: adaptive exercise/recommendation engines, personalized learning path generators, error diagnosis modules, and formative assessment tools. Recent large-scale KT challenges and benchmarks have led to reproducible, robust baselines and established metrics for fair comparison (Hicke, 2023, Abdelrahman et al., 2022). Open-source libraries such as EduKTM and EduData further standardize implementations, facilitating method development and evaluation (Shen et al., 2021).

In computer science and mathematics education, KT systems employing semantic modeling of questions, code, and student queries achieve large gains in both in-domain and cross-domain settings by capturing orthogonal mastery signals compared to correctness-only models (Kim et al., 22 Jan 2025). In settings where open-ended solution processes or mathematical proficiency must be assessed, KT frameworks now incorporate multi-stage LLM pipelines to extract and trace fine-grained learning trajectories (Park et al., 29 Nov 2025).

7. Comparative Perspective and Future Outlook

From BKT's interpretable Markovian belief updates to RoBERTa-driven LKT's cold-start semantic tracing, KT has transformed from simple HMMs to highly expressive, structure- and context-aware architectures (Lee et al., 2024). Methodological progress is now driven by: integration of structured domain knowledge (graphs, hierarchies), multimodal data fusion, and pre-trained deep representation learning. Cutting-edge models achieve robust, interpretable, and cold-start-robust KT, with avenues open for transfer learning, cross-domain generalization, and closed-loop teaching interventions.

Open problems remain in XAI for KT, federated and fair KT, and real-time adaptation to diverse learners. The KT research agenda increasingly aims for cognitive fidelity, actionable interpretability, and practical scalability in rich, heterogeneous, and privacy-constrained educational environments (Abdelrahman et al., 2022, Lee et al., 2024).