KT-PSP: Process-Aware Knowledge Tracing

Updated 6 December 2025

KT-PSP is a knowledge tracing method that enriches traditional models by incorporating detailed data from students’ problem-solving processes.
It leverages diverse inputs—such as code submissions, handwritten solutions, and OCR traces—to extract intermediate proficiency signals using advanced LLM-based pipelines.
Empirical results show that KT-PSP frameworks consistently improve metrics like AUC and RMSE, enabling more robust, diagnostic, and personalized feedback.

Knowledge Tracing Leveraging Problem-Solving Process (KT-PSP) denotes a class of knowledge tracing methodologies that incorporate detailed student problem-solving process data—beyond mere correctness labels—to capture granular, multidimensional models of learner proficiency and improve prediction fidelity. Recent KT-PSP frameworks utilize diverse sources of process data including code submissions, stepwise mathematical solutions, procedural annotations, and solution traces, extracting and leveraging domain-specific intermediate signals for both enhanced predictive accuracy and improved interpretability. These approaches address the limitations of classical KT models (e.g., DKT, BKT), which operate exclusively on correctness/outcome sequences, by explicitly modeling students’ in-problem behaviors, thereby facilitating personalized, diagnostic feedback in adaptive learning systems.

1. Problem Formulation and Motivation

KT-PSP extends the standard knowledge tracing paradigm by augmenting the student interaction sequence. Classical KT models a student’s knowledge evolution via

$S_\text{conv} = \{ (q_1, c_1, r_1), \ldots, (q_t, c_t, r_t) \},$

where $q_t$ is the $t$ th problem, $c_t$ its knowledge concept(s), and $r_t \in \{0,1\}$ the correctness label. The prediction task is to estimate

$P(r_{t+1}=1 \mid q_{t+1}, c_{t+1}, S_\text{conv}).$

In KT-PSP, each interaction is further annotated with the observed problem-solving process, leading to

$S_\text{PSP} = \{ (q_1, c_1, r_1, p_1), \ldots, (q_t, c_t, r_t, p_t) \},$

where $p_t$ encodes the student’s process trace (e.g., source code, handwritten solution steps, multi-step logs). The predictive model then leverages the expanded history to estimate

$P(r_{t+1}=1 \mid q_{t+1}, c_{t+1}, S_\text{PSP}),$

and, in advanced KP-PSP frameworks such as StatusKT, incorporates process-derived intermediate proficiency vectors $m_t \in \mathbb{R}^D$ , where $D$ is the number of proficiency dimensions (e.g., conceptual understanding, procedural fluency) (Park et al., 29 Nov 2025).

KT-PSP is motivated by the empirical limitations of correctness-only KT, which fails to exploit the diagnostic value in partial solutions, error types, intermediate states, and domain-specific behaviors that constitute the bulk of student learning dynamics. Early work on Code-DKT (Shi et al., 2022) and KCQRL (Ozyurt et al., 2024) illustrates the substantial AUC gains achievable by embedding process-aware signals into the sequential KT pipeline.

2. Datasets and Problem-Solving Process Representations

The principal benchmark for mathematical KT-PSP is the KT-PSP-25 dataset (Park et al., 29 Nov 2025), comprising 22,289 digital math sessions with OCR-transcribed solution traces, knowledge concept tags, correctness, and timing metadata. Each $p_t$ is a multi-line (≥5 lines) logically ordered LaTeX representation of a student’s handwritten solution, curated for high process fidelity. Problems span 2,696 items and 490 distinct knowledge concepts, supporting fine-grained mapping between solution steps and conceptual targets.

In programming KT-PSP, the Code-DKT dataset encompasses 410 students and 50 Java programming problems, where each process trace $c_t$ is the raw source code of a student submission (Shi et al., 2022).

For general math KT, process annotation can be generated automatically using LLM-based chain-of-thought prompts, followed by step-wise knowledge concept (KC) mapping and semantic alignment (Ozyurt et al., 2024). Automated tools extract solution steps, annotate KCs, and align process steps to domain concepts, supplying detailed input representations for downstream KT modules.

3. Architectures and Process-Sensitive Modeling

KT-PSP methodologies instantiate several architectural innovations to process and exploit problem-solving process data. The principal architectural modules include:

Process Feature Extraction: In Code-DKT (Shi et al., 2022), code submissions are parsed into ASTs, and code-paths are sampled and encoded via an attention-weighted code2vec variant, yielding a process feature vector $z_t$ for each attempt. For mathematical problem solving (StatusKT (Park et al., 29 Nov 2025)), OCR-extracted solution traces feed into LLM-based pipelines that decompose the process into natural-language proficiency indicators and map evidence of mastery to explicit scores.
Intermediate Signal Construction: StatusKT (Park et al., 29 Nov 2025) deploys a three-stage LLM pipeline:
1. Teacher LLM: Generates lists of problem-specific proficiency indicators covering distinct dimensions (CU, SC, PF, AR).
2. Student LLM: Maps each indicator to a candidate response, as if answering rubric questions with student solutions.
3. Teacher LLM: Evaluates responses for indicator satisfaction, yielding binary scores $e_{t,d,j}$ . Averaged by dimension, these form the MP (mathematical proficiency) ratio vector $m_t \in [0,1]^4$ . These $m_t$ serve as auxiliary inputs to the KT model at each timestep.
Process-Enhanced KT Backbone: The status-aware KT model concatenates or injects process-derived features ( $z_t$ , $m_t$ ) with traditional correctness histories and passes them through LSTM, Transformer, or memory-augmented architectures (e.g., DKVMN), enhancing both predictive accuracy and explanatory transparency (Park et al., 29 Nov 2025, Shi et al., 2022).
Task-Specific Objective Functions: Models optimize a composite loss, typically

$L = \sum_{t=1}^T \big[ \text{BCE}(r_t, r_{\text{pred},t}) + \alpha \sum_{d} (m_{t,d} - m_{\text{pred},t,d})^2 \big],$

balancing response prediction and proficiency regression (Park et al., 29 Nov 2025).

4. Model Learning, Evaluation, and Empirical Results

KT-PSP frameworks are typically trained and evaluated on split datasets using metrics such as AUC and RMSE for next-response correctness, and mean-squared error for proficiency prediction. Notable empirical findings:

StatusKT (Park et al., 29 Nov 2025) achieves consistent performance improvements (ΔAUC up to 0.0149) across ten canonical KT algorithms on KT-PSP-25, including DKT, DKVMN, SAINT, AKT, stableKT, robustKT. Gains are robust to process noise and task type, and ablations show that both the proficiency regression loss and the LLM-based process signal extraction are critical.
Code-DKT (Shi et al., 2022) yields 3.07–4.00% AUC improvement over DKT across five programming assignments, with maximum gains on tasks exhibiting recurring code structures and concept overlap.
KCQRL (Ozyurt et al., 2024) demonstrates generalized process-level gains across 15 KT models (relative AUC increases 1–7%) by employing automated, process-anchored KC annotation and contrastively aligned question representations.
GRATE (Wang et al., 2022) introduces adaptive attempt aggregation and rank-based temporal smoothing to mitigate process noise in complex, multi-concept problems, securing statistically significant improvements over standard tensor factorization and memory-network KT models.

A summary of selected quantitative results appears below:

Model / Approach	Dataset	Baseline AUC	KT-PSP AUC	ΔAUC
StatusKT (DKVMN)	KT-PSP-25	0.6095	0.6220	+0.0125
Code-DKT	Java Assignments	See text	+3.07–4%
KCQRL (IEKT)	XES3G5M	82.24	82.82	+0.58
GRATE	MasteryGrids	≈0.688	0.7035	+0.0155

These consistent AUC and RMSE gains confirm the hypothesis that rich process information provides actionable, predictive signal unavailable to correctness-only KT models.

5. Intermediate Signals and Interpretability

A key advantage of KT-PSP is the explicit modeling and prediction of human-interpretable proficiency vectors at each timestep. In StatusKT (Park et al., 29 Nov 2025), for each problem and student, the model provides estimated scores for Conceptual Understanding (CU), Strategic Competence (SC), Procedural Fluency (PF), and Adaptive Reasoning (AR). These are derived by evaluating LLM-generated, process-specific indicators against the student’s solution trace, facilitating strand-level diagnostics with native interpretability.

For example, a predicted vector

$m_{\text{pred}} = [\text{CU}=0.80, \text{SC}=0.60, \text{PF}=0.40, \text{AR}=0.25]$

indicates high conceptual mastery but low adaptive reasoning, guiding targeted instructional feedback. Case studies in (Park et al., 29 Nov 2025) demonstrate the validity of this multidimensional scoring for formative assessment.

Process-derived embeddings in Code-DKT (Shi et al., 2022) and KCQRL (Ozyurt et al., 2024) similarly contribute to interpretability by identifying domain-relevant code patterns or semantic skills aligned with knowledge concepts or solution-step representations.

6. Extensions: Process Granularity and Aggregation

KT-PSP research pursues finer process granularity while addressing signal noise and redundancy. GRATE (Wang et al., 2022) introduces dynamic attempt aggregation via a rank-based tensor factorization:

Attempts are automatically merged to eliminate high-noise or uninformative time slices, smoothing knowledge traces while preserving key transitions.
A soft monotonicity constraint stabilizes predicted mastery trajectories, mitigating the effects of slips and guesses.
The approach provides interpretable Q-matrices mapping problems to latent concepts, frequently uncovering skill clusters orthogonal to textbook topic labels.

A plausible implication is that adaptive process aggregation will become essential as KT-PSP frameworks scale to more complex, open-ended domains.

KT-PSP unifies and extends multiple methodological lines:

LLM-based semantic annotation and stepwise process interpretation (Ozyurt et al., 2024, Park et al., 29 Nov 2025).
Domain-specific feature extraction from raw code or solution logs (Shi et al., 2022).
Tensor and memory-Augmented KT for multi-concept, multi-step interactions (Wang et al., 2022).
Deep contrastive learning for embedding alignment and noise reduction (Ozyurt et al., 2024).

Emerging themes include integration of multimodal process data (e.g., speech, gesture, collaborative logs), real-time proficiency estimation for adaptive intervention, and generalized frameworks for process-sensitive KT beyond mathematics and programming. As dataset scale and LLM sophistication increase, KT-PSP is positioned to support next-generation personalizable tutoring systems with both accurate predictions and fine-grained, explainable feedback.

References:

"Tracing Mathematical Proficiency Through Problem-Solving Processes" (Park et al., 29 Nov 2025)
"Code-DKT: A Code-based Knowledge Tracing Model for Programming Tasks" (Shi et al., 2022)
"Automated Knowledge Concept Annotation and Question Representation Learning for Knowledge Tracing" (Ozyurt et al., 2024)
"Knowledge Tracing for Complex Problem Solving: Granular Rank-Based Tensor Factorization" (Wang et al., 2022)