CPADP: Adaptive Dropout Prediction for Online Courses
- CPADP is a framework that adaptively predicts dropout in LLM-driven online courses by analyzing chapter-level student interactions.
- It employs a three-stage prediction architecture (Zero-Shot, Few-Shot, Fine-Tuned) to optimize performance as more progress data accrues.
- Personalized interventions, such as LLM-generated recall emails, effectively re-engage at-risk learners and boost course completion rates.
Course-Progress-Adaptive Dropout Prediction (CPADP) is a predictive and intervention framework devised for LLM-driven interactive online courses, particularly within the context of Massive AI-empowered Courses (MAIC). Leveraging multi-agent LLM platforms to create highly dynamic, text-intensive educational environments, CPADP predicts and reduces learner dropout by adaptively analyzing students’ interaction histories and intervening with personalized, content-aware recalls. The framework achieves high predictive accuracy using chapter-level, progress-gated predictors and downstream re-engagement agents that integrate deeply with the course’s dialogic structure (Wang et al., 24 Aug 2025).
1. Formalization of Dropout and Student Interaction
CPADP operates on granular, chapter-level representations of learner progression through a K-chapter MAIC. For student :
- is the set of course chapters.
- are chapters completed by (i.e., the instructor agent has presented all slides).
- is normalized course progress.
- Dropout is defined as if (failure to complete all chapters); otherwise.
- is the ordered sequence of interaction messages and timestamps up to the start of chapter .
A single student’s record may yield multiple training instances: demarcates the start of the interaction history window, the end of the dropout prediction window (), enabling temporally localized predictions as learners progress.
2. Feature Engineering and Representation
CPADP transforms interaction logs into structured features grouped into:
- Textual Interaction Metrics
- : average messages per completed chapter.
- : average token-length per message.
- : TF–IDF vectorized content embedding.
- Time-Based Features
- : mean inter-message interval within or time elapsed since the last message at chapter start.
- Progress Indicators
- : normalized chapters finished before .
These are concatenated into . For variants using pretrained LLM (PLM) encodings, is replaced by , yielding .
CPADP’s architecture does not leverage demographics or self-reported learner traits, as empirical evidence indicates interaction features are more predictive of dropout (Wang et al., 24 Aug 2025).
3. Three-Stage Progress-Adaptive Prediction Architecture
Prediction in CPADP is modulated by a gating function based on course progress , partitioned into intervals :
| Model Stage | Input Regime | Description |
|---|---|---|
| Zero-Shot (ZS) | LLM-based prediction with unprimed prompt | |
| Few-Shot (FS) | LLM-based prediction with prompt including a small, curated set of labeled cases; is the chapter where 100 labels accrue | |
| Fine-Tuned (FT) | PLM encoder with MLP, trained on accumulated labels |
Formally, for a given , prediction is:
where selects , , or based on .
Within FT, the MLP classifier operates as: or, in the simplest logistic form,
For ZS/FS, .
A plausible implication is that the staged approach balances label scarcity at earlier course stages with model specificity at later ones.
4. Training Objective, Optimization, and Experimental Results
The FT (PLM+MLP) stage employs a weighted cross-entropy objective: where class weights and address a 40% dropout imbalance.
Key experimental details:
- Dataset: 186 students, 1,201 labeled instances.
- Split: 80% train (961 instances), 20% test (240 instances); cross-validation inside training.
- PLM+MLP training: learning rate , batch size 16, 3–5 epochs.
- No explicit validation set; hyperparameters determined by cross-validation on train.
- Metrics: Precision, Recall, F1, Accuracy.
Performance summary:
| Model | Precision | Recall | F1 | Accuracy |
|---|---|---|---|---|
| PLM+MLP (FT) | 0.966 | 0.906 | 0.935 | 0.954 |
| GPT-4 (FS) | — | — | — | 0.779 |
| GPT-4 (ZS) | — | — | — | 0.716 |
95.4% held-out accuracy confirms substantial improvement over zero-shot/few-shot baselines (Wang et al., 24 Aug 2025).
5. Personalized Recall Agent for Intervention
A downstream intervention mechanism, the Email-Agent, is invoked upon high-risk prediction. Built on an LLM (e.g., GPT-4), the recall agent:
- Incorporates the student’s name, last completed topic/slide, and relevant interaction snippets
- Adopts a motivational, supportive tone with previews of upcoming course content
Emails are dispatched mid-semester. Empirical post-intervention analysis shows that of 17 post-email logins, 8 students (“recalled” by the agent) had been offline for an average of ~52 days and were at ~0.75 progress, while the remaining 9 (self-initiated logins) had been offline ~7.7 days and reached 2.56 progress on average. This suggests the recall agent efficiently targets dormant, at-risk learners rather than active students.
6. Empirical Insights, Generalization, and Future Directions
Empirical analysis demonstrates that:
- Interaction activity and text-derived embeddings are more predictive for dropout than demographic or psychographic data.
- A staged, progress-adaptive approach—ZS, FS, then FT—yields optimal cost–accuracy balance in practical deployments.
- PLM-based embeddings systematically outperform sparse TF–IDF features once sufficient label data accrues.
- Content-personalized, LLM-generated recalls can drive re-engagement among students most likely to drop out.
Although validated in a single MAIC course, the CPADP pipeline is described as modular and transferable to any interactive, conversation-logging online course. Future research directions outlined include broadening multi-course validation, adding richer temporal/social signal extraction, and rigorously A/B testing interventions across modalities (emails, in-platform prompts, SMS).
7. Significance within LLM-Enhanced Online Education
CPADP stands as a comprehensive, LLM-native framework integrating granular interaction analysis, dynamically staged predictive modeling, and automated, just-in-time personalized interventions. Its canonical formulation and empirical results set a functional template for dropout risk management in next-generation LLM-augmented education platforms, providing a technically robust and extensible methodology for both prediction and active retention management (Wang et al., 24 Aug 2025).