Developing Authentic Simulated Learners for Mathematics Teacher Learning: Insights from Three Approaches with Large Language Models

Published 6 Apr 2026 in cs.HC | (2604.04361v1)

Abstract: LLM simulations, where LLMs act as students with varying approaches to learning tasks, can support teachers' noticing of student thinking. However, simulations using zero- or few-shot prompting often yield inauthentic knowledge and language, directing teachers to unrealistic reasoning. We evaluate three approaches (Fine-tuning, Multi-agent, and Direct Preference Optimization; DPO) to improve the authenticity and pedagogical utility of simulated students. All approaches improve cognitive and linguistic authenticity, compared with few-shot prompts. Interviews with elementary mathematics pre-service teachers and researchers (\textit{n} = 8) reveal distinct pedagogical affordances. The fine-tuned model produces realistic, brief responses but limits opportunities to extend students' thinking. Meanwhile, the multi-agent and DPO approaches generate explicit reasoning behind student strategies. We discuss implications for designing LLM simulations that balance authenticity with instructional utility for teacher learning.

Abstract PDF Upgrade to Chat

Authors (7)

Summary

The paper demonstrates that advanced LLM adaptation methods measurably improve both cognitive plausibility and linguistic realism in math learner simulations.
The fine-tuning approach leverages real student data, while multi-agent and DPO methods incorporate reflective critique to produce contextually appropriate responses.
Educator feedback indicates that DPO achieves high authenticity and diverse reasoning, offering actionable insights for scalable teacher training simulations.

Developing Authentic Simulated Learners for Mathematics Teacher Education: Comparative Analysis of Fine-Tuning, Multi-Agent, and DPO Approaches

Introduction

The increasing integration of LLM-based simulations in Practice-Based Teacher Education (PBTE) addresses significant limitations of traditional training modalities by providing scalable, dynamic opportunities for teacher candidates to develop professional noticing of student mathematical thinking. However, prior LLM approaches relying on zero- or few-shot prompting inadequately mirror authentic student cognition and discourse, often resulting in verbose, overly sophisticated, or inconsistent simulations. This paper systematically evaluates three methods to enhance the authenticity and educational utility of simulated students: Fine-tuning on domain-specific learner data, Multi-agent architectures with reflective critique and self-correction loops, and Direct Preference Optimization (DPO) using synthetic preference pairs.

Figure 1: Prompt and interface for the simulated agent "Josh" targeting elementary fraction sense.

Comparative Methodological Overview

The study centers on a well-established mathematics diagnostic task—finding a fraction between 2/3 and 7/8—undertaken by pre-service teachers (PSTs) interacting with simulated fifth-grade student "Josh" in multi-turn chat-based settings. Across numerous authentic and inauthentic interaction instances, authenticity of LLM outputs is evaluated along two axes: Cognitive plausibility (consistency with the student profile, logical reasoning congruency, and typical solution pathways) and Linguistic realism (age-appropriate tone, avoidance of disciplinary jargon, and variability in surface features).

The three LLM adaptation strategies are as follows:

Fine-tuning: GPT-4.1 is adapted on a curated corpus comprising authentic student utterances from PST interactions, AI-tutoring transcripts (e.g., Khan Academy), and annotated classroom dialogue datasets.
Multi-Agent Architecture: A hierarchical system where an Initial Responder generates candidate outputs, an Evaluator critiques those outputs using the authenticity framework, and a Refiner revises responses, iteratively enhancing plausibility and alignment.
Direct Preference Optimization (DPO): Synthetic preference pairs are generated via reflexive agent critique and revision, with the LLM directly trained on these preferences using the DPO algorithm, bypassing explicit reward model construction.
Figure 2: Overview of approaches showing LLM selection and feedback-driven refinement mechanisms.

Quantitative Outcomes on Authenticity

Rigorous double-coded evaluation with Generalized Linear Mixed Models (GLMM) and McNemar’s tests reveals that all three advanced methods substantially outperform few-shot prompting baselines in both cognitive and linguistic authenticity (Fine-tuning: $p=.007$ / $p<.001$ ; Multi-Agent: $p<.001$ / $p=.003$ ; DPO: $p=.013$ / $p<.001$ ). DPO achieves the highest descriptive performance, consistently generating 100% authentic language and 88.7% authentic cognition on the test set. However, no statistically significant differences emerge between the advanced approaches in aggregate GLMM analyses, indicating convergent effectiveness across methods in the current simulation context.

Figure 3: Performance of each approach across cognition and language authenticity dimensions.

Qualitative Analysis: Educator Perceptions and Pedagogical Implications

Structured interviews with teacher candidates and mathematics education researchers reveal nuanced affordances of each approach beyond aggregate authenticity scores:

Fine-Tuning consistently produces concise, demotivated student-like responses characteristic of classroom disengagement, prompting PSTs to use more open-ended elicitation strategies. However, such brevity can restrict opportunities for scaffolded reasoning.
Multi-Agent outputs are linguistically rich and display explicit meta-cognitive reasoning. While this can simulate high-performing students or support teachers in tracing thinking processes, it sometimes results in inauthentic verbosity and overly sophisticated explanations.
DPO is systematically preferred in educator rankings for its capacity to simulate diverse reasoning trajectories and articulate uncertainty in contextually appropriate ways. However, both DPO and Multi-Agent versions occasionally manifest uncertainty inconsistently with prior agent knowledge, which some participants find unrealistic.
Figure 4: User preference and perceived authenticity rankings for the three approaches.

Adaptive questioning emerges as a key instructional outcome: Fine-tuned agents promote more probing and open-ended scaffolding, whereas Multi-Agent/DPO agents facilitate deeper explanatory questioning (e.g., "why" and "how" prompts). All versions support PSTs’ reflection on the spectrum of authentic student responses and adaptive teacher noticing skills, but agent expressiveness must be balanced to avoid demotivation or cognitive overload.

Discussion

The findings demonstrate that Fine-tuning, Multi-Agent, and DPO pipelines each remediate the core authenticity limitations of prompt-only LLM student simulations. Each mechanism entails intrinsic trade-offs: Fine-tuning capitalizes on real classroom data but is limited in adaptivity when data are scarce; Multi-Agent systems excel in generating explicit reasoning but suffer from greater latency and verbosity; DPO generalizes well with limited paired preferences and aligns closely with desired student behaviors when carefully scaffolded. The study underscores the necessity of selecting and adapting techniques based on the target pedagogical context and the specific goals of simulation-based teacher training.

Notably, DPO and reflexive Multi-Agent feedback present a scalable frontier for educational LLM simulation, especially in domains where reliable annotated student data are limited. However, handling nuanced uncertainty expression and reasoning trajectories remains an open challenge, as does scaling simulation complexity to multi-student classroom settings.

Conclusion

This study provides robust evidence that advanced LLM adaptation methods—Fine-tuning, Multi-Agent collaboration, and Direct Preference Optimization—can each significantly improve the authenticity and pedagogical usefulness of simulated learners in mathematics teacher education. DPO approaches are favored in head-to-head educator comparisons, but all evaluated methods offer distinct advantages and implementation trade-offs. Future research should pursue integration with knowledge tracing mechanisms to ameliorate uncertainty misalignment, expand to collaborative multi-agent simulation of diverse classrooms, and iteratively ground agent preferences in expanded human-annotated datasets. As LLM-based teacher education tools mature, careful framework design to balance authenticity, instructional feedback, and operational efficiency will be essential for maximizing impact on professional noticing and classroom responsiveness.

Reference: See "Developing Authentic Simulated Learners for Mathematics Teacher Learning: Insights from Three Approaches with LLMs" (2604.04361).

Markdown Report Issue