MathEDU: Towards Adaptive Feedback for Student Mathematical Problem-Solving (2505.18056v1)

Published 23 May 2025 in cs.CL

Abstract: Online learning enhances educational accessibility, offering students the flexibility to learn anytime, anywhere. However, a key limitation is the lack of immediate, personalized feedback, particularly in helping students correct errors in math problem-solving. Several studies have investigated the applications of LLMs in educational contexts. In this paper, we explore the capabilities of LLMs to assess students' math problem-solving processes and provide adaptive feedback. The MathEDU dataset is introduced, comprising authentic student solutions annotated with teacher feedback. We evaluate the model's ability to support personalized learning in two scenarios: one where the model has access to students' prior answer histories, and another simulating a cold-start context. Experimental results show that the fine-tuned model performs well in identifying correctness. However, the model still faces challenges in generating detailed feedback for pedagogical purposes.

Summary

The paper explores using large language models (LLMs) to provide adaptive feedback for students solving mathematical problems, evaluating their capabilities on the MathEDU dataset of annotated student solutions.
Key findings indicate that larger LLMs like Llama3 70B perform better in assessing answer accuracy but struggle with generating detailed pedagogical feedback, even with fine-tuning.
The research suggests future work should focus on enhancing LLMs' ability to generate concise and accurate feedback, potentially by integrating dynamic student profiles or expanding analysis to other educational disciplines.

Analyzing MathEDU: Adaptive Feedback in Student Mathematical Problem-Solving

The paper "MathEDU: Towards Adaptive Feedback for Student Mathematical Problem-Solving" introduces an innovative exploration into leveraging LLMs to improve the educational process for students engaged in mathematical problem-solving tasks. It focuses on the development and evaluation of an AI-based system aimed at providing personalized and adaptive feedback to students as they navigate mathematical problems. This model is evaluated within the context of the MathEDU dataset, which comprises authentic student solutions accompanied by expert teacher annotations.

Dataset and Methodology

The MathEDU dataset is a pivotal component of this research, containing 4,048 annotated entries where student solutions to GRE-level mathematical problems have been meticulously reviewed and graded by mathematics experts. The dataset not only includes the final accuracy of student responses but also records the detailed problem-solving processes provided by each student, along with teacher feedback identifying errors and offering corrective guidance. Error types are categorized systematically into "Wrong Mathematical Operation/Concept," "Calculation Error," "Incomplete Answer," and others. This structured approach provides the necessary foundation for evaluating LLMs' capacity to analyze student reasoning and provide constructive feedback.

The methodology involves fine-tuning LLMs using LoRA adaptation strategies across different training paradigms: single-task, multi-task, and end-to-end. These methods are tested on two distinct dataset splitting strategies simulating real-world scenarios: accessing students' prior answering histories and cold-start scenarios without prior data. Few-shot prompting and zero-shot settings are employed in leveraging the models. Llama3 8B, Llama3 70B, and GPT-3.5 serve as primary models, compared against an advanced model, o1-mini.

Key Findings

The results indicate that larger LLMs, specifically Llama3 70B, exhibit significantly better performance in assessing answer accuracy and identifying correct problem-solving paths compared to smaller models such as Llama3 8B. While fine-tuning enhances model effectiveness, especially in identifying correctness, it remains limited in generating detailed pedagogical feedback, highlighting an area for further improvement.

For the error identification task, LLMs demonstrated difficulty in accurately pinpointing erroneous steps, particularly when rationales were introduced, potentially due to format discrepancies between rationales and student solutions. However, the end-to-end model performed decisively better in utilizing rationales.

Given the complex feedback generation requirements, the models' abilities were challenged, affecting their performance. Large models tend to produce verbose, unfocused feedback, whereas fine-tuned models frequently underperform due to their limited reasoning capabilities.

Implications and Future Directions

This research underscores several implications for the future development of AI-driven educational tools. While LLMs already show promise in automating answer accuracy assessments, additional refinement in their ability to generate adaptive feedback is essential for practical applications. This includes enhancing models' interpretative capabilities to produce concise yet accurate pedagogical suggestions for diverse student problem-solving styles.

Future work should explore integrating dynamic student profiles into model architectures, or embedding strategies to better understand and represent individual learning progress and style variations. Additionally, expanding beyond mathematical contexts to assess LLM capabilities across various educational disciplines may yield comprehensive insights into their broader applicability.

Ultimately, this paper presents a methodical approach to exploring LLMs' potential as educational assistants, particularly in mathematics. It challenges existing models to adapt their reasoning and feedback capabilities to better serve the needs of educators and students alike in fostering a more personalized and effective learning environment.