Affordably Fine-tuned LLMs Provide Better Answers to Course-specific MCQs (2501.05891v2)

Published 10 Jan 2025 in cs.CL and cs.AI

Abstract: In education, the capability of generating human-like text of LLMs inspired work on how they can increase the efficiency of learning and teaching. We study the affordability of these models for educators and students by investigating how LLMs answer multiple-choice questions (MCQs) with respect to hardware constraints and refinement techniques. We explore this space by using generic pre-trained LLMs (the 7B, 13B, and 70B variants of LLaMA-2) to answer 162 undergraduate-level MCQs from a course on Programming Languages (PL) -- the MCQ dataset is a contribution of this work, which we make publicly available. Specifically, we dissect how different factors, such as using readily-available material -- (parts of) the course's textbook -- for fine-tuning and quantisation (to decrease resource usage) can change the accuracy of the responses. The main takeaway is that smaller textbook-based fine-tuned models outperform generic larger ones (whose pre-training requires conspicuous resources), making the usage of LLMs for answering MCQs resource- and material-wise affordable.

Summary

The paper demonstrates that fine-tuning quantised LLaMA-2 models significantly boosts performance on undergraduate programming MCQs.
The study uses LoRa and qLora methods to optimize memory usage, enabling cost-effective deployment on resource-constrained hardware.
Results show that smaller models, especially the 13B variant, can rival larger models when fine-tuned with course-specific content.

Affordably Fine-tuned LLMs Provide Better Answers to Course-specific MCQs

In the examination of educational applications of LLMs, the paper "Affordably Fine-tuned LLMs Provide Better Answers to Course-specific MCQs" provides a meticulous analysis of how LLMs can be tailored to perform efficiently within specific academic domains. The authors, Bianca Raimondi, Saverio Giallorenzo, and Maurizio Gabbrielli, focus on the domain of Programming Languages (PL) at the undergraduate level, exploring the ability of variously fine-tuned LLaMA-2 models to answer multiple-choice questions (MCQs) effectively in this context.

The central premise of the paper lies in the affordability and practicality of leveraging smaller, fine-tuned LLMs over their larger pre-trained counterparts. The authors evaluated three variants of LLaMA-2—7B, 13B, and 70B—by gauging their performance on a newly developed dataset comprising 162 MCQs, categorized across various PL topics. Remarkably, the paper makes a dataset contribution by making this collection of MCQs publicly accessible.

Methodological Overview

The research employs a structured approach, initially assessing the competence of generic LLaMA-2 variants. Their findings indicate that the lighter models, specifically the 7B and 13B quantised versions, can perform comparably to larger ones when fine-tuned with specific material, such as textbooks or course notes. This fine-tuning process relies on the LoRa and qLora methods, which are designed to reduce the number of parameters that require updating, thereby minimizing memory usage and making the process feasible on resource-constrained hardware.

A notable aspect of this methodology is its inquiry into the quantised alternatives of LLaMA-2, which offer a means to deploy these models on cost-effective hardware configurations, aligning with the paper's emphasis on affordability. The research underscores the potential of these quantised, fine-tuned models to maintain or even surpass the performance of larger pre-trained models when answering course-specific MCQs, positing a practical advantage for educational settings.

Results and Implications

The experimental results affirm that fine-tuning smaller variants of LLaMA-2 models enhances their accuracy significantly. In particular, the 13B quantised variant consistently outperformed expectations, evidencing the potential of quantised fine-tuning to produce accurate and efficient pedagogical tools. Importantly, the research reveals that, when related textbook chapters are employed in fine-tuning, these models achieve marked improvements in domain-specific accuracy.

One crucial observation is the model’s susceptibility to catastrophic forgetting, a phenomenon where smaller models tend to lose previously acquired information after fine-tuning. However, this was mitigated by choosing appropriate hyperparameters and fine-tuning strategies. Additionally, the correlation paper within the paper highlights the varying impacts of these hyperparameters, elucidating that quantisation, learning rate, and the fine-tuning dataset profoundly influence the performances of the models.

The theoretical implications signal a refined understanding of optimal model size and tuning strategies within educational systems. Practically, the ability of institutions to deploy these refined LLMs on consumer hardware heralds a democratization of AI tools in learning environments, enhancing educational access and customization without imposing significant cost burdens.

Future Prospects

The paper identifies several avenues for future exploration, such as expanding this investigatory framework to encompass diverse academic disciplines beyond Programming Languages. This could involve adapting multimodal models, integrating visual content, and evaluating the blend of AI-generated pedagogical support with human oversight in teaching practices.

Moreover, the work points towards developing strategies that could alleviate the hardware demands of larger models, potentially expanding the utility of such sophisticated LLMs in varied educational contexts. Investigating cloud-based solutions or more economical resource allocation strategies further represents a pragmatic direction for scaling these benefits.

Overall, the paper contributes a substantive analysis of the utility and practicality of fine-tuned LLMs in educational settings, presenting a compelling case for their potential to support and enhance academic learning outcomes within resource-constrained environments.

PDF Markdown

Related Papers

Tweets

https://twitter.com/rohanpaul_ai/status/1879118983900852607