Birbal: An efficient 7B instruct-model fine-tuned with curated datasets (2403.02247v1)

Published 4 Mar 2024 in cs.CL

Abstract: LLMOps incur significant costs due to hardware requirements, hindering their widespread accessibility. Additionally, a lack of transparency in model training methods and data contributes to the majority of models being non-reproducible. To tackle these challenges, the LLM Efficiency Challenge was introduced at NeurIPS Workshop, aiming to adapt foundation models on a diverse set of tasks via fine-tuning on a single GPU (RTX 4090 or A100 with 40GB) within a 24-hour timeframe. In this system description paper, we introduce Birbal, our Mistral-7B based winning model, fine-tuned on a single RTX 4090 for 16 hours. Birbal's success lies in curating high-quality instructions covering diverse tasks, resulting in a 35% performance improvement over second-best Qwen-14B based submission.

References (35)

Citations (1)

View on Semantic Scholar

Summary

The paper demonstrates a 35% performance improvement over competitors by fine-tuning Mistral-7B on a single GPU within 24 hours.
The methodology emphasizes meticulous dataset curation and leverages a 4-bit QLoRA fine-tuning approach to optimize performance.
The work underscores the viability of efficient LLM fine-tuning under resource constraints, democratizing access to advanced AI models.

Efficiency in LLM Fine-Tuning Demonstrated by Birbal on a Single GPU

Introduction to Birbal's Success

The landscape of few-shot LLMs has witnessed notable advancements in various NLP tasks. Amidst this progress, the challenges of high operational costs and limited reproducibility due to undisclosed training methodologies persist. Addressing these issues, the LLM Efficiency Challenge was conceived, focusing on the fine-tuning of an open-source foundation model on a single GPU within a 24-hour limit. This paper introduces "Birbal," a Mistral-7B based model that emerged as the winner of this challenge, showcasing a 35% performance improvement over its nearest competitor by leveraging a uniquely curated dataset.

LLM Efficiency Challenge

The challenge encouraged participants to adapt an open-source base LLM for a broad task spectrum, emphasizing efficiency and accessibility. It provided a platform where models like Mistral-7B could be fine-tuned within strict hardware and time constraints, using only open-source data. This initiative highlighted the feasibility of achieving significant LLM advancements without relying on extensive computational resources.

Our Approach

Design Choices

Given the competition's constraints, our strategic choice centered on the Mistral-7B model, considering its optimal balance between size and performance within the limited memory budget. Our focus was twofold: eschewing reliance on hardware optimization in favor of dataset curation and prioritizing high-quality, task-oriented data over quantity.

Data Curation

The curation process sought to assemble a diverse dataset conducive to broad task coverage. This involved meticulous selection and sampling from existing datasets, with the compiled data consisting of prompts and responses crossing various NLP domains. The fine-tuning dataset sizes were directly correlated with the epochs allowable within the time limit, ensuring efficient use of resources.

Fine-Tuning Methodology

Fine-tuning employed 4-bit QLoRA, optimized for the Mistral-7B model within the memory and temporal confines. This process was carefully managed to adhere to the stipulated 24-hour window, with adjustments in dataset size allowing for the necessary epoch completion. Benchmarking against validation sets ensured the selection of the optimal model checkpoint for competition submission.

Evaluation and Results

The Birbal model underwent rigorous assessment through multiple evaluation stages, demonstrating superior performance in a diverse array of tasks. Despite not all variants advancing beyond the initial stages, Birbal-200K notably excelled, underscoring the effectiveness of our fine-tuning strategy and dataset curation in achieving high efficiency on a single GPU setup.

Conclusion and Broader Impact

The development of Birbal exemplifies how strategic dataset curation and fine-tuning approaches can significantly elevate LLM performance under stringent resource constraints. This work contributes to democratizing access to efficient LLM fine-tuning, potentially broadening participation in cutting-edge AI research. Nonetheless, it also underscores the inherent biases present within base models and source datasets, raising crucial considerations for future endeavors in this domain.

Acknowledgments and Reproducibility

The success of the Birbal model could not have been achieved without support from Lambda Labs for compute resources. Our commitment to transparency and reproducibility is evidenced by the public availability of datasets, fine-tuning scripts, and model artifacts, facilitating further exploration and application within the research community.

PDF Markdown

Related Papers

Tweets

https://twitter.com/akjindal53244/status/1767246038598119853

https://twitter.com/dippatel1994/status/1764975151882391771