LESS: An Efficient Algorithm for Targeted Instruction Tuning in LLMs
Introduction to LESS
LLMs have gained significant traction for their ability to serve as general-purpose chatbots, capable of generating human-like text based on provided instructions. However, for real-world applications that demand specialized capabilities, such as advanced reasoning, the challenge of sifting through extensive instruction tuning datasets to identify and utilize the most relevant data becomes apparent. This process, termed "targeted instruction tuning," is crucial for developing specific skills within LLMs without having to train on the entire dataset, which may contain irrelevant or even counterproductive information.
The proposed solution to this challenge is the algorithm LESS (Low-rank gradiEnt Similarity Search), which represents a novel method for selecting influential data from large instruction tuning datasets. LESS operates by effectively estimating data influences using optimizer-aware formulations and performing a low-rank gradient similarity search to pinpoint the examples most pertinent to enhancing the model's performance on a given task.
LESS: The Underlying Mechanism
Compatibility with Instruction Tuning
At its core, LESS modifies existing influence estimation methods to work efficiently with the Adam optimizer and manage variable-length instruction data. These adaptations are crucial given that LLMs often use Adam for fine-tuning due to its ability to handle sparse gradients and adjust learning rates automatically.
Efficiency Through LoRA and Random Projections
To address the computational and storage overhead associated with large model parameters, LESS employs LoRA (Low-Rank Adaptations) and random projection techniques to construct a gradient datastore. This datastore, consisting of low-dimensional gradient features, allows for efficient and effective dataset selection while being reusable for new target tasks, thus significantly reducing the computational cost.
Transferable Knowledge Across Models
A significant advantage of LESS is its ability to select data using gradients from smaller models to induce strong performance in larger models or even different model families. This transferability is crucial for practical applications where computational resources may be limited.
Interpretable Data Selection
LESS diverges from traditional methods that often rely on surface form cues for data selection. Instead, it focuses on identifying data that showcases similar reasoning and skill types required for the target task. This approach ensures that the selected data aligns more closely with the specific capabilities being targeted, rather than merely matching on language or topic.
Experimental Findings and Implications
The effectiveness of LESS is demonstrated through experiments on diverse downstream tasks, where training on only a 5% subset of data selected by LESS often outperforms training on the full dataset. This outcome underscores the potential for LESS to enable more focused and efficient training protocols, especially in scenarios where dataset size significantly outstrips the in-domain data necessary for specialized tasks.
Additionally, the ability of LESS to select transferable data across models introduces a promising avenue for reducing the computational costs associated with data selection and model training. Smaller models can be utilized to curate training datasets for larger, more complex models, facilitating a more resource-efficient workflow without compromising performance.
The Road Ahead
While LESS presents a significant advance in targeted instruction tuning for LLMs, several avenues remain open for further exploration. These include extending LESS for real-time model adaptation, optimizing the algorithm for even greater efficiency, and investigating its potential for reducing unintended model biases by selectively focusing on data that promotes fairness and inclusivity.
In summary, LESS stands as a testament to the potential of intelligent data selection in unlocking more specialized and efficient capabilities within the field of LLMs, paving the way for their broader application across a myriad of tasks demanding high degrees of specificity and complexity.