Diversity-driven Data Selection for Language Model Tuning through Sparse Autoencoder (2502.14050v2)

Published 19 Feb 2025 in cs.CL, cs.AI, and cs.LG

Abstract: Instruction tuning data are often quantity-saturated due to the large volume of data collection and fast model iteration, leaving data selection important but underexplored. Existing quality-driven data selection methods, such as LIMA (NeurIPS 2023 \citep{zhou2024lima}) and AlpaGasus (ICLR 2024 \citep{chenalpagasus}) generally ignore the equal importance of data diversity and complexity. In this work, we aim to design a diversity-aware data selection strategy and creatively propose using sparse autoencoders (SAEs) to tackle the challenge of data diversity measure. In addition, SAEs can also provide more interpretability of model behavior and explain, e.g., the surprising effectiveness of selecting the longest response (ICML 2024 \citep{zhaolong}). Using effective data selection, we experimentally prove that models trained on our selected data can outperform other methods in terms of model capabilities, reduce training cost, and potentially gain more control over model behaviors. We prove that SAEs can serve as a good alternative to diversity measure and design our method to be scalable for potential industrial large-scale pruning, and we will also release our trained SAEs for use by the broader community.

PDF Abstract

Diversity-driven Data Selection for LLM Tuning through Sparse Autoencoder

The paper "Diversity-driven Data Selection for LLM Tuning through Sparse Autoencoder" presents an innovative approach to optimize data selection for instruction tuning of LLMs. With the prevalent need for aligning LLMs to human instructions effectively, instruction tuning plays an instrumental role. However, the abundance of data due to rapid model advancements makes coreset data selection crucial yet understudied. The authors aim to address this by emphasizing the equal importance of data diversity and complexity in addition to quality, which has been overlooked by existing methods like LIMA and AlpaGasus. To achieve this, they propose a novel strategy utilizing sparse autoencoders (SAEs) to measure data diversity, which also provides interpretability into model behaviors.

Key Contributions and Findings

Diversity-Aware Data Selection: The paper introduces a diversity-aware approach to data selection using sparse autoencoders, setting a new direction for optimizing instructional data. This method measures data diversity through the monosemanticity of activated features in an SAE, which are highly independent and accurate.
Algorithms for Data Selection: Two novel algorithms, SAE-GreedSelect and SAE-SimScale, are introduced. SAE-GreedSelect aims at selecting a limited number of data entries by maximizing feature utilization, while SAE-SimScale scales up the selection through similarity-based sampling. Both methods focus on efficiency and effectiveness, proving superior in various experimental scenarios.
Empirical Validation: The paper demonstrates that models trained with their selected datasets outperform comparative methods in instruction-following capability, showcasing reduced training costs and improved control over model behaviors. These advantages are evident across different datasets, including Alpaca and WizardLM_evol_instruct_70k, and provide insights into why methods like selecting the longest responses, albeit simple, are effective.
Scalability and Flexibility: The proposed methods are not only effective but also scalable across different data sizes. SAE-SimScale, in particular, is shown to yield optimal results with larger datasets, highlighting its robustness.
Comprehensive Evaluation: The paper employs various evaluation methods, including IFEval for strict adherence to complex instructions, LLM- and Human-as-a-Judge for qualitative assessment, and performance metrics on knowledge-intensive benchmarks such as MMLU and ARC.

Implications and Speculations for Future Developments

This research potentially shifts the paradigm in LLM instruction tuning by prioritizing data diversity and leveraging SAEs for interpretable feature extraction. The implications extend beyond mere performance improvement; they suggest a pathway to achieving more controlled, efficient, and scalable instruction-tuning processes.

The approach can theoretically be adapted to optimize various model objectives or constraints, potentially leading to advances in AI's adaptability to diverse and complex human instructions. By elucidating the role of feature richness and diversity, this paper adds a layer of transparency and interpretability to LLM operations, crucial for informed decision-making and robust AI system development.

Future work could expand this framework to other fine-tuning areas, such as preference data selection or safety and bias mitigation strategies. Enhancing the versatility and application scope of these algorithms could further the development of more generalized AI systems adaptable to an extensive range of tasks and scenarios.

In sum, this paper presents significant advancements in using SAEs for data selection, providing clear benefits in LLM tuning and opening up new avenues for research and application in the field of data-centric AI solutions.

PDF Markdown Bookmark Chat (Pro)

Authors (8)

Xianjun Yang (37 papers)
Shaoliang Nie (17 papers)
Lijuan Liu (39 papers)
Suchin Gururangan (29 papers)
Ujjwal Karn (3 papers)
Rui Hou (56 papers)
Madian Khabsa (38 papers)
Yuning Mao (34 papers)

Related Papers

Find Related Papers

Tweets

https://twitter.com/Qnolan4/status/1892811379025072383

https://twitter.com/fly51fly/status/1893048685933797789

https://twitter.com/xianjun_agi/status/1907245095717687366

https://twitter.com/xianjun_agi/status/1907244834940928459

https://twitter.com/GptMaestro/status/1897414679368179747