An Experimental Design Framework for Label-Efficient Supervised Finetuning of Large Language Models (2401.06692v3)
Abstract: Supervised finetuning (SFT) on instruction datasets has played a crucial role in achieving the remarkable zero-shot generalization capabilities observed in modern LLMs. However, the annotation efforts required to produce high quality responses for instructions are becoming prohibitively expensive, especially as the number of tasks spanned by instruction datasets continues to increase. Active learning is effective in identifying useful subsets of samples to annotate from an unlabeled pool, but its high computational cost remains a barrier to its widespread applicability in the context of LLMs. To mitigate the annotation cost of SFT and circumvent the computational bottlenecks of active learning, we propose using experimental design. Experimental design techniques select the most informative samples to label, and typically maximize some notion of uncertainty and/or diversity. In our work, we implement a framework that evaluates several existing and novel experimental design techniques and find that these methods consistently yield significant gains in label efficiency with little computational overhead. On generative tasks, our methods achieve the same generalization performance with only $50\%$ of annotation cost required by random sampling.
- Gone fishing: Neural active learning with fisher embeddings. Advances in Neural Information Processing Systems, 34:8927–8939.
- Deep batch active learning by diverse, uncertain gradient lower bounds. arXiv preprint arXiv:1906.03671.
- Training connectionist networks with queries and selective sampling. Advances in neural information processing systems, 2.
- Agnostic active learning. In Proceedings of the 23rd international conference on Machine learning, pages 65–72.
- The power of ensembles for active learning in image classification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 9368–9377.
- Jeff Bilmes. 2022. Submodularity in machine learning and artificial intelligence. arXiv preprint arXiv:2202.00132.
- Alexander Bukharin and Tuo Zhao. 2023. Data diversity matters for robust instruction tuning.
- Alpagasus: Training a better alpaca with fewer data.
- Active representation learning for general task space with applications in robotics. arXiv preprint arXiv:2306.08942.
- Batch active learning at scale. Advances in Neural Information Processing Systems, 34:11933–11944.
- Selection via proxy: Efficient data selection for deep learning. In International Conference on Learning Representations.
- Combinatorial optimisation. Wiley-Interscience Series in Discrete Mathematics and Optimization, USA, 1:998.
- Accelerating batch active learning using continual learning techniques.
- Mods: Model-oriented data selection for instruction tuning.
- Melanie Ducoffe and Frederic Precioso. 2018. Adversarial active learning for deep networks: a margin based approach. arXiv preprint arXiv:1802.09841.
- Reza Zanjirani Farahani and Masoud Hekmatfar. 2009. Facility Location — link.springer.com. https://link.springer.com/book/10.1007/978-3-7908-2151-2. [Accessed 10-01-2024].
- Efficiently identifying task groupings for multi-task learning. Advances in Neural Information Processing Systems, 34:27503–27516.
- Deep bayesian active learning with image data. In International Conference on Machine Learning, pages 1183–1192. PMLR.
- Yonatan Geifman and Ran El-Yaniv. 2017. Deep active learning over the long tail. arXiv preprint arXiv:1711.00941.
- Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300.
- Unnatural instructions: Tuning language models with (almost) no human labor.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.
- Active learning with support vector machines. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 4(4):313–326.
- Active instruction tuning: Improving cross-task generalization by training on prompt sensitive tasks. arXiv preprint arXiv:2311.00288.
- From quantity to quality: Boosting llm performance with self-guided data selection for instruction tuning.
- The flan collection: Designing data and methods for effective instruction tuning. arXiv preprint arXiv:2301.13688.
- Michel Minoux. 2005. Accelerated greedy algorithms for maximizing submodular set functions. In Optimization Techniques: Proceedings of the 8th IFIP Conference on Optimization Techniques Würzburg, September 5–9, 1977, pages 234–243. Springer.
- Pitu B Mirchandani and Richard L Francis. 1990. Discrete location theory.
- Lazier than lazy greedy.
- Coresets for data-efficient training of machine learning models. In International Conference on Machine Learning, pages 6950–6960. PMLR.
- Cross-task generalization via natural language crowdsourcing instructions. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3470–3487, Dublin, Ireland. Association for Computational Linguistics.
- An analysis of approximations for maximizing submodular set functions—i. Mathematical programming, 14:265–294.
- Direct: Deep active learning under imbalance and label noise. arXiv preprint arXiv:2312.09196.
- Instruction tuning with gpt-4.
- Active learning for natural language generation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 9862–9877, Singapore. Association for Computational Linguistics.
- Luc Pronzato and Andrej Pázman. 2013. Design of experiments in nonlinear models. Lecture notes in statistics, 212(1).
- Friedrich Pukelsheim. 2006. Optimal design of experiments. SIAM.
- Ozan Sener and Silvio Savarese. 2017. Active learning for convolutional neural networks: A core-set approach. arXiv preprint arXiv:1708.00489.
- Ozan Sener and Silvio Savarese. 2018. Active learning for convolutional neural networks: A core-set approach. In International Conference on Learning Representations.
- Burr Settles. 2009. Active learning literature survey.
- Burr Settles. 2011. From theories to queries: Active learning in practice. In Active Learning and Experimental Design workshop In conjunction with AISTATS 2010, volume 16 of Proceedings of Machine Learning Research, pages 1–18, Sardinia, Italy. PMLR.
- Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint arXiv:2206.04615.
- Challenging big-bench tasks and whether chain-of-thought can solve them. arXiv preprint arXiv:2210.09261.
- Simon Tong and Daphne Koller. 2001. Support vector machine active learning with applications to text classification. Journal of machine learning research, 2(Nov):45–66.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
- Information-Based Complexity. Academic Press, New York.
- Improved active multi-task representation learning via lasso. In International Conference on Machine Learning, pages 35548–35578. PMLR.
- How far can camels go? exploring the state of instruction tuning on open resources. arXiv preprint arXiv:2306.04751.
- Self-instruct: Aligning language models with self-generated instructions.
- Super-naturalinstructions: Generalization via declarative instructions on 1600+ nlp tasks.
- Finetuned language models are zero-shot learners.
- Submodularity in data subset selection and active learning. In International conference on machine learning, pages 1954–1963. PMLR.
- Zeroprompt: Scaling prompt-based pretraining to 1,000 tasks improves zero-shot generalization.
- Cit: Curation in training for effective vision-language data. arXiv preprint arXiv:2301.02241.
- Labelbench: A comprehensive framework for benchmarking adaptive label-efficient learning. arXiv preprint arXiv:2306.09910.
- Galaxy: Graph-based active learning at the extreme. In International Conference on Machine Learning, pages 26223–26238. PMLR.
- Lima: Less is more for alignment.
- Gantavya Bhatt (13 papers)
- Yifang Chen (31 papers)
- Arnav M. Das (4 papers)
- Jifan Zhang (17 papers)
- Sang T. Truong (12 papers)
- Stephen Mussmann (15 papers)
- Yinglun Zhu (17 papers)
- Jeffrey Bilmes (6 papers)
- Simon S. Du (120 papers)
- Kevin Jamieson (72 papers)
- Jordan T. Ash (18 papers)
- Robert D. Nowak (34 papers)