SelectIT: Selective Instruction Tuning for LLMs via Uncertainty-Aware Self-Reflection (2402.16705v2)
Abstract: Instruction tuning (IT) is crucial to tailoring LLMs towards human-centric interactions. Recent advancements have shown that the careful selection of a small, high-quality subset of IT data can significantly enhance the performance of LLMs. Despite this, common approaches often rely on additional models or data, which increases costs and limits widespread adoption. In this work, we propose a novel approach, termed SelectIT, that capitalizes on the foundational capabilities of the LLM itself. Specifically, we exploit the intrinsic uncertainty present in LLMs to more effectively select high-quality IT data, without the need for extra resources. Furthermore, we introduce a curated IT dataset, the Selective Alpaca, created by applying SelectIT to the Alpaca-GPT4 dataset. Empirical results demonstrate that IT using Selective Alpaca leads to substantial model ability enhancement. The robustness of SelectIT has also been corroborated in various foundation models and domain-specific tasks. Our findings suggest that longer and more computationally intensive IT data may serve as superior sources of IT, offering valuable insights for future research in this area. Data, code, and scripts are freely available at https://github.com/Blue-Raincoat/SelectIT.
- Gpt-4 technical report. ArXiv preprint, abs/2303.08774.
- Instruction mining: High-quality instruction data selection for large language models. ArXiv preprint, abs/2307.06290.
- Alpagasus: Training a better alpaca with fewer data. ArXiv preprint, abs/2307.08701.
- Evaluating large language models trained on code. ArXiv preprint, abs/2107.03374.
- Improving translation faithfulness of large language models via augmenting instructions. arXiv preprint arXiv:2308.12674.
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.
- Scaling instruction-finetuned language models. ArXiv preprint, abs/2210.11416.
- Training verifiers to solve math word problems. ArXiv preprint, abs/2110.14168.
- Progressive multi-granularity training for non-autoregressive translation. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 2797–2803, Online. Association for Computational Linguistics.
- Understanding and improving lexical choice in non-autoregressive translation. In International Conference on Learning Representations.
- Alpacafarm: A simulation framework for methods that learn from human feedback.
- Towards boosting many-to-many multilingual machine translation with large language models. ArXiv preprint, abs/2401.05861.
- Measuring massive multitask language understanding. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net.
- Geoffrey E. Hinton and Sam T. Roweis. 2002. Stochastic neighbor embedding. In Advances in Neural Information Processing Systems 15 [Neural Information Processing Systems, NIPS 2002, December 9-14, 2002, Vancouver, British Columbia, Canada], pages 833–840. MIT Press.
- Unnatural instructions: Tuning language models with (almost) no human labor. ArXiv preprint, abs/2212.09689.
- Camels in a changing climate: Enhancing lm adaptation with tulu 2.
- Mistral 7b. ArXiv preprint, abs/2310.06825.
- Active instruction tuning: Improving cross-task generalization by training on prompt sensitive tasks. ArXiv preprint, abs/2311.00288.
- From quantity to quality: Boosting llm performance with self-guided data selection for instruction tuning. ArXiv preprint, abs/2308.12032.
- Self-alignment with instruction backtranslation. ArXiv preprint, abs/2308.06259.
- Consisttl: Modeling consistency in transfer learning for low-resource neural machine translation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, page 8383–8394, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- What makes good data for alignment? a comprehensive study of automatic data selection in instruction tuning. ArXiv preprint, abs/2312.15685.
- Norm-based curriculum learning for neural machine translation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 427–436, Online. Association for Computational Linguistics.
- On the complementarity between pre-training and back-translation for neural machine translation. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 2900–2907, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- On the copying behaviors of pre-training for neural machine translation. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 4265–4275, Online. Association for Computational Linguistics.
- Understanding and improving encoder layer fusion in sequence-to-sequence learning. In International Conference on Learning Representations.
- Shared-private bilingual word embeddings for neural machine translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3613–3622, Florence, Italy. Association for Computational Linguistics.
- The flan collection: Designing data and methods for effective instruction tuning. ArXiv preprint, abs/2301.13688.
- The refinedweb dataset for falcon llm: outperforming curated corpora with web data, and web data only. ArXiv preprint, abs/2306.01116.
- Instruction tuning with gpt-4. ArXiv preprint, abs/2304.03277.
- Towards making the most of ChatGPT for machine translation. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 5622–5633, Singapore. Association for Computational Linguistics.
- Principle-driven self-alignment of language models from scratch with minimal human supervision. ArXiv preprint, abs/2305.03047.
- Challenging big-bench tasks and whether chain-of-thought can solve them. ArXiv preprint, abs/2210.09261.
- Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca.
- Llama: Open and efficient foundation language models. ArXiv preprint, abs/2302.13971.
- Llama 2: Open foundation and fine-tuned chat models. ArXiv preprint, abs/2307.09288.
- How far can camels go? exploring the state of instruction tuning on open resources.
- Breaking the representation bottleneck of Chinese characters: Neural machine translation with stroke sequence modeling. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 6473–6484, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Finetuned language models are zero-shot learners. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
- Self-evolved diverse data sampling for efficient instruction tuning. ArXiv preprint, abs/2311.08182.
- Wizardlm: Empowering large language models to follow complex instructions. ArXiv preprint, abs/2304.12244.
- A paradigm shift in machine translation: Boosting translation performance of large language models.
- Bigtranslate: Augmenting large language models with multilingual translation capability over 100 languages. arXiv preprint arXiv:2305.18098.
- Tim: Teaching large language models to translate with comparison. ArXiv preprint, abs/2307.04408.
- Meta-curriculum learning for domain adaptation in neural machine translation. Proceedings of the AAAI Conference on Artificial Intelligence, 35(16):14310–14318.
- Bayling: Bridging cross-lingual alignment and instruction following through interactive translation for large language models. arXiv preprint arXiv:2306.10968.
- Long is more for alignment: A simple but tough-to-beat baseline for instruction fine-tuning.
- Lima: Less is more for alignment. ArXiv preprint, abs/2305.11206.
- Uncertainty-aware curriculum learning for neural machine translation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6934–6944, Online. Association for Computational Linguistics.