RECOST: External Knowledge Guided Data-efficient Instruction Tuning (2402.17355v1)
Abstract: In the current landscape of LLMs, the process of instruction tuning serves as an essential step. Considering the high computing power overhead, data-efficient instruction tuning was proposed to reduce the training data size in this process, aiming at selecting high-quality instructional data. Nevertheless, we argue that most current data-efficient instruction-tuning methods are highly dependent on the quality of the original instruction-tuning dataset. When it comes to datasets synthesized by LLMs, a common scenario in this field, dirty samples will even be selected with a higher probability than other samples. To address these challenges, we utilized external knowledge (relevant examples or paragraphs) to evaluate those samples synthesized by LLMs with an in-context-based relative predictive entropy. Based on the new metric, we proposed a framework, dubbed as \textbf{RECOST}, which integrates external-knowledge-base re-ranking and diversity-consistent sampling into a single pipeline. Through extensive experiments on several synthetic datasets (Alpaca and Alpaca-gpt4), we demonstrate the effectiveness of our method and achieve even better results with only \textbf{1\%} of the full dataset.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- Instruction mining: When data mining meets large language model finetuning.
- Maybe only 0.5% data is needed: A preliminary exploration of low training data instruction tuning.
- Alpagasus: Training a better alpaca with fewer data.
- Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
- Shifting attention to relevance: Towards the uncertainty estimation of large language models.
- A framework for few-shot language model evaluation.
- Measuring massive multitask language understanding.
- Language models (mostly) know what they know.
- From quantity to quality: Boosting llm performance with self-guided data selection for instruction tuning. ArXiv, abs/2308.12032.
- Alpacaeval: An automatic evaluator of instruction-following models. https://github.com/tatsu-lab/alpaca_eval.
- One shot learning as instruction data prospector for large language models. arXiv preprint arXiv:2312.10302.
- Truthfulqa: Measuring how models mimic human falsehoods.
- What makes good data for alignment? a comprehensive study of automatic data selection in instruction tuning. In The Twelfth International Conference on Learning Representations.
- The flan collection: Designing data and methods for effective instruction tuning. arXiv preprint arXiv:2301.13688.
- Wizardmath: Empowering mathematical reasoning for large language models via reinforced evol-instruct. arXiv preprint arXiv:2308.09583.
- In-context learning with retrieved demonstrations for language models: A survey.
- Wizardcoder: Empowering code large language models with evol-instruct. arXiv preprint arXiv:2306.08568.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
- Instruction tuning with gpt-4. arXiv preprint arXiv:2304.03277.
- Learning to retrieve prompts for in-context learning. arXiv preprint arXiv:2112.08633.
- Ozan Sener and Silvio Savarese. 2018. Active learning for convolutional neural networks: A core-set approach. In International Conference on Learning Representations.
- Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca.
- Learning to retrieve in-context examples for large language models.
- How far can camels go? exploring the state of instruction tuning on open resources.
- Self-instruct: Aligning language model with self generated instructions.
- Finetuned language models are zero-shot learners. In International Conference on Learning Representations.
- Wizardlm: Empowering large language models to follow complex instructions. arXiv preprint arXiv:2304.12244.
- Quick and (not so) dirty: Unsupervised selection of justification sentences for multi-hop question answering. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics.
- Hellaswag: Can a machine really finish your sentence?
- Retrieve anything to augment large language models.
- Lima: Less is more for alignment.
- Qi Zhang (784 papers)
- Yiming Zhang (128 papers)
- Haobo Wang (45 papers)
- Junbo Zhao (86 papers)