Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

RECOST: External Knowledge Guided Data-efficient Instruction Tuning (2402.17355v1)

Published 27 Feb 2024 in cs.CL

Abstract: In the current landscape of LLMs, the process of instruction tuning serves as an essential step. Considering the high computing power overhead, data-efficient instruction tuning was proposed to reduce the training data size in this process, aiming at selecting high-quality instructional data. Nevertheless, we argue that most current data-efficient instruction-tuning methods are highly dependent on the quality of the original instruction-tuning dataset. When it comes to datasets synthesized by LLMs, a common scenario in this field, dirty samples will even be selected with a higher probability than other samples. To address these challenges, we utilized external knowledge (relevant examples or paragraphs) to evaluate those samples synthesized by LLMs with an in-context-based relative predictive entropy. Based on the new metric, we proposed a framework, dubbed as \textbf{RECOST}, which integrates external-knowledge-base re-ranking and diversity-consistent sampling into a single pipeline. Through extensive experiments on several synthetic datasets (Alpaca and Alpaca-gpt4), we demonstrate the effectiveness of our method and achieve even better results with only \textbf{1\%} of the full dataset.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  2. Instruction mining: When data mining meets large language model finetuning.
  3. Maybe only 0.5% data is needed: A preliminary exploration of low training data instruction tuning.
  4. Alpagasus: Training a better alpaca with fewer data.
  5. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
  6. Shifting attention to relevance: Towards the uncertainty estimation of large language models.
  7. A framework for few-shot language model evaluation.
  8. Measuring massive multitask language understanding.
  9. Language models (mostly) know what they know.
  10. From quantity to quality: Boosting llm performance with self-guided data selection for instruction tuning. ArXiv, abs/2308.12032.
  11. Alpacaeval: An automatic evaluator of instruction-following models. https://github.com/tatsu-lab/alpaca_eval.
  12. One shot learning as instruction data prospector for large language models. arXiv preprint arXiv:2312.10302.
  13. Truthfulqa: Measuring how models mimic human falsehoods.
  14. What makes good data for alignment? a comprehensive study of automatic data selection in instruction tuning. In The Twelfth International Conference on Learning Representations.
  15. The flan collection: Designing data and methods for effective instruction tuning. arXiv preprint arXiv:2301.13688.
  16. Wizardmath: Empowering mathematical reasoning for large language models via reinforced evol-instruct. arXiv preprint arXiv:2308.09583.
  17. In-context learning with retrieved demonstrations for language models: A survey.
  18. Wizardcoder: Empowering code large language models with evol-instruct. arXiv preprint arXiv:2306.08568.
  19. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
  20. Instruction tuning with gpt-4. arXiv preprint arXiv:2304.03277.
  21. Learning to retrieve prompts for in-context learning. arXiv preprint arXiv:2112.08633.
  22. Ozan Sener and Silvio Savarese. 2018. Active learning for convolutional neural networks: A core-set approach. In International Conference on Learning Representations.
  23. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca.
  24. Learning to retrieve in-context examples for large language models.
  25. How far can camels go? exploring the state of instruction tuning on open resources.
  26. Self-instruct: Aligning language model with self generated instructions.
  27. Finetuned language models are zero-shot learners. In International Conference on Learning Representations.
  28. Wizardlm: Empowering large language models to follow complex instructions. arXiv preprint arXiv:2304.12244.
  29. Quick and (not so) dirty: Unsupervised selection of justification sentences for multi-hop question answering. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics.
  30. Hellaswag: Can a machine really finish your sentence?
  31. Retrieve anything to augment large language models.
  32. Lima: Less is more for alignment.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Qi Zhang (784 papers)
  2. Yiming Zhang (128 papers)
  3. Haobo Wang (45 papers)
  4. Junbo Zhao (86 papers)
Citations (6)