Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DavIR: Data Selection via Implicit Reward for Large Language Models (2310.13008v2)

Published 16 Oct 2023 in cs.LG, cs.AI, and cs.CL

Abstract: We introduce DavIR, a model-based data selection method for post-training LLMs. DavIR generalizes Reducible Holdout Loss to core-set selection problem of causal LLMing, and quantifies the learnability of a given datum with respect to a pre-trained LLM based on relative reduction in loss during fine-tuning, a metric we show to be closely related to the implicit reward model described in Direct Preference Optimization (DPO). We show that 6% of Alpaca dataset selected with DavIR can steer both the LLaMA and Gemma model family to produce superior performance compared to the same models trained on the full 52K dataset. We also show that Alpaca dataset compressed with DavIR can be combined with GSM8K dataset to effectively balance open-domain freeform QA and mathematical reasoning capabilities. Finally, we apply the DavIR objective to DPO and develop a normalized DavIR-DPO objective which improves alignment performance of Zephyr-7B-SFT model by 8% (relative) on AlpacaEval, compared against training on vanilla DPO objective.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. Training a helpful and harmless assistant with reinforcement learning from human feedback, 2022.
  2. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  3. Instruction mining: High-quality instruction data selection for large language models, 2023.
  4. Maybe only 0.5% data is needed: A preliminary exploration of low training data instruction tuning. arXiv preprint arXiv:2305.09246, 2023a.
  5. Alpagasus: Training a better alpaca with fewer data, 2023b.
  6. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
  7. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416, 2022.
  8. Chatlaw: Open-source legal large language model with integrated external knowledge bases, 2023.
  9. Koala: A dialogue model for academic research. Blog post, April 2023. URL https://bair.berkeley.edu/blog/2023/04/03/koala/.
  10. Exploring the impact of instruction data scaling on large language models: An empirical study on real-world use cases. arXiv preprint arXiv:2303.14742, 2023.
  11. Openassistant conversations – democratizing large language model alignment, 2023.
  12. Self-alignment with instruction backtranslation, 2023a.
  13. Alpacaeval: An automatic evaluator of instruction-following models. https://github.com/tatsu-lab/alpaca_eval, 2023b.
  14. Prioritized training on points that are learnable, worth learning, and not yet learnt, 2022.
  15. OpenAI. Gpt-4 technical report, 2023.
  16. Distributionally robust language modeling, 2019.
  17. Training language models to follow instructions with human feedback, 2022.
  18. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  19. Large language models encode clinical knowledge. arXiv preprint arXiv:2212.13138, 2022.
  20. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca, 2023.
  21. Llama: Open and efficient foundation language models, 2023.
  22. Self-instruct: Aligning language model with self generated instructions. arXiv preprint arXiv:2212.10560, 2022.
  23. Bloomberggpt: A large language model for finance, 2023.
  24. Doremi: Optimizing data mixtures speeds up language model pretraining, 2023a.
  25. Data selection for language models via importance resampling, 2023b.
  26. Wizardlm: Empowering large language models to follow complex instructions. arXiv preprint arXiv:2304.12244, 2023.
  27. Judging llm-as-a-judge with mt-bench and chatbot arena, 2023.
  28. Lima: Less is more for alignment. arXiv preprint arXiv:2305.11206, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Haotian Zhou (8 papers)
  2. Tingkai Liu (9 papers)
  3. Qianli Ma (77 papers)
  4. Jianbo Yuan (33 papers)
  5. Pengfei Liu (191 papers)
  6. Yang You (173 papers)
  7. Hongxia Yang (130 papers)
  8. Yufeng Zhang (67 papers)
Citations (6)