Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
72 tokens/sec
GPT-4o
61 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning (2402.10110v2)

Published 15 Feb 2024 in cs.CL, cs.AI, and cs.LG
Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning

Abstract: Instruction tuning is critical to LLMs for achieving better instruction following and task adaptation capabilities but its success heavily relies on the training data quality. Many recent methods focus on improving the data quality but often overlook the compatibility of the data with the student model being finetuned. This paper introduces Selective Reflection-Tuning, a novel paradigm that synergizes a teacher LLM's reflection and introspection for improving existing data quality with the data selection capability of the student LLM, to automatically refine existing instruction-tuning data. This teacher-student collaboration produces high-quality and student-compatible instruction-response pairs, resulting in sample-efficient instruction tuning and LLMs of superior performance. Selective Reflection-Tuning is a data augmentation and synthesis that generally improves LLM finetuning and self-improvement without collecting brand-new data. We apply our method to Alpaca and WizardLM data and achieve much stronger and top-tier 7B and 13B LLMs.

Selective Reflection-Tuning: An In-Depth Analysis

The paper presents a novel approach, Selective Reflection-Tuning, aimed at optimizing the instruction tuning process for LLMs. This methodology leverages a collaborative teacher-student model framework to enhance the quality of instruction-response data efficiently, without the need to source new datasets.

Core Concepts

Selective Reflection-Tuning innovatively merges data synthesis with selection processes to improve dataset-quality compatibility for student models. It does so through interlinking reflection capabilities of a teacher model and selection processes of a student model, concentrating on refining existing instruction-tuning data. This synergy results in generating high-quality, student-compatible instruction-response pairs, leading to substantial improvements in LLM performance while ensuring efficient sample usage.

Methodological Framework

The paper delineates a dual-phase process comprising Selective Instruction Reflection and Selective Response Reflection. Initially, a teacher model improves data samples based on instructive criteria, generating new samples. The student model evaluates these using the Instruction-Following Difficulty (IFD) and reversed IFD (r-IFD) scores to decide on their utility, thus ensuring alignment with its statistical characteristics:

  • Instruction-Following Difficulty (IFD): Assesses how effectively an instruction adds value to predicting a resultant response, highlighting difficulty.
  • Reversed IFD (r-IFD): Gauges the feasibility of deducing instructions from responses, emphasizing alignment with the student model's learning capacity.

Through iterative reflection and evaluation, the pipeline yields highly coherent and effective instruction-response datasets, underscoring enhanced self-improvement capabilities in LLMs.

Numerical Insights

The implementation of this method on existing datasets such as Alpaca and WizardLM demonstrated significant performance enhancement in resulting LLMs, as evidenced by evaluations on industry-standard benchmarks like the AlpacaEval and Huggingface Open LLM Leaderboards. Notably, models trained on these refined datasets reached performance levels comparable to, or surpassing, larger models necessitating less computational data.

Implications and Future Directions

This research exhibits profound implications for both theoretical advancements and practical applications in LLM development, facilitating more precise and resource-efficient model training processes. The use of IFD and r-IFD scores allows for a nuanced approach to model-specific data tailoring, bridging gaps in existing approaches that disregard compatibility between data and the target LLM.

Future research could explore extending this reflection-tuning framework to diverse LLM architectures and heterogeneous datasets. Investigations into automating reflection criteria or further refining selection metrics may yield heightened adaptability and efficiency across varying AI landscapes.

Overall, Selective Reflection-Tuning exemplifies a significant leap forward in LLM instruction tuning, promising streamlined, resource-conscious, yet highly effective model training methodologies that align meticulously with student models’ inherent parameters.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862.
  2. Constitutional ai: Harmlessness from ai feedback. arXiv preprint arXiv:2212.08073.
  3. Alpagasus: Training a better alpaca with fewer data.
  4. Claude2-alpaca: Instruction tuning datasets distilled from claude. https://github.com/Lichang-Chen/claude2-alpaca.
  5. Cheng-Han Chiang and Hung-yi Lee. 2023. Can large language models be an alternative to human evaluations? In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15607–15631, Toronto, Canada. Association for Computational Linguistics.
  6. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.
  7. Scaling instruction-finetuned language models. ArXiv, abs/2210.11416.
  8. Think you have solved question answering? try arc, the ai2 reasoning challenge.
  9. Free dolly: Introducing the world’s first truly open instruction-tuned llm.
  10. Flashattention: Fast and memory-efficient exact attention with io-awareness.
  11. Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314.
  12. Enhancing chat language models by scaling high-quality instructional conversations. arXiv preprint arXiv:2305.14233.
  13. GLM: General language model pretraining with autoregressive blank infilling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 320–335, Dublin, Ireland. Association for Computational Linguistics.
  14. Alpacafarm: A simulation framework for methods that learn from human feedback.
  15. A framework for few-shot language model evaluation.
  16. Measuring massive multitask language understanding. In International Conference on Learning Representations.
  17. Large language models can self-improve. arXiv preprint arXiv:2210.11610.
  18. UNIFIEDQA: Crossing format boundaries with a single QA system. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1896–1907, Online. Association for Computational Linguistics.
  19. Diederik P. Kingma and Jimmy Ba. 2017. Adam: A method for stochastic optimization.
  20. Look at the first sentence: Position bias in question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1109–1121, Online. Association for Computational Linguistics.
  21. Rlaif: Scaling reinforcement learning from human feedback with ai feedback. arXiv preprint arXiv:2309.00267.
  22. Generative judge for evaluating alignment.
  23. Reflection-tuning: Recycling data for better instruction-tuning. In NeurIPS 2023 Workshop on Instruction Tuning and Instruction Following.
  24. Superfiltering: Weak-to-strong data filtering for fast instruction-tuning. ArXiv, abs/2402.00530.
  25. From quantity to quality: Boosting llm performance with self-guided data selection for instruction tuning. ArXiv, abs/2308.12032.
  26. Self-alignment with instruction backtranslation. arXiv preprint arXiv:2308.06259.
  27. Alpacaeval: An automatic evaluator of instruction-following models. https://github.com/tatsu-lab/alpaca_eval.
  28. TruthfulQA: Measuring how models mimic human falsehoods. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3214–3252, Dublin, Ireland. Association for Computational Linguistics.
  29. What makes good data for alignment? a comprehensive study of automatic data selection in instruction tuning.
  30. G-eval: Nlg evaluation using gpt-4 with better human alignment.
  31. The flan collection: Designing data and methods for effective instruction tuning. ArXiv, abs/2301.13688.
  32. Cross-task generalization via natural language crowdsourcing instructions. arXiv preprint arXiv:2104.08773.
  33. Orca 2: Teaching small language models how to reason.
  34. OpenAI. 2023. Gpt-4 technical report.
  35. Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems, volume 35, pages 27730–27744. Curran Associates, Inc.
  36. Automatically correcting large language models: Surveying the landscape of diverse self-correction strategies.
  37. Instruction tuning with gpt-4. arXiv preprint arXiv:2304.03277.
  38. Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992, Hong Kong, China. Association for Computational Linguistics.
  39. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca.
  40. Xwin-LM Team. 2023. Xwin-lm.
  41. Llama 2: Open foundation and fine-tuned chat models.
  42. Zephyr: Direct distillation of lm alignment.
  43. Koala: An index for quantifying overlaps with pre-training corpora.
  44. Openchat: Advancing open-source language models with mixed-quality data. arXiv preprint arXiv:2309.11235.
  45. Large language models are not fair evaluators.
  46. Shepherd: A critic for language model generation.
  47. Self-instruct: Aligning language models with self-generated instructions. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13484–13508, Toronto, Canada. Association for Computational Linguistics.
  48. Super-NaturalInstructions: Generalization via declarative instructions on 1600+ NLP tasks. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 5085–5109, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  49. Finetuned language models are zero-shot learners. In International Conference on Learning Representations.
  50. Chain-of-thought prompting elicits reasoning in large language models.
  51. Lamini-lm: A diverse herd of distilled models from large-scale instructions.
  52. Wizardlm: Empowering large language models to follow complex instructions.
  53. Rethinking the instruction quality: Lift is what you need.
  54. Tree of thoughts: Deliberate problem solving with large language models.
  55. CrossFit: A few-shot learning challenge for cross-task generalization in NLP. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7163–7189, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  56. Selfee: Iterative self-revising llm empowered by self-feedback generation. Blog post.
  57. HellaSwag: Can a machine really finish your sentence? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4791–4800, Florence, Italy. Association for Computational Linguistics.
  58. Instruction tuning for large language models: A survey.
  59. Judging llm-as-a-judge with mt-bench and chatbot arena.
  60. Lima: Less is more for alignment.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Ming Li (787 papers)
  2. Lichang Chen (30 papers)
  3. Jiuhai Chen (26 papers)
  4. Shwai He (23 papers)
  5. Jiuxiang Gu (73 papers)
  6. Tianyi Zhou (172 papers)
Citations (30)
Github Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com