Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
51 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Rethinking the Instruction Quality: LIFT is What You Need (2312.11508v2)

Published 12 Dec 2023 in cs.CL and cs.AI

Abstract: Instruction tuning, a specialized technique to enhance LLM performance via instruction datasets, relies heavily on the quality of employed data. Existing quality improvement methods alter instruction data through dataset expansion or curation. However, the expansion method risks data redundancy, potentially compromising LLM performance, while the curation approach confines the LLM's potential to the original dataset. Our aim is to surpass the original data quality without encountering these shortcomings. To achieve this, we propose LIFT (LLM Instruction Fusion Transfer), a novel and versatile paradigm designed to elevate the instruction quality to new heights. LIFT strategically broadens data distribution to encompass more high-quality subspaces and eliminates redundancy, concentrating on high-quality segments across overall data subspaces. Experimental results demonstrate that, even with a limited quantity of high-quality instruction data selected by our paradigm, LLMs not only consistently uphold robust performance across various tasks but also surpass some state-of-the-art results, highlighting the significant improvement in instruction quality achieved by our paradigm.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Yang Xu (277 papers)
  2. Yongqiang Yao (21 papers)
  3. Yufan Huang (20 papers)
  4. Mengnan Qi (5 papers)
  5. Maoquan Wang (7 papers)
  6. Bin Gu (86 papers)
  7. Neel Sundaresan (38 papers)
Citations (25)