Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Explore-Instruct: Enhancing Domain-Specific Instruction Coverage through Active Exploration (2310.09168v3)

Published 13 Oct 2023 in cs.CL

Abstract: Instruction-tuning can be substantially optimized through enhanced diversity, resulting in models capable of handling a broader spectrum of tasks. However, existing data employed for such tuning often exhibit an inadequate coverage of individual domains, limiting the scope for nuanced comprehension and interactions within these areas. To address this deficiency, we propose Explore-Instruct, a novel approach to enhance the data coverage to be used in domain-specific instruction-tuning through active exploration via LLMs. Built upon representative domain use cases, Explore-Instruct explores a multitude of variations or possibilities by implementing a search algorithm to obtain diversified and domain-focused instruction-tuning data. Our data-centric analysis validates the effectiveness of this proposed approach in improving domain-specific instruction coverage. Moreover, our model's performance demonstrates considerable advancements over multiple baselines, including those utilizing domain-specific data enhancement. Our findings offer a promising opportunity to improve instruction coverage, especially in domain-specific contexts, thereby advancing the development of adaptable LLMs. Our code, model weights, and data are public at \url{https://github.com/fanqiwan/Explore-Instruct}.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Fanqi Wan (20 papers)
  2. Xinting Huang (36 papers)
  3. Tao Yang (520 papers)
  4. Xiaojun Quan (52 papers)
  5. Wei Bi (62 papers)
  6. Shuming Shi (126 papers)
Citations (15)