Diverse and Fine-Grained Instruction-Following Ability Exploration with Synthetic Data (2407.03942v1)
Abstract: Instruction-following is particularly crucial for LLMs to support diverse user requests. While existing work has made progress in aligning LLMs with human preferences, evaluating their capabilities on instruction following remains a challenge due to complexity and diversity of real-world user instructions. While existing evaluation methods focus on general skills, they suffer from two main shortcomings, i.e., lack of fine-grained task-level evaluation and reliance on singular instruction expression. To address these problems, this paper introduces DINGO, a fine-grained and diverse instruction-following evaluation dataset that has two main advantages: (1) DINGO is based on a manual annotated, fine-grained and multi-level category tree with 130 nodes derived from real-world user requests; (2) DINGO includes diverse instructions, generated by both GPT-4 and human experts. Through extensive experiments, we demonstrate that DINGO can not only provide more challenging and comprehensive evaluation for LLMs, but also provide task-level fine-grained directions to further improve LLMs.
- Zihui Gu (7 papers)
- Xingwu Sun (32 papers)
- Fengzong Lian (10 papers)
- Zhanhui Kang (45 papers)
- Cheng-Zhong Xu (45 papers)
- Ju Fan (26 papers)