Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Diverse and Fine-Grained Instruction-Following Ability Exploration with Synthetic Data (2407.03942v1)

Published 4 Jul 2024 in cs.AI, cs.CL, and cs.HC

Abstract: Instruction-following is particularly crucial for LLMs to support diverse user requests. While existing work has made progress in aligning LLMs with human preferences, evaluating their capabilities on instruction following remains a challenge due to complexity and diversity of real-world user instructions. While existing evaluation methods focus on general skills, they suffer from two main shortcomings, i.e., lack of fine-grained task-level evaluation and reliance on singular instruction expression. To address these problems, this paper introduces DINGO, a fine-grained and diverse instruction-following evaluation dataset that has two main advantages: (1) DINGO is based on a manual annotated, fine-grained and multi-level category tree with 130 nodes derived from real-world user requests; (2) DINGO includes diverse instructions, generated by both GPT-4 and human experts. Through extensive experiments, we demonstrate that DINGO can not only provide more challenging and comprehensive evaluation for LLMs, but also provide task-level fine-grained directions to further improve LLMs.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Zihui Gu (7 papers)
  2. Xingwu Sun (32 papers)
  3. Fengzong Lian (10 papers)
  4. Zhanhui Kang (45 papers)
  5. Cheng-Zhong Xu (45 papers)
  6. Ju Fan (26 papers)
Citations (1)
X Twitter Logo Streamline Icon: https://streamlinehq.com