Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Benchmarking Complex Instruction-Following with Multiple Constraints Composition (2407.03978v3)

Published 4 Jul 2024 in cs.CL and cs.AI

Abstract: Instruction following is one of the fundamental capabilities of LLMs. As the ability of LLMs is constantly improving, they have been increasingly applied to deal with complex human instructions in real-world scenarios. Therefore, how to evaluate the ability of complex instruction-following of LLMs has become a critical research problem. Existing benchmarks mainly focus on modeling different types of constraints in human instructions while neglecting the composition of different constraints, which is an indispensable constituent in complex instructions. To this end, we propose ComplexBench, a benchmark for comprehensively evaluating the ability of LLMs to follow complex instructions composed of multiple constraints. We propose a hierarchical taxonomy for complex instructions, including 4 constraint types, 19 constraint dimensions, and 4 composition types, and manually collect a high-quality dataset accordingly. To make the evaluation reliable, we augment LLM-based evaluators with rules to effectively verify whether generated texts can satisfy each constraint and composition. Furthermore, we obtain the final evaluation score based on the dependency structure determined by different composition types. ComplexBench identifies significant deficiencies in existing LLMs when dealing with complex instructions with multiple constraints composition.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (14)
  1. Bosi Wen (8 papers)
  2. Pei Ke (37 papers)
  3. Xiaotao Gu (32 papers)
  4. Lindong Wu (3 papers)
  5. Hao Huang (153 papers)
  6. Jinfeng Zhou (15 papers)
  7. Wenchuang Li (1 paper)
  8. Binxin Hu (1 paper)
  9. Wendy Gao (1 paper)
  10. Jiaxin Xu (22 papers)
  11. Yiming Liu (53 papers)
  12. Jie Tang (302 papers)
  13. Hongning Wang (107 papers)
  14. Minlie Huang (225 papers)
Citations (14)