Beyond Instruction Following: Evaluating Inferential Rule Following of Large Language Models (2407.08440v4)

Published 11 Jul 2024 in cs.CL and cs.AI

Abstract: Although LLMs have demonstrated strong ability, they are further supposed to be controlled and guided by in real-world scenarios to be safe, accurate, and intelligent. This demands the possession of capability of LLMs. However, no prior work has made a clear evaluation of the inferential rule-following capability of LLMs. Previous studies that try to evaluate the inferential rule-following capability of LLMs fail to distinguish the inferential rule-following scenarios from the instruction-following scenarios. Therefore, this paper first clarifies the concept of inferential rule-following and proposes a comprehensive benchmark, RuleBench, to evaluate a diversified range of inferential rule-following abilities. Our experimental results on a variety of LLMs show that they are still limited in following rules. Our analysis based on the evaluation results provides insights into the improvements for LLMs toward a better inferential rule-following intelligent agent. We further propose Inferential Rule-Following Tuning (IRFT). The experimental results show that through IRFT, LLMs can learn abstract rule-following abilities from purely synthetic data and then generalize to RuleBench. The data and code can be found at: https://anonymous.4open.science/r/LLM-rule-following-B3E3/

References (33)

Authors (10)

Wangtao Sun (9 papers)
Chenxiang Zhang (5 papers)
Xueyou Zhang (1 paper)
Ziyang Huang (23 papers)
Haotian Xu (48 papers)
Pei Chen (38 papers)
Shizhu He (51 papers)
Jun Zhao (469 papers)
Kang Liu (207 papers)
Xuanqing Yu (5 papers)

Citations (2)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Beyond Instruction Following: Evaluating Inferential Rule Following of Large Language Models (2407.08440v4)

Summary

Related Papers