Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

GRAIT: Gradient-Driven Refusal-Aware Instruction Tuning for Effective Hallucination Mitigation (2502.05911v1)

Published 9 Feb 2025 in cs.CL

Abstract: Refusal-Aware Instruction Tuning (RAIT) aims to enhance LLMs by improving their ability to refuse responses to questions beyond their knowledge, thereby reducing hallucinations and improving reliability. Effective RAIT must address two key challenges: firstly, effectively reject unknown questions to minimize hallucinations; secondly, avoid over-refusal to ensure questions that can be correctly answered are not rejected, thereby maintain the helpfulness of LLM outputs. In this paper, we address the two challenges by deriving insightful observations from the gradient-based perspective, and proposing the Gradient-driven Refusal Aware Instruction Tuning Framework GRAIT: (1) employs gradient-driven sample selection to effectively minimize hallucinations and (2) introduces an adaptive weighting mechanism during fine-tuning to reduce the risk of over-refusal, achieving the balance between accurate refusals and maintaining useful responses. Experimental evaluations on open-ended and multiple-choice question answering tasks demonstrate that GRAIT significantly outperforms existing RAIT methods in the overall performance. The source code and data will be available at https://github.com/opendatalab/GRAIT .

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Runchuan Zhu (10 papers)
  2. Zinco Jiang (2 papers)
  3. Jiang Wu (58 papers)
  4. Zhipeng Ma (12 papers)
  5. Jiahe Song (3 papers)
  6. Fengshuo Bai (11 papers)
  7. Dahua Lin (336 papers)
  8. Lijun Wu (113 papers)
  9. Conghui He (114 papers)