Balancing Truthfulness and Informativeness with Uncertainty-Aware Instruction Fine-Tuning (2502.11962v3)
Abstract: Instruction fine-tuning (IFT) can increase the informativeness of LLMs, but may reduce their truthfulness. This trade-off arises because IFT steers LLMs to generate responses containing long-tail knowledge that was not well covered during pre-training. As a result, models become more informative but less accurate when generalizing to unseen tasks. In this paper, we empirically demonstrate how unfamiliar knowledge in IFT datasets can negatively affect the truthfulness of LLMs, and we introduce two new IFT paradigms, $UNIT_{cut}$ and $UNIT_{ref}$, to address this issue. $UNIT_{cut}$ identifies and removes unfamiliar knowledge from IFT datasets to mitigate its impact on model truthfulness, whereas $UNIT_{ref}$ trains LLMs to recognize their uncertainty and explicitly indicate it at the end of their responses. Our experiments show that $UNIT_{cut}$ substantially improves LLM truthfulness, while $UNIT_{ref}$ maintains high informativeness and reduces hallucinations by distinguishing between confident and uncertain statements.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.