High Accuracy, Less Talk (HALT): Reliable LLMs through Capability-Aligned Finetuning

Published 4 Jun 2025 in cs.CL and cs.AI | (2506.04051v1)

Abstract: LLMs currently respond to every prompt. However, they can produce incorrect answers when they lack knowledge or capability -- a problem known as hallucination. We instead propose post-training an LLM to generate content only when confident in its correctness and to otherwise (partially) abstain. Specifically, our method, HALT, produces capability-aligned post-training data that encodes what the model can and cannot reliably generate. We generate this data by splitting responses of the pretrained LLM into factual fragments (atomic statements or reasoning steps), and use ground truth information to identify incorrect fragments. We achieve capability-aligned finetuning responses by either removing incorrect fragments or replacing them with "Unsure from Here" -- according to a tunable threshold that allows practitioners to trade off response completeness and mean correctness of the response's fragments. We finetune four open-source models for biography writing, mathematics, coding, and medicine with HALT for three different trade-off thresholds. HALT effectively trades off response completeness for correctness, increasing the mean correctness of response fragments by 15% on average, while resulting in a 4% improvement in the F1 score (mean of completeness and correctness of the response) compared to the relevant baselines. By tuning HALT for highest correctness, we train a single reliable Llama3-70B model with correctness increased from 51% to 87% across all four domains while maintaining 53% of the response completeness achieved with standard finetuning.

Abstract PDF Upgrade to Chat

Summary

The paper presents Capability-Aligned Finetuning (HALT) to adjust LLM responses based on intrinsic confidence, reducing hallucination risks.
It decomposes outputs into verifiable fragments and employs an evaluator LLM to ensure only factual content remains.
Empirical results demonstrate a 17% increase in correctness for LLama3-70B and a 4% boost in F1 score, highlighting enhanced model reliability.

High Accuracy, Less Talk (HALT): Enhancing LLM Reliability through Capability-Aligned Finetuning

The paper "High Accuracy, Less Talk (HALT): Reliable LLMs through Capability-Aligned Finetuning" presents HALT, a novel approach for finetuning LLMs to enhance response accuracy by aligning their outputs with their intrinsic confidence levels. The research addresses the prevalent issue of hallucinations in LLMs, where models generate incorrect or misleading outputs, especially in critical domains such as law and medicine. The HALT methodology is applied to finetune LLMs to withhold information deemed uncertain, striking a balance between response completeness and correctness.

Key Contributions

Capability-Aligned Finetuning: HALT improves LLM reliability by ensuring that responses remain within the bounds of the model's pretrained capabilities. The finetuning process involves adjusting the threshold for response confidence, thereby allowing for tuning between aggressive or conservative responses.
Fragmentation Methodology: The HALT approach decomposes model outputs into atomic, verifiable fragments, which ensures selective retention of content that the model predicts accurately. This methodology is particularly beneficial in fields requiring high factual precision.
Evaluation Framework: The method integrates an Evaluator LLM that verifies correctness against ground truth data, ensuring only factual fragments or statements remain within the model-generated response. This framework prevents hallucination proliferation during finetuning by avoiding unknown content exposure.
Empirical Validation: The paper presents extensive evaluations on multiple models (LLama3-8B, LLama3-70B, Gemma2-9B, and Mistral-7B) and tasks, demonstrating improvement in accuracy and adaptability. According to the results, HALT increases the correctness score by an average of 17% for LLama3-70B and the F1 score by 4%, showcasing its superiority over traditional fine-tuning methods.

Results and Implications

The experiments reveal HALT's effectiveness in enhancing response correctness across diverse datasets, including Wikipedia-style biographies, mathematical problems, and medical queries. The introduction of a mechanism to adjust the completeness-accuracy trade-off allows HALT to cater flexibly to different deployment nuances and scenarios, such as prioritizing accuracy in high-stakes environments.

The findings suggest that HALT's methodology could set new standards for fine-tuning LLMs in any domain requiring a high degree of factuality and precision. From a practical standpoint, HALT's advantage lies in its ability to generate high-confidence responses while transparently acknowledging limitations ("Unsure from here"), reducing potential risks associated with hallucination in critical applications.

Future Developments

The research community might further explore HALT's adaptability and extend its use cases to more complex, multimodal applications requiring enhanced model competency understanding and contextual awareness. Additionally, refining dependency graphs and enhancing evaluator modules can further restrict hallucination risks, paving the way for multipurpose, reliable AI systems.

In sum, the paper outlines a compelling strategy for fostering LLMs that judiciously interact with their knowledge limits, elevating both their practical usability and trustworthiness in academic and professional settings.

Markdown