Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse

Published 17 Sep 2024 in cs.CL | (2409.11242v4)

Abstract: LLMs are an integral component of retrieval-augmented generation (RAG) systems. While many studies focus on evaluating the overall quality of end-to-end RAG systems, there is a gap in understanding the appropriateness of LLMs for the RAG task. To address this, we introduce Trust-Score, a holistic metric that evaluates the trustworthiness of LLMs within the RAG framework. Our results show that various prompting methods, such as in-context learning, fail to effectively adapt LLMs to the RAG task as measured by Trust-Score. Consequently, we propose Trust-Align, a method to align LLMs for improved Trust-Score performance. 26 out of 27 models aligned using Trust-Align substantially outperform competitive baselines on ASQA, QAMPARI, and ELI5. Specifically, in LLaMA-3-8b, Trust-Align outperforms FRONT on ASQA (up 12.56), QAMPARI (up 36.04), and ELI5 (up 17.69). Trust-Align also significantly enhances models' ability to correctly refuse and provide quality citations. We also demonstrate the effectiveness of Trust-Align across different open-weight models, including the LLaMA series (1b to 8b), Qwen-2.5 series (0.5b to 7b), and Phi3.5 (3.8b). We release our code at https://github.com/declare-lab/trust-align.

Abstract PDF Upgrade to Chat

Citations (1)

View on Semantic Scholar

Summary

The paper introduces Trust-Score, a metric for evaluating LLM response grounding and citation accuracy within RAG systems.
It proposes Trust-Align, a DPO-based methodology that improves response veracity and the ability to refuse unanswerable queries.
The approach achieves notable gains, such as a 28.89% improvement on QAMPARI, demonstrating enhanced model alignment and reliability.

An Academic Overview of Trustworthiness in LLM RAG Systems

The paper addresses a critical concern in the integration of LLMs within Retrieval-Augmented Generation (RAG) systems: the trustworthiness of LLMs in generating grounded responses. Despite advancements in end-to-end RAG systems, the suitability of LLMs for such tasks remains insufficiently explored. The authors introduce "Trust-Score," a comprehensive metric designed to evaluate the degree of grounding in LLM responses, aiming to improve response veracity and citation accuracy.

Core Contributions

Introduction of Trust-Score: Trust-Score is a holistic metric that scrutinizes LLMs across several dimensions: properly grounding responses in documents, discerning answerable from unanswerable questions, and ensuring citations accurately support statements. By focusing solely on the LLM's output, Trust-Score mitigates the retriever's influence, providing a clearer assessment of the model's performance in RAG tasks.
Trust-Align Methodology: The study proposes Trust-Align to cultivate LLM behaviors aligned with higher Trust-Score ratings. Trust-Align involves constructing an alignment dataset with pairs of questions, relevant documents, and respective positive and negative responses. Utilizing Direct Preference Optimization (DPO), the method fine-tunes models to improve response alignment, refusal rates, and citation quality.
Strong Numerical Results: Trust-Align enhanced models significantly outperform open-source peers in Trust-Score improvements across datasets such as ASQA, QAMPARI, and ELI5, with notable percentage gains (e.g., a 28.89% improvement on QAMPARI). The study also shows substantial advancements in citation accuracy, evidenced by improved F1\textsubscript{CG} scores across these benchmarks.

Implications and Future Directions

Practical Implications

The introduction of Trust-Score sets a new standard for evaluating LLMs in RAG systems, emphasizing response grounding and accurate attribution. Trust-Align offers a viable path for developing LLMs suitable for high-stakes information retrieval tasks, which demand precision and reliability. The ability of LLMs to correctly refuse unanswerable questions without resorting to parametric knowledge advancements provides users with more reliable outputs, potentially increasing user trust in automated information retrieval systems.

Theoretical Implications

From a theoretical standpoint, Trust-Score challenges existing evaluation paradigms by isolating the LLM's contribution from the retriever's performance. This shift prompts new inquiries into how models learn to discern, refuse, or adequately cite based on retrieved documents. The study also underscores the significance of dataset construction in fine-tuning processes, as seen in the effectiveness of Trust-Align.

Speculations for Future AI Developments

Future research in AI could extend the Trust-Align methodology to encompass increasingly complex knowledge domains, analyzing the biases induced by parametric knowledge in depth. Advances in model architecture and training data diversity, fostered by the Trust-Scores framework, could yield LLMs with heightened capability to distinguish between grounded and hallucinate responses more naturally. Additionally, the expansion of multicriteria evaluation metrics like Trust-Score across broader AI applications could drive innovations that emphasize model interpretability and accountability.

Conclusion

The presented study marks a substantial stride towards elevating the trustworthiness of LLMs in RAG applications. While not revolutionary, the introduction of Trust-Score and the Trust-Align alignment process offer a robust framework for future contributions aimed at refining LLM's role in reliable, context-grounded text generation. As the field progresses, these methodologies have the potential to become foundational components in the development of secure and dependable LLMs for diverse real-world applications.

Markdown