Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

TrustLLM: Trustworthiness in Large Language Models (2401.05561v6)

Published 10 Jan 2024 in cs.CL
TrustLLM: Trustworthiness in Large Language Models

Abstract: LLMs, exemplified by ChatGPT, have gained considerable attention for their excellent natural language processing capabilities. Nonetheless, these LLMs present many challenges, particularly in the realm of trustworthiness. Therefore, ensuring the trustworthiness of LLMs emerges as an important topic. This paper introduces TrustLLM, a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. Our findings firstly show that in general trustworthiness and utility (i.e., functional effectiveness) are positively related. Secondly, our observations reveal that proprietary LLMs generally outperform most open-source counterparts in terms of trustworthiness, raising concerns about the potential risks of widely accessible open-source LLMs. However, a few open-source LLMs come very close to proprietary ones. Thirdly, it is important to note that some LLMs may be overly calibrated towards exhibiting trustworthiness, to the extent that they compromise their utility by mistakenly treating benign prompts as harmful and consequently not responding. Finally, we emphasize the importance of ensuring transparency not only in the models themselves but also in the technologies that underpin trustworthiness. Knowing the specific trustworthy technologies that have been employed is crucial for analyzing their effectiveness.

Introduction

The landscape of natural language processing and artificial intelligence has been transformed by the development of LLMs. These models have demonstrated exceptional capabilities in various language tasks, leading to their widespread adoption across industries. However, the growth in utility and application scope is paralleled by increasing concerns about the trustworthiness of LLMs. Issues such as transparency, ethical alignment, and robustness to adversarial inputs have prompted researchers to thoroughly evaluate the trustworthiness of these models.

Trustworthiness Dimensions

A pivotal aspect of trustworthiness in LLMs is the set of principles spanning eight dimensions: truthfulness, safety, fairness, robustness, privacy, machine ethics, transparency, and accountability. These principles guide the comprehensive analysis of trustworthiness and serve as a benchmark for assessing LLMs. The paper "TRUST LLM" positions these dimensions at the core of its benchmark framework, aiming to evaluate LLMs against these multifaceted criteria.

Assessment Approach

In the evaluation process, LLMs are subjected to a series of tests designed to probe their capacities in handling conceptually and ethically challenging scenarios. The benchmark comprises over 30 datasets and examines LLMs from proprietary to open-source origins on tasks that are both closed-ended with ground-truth labels and open-ended without definitive answers. By applying prompts that are meticulously crafted to minimize prompt sensitivity and provide explicit instructions, this paper ensures that the evaluation captures a reliable measure of each model's performance across the key dimensions.

Insights from Evaluation

The paper identifies several patterns in LLM behavior across the examined dimensions. It reveals a positive relationship between the trustworthiness and utility of LLMs, indicating that stronger-performing models in functional tasks tend to align better with ethical and safety norms. However, the paper also uncovers cases of over-alignment, where some LLMs, in their pursuit of trustworthiness, become overly cautious to the detriment of practical utility. The proprietary LLMs generally outperform open-source ones in trustworthiness, though few open-source models closely compete, demonstrating the possibility of achieving high trustworthiness without proprietary mechanisms. It is highlighted that transparency in trustworthy technologies is integral, advocating for transparent model architectures and decision-making processes to foster a more human-trusted AI landscape.

Conclusion

The "TRUST LLM" paper serves as a foundational work in understanding and improving the trustworthiness of LLMs. By identifying strengths and weaknesses across various trustworthiness dimensions, this paper does not only inform future development of more reliable and ethical LLMs but also underlines the need for an industry-wide effort to advance the field. Through continued research and the establishment of clear benchmarks, we can steer the evolution of LLMs towards models that are not only functionally robust but also ethically sound and societally beneficial.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (70)
  1. Lichao Sun (186 papers)
  2. Yue Huang (171 papers)
  3. Haoran Wang (141 papers)
  4. Siyuan Wu (18 papers)
  5. Qihui Zhang (13 papers)
  6. Chujie Gao (9 papers)
  7. Yixin Huang (7 papers)
  8. Wenhan Lyu (5 papers)
  9. Yixuan Zhang (94 papers)
  10. Xiner Li (17 papers)
  11. Zhengliang Liu (91 papers)
  12. Yixin Liu (108 papers)
  13. Yijue Wang (6 papers)
  14. Zhikun Zhang (39 papers)
  15. Bhavya Kailkhura (108 papers)
  16. Caiming Xiong (337 papers)
  17. Chaowei Xiao (110 papers)
  18. Chunyuan Li (122 papers)
  19. Eric Xing (127 papers)
  20. Furong Huang (150 papers)
Citations (140)
Github Logo Streamline Icon: https://streamlinehq.com

GitHub

  1. TrustLLM-Benchmark (16 stars)
Youtube Logo Streamline Icon: https://streamlinehq.com