Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

GPTBIAS: A Comprehensive Framework for Evaluating Bias in Large Language Models (2312.06315v1)

Published 11 Dec 2023 in cs.CL, cs.CY, and cs.LG

Abstract: Warning: This paper contains content that may be offensive or upsetting. There has been a significant increase in the usage of LLMs in various applications, both in their original form and through fine-tuned adaptations. As a result, LLMs have gained popularity and are being widely adopted by a large user community. However, one of the concerns with LLMs is the potential generation of socially biased content. The existing evaluation methods have many constraints, and their results exhibit a limited degree of interpretability. In this work, we propose a bias evaluation framework named GPTBIAS that leverages the high performance of LLMs (e.g., GPT-4 \cite{openai2023gpt4}) to assess bias in models. We also introduce prompts called Bias Attack Instructions, which are specifically designed for evaluating model bias. To enhance the credibility and interpretability of bias evaluation, our framework not only provides a bias score but also offers detailed information, including bias types, affected demographics, keywords, reasons behind the biases, and suggestions for improvement. We conduct extensive experiments to demonstrate the effectiveness and usability of our bias evaluation framework.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. Redditbias: A real-world resource for bias evaluation and debiasing of conversational language models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1941–1955.
  2. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  3. Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334):183–186.
  4. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311.
  5. Measuring fairness with biased rulers: A comparative study on bias metrics for pre-trained language models. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1693–1706.
  6. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  7. Emilio Ferrara. 2023. Should chatgpt be biased? challenges and risks of bias in large language models. arXiv preprint arXiv:2304.03738.
  8. Clary Krekula. 2007. The intersection of age and gender: Reworking gender theory and social gerontology. Current Sociology, 55(2):155–171.
  9. Bum Chul Kwon and Nandana Mihindukulasooriya. 2022. An empirical study on pseudo-log-likelihood bias measures for masked language models using paraphrased sentences. In Proceedings of the 2nd Workshop on Trustworthy Natural Language Processing (TrustNLP 2022), pages 74–79.
  10. On measuring social biases in sentence encoders. arXiv preprint arXiv:1903.10561.
  11. Stereoset: Measuring stereotypical bias in pretrained language models. arXiv preprint arXiv:2004.09456.
  12. Crows-pairs: A challenge dataset for measuring social biases in masked language models. arXiv preprint arXiv:2010.00133.
  13. OpenAI. 2022. Chatgpt.
  14. OpenAI. 2023. Gpt-4 technical report.
  15. Instruction tuning with gpt-4. arXiv preprint arXiv:2304.03277.
  16. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
  17. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551.
  18. Deborah L Rhode. 2010. The beauty bias: The injustice of appearance in life and law. Oxford University Press.
  19. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100.
  20. Safety assessment of chinese large language models. arXiv preprint arXiv:2304.10436.
  21. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  22. Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461.
  23. Self-instruct: Aligning language model with self generated instructions. arXiv preprint arXiv:2212.10560.
  24. Emergent abilities of large language models. arXiv preprint arXiv:2206.07682.
  25. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068.
  26. Universal and transferable adversarial attacks on aligned language models. arXiv preprint arXiv:2307.15043.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Jiaxu Zhao (6 papers)
  2. Meng Fang (100 papers)
  3. Shirui Pan (197 papers)
  4. Wenpeng Yin (69 papers)
  5. Mykola Pechenizkiy (118 papers)
Citations (7)
Youtube Logo Streamline Icon: https://streamlinehq.com