Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI Collaboration for Large Language Models (2306.16244v1)

Published 28 Jun 2023 in cs.CL and cs.AI

Abstract: Holistically measuring societal biases of LLMs is crucial for detecting and reducing ethical risks in highly capable AI models. In this work, we present a Chinese Bias Benchmark dataset that consists of over 100K questions jointly constructed by human experts and generative LLMs, covering stereotypes and societal biases in 14 social dimensions related to Chinese culture and values. The curation process contains 4 essential steps: bias identification via extensive literature review, ambiguous context generation, AI-assisted disambiguous context generation, snd manual review & recomposition. The testing instances in the dataset are automatically derived from 3K+ high-quality templates manually authored with stringent quality control. The dataset exhibits wide coverage and high diversity. Extensive experiments demonstrate the effectiveness of the dataset in detecting model bias, with all 10 publicly available Chinese LLMs exhibiting strong bias in certain categories. Additionally, we observe from our experiments that fine-tuned models could, to a certain extent, heed instructions and avoid generating outputs that are morally harmful in some types, in the way of "moral self-correction". Our dataset and results are publicly available at \href{https://github.com/YFHuangxxxx/CBBQ}{https://github.com/YFHuangxxxx/CBBQ}, offering debiasing research opportunities to a widened community.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. Large language models associate muslims with violence. Nature Machine Intelligence, 3(6):461–463, 2021.
  2. A general language assistant as a laboratory for alignment. arXiv preprint arXiv:2112.00861, 2021.
  3. Evaluating the underlying gender bias in contextualized word embeddings. arXiv preprint arXiv:1904.08783, 2019.
  4. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pages 610–623, 2021.
  5. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
  6. Kate Crawford. The trouble with bias. keynote at neurips, 2017.
  7. Bold: Dataset and metrics for measuring biases in open-ended language generation. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pages 862–872, 2021.
  8. Anticipating safety issues in e2e conversational ai: Framework and tooling. arXiv preprint arXiv:2107.03451, 2021.
  9. Glm: General language model pretraining with autoregressive blank infilling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 320–335, 2022.
  10. Predictability and surprise in large generative models. In 2022 ACM Conference on Fairness, Accountability, and Transparency, pages 1747–1764, 2022.
  11. The capacity for moral self-correction in large language models. arXiv preprint arXiv:2302.07459, 2023.
  12. Realtoxicityprompts: Evaluating neural toxic degeneration in language models. arXiv preprint arXiv:2009.11462, 2020.
  13. Social biases in nlp models as barriers for persons with disabilities. arXiv preprint arXiv:2005.00813, 2020.
  14. Belle: Bloom-enhanced large language model engine. https://github.com/LianjiaTech/BELLE, 2023.
  15. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020.
  16. Measuring bias in contextualized word representations. arXiv preprint arXiv:1906.07337, 2019.
  17. Unqovering stereotyping biases via underspecified questions. arXiv preprint arXiv:2010.02428, 2020.
  18. Does gender matter? towards fairness in dialogue systems. arXiv preprint arXiv:1910.10486, 2019.
  19. Crosslingual generalization through multitask finetuning. arXiv preprint arXiv:2211.01786, 2022.
  20. OpenAI. Gpt-4 technical report, 2023.
  21. BBQ: A hand-built bias benchmark for question answering. In Findings of the Association for Computational Linguistics: ACL 2022, pages 2086–2105, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.findings-acl.165. URL https://aclanthology.org/2022.findings-acl.165.
  22. Scaling language models: Methods, analysis & insights from training gopher. arXiv preprint arXiv:2112.11446, 2021.
  23. Investigating failures of automatic translation in the case of unambiguous gender. arXiv preprint arXiv:2104.07838, 2021.
  24. Social bias frames: Reasoning about social and power implications of language. arXiv preprint arXiv:1911.03891, 2019.
  25. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100, 2022.
  26. Revealing persona biases in dialogue systems. arXiv preprint arXiv:2104.08728, 2021a.
  27. “nice try, kiddo”: Investigating ad hominems in dialogue responses. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021b.
  28. Process for adapting language models to society (palms) with values-targeted datasets. Advances in Neural Information Processing Systems, 34:5861–5873, 2021.
  29. Evaluating gender bias in machine translation. arXiv preprint arXiv:1906.00591, 2019.
  30. Ethical and social risks of harm from language models. arXiv preprint arXiv:2112.04359, 2021.
  31. Glm-130b: An open bilingual pre-trained model. arXiv preprint arXiv:2210.02414, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Yufei Huang (81 papers)
  2. Deyi Xiong (103 papers)
Citations (12)
Github Logo Streamline Icon: https://streamlinehq.com

GitHub