Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 64 tok/s
Gemini 2.5 Pro 47 tok/s Pro
GPT-5 Medium 27 tok/s Pro
GPT-5 High 31 tok/s Pro
GPT-4o 102 tok/s Pro
Kimi K2 206 tok/s Pro
GPT OSS 120B 463 tok/s Pro
Claude Sonnet 4.5 35 tok/s Pro
2000 character limit reached

FairMT-Bench: Benchmarking Fairness for Multi-turn Dialogue in Conversational LLMs (2410.19317v2)

Published 25 Oct 2024 in cs.CL

Abstract: The growing use of LLM-based chatbots has raised concerns about fairness. Fairness issues in LLMs can lead to severe consequences, such as bias amplification, discrimination, and harm to marginalized communities. While existing fairness benchmarks mainly focus on single-turn dialogues, multi-turn scenarios, which in fact better reflect real-world conversations, present greater challenges due to conversational complexity and potential bias accumulation. In this paper, we propose a comprehensive fairness benchmark for LLMs in multi-turn dialogue scenarios, \textbf{FairMT-Bench}. Specifically, we formulate a task taxonomy targeting LLM fairness capabilities across three stages: context understanding, user interaction, and instruction trade-offs, with each stage comprising two tasks. To ensure coverage of diverse bias types and attributes, we draw from existing fairness datasets and employ our template to construct a multi-turn dialogue dataset, \texttt{FairMT-10K}. For evaluation, GPT-4 is applied, alongside bias classifiers including Llama-Guard-3 and human validation to ensure robustness. Experiments and analyses on \texttt{FairMT-10K} reveal that in multi-turn dialogue scenarios, current LLMs are more likely to generate biased responses, and there is significant variation in performance across different tasks and models. Based on this, we curate a challenging dataset, \texttt{FairMT-1K}, and test 15 current state-of-the-art (SOTA) LLMs on this dataset. The results show the current state of fairness in LLMs and showcase the utility of this novel approach for assessing fairness in more realistic multi-turn dialogue contexts, calling for future work to focus on LLM fairness improvement and the adoption of \texttt{FairMT-1K} in such efforts.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
  2. Mt-bench-101: A fine-grained benchmark for evaluating large language models in multi-turn dialogues. arXiv preprint arXiv:2402.14762, 2024.
  3. Redditbias: A real-world resource for bias evaluation and debiasing of conversational language models. arXiv preprint arXiv:2106.03521, 2021.
  4. Identifying and reducing gender bias in word-level language models. arXiv preprint arXiv:1904.03035, 2019.
  5. On the independence of association bias and empirical fairness in language models. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, pp.  370–378, 2023.
  6. Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334):183–186, 2017.
  7. Understanding multi-turn toxic behaviors in open-domain chatbots. In Proceedings of the 26th International Symposium on Research in Attacks, Intrusions and Defenses, pp.  282–296, 2023.
  8. Editable fairness: Fine-grained bias mitigation in language models. arXiv preprint arXiv:2408.11843, 2024a.
  9. Fast model debias with machine unlearning. Advances in Neural Information Processing Systems, 36, 2024b.
  10. Pad: Personalized alignment at decoding-time. arXiv preprint arXiv:2410.04070, 2024c.
  11. Measuring fairness with biased rulers: A comparative study on bias metrics for pre-trained language models. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics, pp.  1693–1706. Association for Computational Linguistics, 2022.
  12. Bold: Dataset and metrics for measuring biases in open-ended language generation. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pp.  862–872, 2021.
  13. Botchat: Evaluating llms’ capabilities of having multi-turn dialogues. arXiv preprint arXiv:2310.13650, 2023.
  14. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024.
  15. Biasalert: A plug-and-play tool for social bias detection in llms. arXiv preprint arXiv:2407.10241, 2024.
  16. Bias and fairness in large language models: A survey. Computational Linguistics, pp.  1–79, 2024.
  17. A survey on bias in deep nlp. Applied Sciences, 11(7):3184, 2021.
  18. Gemma. 2024. doi: 10.34740/KAGGLE/M/3301. URL https://www.kaggle.com/m/3301.
  19. Detecting emergent intersectional biases: Contextualized word embeddings contain a distribution of human-like biases. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, pp.  122–133, 2021.
  20. Mistral 7b. arXiv preprint arXiv:2310.06825, 2023.
  21. Decoding biases: Automated methods and llm judges for gender bias detection in language models. arXiv preprint arXiv:2408.03907, 2024.
  22. Llm defenses are not robust to multi-turn human jailbreaks yet. arXiv preprint arXiv:2408.15221, 2024.
  23. Unqovering stereotyping biases via underspecified questions. arXiv preprint arXiv:2010.02428, 2020.
  24. Holistic evaluation of language models. arXiv preprint arXiv:2211.09110, 2022.
  25. Calibrating llm-based evaluator. arXiv preprint arXiv:2309.13308, 2023.
  26. AI @ Meta Llama Team. The llama 3 family of models. https://github.com/meta-llama/PurpleLlama/blob/main/Llama-Guard3/1B/MODEL_CARD.md, 2024.
  27. Hatexplain: A benchmark dataset for explainable hate speech detection. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pp.  14867–14875, 2021.
  28. On measuring social biases in sentence encoders. arXiv preprint arXiv:1903.10561, 2019.
  29. Stereoset: Measuring stereotypical bias in pretrained language models. arXiv preprint arXiv:2004.09456, 2020.
  30. Crows-pairs: A challenge dataset for measuring social biases in masked language models. arXiv preprint arXiv:2010.00133, 2020.
  31. Biases in large language models: origins, inventory, and discussion. ACM Journal of Data and Information Quality, 15(2):1–21, 2023.
  32. Honest: Measuring hurtful sentence completion in language models. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2021.
  33. OpenAI. Chatgpt. https://chat.openai.com, 2022. Accessed: 2024-06-13.
  34. Bbq: A hand-built bias benchmark for question answering. arXiv preprint arXiv:2110.08193, 2021.
  35. Social bias frames: Reasoning about social and power implications of language. arXiv preprint arXiv:1911.03891, 2019.
  36. Context-aware offensive language detection in human-chatbot conversations. In 2024 IEEE International Conference on Big Data and Smart Computing (BigComp), pp.  270–277. IEEE, 2024.
  37. ” i’m sorry to hear that”: Finding new biases in language models with a holistic descriptor dataset. arXiv preprint arXiv:2205.09209, 2022.
  38. Trustllm: Trustworthiness in large language models. arXiv preprint arXiv:2401.05561, 2024a.
  39. Parrot: Enhancing multi-turn instruction following for large language models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  9729–9750, 2024b.
  40. Qwen Team. Qwen2.5: A party of foundation models, September 2024. URL https://qwenlm.github.io/blog/qwen2.5/.
  41. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
  42. Biasasker: Measuring the bias in conversational ai system. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp.  515–527, 2023.
  43. Ceb: Compositional evaluation benchmark for fairness in large language models. arXiv preprint arXiv:2407.02408, 2024.
  44. Jailbroken: How does llm safety training fail? Advances in Neural Information Processing Systems, 36, 2024.
  45. Using the veil of ignorance to align ai systems with principles of justice. Proceedings of the National Academy of Sciences, 120(18):e2213709120, 2023.
  46. Autogen: Enabling next-gen llm applications via multi-agent conversation framework. arXiv preprint arXiv:2308.08155, 2023.
  47. Chain of attack: a semantic-driven contextual multi-turn attacker for llm. arXiv preprint arXiv:2405.05610, 2024.
  48. Cosafe: Evaluating large language model safety in multi-turn dialogue coreference. arXiv preprint arXiv:2406.17626, 2024.
  49. Judging llm-as-a-judge with mt-bench and chatbot arena. Advances in Neural Information Processing Systems, 36:46595–46623, 2023.
  50. Speak out of turn: Safety vulnerability of large language models in multi-turn dialogue. arXiv preprint arXiv:2402.17262, 2024.

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.