Papers
Topics
Authors
Recent
Search
2000 character limit reached

AXOLOTL: Fairness through Assisted Self-Debiasing of Large Language Model Outputs

Published 1 Mar 2024 in cs.CL, cs.AI, cs.CY, and cs.LG | (2403.00198v1)

Abstract: Pre-trained LLMs have significantly advanced natural language processing capabilities but are susceptible to biases present in their training data, leading to unfair outcomes in various applications. While numerous strategies have been proposed to mitigate bias, they often require extensive computational resources and may compromise model performance. In this work, we introduce AXOLOTL, a novel post-processing framework, which operates agnostically across tasks and models, leveraging public APIs to interact with LLMs without direct access to internal parameters. Through a three-step process resembling zero-shot learning, AXOLOTL identifies biases, proposes resolutions, and guides the model to self-debias its outputs. This approach minimizes computational costs and preserves model performance, making AXOLOTL a promising tool for debiasing LLM outputs with broad applicability and ease of use.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. RedditBias: A real-world resource for bias evaluation and debiasing of conversational language models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1941–1955, Online. Association for Computational Linguistics.
  2. Evaluating the underlying gender bias in contextualized word embeddings. In Proceedings of the First Workshop on Gender Bias in Natural Language Processing, pages 33–39, Florence, Italy. Association for Computational Linguistics.
  3. Emily M. Bender and Batya Friedman. 2018. Data statements for natural language processing: Toward mitigating system bias and enabling better science. Transactions of the Association for Computational Linguistics, 6:587–604.
  4. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, page 610–623, New York, NY, USA. Association for Computing Machinery.
  5. Man is to computer programmer as woman is to homemaker? debiasing word embeddings. NIPS’16, page 4356–4364, Red Hook, NY, USA. Curran Associates Inc.
  6. Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In Neural Information Processing Systems.
  7. Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Proceedings of the 1st Conference on Fairness, Accountability and Transparency, volume 81 of Proceedings of Machine Learning Research, pages 77–91. PMLR.
  8. Semantics derived automatically from language corpora contain human-like biases. Science, 356:183–186.
  9. Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334):183–186.
  10. Fairfil: Contrastive neural debiasing method for pretrained text encoders. arXiv preprint arXiv:2103.06413.
  11. Bold: Dataset and metrics for measuring biases in open-ended language generation. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pages 862–872.
  12. Bold: Dataset and metrics for measuring biases in open-ended language generation. FAccT ’21, New York, NY, USA. Association for Computing Machinery.
  13. He is very intelligent, she is very beautiful? on mitigating social biases in language modelling and generation. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 4534–4545.
  14. Auto-debias: Debiasing masked language models with automated biased prompts. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1012–1023.
  15. Dirk Hovy and Shrimai Prabhumoye. 2021. Five sources of bias in natural language processing. Language and Linguistics Compass, 15.
  16. Does gender matter? towards fairness in dialogue systems. arXiv preprint arXiv:1910.10486.
  17. It’s all in the name: Mitigating gender bias with name-based counterfactual data substitution. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5267–5275, Hong Kong, China. Association for Computational Linguistics.
  18. On measuring social biases in sentence encoders. pages 622–628.
  19. StereoSet: Measuring stereotypical bias in pretrained language models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 5356–5371, Online. Association for Computational Linguistics.
  20. Investigating user perception of gender bias in image search: The role of sexism. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR ’18, page 933–936, New York, NY, USA. Association for Computing Machinery.
  21. Christian Perwass. 2009. Geometric Algebra with Applications in Engineering, 1st edition. Springer Publishing Company, Incorporated.
  22. Perturbation augmentation for fairer NLP. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 9496–9521, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  23. Language models are unsupervised multitask learners.
  24. Null it out: Guarding protected attributes by iterative nullspace projection. arXiv preprint arXiv:2004.07667.
  25. Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP. Transactions of the Association for Computational Linguistics, 9:1408–1424.
  26. The woman worked as a babysitter: On biases in language generation. arXiv preprint arXiv:1909.01326.
  27. Release strategies and the social impacts of language models. arXiv preprint arXiv:1908.09203.
  28. One embedder, any task: Instruction-finetuned text embeddings. In Findings of the Association for Computational Linguistics: ACL 2023, pages 1102–1121, Toronto, Canada. Association for Computational Linguistics.
  29. DIALOGPT : Large-scale generative pre-training for conversational response generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 270–278, Online. Association for Computational Linguistics.
  30. Gender bias in contextualized word embeddings. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 629–634, Minneapolis, Minnesota. Association for Computational Linguistics.
  31. Gender bias in coreference resolution: Evaluation and debiasing methods. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 15–20, New Orleans, Louisiana. Association for Computational Linguistics.
  32. On large language models’ selection bias in multi-choice questions. arXiv preprint arXiv:2309.03882.
  33. Multilingual machine translation with large language models: Empirical results and analysis. arXiv preprint arXiv:2304.04675.
Citations (4)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.