Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Thinking Fair and Slow: On the Efficacy of Structured Prompts for Debiasing Language Models (2405.10431v1)

Published 16 May 2024 in cs.CL

Abstract: Existing debiasing techniques are typically training-based or require access to the model's internals and output distributions, so they are inaccessible to end-users looking to adapt LLM outputs for their particular needs. In this study, we examine whether structured prompting techniques can offer opportunities for fair text generation. We evaluate a comprehensive end-user-focused iterative framework of debiasing that applies System 2 thinking processes for prompts to induce logical, reflective, and critical text generation, with single, multi-step, instruction, and role-based variants. By systematically evaluating many LLMs across many datasets and different prompting strategies, we show that the more complex System 2-based Implicative Prompts significantly improve over other techniques demonstrating lower mean bias in the outputs with competitive performance on the downstream tasks. Our work offers research directions for the design and the potential of end-user-focused evaluative frameworks for LLM use.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. All should be equal in the eyes of language models: Counterfactually aware fair text generation.
  2. Looking for a handsome carpenter! debiasing GPT-3 job advertisements. In Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP), pages 212–224, Seattle, Washington. Association for Computational Linguistics.
  3. Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc.
  4. Language models are few-shot learners.
  5. Boolq: Exploring the surprising difficulty of natural yes/no questions.
  6. Bias and fairness in large language models: A survey. arXiv preprint arXiv:2309.00770.
  7. Realtoxicityprompts: Evaluating neural toxic degeneration in language models. arXiv preprint arXiv:2009.11462.
  8. Mistral 7b.
  9. Mixtral of experts.
  10. Daniel Kahneman. 2011. Thinking, fast and slow. macmillan.
  11. Daniel Kahneman and Amos Tversky. 2013. Prospect theory: An analysis of decision under risk. In Handbook of the fundamentals of financial decision making: Part I, pages 99–127. World Scientific.
  12. Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213.
  13. Better zero-shot reasoning with role-play prompting.
  14. A survey on fairness in large language models. arXiv preprint arXiv:2308.10149.
  15. Guiding large language models via directional stimulus prompting.
  16. Towards debiasing sentence representations. arXiv preprint arXiv:2007.08100.
  17. Towards understanding and mitigating social biases in language models. In International Conference on Machine Learning, pages 6565–6576. PMLR.
  18. Truthfulqa: Measuring how models mimic human falsehoods.
  19. Fairness-guided few-shot prompting for large language models. arXiv preprint arXiv:2303.13217.
  20. Self-refine: Iterative refinement with self-feedback.
  21. Stereoset: Measuring stereotypical bias in pretrained language models. arXiv preprint arXiv:2004.09456.
  22. Self-diagnosis and self-debiasing: A proposal for reducing corpus-based bias in nlp. Transactions of the Association for Computational Linguistics, 9:1408–1424.
  23. The woman worked as a babysitter: On biases in language generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3407–3412, Hong Kong, China. Association for Computational Linguistics.
  24. Prompting gpt-3 to be reliable.
  25. Trustllm: Trustworthiness in large language models.
  26. MosaicML NLP Team. 2023. Introducing mpt-7b: A new standard for open-source, commercially usable llms. Accessed: 2023-05-05.
  27. Vishesh Thakur. 2023. Unveiling gender bias in terms of profession across llms: Analyzing and addressing sociological implications. arXiv preprint arXiv:2307.09162.
  28. Llama 2: Open foundation and fine-tuned chat models.
  29. Investigating gender bias in language models using causal mediation analysis. Advances in neural information processing systems, 33:12388–12401.
  30. Ben Wang and Aran Komatsuzaki. 2021. GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax.
  31. Self-consistency improves chain of thought reasoning in language models.
  32. Measuring and reducing gendered correlations in pre-trained models. arXiv preprint arXiv:2010.06032.
  33. Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems, volume 35, pages 24824–24837. Curran Associates, Inc.
  34. Zhongbin Xie and Thomas Lukasiewicz. 2023. An empirical analysis of parameter-efficient methods for debiasing pre-trained language models.
  35. Tree of thoughts: Deliberate problem solving with large language models.
  36. Tinyllama: An open-source small language model.
  37. Counterfactual data augmentation for mitigating gender stereotypes in languages with rich morphology. arXiv preprint arXiv:1906.04571.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Shaz Furniturewala (7 papers)
  2. Surgan Jandial (14 papers)
  3. Abhinav Java (11 papers)
  4. Pragyan Banerjee (2 papers)
  5. Simra Shahid (11 papers)
  6. Sumit Bhatia (30 papers)
  7. Kokil Jaidka (24 papers)
Citations (2)