Your Large Language Model is Secretly a Fairness Proponent and You Should Prompt it Like One (2402.12150v1)
Abstract: The widespread adoption of LLMs underscores the urgent need to ensure their fairness. However, LLMs frequently present dominant viewpoints while ignoring alternative perspectives from minority parties, resulting in potential biases. We hypothesize that these fairness-violating behaviors occur because LLMs express their viewpoints using a human personality that represents the majority of training data. In response to this, we validate that prompting LLMs with specific roles can allow LLMs to express diverse viewpoints. Building on this insight and observation, we develop FairThinking, a pipeline designed to automatically generate roles that enable LLMs to articulate diverse perspectives for fair expressions. To evaluate FairThinking, we create a dataset with a thousand items covering three fairness-related topics and conduct experiments on GPT-3.5, GPT-4, Llama2, and Mistral to demonstrate its superior performance.
- Massachusetts general laws. Mass. Gen. Laws ch. 234A, 2016.
- Persistent anti-muslim bias in large language models, 2021.
- Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862, 2022.
- Language models are few-shot learners, 2020.
- Sparks of artificial general intelligence: Early experiments with gpt-4, 2023.
- Chateval: Towards better llm-based evaluators through multi-agent debate. arXiv preprint arXiv:2308.07201, 2023.
- Palm: Scaling language modeling with pathways. Journal of Machine Learning Research, 24(240):1–113, 2023.
- Deep reinforcement learning from human preferences. Advances in neural information processing systems, 30, 2017.
- Fft: Towards harmlessness evaluation and analysis for llms with factuality, fairness, toxicity. arXiv preprint arXiv:2311.18580, 2023.
- Improving factuality and reasoning in language models through multiagent debate. arXiv preprint arXiv:2305.14325, 2023.
- Improving language model negotiation with self-play and in-context learning from ai feedback, 2023.
- Bias and fairness in large language models: A survey. arXiv preprint arXiv:2309.00770, 2023.
- Trustgpt: A benchmark for trustworthy and responsible large language models. arXiv preprint arXiv:2306.11507, 2023.
- Jane, Ng. Top 70 controversial debate topics for critical thinkers in 2023, 2023. https://ahaslides.com/blog/controversial-debate-topics.
- Jay, Cooper. 225 trending social issues topics for academic writing, 2023. https://www.greatassignmenthelp.com/blog/social-issues-topics.
- Mistral 7b, 2023.
- Collecting a large-scale gender bias dataset for coreference resolution and machine translation. arXiv preprint arXiv:2109.03858, 2021.
- Unqovering stereotyping biases via underspecified questions. arXiv preprint arXiv:2010.02428, 2020.
- Does gpt-3 demonstrate psychopathy? evaluating large language models from a psychological perspective, 2023a.
- A survey on fairness in large language models. arXiv preprint arXiv:2308.10149, 2023b.
- Fairness of chatgpt. arXiv preprint arXiv:2305.18569, 2023.
- Encouraging divergent thinking in large language models through multi-agent debate, 2023.
- Training socially aligned language models on simulated social interactions, 2023.
- Measuring harmful sentence completion in language models for lgbtqia+ individuals. In Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion. Association for Computational Linguistics, 2022.
- OpenAI. Models, 2023a. https://platform.openai.com/docs/models/gpt-3-5.
- OpenAI. Gpt-4 technical report, 2023b. https://cdn.openai.com/papers/gpt-4.pdf.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
- Do llms possess a personality? making the mbti test an amazing evaluation for large language models, 2023.
- Bbq: A hand-built bias benchmark for question answering. arXiv preprint arXiv:2110.08193, 2021.
- Communicative agents for software development, 2023a.
- Experiential co-learning of software-developing agents, 2023b.
- Perturbation augmentation for fairer nlp. arXiv preprint arXiv:2205.12586, 2022.
- Direct preference optimization: Your language model is secretly a reward model. arXiv preprint arXiv:2305.18290, 2023.
- Rasal, Sumedh. Llm harmony: Multi-agent communication for problem solving, 2024.
- In-context impersonation reveals large language models’ strengths and biases. arXiv preprint arXiv:2305.14930, 2023.
- Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100, 2022.
- Multi-agent collaboration: Harnessing the power of intelligent llm agents, 2023.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
- Biasasker: Measuring the bias in conversational ai system. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 515–527, 2023.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
- Red teaming chatgpt via jailbreaking: Bias, robustness, reliability and toxicity. arXiv preprint arXiv:2301.12867, 2023.
- Universal and transferable adversarial attacks on aligned language models. arXiv preprint arXiv:2307.15043, 2023.