Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Few-Shot Fairness: Unveiling LLM's Potential for Fairness-Aware Classification (2402.18502v1)

Published 28 Feb 2024 in cs.CL

Abstract: Employing LLMs (LLM) in various downstream applications such as classification is crucial, especially for smaller companies lacking the expertise and resources required for fine-tuning a model. Fairness in LLMs helps ensure inclusivity, equal representation based on factors such as race, gender and promotes responsible AI deployment. As the use of LLMs has become increasingly prevalent, it is essential to assess whether LLMs can generate fair outcomes when subjected to considerations of fairness. In this study, we introduce a framework outlining fairness regulations aligned with various fairness definitions, with each definition being modulated by varying degrees of abstraction. We explore the configuration for in-context learning and the procedure for selecting in-context demonstrations using RAG, while incorporating fairness rules into the process. Experiments conducted with different LLMs indicate that GPT-4 delivers superior results in terms of both accuracy and fairness compared to other models. This work is one of the early attempts to achieve fairness in prediction tasks by utilizing LLMs through in-context learning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. Large language models associate Muslims with violence. Nature Machine Intelligence (2021).
  2. US Dept of Justice American Judicature Soc. 1968. Stop and Frisk.
  3. A General Language Assistant as a Laboratory for Alignment.
  4. Constitutional ai: Harmlessness from ai feedback. arXiv preprint (2022).
  5. Evaluating the Underlying Gender Bias in Contextualized Word Embeddings. In ACL GeBNLP.
  6. Barry Becker and Ronny Kohavi. 1996. Adult. UCI Machine Learning Repository.
  7. Fairness in Criminal Justice Risk Assessments: The State of the Art. Sociological Methods & Research (2021).
  8. A Group Fairness Lens for Large Language Models.
  9. Language (Technology) is Power: A Critical Survey of “Bias” in NLP. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.
  10. Shikha Bordia and Samuel R. Bowman. 2019. Identifying and Reducing Gender Bias in Word-Level Language Models. In NAACL.
  11. Language Models are Few-Shot Learners. In NIPS, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.).
  12. A Survey on Evaluation of Large Language Models.
  13. Deep Reinforcement Learning from Human Preferences. In NIPS.
  14. Algorithmic Decision Making and the Cost of Fairness. In ACM SIGKDD.
  15. Certifying and Removing Disparate Impact. In ACM SIGKDD.
  16. Emilio Ferrara. 2023. Should ChatGPT be biased? Challenges and risks of bias in large language models. First Monday (2023).
  17. Vincent Freiberger and Erik Buchmann. 2024. Fairness Certification for Natural Language Processing and Large Language Models.
  18. Fairness testing: testing software for discrimination. In FSE 2017.
  19. Predictability and Surprise in Large Generative Models. In FAccT.
  20. Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned.
  21. RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models. In EMNLP.
  22. Google Gemini Team. 2023. Gemini: A Family of Highly Capable Multimodal Models. arXiv preprint (2023).
  23. Equality of Opportunity in Supervised Learning. In NIPS.
  24. TabLLM: Few-shot Classification of Tabular Data with Large Language Models. In PMLR.
  25. Bias Testing and Mitigation in LLM-based Code Generation.
  26. Social Biases in NLP Models as Barriers for Persons with Disabilities. In ACL.
  27. Fairness-Aware Classifier with Prejudice Remover Regularizer. In Machine Learning and Knowledge Discovery in Databases, Peter A. Flach, Tijl De Bie, and Nello Cristianini (Eds.).
  28. Gender bias and stereotypes in Large Language Models. In ACM Collective Intelligence Conference.
  29. Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey. In EACL.
  30. Counterfactual Fairness. In NIPS.
  31. Xiaonan Li and Xipeng Qiu. 2023. Finding Support Examples for In-Context Learning. In EMNLP 2023.
  32. Holistic evaluation of language models. arXiv preprint (2022).
  33. Investigating the Fairness of Large Language Models for Predictions on Tabular Data.
  34. Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. arXiv preprint (2021).
  35. StereoSet: Measuring stereotypical bias in pretrained language models. In ACL IJCNLP.
  36. CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models. In EMNLP.
  37. OpenAI. 2023. GPT-4 Technical Report. arXiv preprint (2023).
  38. Training language models to follow instructions with human feedback. In NIPS.
  39. Red Teaming Language Models with Language Models. In EMNLP.
  40. On Fairness and Calibration. In NIPS.
  41. Foster Provost and Ron Kohavi. 1998. Guest Editors’ Introduction: On Applied Research in Machine Learning. Machine Learning (1998).
  42. Language models are unsupervised multitask learners. OpenAI blog (2019).
  43. Ethical Reasoning over Moral Alignment: A Case and Framework for In-Context Ethical Policies in LLMs. In EMNLP.
  44. Dylan Slack and Sameer Singh. 2023. TABLET: Learning From Instructions For Tabular Data.
  45. Llama 2: Open Foundation and Fine-Tuned Chat Models.
  46. Attention is All you Need. In NIPS.
  47. Emergent Abilities of Large Language Models.
  48. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.
  49. Learning Fair Representations. In ICML.
  50. Is ChatGPT Fair for Recommendation? Evaluating Fairness in Large Language Model Recommendation. In ACM RecSys.
  51. A Survey of Large Language Models.
  52. Calibrate before use: Improving few-shot performance of language models. In ICML.
  53. Large Language Models Are Not Robust Multiple Choice Selectors.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Garima Chhikara (3 papers)
  2. Anurag Sharma (6 papers)
  3. Kripabandhu Ghosh (34 papers)
  4. Abhijnan Chakraborty (35 papers)
Citations (10)