Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Queer People are People First: Deconstructing Sexual Identity Stereotypes in Large Language Models (2307.00101v1)

Published 30 Jun 2023 in cs.CL and cs.AI
Queer People are People First: Deconstructing Sexual Identity Stereotypes in Large Language Models

Abstract: LLMs are trained primarily on minimally processed web text, which exhibits the same wide range of social biases held by the humans who created that content. Consequently, text generated by LLMs can inadvertently perpetuate stereotypes towards marginalized groups, like the LGBTQIA+ community. In this paper, we perform a comparative study of how LLMs generate text describing people with different sexual identities. Analyzing bias in the text generated by an LLM using regard score shows measurable bias against queer people. We then show that a post-hoc method based on chain-of-thought prompting using SHAP analysis can increase the regard of the sentence, representing a promising approach towards debiasing the output of LLMs in this setting.

Deconstructing Sexual Identity Stereotypes in LLMs

The paper "Queer People are People First: Deconstructing Sexual Identity Stereotypes in LLMs" presents a meticulous exploration of biases intrinsic to LLMs with a focus on sexual identity stereotypes. As models trained predominantly on minimally processed web text, LLMs can inherently perpetuate the societal biases encapsulated in their training data. The authors identify the significant concern that such biases lead to skewed portrayals of individuals based on sexual identity, potentially propagating harmful stereotypes against marginalized groups like the LGBTQIA+ community.

Methodology and Findings

The paper begins by posing two primary research questions: (1) whether pre-trained LLMs exhibit measurable bias against queer individuals, and (2) whether these biases can be mitigated while preserving contextual integrity using a post-hoc debiasing method.

For bias detection, the authors curate a set of gender-neutral prompts derived from the WikiBio dataset, ensuring contextual meaningfulness. These prompts are supplemented with specific sexual identity "trigger words" to discern bias in generated outputs from an LLM. The analysis reveals that outputs associated with queer identities consistently included references to struggles and societal challenges, whereas those associated with straight identities emphasized achievements and positive attributes.

To quantify these differences, the authors employed several methods, including word cloud analyses, pointwise mutual information (PMI), and t-SNE visualizations. Particularly, the regard score—a measure of social perception used in prior studies—quantified how LLMs' outputs differ across sexual identities. The analysis exposed a tendency of LLMs to ascribe lower regard scores to queer-associated outputs, reflecting heteronormative biases present in the training corpus.

Debiasing Approach

Acknowledging the potential repercussions of biased language generation, the authors explore a novel technique for debiasing LLM outputs using a post-hoc correction mechanism. The approach is framed as a text-to-text neural style transfer problem, where the generation style is adjusted post-LLM-output-production. They employ SHAP (SHapley Additive exPlanations) analysis to identify words contributing to a low regard score and enhance the sentence regard through chain-of-thought (CoT) prompting. This is achieved while preserving essential contextual elements like the recognition of queer struggles, maintaining a balance between acknowledging minority challenges and enhancing positive portrayal.

The debiasing process proved effective; regard scores for queer-associated outputs became more comparable to their straight counterparts without eroding the narrative context that acknowledges struggles inherent to queer identities. This paper demonstrates that, through CoT and SHAP-based interventions, LLMs can produce outputs that support a more affirmative representation of diverse sexual identities.

Implications and Future Directions

This paper offers insights into the challenges and possibilities of mitigating biases in LLMs, particularly regarding representational biases against queer individuals. The work underscores the importance of improving data pre-processing and developing robust evaluation metrics that better capture the nuanced manifestations of bias in textual output.

The methodology proposed holds promise for broader applications across different domains of identity-based bias in AI systems. Future research could pivot towards exploring implicit biases absent explicit trigger terms and assessing the scalability of these debiasing techniques when applied to other demographic aspects beyond sexual identity. Moreover, integrating refined methodologies to gender-neutralize data sets could offer paths to further minimize biased learning from historical training data.

Overall, the paper emphasizes the potential AI systems hold in both perpetuating and alleviating societal biases, highlighting the significant responsibility borne by researchers and developers in crafting these systems. As advancements in LLM capabilities continue, ongoing efforts in algorithmic fairness and inclusive representation remain vital in fostering equitable AI technologies.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. Language (technology) is power: A critical survey of “bias” in NLP. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5454–5476, Online. Association for Computational Linguistics.
  2. Sky CH-Wang and David Jurgens. 2021. Using sociolinguistic variables to reveal changing attitudes towards sexuality and gender. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 9918–9938, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  3. Jenny Cheshire. 2007. Style and sociolinguistic variation (review).
  4. Kenneth Ward Church and Patrick Hanks. 1990. Word association norms, mutual information, and lexicography. Computational Linguistics, 16(1):22–29.
  5. Harms of gender exclusivity and challenges in non-binary representation in language technologies. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 1968–1994, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  6. Documenting large webtext corpora: A case study on the colossal clean crawled corpus. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 1286–1305, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  7. RealToxicityPrompts: Evaluating neural toxic degeneration in language models. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3356–3369, Online. Association for Computational Linguistics.
  8. Debiasing pre-trained language models via efficient fine-tuning. In Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion, pages 59–69.
  9. Mitigating gender bias in distilled language models via counterfactual role reversal.
  10. Unpacking the interdependent systems of discrimination: Ableist bias in NLP systems through an intersectional lens. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 3116–3123, Punta Cana, Dominican Republic. Association for Computational Linguistics.
  11. Toward controlled generation of text.
  12. Latent dirichlet allocation (LDA) and topic modeling: models, applications, a survey. CoRR, abs/1711.04305.
  13. Celia Kitzinger. 2005. "speaking as a heterosexual": (how) does sexuality matter for talk-in-interaction? Research on Language and Social Interaction, 38(3):221–265.
  14. Generating text from structured data with application to the biography domain. CoRR, abs/1603.07771.
  15. Delete, retrieve, generate: a simple approach to sentiment and style transfer. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1865–1874, New Orleans, Louisiana. Association for Computational Linguistics.
  16. Towards understanding and mitigating social biases in language models. In International Conference on Machine Learning, pages 6565–6576. PMLR.
  17. Gender and representation bias in GPT-3 generated stories. In Proceedings of the Third Workshop on Narrative Understanding, pages 48–55, Virtual. Association for Computational Linguistics.
  18. Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
  19. Powertransformer: Unsupervised controllable revision for biased language correction.
  20. StereoSet: Measuring stereotypical bias in pretrained language models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 5356–5371, Online. Association for Computational Linguistics.
  21. CrowS-pairs: A challenge dataset for measuring social biases in masked language models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1953–1967, Online. Association for Computational Linguistics.
  22. HONEST: Measuring hurtful sentence completion in language models. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2398–2406, Online. Association for Computational Linguistics.
  23. Connotation frames of power and agency in modern films. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2329–2334, Copenhagen, Denmark. Association for Computational Linguistics.
  24. The woman worked as a babysitter: On biases in language generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3407–3412, Hong Kong, China. Association for Computational Linguistics.
  25. “I’m sorry to hear that”: Finding new biases in language models with a holistic descriptor dataset. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 9180–9211, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  26. They, them, theirs: Rewriting with gender-neutral english.
  27. Ewoenam Kwaku Tokpo and Toon Calders. 2022. Text style transfer for bias mitigation using masked language modeling. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop, pages 163–171, Hybrid: Seattle, Washington + Online. Association for Computational Linguistics.
  28. Neutral rewriter: A rule-based and neural approach to automatic rewriting into gender-neutral alternatives.
  29. HeteroCorpus: A corpus for heteronormative language detection. In Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP), pages 225–234, Seattle, Washington. Association for Computational Linguistics.
  30. Chain-of-thought prompting elicits reasoning in large language models.
  31. Xusheng Yang. 2022. Transferring styles between sarcastic and unsarcastic text using shap, gpt-2 and pplm. In 2022 4th International Conference on Natural Language Processing (ICNLP), pages 390–394.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Harnoor Dhingra (1 paper)
  2. Preetiha Jayashanker (1 paper)
  3. Sayali Moghe (1 paper)
  4. Emma Strubell (60 papers)
Citations (10)
Youtube Logo Streamline Icon: https://streamlinehq.com