The paper "Towards Centering Transgender and Non-Binary Voices to Measure Biases in Open Language Generation" presents an investigation into how language generation technologies, such as LLMs, may inadvertently marginalize or discriminate against Transgender and Non-Binary (TGNB) individuals through biases in generated text. The paper provides a comprehensive evaluation framework for measuring such biases and introduces a dataset named TANGO, purpose-built to assess misgendering and harmful language in response to gender disclosures.
The authors gather data from the Nonbinary Wiki to construct TANGO, which consists of real-world text instances and templates related to TGNB experiences. The dataset includes a set of prompts for examining pronoun consistency (to detect misgendering) and another set for measuring potentially harmful responses to disclosures of gender identity. Results indicate widespread misgendering by LLMs, especially when prompts include lesser-known TGNB-specific pronouns (neopronouns), and reveal that LLMs are prone to generate harmful responses to gender disclosures, particularly for non-binary and gender-fluid identities.
The paper finds that generated texts are less harmful when binary gender pronouns are used, revealing a bias toward traditional gender norms. Additionally, LLMs struggle with the grammatical rules for neopronouns, hinting at broader issues of pronoun recognition and representation in AI systems. A case paper with ChatGPT demonstrates the need for further research and development of more inclusive language technologies.
The research warns against the erasure of TGNB identities and suggests avenues for future improvement, such as pretraining with more diverse corpora, refining tokenizers to preserve the structural integrity of TGNB pronouns, and employing in-context learning techniques with various TGNB examples. The authors call for centering marginalized voices in AI and recommend increased scrutiny of the normative assumptions behind toxicity annotation and LLM development.