Re-contextualizing Fairness in NLP: The Case of India (2209.12226v5)

Published 25 Sep 2022 in cs.CL and cs.CY

Abstract: Recent research has revealed undesirable biases in NLP data and models. However, these efforts focus on social disparities in West, and are not directly portable to other geo-cultural contexts. In this paper, we focus on NLP fair-ness in the context of India. We start with a brief account of the prominent axes of social disparities in India. We build resources for fairness evaluation in the Indian context and use them to demonstrate prediction biases along some of the axes. We then delve deeper into social stereotypes for Region andReligion, demonstrating its prevalence in corpora and models. Finally, we outline a holistic research agenda to re-contextualize NLP fairness research for the Indian context, ac-counting for Indian societal context, bridging technological gaps in NLP capabilities and re-sources, and adapting to Indian cultural values. While we focus on India, this framework can be generalized to other geo-cultural contexts.

Authors (5)

Shaily Bhatt (8 papers)
Sunipa Dev (28 papers)
Partha Talukdar (51 papers)
Shachi Dave (12 papers)
Vinodkumar Prabhakaran (48 papers)

Citations (44)

View on Semantic Scholar

Summary

Re-contextualizing Fairness in NLP: The Case of India

The paper sheds light on the pressing need to address fairness in NLP within specific geo-cultural contexts, using India as a case paper. The research presented explores the social and technical intricacies associated with assessing and improving fairness in NLP models and datasets, considering India's unique socio-cultural fabric.

Overview of the Paper

The authors begin by acknowledging that much of the existing research on fairness in NLP is skewed toward Western norms, leaving a gap when it comes to applying these findings to non-Western contexts like India. They identify several axes of social disparities prevalent in India—such as Region, Caste, Gender, Religion, Ability, and Gender Identity and Sexual Orientation—and argue that these require distinct consideration in fairness research. The paper focuses specifically on Region and Religion as demonstrative axes of bias.

Methodology and Findings

To explore the biases inherent in NLP models, the authors utilize identity terms, personal names, and dialectal features as proxies for demographic subgroups. They conduct an empirical analysis using predictive models to demonstrate biases in sentiment and stereotype associations. For instance, the paper shows that sentiment prediction models exhibit varying sentiment shifts when regional identity terms are swapped, revealing implicit biases toward certain Indian regions.

The authors also curate and annotate a dataset of stereotypical associations for Region and Religion, aiming to identify and analyze biases in large NLP corpora and models. The results indicate a significant prevalence of stereotypes in model predictions, which are more pronounced when incorporating Indian names and dialectal features. This highlights the intensity of the biases when models are subjected to contextually relevant inputs.

Implications

The implications of this paper are multifold. Practically, it emphasizes the need for NLP models to be fine-tuned with context-specific data to ensure fairness across diverse geographies and cultures. Theoretically, the paper advocates for a more global understanding of fairness that transcends Western paradigms, pointing out that ethical AI needs to be aligned with local values and societal norms.

The paper proposes a holistic agenda for re-contextualizing NLP fairness in India, focusing on three key aspects: accounting for social disparities, bridging technological gaps across languages, and aligning fairness interventions with local value systems. The authors call for participatory approaches in data annotation, inclusion of diverse demographics in research, and culturally aligned justice models.

Future Directions

The research invites further exploration into intersectional biases and the development of resources across more axes and languages. It underscores the necessity of designing multilingual fairness evaluation strategies that account for variances in bias manifestation across languages. Additionally, given India's normative frameworks around fairness, particularly its restorative justice systems, future work could explore how these can inform computational ethics.

In summary, the paper presents an in-depth investigation into fairness in NLP, uniquely contextualized within the Indian socio-cultural landscape. It posits a framework that could be adapted for similar research in other non-Western contexts, urging the NLP community to consider broader, more inclusive definitions of fairness.

PDF Markdown

Related Papers

Find Related Papers

Tweets

https://twitter.com/shaily99/status/1762851345047196019

YouTube

Show All Videos