Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The "Colonial Impulse" of Natural Language Processing: An Audit of Bengali Sentiment Analysis Tools and Their Identity-based Biases (2401.10535v1)

Published 19 Jan 2024 in cs.CL, cs.CY, cs.HC, and cs.LG

Abstract: While colonization has sociohistorically impacted people's identities across various dimensions, those colonial values and biases continue to be perpetuated by sociotechnical systems. One category of sociotechnical systems--sentiment analysis tools--can also perpetuate colonial values and bias, yet less attention has been paid to how such tools may be complicit in perpetuating coloniality, although they are often used to guide various practices (e.g., content moderation). In this paper, we explore potential bias in sentiment analysis tools in the context of Bengali communities that have experienced and continue to experience the impacts of colonialism. Drawing on identity categories most impacted by colonialism amongst local Bengali communities, we focused our analytic attention on gender, religion, and nationality. We conducted an algorithmic audit of all sentiment analysis tools for Bengali, available on the Python package index (PyPI) and GitHub. Despite similar semantic content and structure, our analyses showed that in addition to inconsistencies in output from different tools, Bengali sentiment analysis tools exhibit bias between different identity categories and respond differently to different ways of identity expression. Connecting our findings with colonially shaped sociocultural structures of Bengali communities, we discuss the implications of downstream bias of sentiment analysis tools.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (182)
  1. International Energy Agency. 2019. Emissions – Global Energy & CO2 Status Report 2019 – Analysis - IEA. https://www.iea.org/reports/global-energy-co2-status-report-2019/emissions. Last accessed: August 3, 2023.
  2. Towards Detecting Political Bias in Hindi News Articles. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. Association for Computational Linguistics, Dublin, Ireland, 239–244. https://doi.org/10.18653/v1/2022.acl-srw.17
  3. Labor market discrimination in Bangladesh: Experimental evidence from the job market of college graduates. (2023).
  4. Olga Akselrod. 2021. How artificial intelligence can deepen racial and economic inequities. https://www.aclu.org/news/privacy-technology/how-artificial-intelligence-can-deepen-racial-and-economicinequities. Last accessed: Sep 11, 2023.
  5. Syed Mustafa Ali. 2016. A brief introduction to decolonial computing. XRDS: Crossroads, The ACM Magazine for Students 22, 4 (2016), 16–21.
  6. Tariq Ali. 1971. Bangla Desh: Results and Prospects. New Left Review 68 (1971), 3–55.
  7. Power to the people: The role of humans in interactive machine learning. Ai Magazine 35, 4 (2014), 105–120.
  8. Benedict Anderson. 2006. Imagined communities: Reflections on the origin and spread of nationalism. Verso books.
  9. Ahmed Ansari. 2020. Design’s Missing Others and Their Incommensurate Worlds. Design in Crisis New Worlds, Philosophies and Practices (2020).
  10. Human-centered data science: an introduction. MIT Press.
  11. Sabrina Argoub. 2021. The NLP divide: English is not the only natural language. https://blogs.lse.ac.uk/polis/2021/06/09/the-nlp-divide-english-is-not-the-only-natural-language/. Last accessed: Sep 10, 2023.
  12. Gaurav Arora. 2020. inltk: Natural language toolkit for indic languages. arXiv preprint arXiv:2009.12534 (2020).
  13. Mariam Attia and Julian Edge. 2017. Be (com) ing a reflexive researcher: a developmental approach to research methodology. Open Review of Educational Research 4, 1 (2017), 33–45.
  14. Imran Awan. 2016. Islamophobia on social media: A qualitative analysis of the facebook’s walls of hate. International Journal of Cyber Criminology 10, 1 (2016), 1.
  15. Casteism in India, but Not Racism - a Study of Bias in Word Embeddings of Indian Languages. In Proceedings of the First Workshop on Language Technology and Resources for a Fair, Inclusive, and Safe Society within the 13th Language Resources and Evaluation Conference. European Language Resources Association, Marseille, France, 1–7. https://aclanthology.org/2022.lateraisse-1.1
  16. Younggue Bae and Hongchul Lee. 2012. Sentiment analysis of twitter audiences: Measuring the positive or negative influence of popular twitterers. Journal of the American Society for Information Science and technology 63, 12 (2012), 2521–2535.
  17. Ricardo Baeza-Yates. 2020. Bias in search and recommender systems. In Proceedings of the 14th ACM Conference on Recommender Systems. 2–2.
  18. Sarbani Banerjee. 2015. ” More or Less” Refugee?: Bengal Partition in Literature and Cinema. The University of Western Ontario (Canada).
  19. Government of the People’s Republic of Bangladesh. 1972. The Constitution of the People’s Republic of Bangladesh: Nationalism. Last accessed: Aug 28, 2023.
  20. Debiasing multilingual word embeddings: A case study of three indian languages. In Proceedings of the 32nd ACM Conference on Hypertext and Social Media. 27–34.
  21. R. Benjamin. 2019. Race After Technology: Abolitionist Tools for the New Jim Code. Polity Press.
  22. Marianne Bertrand and Sendhil Mullainathan. 2004. Are Emily and Greg more employable than Lakisha and Jamal? A field experiment on labor market discrimination. American economic review 94, 4 (2004), 991–1013.
  23. Re-contextualizing fairness in NLP: The case of India. arXiv preprint arXiv:2209.12226 (2022).
  24. Re-contextualizing Fairness in NLP: The Case of India. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Online only, 727–740. https://aclanthology.org/2022.aacl-main.55
  25. Banglabert: Combating embedding barrier for low-resource language understanding. arXiv preprint arXiv:2101.00204 (2021).
  26. Steven Bird. 2020. Decolonising speech and language technology. In Proceedings of the 28th international conference on computational linguistics. 3504–3519.
  27. Language (technology) is power: A critical survey of” bias” in nlp. arXiv preprint arXiv:2005.14050 (2020).
  28. Su Lin Blodgett and Brendan O’Connor. 2017. Racial disparity in natural language processing: A case study of social media african-american english. arXiv preprint arXiv:1707.00061 (2017).
  29. Carlo Emilio Bonferroni. 1935. The calculation of assurance from groups of tests. Studi in Onore del Professore Salvatore Ortu Carboni (1935).
  30. Geoffrey C Bowker and Susan Leigh Star. 2000. Sorting things out: Classification and its consequences. MIT press.
  31. Meredith Broussard. 2019. Artificial unintelligence. MIT press.
  32. Perspectives: An open introduction to cultural anthropology. Vol. 2300. American Anthropological Association.
  33. Amy Bruckman. 2002. Studying the amateur artist: A perspective on disguising data collected in human subjects research on the Internet. Ethics and Information Technology 4 (2002), 217–231.
  34. Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on fairness, accountability and transparency. PMLR, 77–91.
  35. Judith Butler. 2011. Gender trouble: Feminism and the subversion of identity. routledge.
  36. Anindita Chakrabarty. 2020. Migrant Identity at the Intersection of Postcolonialism and Modernity. Journal of Migration Affairs 2, 2 (2020), 100–116.
  37. Dipesh Chakrabarty. 1996. Remembered villages: representation of Hindu-Bengali memories in the aftermath of the partition. Economic and Political Weekly (1996), 2143–2151.
  38. The Internet’s hidden rules: An empirical study of Reddit norm violations at micro, meso, and macro scales. Proceedings of the ACM on Human-Computer Interaction 2, CSCW (2018), 1–25.
  39. Partha Chatterjee. 1993. The nation and its fragments: Colonial and postcolonial histories. Princeton University Press.
  40. Investigating the impact of gender on rank in resume search engines. In Proceedings of the 2018 chi conference on human factors in computing systems. 1–14.
  41. Peeking beneath the hood of uber. In Proceedings of the 2015 internet measurement conference. 495–508.
  42. John Cheney-Lippold. 2017. We are data. In We Are Data. New York University Press.
  43. Jacob Cohen. 2013. Statistical power analysis for the behavioral sciences. Academic press.
  44. Jacob Cohen. 2016. A power primer. (2016).
  45. Patricia Hill Collins. 2022. Black feminist thought: Knowledge, consciousness, and the politics of empowerment. routledge.
  46. William J Conover and Ronald L Iman. 1979. On Multiple-Comparisons Procedures. Technical Report LA-7677-MS (1979), 124–129.
  47. Kate Crawford. 2021. The atlas of AI: Power, politics, and the planetary costs of artificial intelligence. Yale University Press.
  48. Kimberlé Crenshaw. 2013. Demarginalizing the intersection of race and sex: A black feminist critique of antidiscrimination doctrine, feminist theory and antiracist politics. In Feminist legal theories. Routledge, 23–51.
  49. Peter Cummings. 2011. Arguments for and against standardized mean differences (effect sizes). Archives of pediatrics & adolescent medicine 165, 7 (2011), 592–596.
  50. Quantifying social biases in NLP: A generalization and empirical comparison of extrinsic fairness metrics. Transactions of the Association for Computational Linguistics 9 (2021), 1249–1267.
  51. Toward Cultural Bias Evaluation Datasets: The Case of Bengali Gender, Religious, and National Identity. In Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP). 68–83.
  52. ” Jol” or” Pani”?: How Does Governance Shape a Platform’s Identity? Proceedings of the ACM on Human-Computer Interaction 5, CSCW2 (2021), 1–25.
  53. Dipto Das and Bryan Semaan. 2022. Collaborative identity decolonization as reclaiming narrative agency: Identity work of Bengali communities on Quora. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–23.
  54. Mithun Das and Animesh Mukherjee. 2023. BanglaAbuseMeme: A Dataset for Bengali Abusive Meme Classification. arXiv preprint arXiv:2310.11748 (2023).
  55. Veena Das. 2006. Life and Words: Violence and the Descent into the Ordinary. Univ of California Press.
  56. Addressing age-related bias in sentiment analysis. In Proceedings of the 2018 chi conference on human factors in computing systems. 1–14.
  57. Afia Dil. 1972. The Hindu and Muslim Dialects of Bengali. Stanford University.
  58. Paul Dimeo. 2002. Colonial bodies, colonial sport:’Martial’Punjabis,’effeminate’Bengalis and the development of Indian football. The international journal of the history of sport 19, 1 (2002), 72–90.
  59. divinAI. 2020. Diversity in Artificial Intelligence: ACM FAccT 2020. https://divinai.org/conf/74/acm-facct. Last accessed: Sep 12, 2023.
  60. Paul Dourish and Scott D Mainwaring. 2012. Ubicomp’s colonial impulse. In Proceedings of the 2012 ACM conference on ubiquitous computing. 133–142.
  61. Sutapa Dutta. 2021. Packing a Punch at the Bengali Babu. South Asia: Journal of South Asian Studies 44, 3 (2021), 437–458.
  62. Racial discrimination in the sharing economy: Evidence from a field experiment. American economic journal: applied economics 9, 2 (2017), 1–22.
  63. Benjamin G Edelman and Michael Luca. 2014. Digital discrimination: The case of Airbnb. com. Harvard Business School NOM Unit Working Paper 14-054 (2014).
  64. Upol Ehsan and Mark O Riedl. 2020. Human-centered explainable ai: Towards a reflective sociotechnical approach. In HCI International 2020-Late Breaking Papers: Multimodality and Intelligence: 22nd HCI International Conference, HCII 2020, Copenhagen, Denmark, July 19–24, 2020, Proceedings 22. Springer, 449–466.
  65. Erik H Erikson. 1968. Identity youth and crisis. Number 7. WW Norton & company.
  66. Maria Eriksson and Anna Johansson. 2017. Tracking gendered streams. Culture unbound. Journal of Current Cultural Research 9, 2 (2017), 163–183.
  67. Dialects, cultural identity, and economic exchange. Journal of urban economics 72, 2-3 (2012), 225–239.
  68. Angela Fan and Claire Gardent. 2022. Generating Biographies on Wikipedia: The Impact of Gender Bias on the Retrieval-Based Generation of Women Biographies. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Dublin, Ireland, 8561–8576. https://doi.org/10.18653/v1/2022.acl-long.586
  69. Black Skin, White Masks. Grove Atlantic. https://books.google.com/books?id=W45-IrrK-_sC
  70. Casey Fiesler and Nicholas Proferes. 2018. “Participant” perceptions of Twitter research ethics. Social Media+ Society 4, 1 (2018), 2056305118763366.
  71. Sentiment analysis of online news using mallet. In 2013 International Symposium on Computational and Business Intelligence. IEEE, 301–304.
  72. Batya Friedman and Helen Nissenbaum. 1996. Bias in computer systems. ACM Transactions on information systems (TOIS) 14, 3 (1996), 330–347.
  73. Roland G Fryer Jr and Steven D Levitt. 2004. The causes and consequences of distinctively black names. The Quarterly Journal of Economics 119, 3 (2004), 767–805.
  74. Viktor Gecas. 1982. The self-concept. Annual review of sociology 8, 1 (1982), 1–33.
  75. Anindita Ghoshal. 2021. ‘mirroring the other’: Refugee, homeland, identity and diaspora. In Routledge Handbook of Asian Diaspora and Development. Routledge, 147–158.
  76. Erving Goffman. 1978. The presentation of self in everyday life. Harmondsworth London.
  77. Sentiment analysis of health care tweets: review of the methods used. JMIR public health and surveillance 4, 2 (2018), e5789.
  78. Jonathan D Greenberg. 2005. Generations of memory: remembering partition in India/Pakistan and Israel/Palestine. Comparative Studies of South Asia, Africa and the Middle East 25, 1 (2005), 89–110.
  79. Towards a critical race methodology in algorithmic fairness. In Proceedings of the 2020 conference on fairness, accountability, and transparency. 501–512.
  80. Measuring price discrimination and steering on e-commerce web sites. In Proceedings of the 2014 conference on internet measurement conference. 305–318.
  81. Bias in online freelance marketplaces: Evidence from taskrabbit and fiverr. In Proceedings of the 2017 ACM conference on computer supported cooperative work and social computing. 1914–1933.
  82. Drew Harwell. 2018. Why some accents don’t work on Alexa or Google Home - Washington Post. https://www.washingtonpost.com/graphics/2018/business/alexa-does-not-understand-your-accent/. Last accessed: Sep 4, 2023.
  83. Challenges and strategies in cross-cultural NLP. arXiv preprint arXiv:2203.10020 (2022).
  84. Danula Hettiachchi and Jorge Goncalves. 2019. Towards effective crowd-powered online content moderation. In Proceedings of the 31st Australian Conference on Human-Computer-Interaction. 342–346.
  85. “you sound just like your father” commercial machine translation systems include stylistic biases. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 1686–1690.
  86. Uncovering Implicit Gender Bias in Narratives through Commonsense Inference. In Findings of the Association for Computational Linguistics: EMNLP 2021. Association for Computational Linguistics, Punta Cana, Dominican Republic, 3866–3873. https://doi.org/10.18653/v1/2021.findings-emnlp.326
  87. Postcolonial computing: a lens on design and development. In Proceedings of the SIGCHI conference on human factors in computing systems. 1311–1320.
  88. Alvi Md Ishmam and Sadia Sharmin. 2019. Hateful speech detection in public facebook pages for the bengali language. In 2019 18th IEEE international conference on machine learning and applications (ICMLA). IEEE, 555–560.
  89. Human-machine collaboration for content regulation: The case of reddit automoderator. ACM Transactions on Computer-Human Interaction (TOCHI) 26, 5 (2019), 1–35.
  90. Does transparency in moderation really matter? User behavior after content removal explanations on reddit. Proceedings of the ACM on Human-Computer Interaction 3, CSCW (2019), 1–27.
  91. A trade-off-centered framework of content moderation. ACM Transactions on Computer-Human Interaction 30, 1 (2023), 1–34.
  92. The effect of population and” structural” biases on social media-based algorithms: A case study in geolocation inference across the urban-rural spectrum. In Proceedings of the 2017 CHI conference on Human Factors in Computing Systems. 1167–1178.
  93. The State and Fate of Linguistic Diversity and Inclusion in the NLP World. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 6282–6293. https://doi.org/10.18653/v1/2020.acl-main.560
  94. Prerna Juneja and Tanushree Mitra. 2021. Auditing e-commerce platforms for algorithmically curated vaccine misinformation. In Proceedings of the 2021 chi conference on human factors in computing systems. 1–27.
  95. Dip Kapoor. 2012. Human rights as paradox and equivocation in contexts of Adivasi (original dweller) dispossession in India. Journal of Asian and African Studies 47, 4 (2012), 404–420.
  96. Bangladeshi Bangla speech corpus for automatic speech recognition research. Speech Communication 136 (2022).
  97. Svetlana Kiritchenko and Saif Mohammad. 2018. Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems. In Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics. Association for Computational Linguistics, New Orleans, Louisiana, 43–53. https://doi.org/10.18653/v1/S18-2005
  98. William H Kruskal and W Allen Wallis. 1952. Use of ranks in one-criterion variance analysis. Journal of the American statistical Association 47, 260 (1952), 583–621.
  99. James Lane. 2023. The 10 Most Spoken Languages In The World. https://www.babbel.com/en/magazine/the-10-most-spoken-languages-in-the-world. [Accessed 23-08-2023].
  100. Acoustic and lexical sentiment analysis for customer service calls. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 5876–5880.
  101. Embracing four tensions in human-computer interaction research with marginalized people. ACM Transactions on Computer-Human Interaction (TOCHI) 28, 2 (2021), 1–47.
  102. How weird is CHI?. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–14.
  103. Ania Loomba. 2002. Colonialism/postcolonialism. Routledge.
  104. María Lugones. 2007. Heterosexualism and the colonial/modern gender system. Hypatia 22, 1 (2007), 186–219.
  105. Henry B Mann and Donald R Whitney. 1947. On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics (1947), 50–60.
  106. Leslie McCall. 2005. The complexity of intersectionality. Signs: Journal of women in culture and society 30, 3 (2005).
  107. Hexagonal variations: diversity, plurality and reinvention in contemporary France. Vol. 359. Rodopi.
  108. Mary L McHugh. 2013. The chi-square test of independence. Biochemia medica 23, 2 (2013), 143–149.
  109. A survey on bias and fairness in machine learning. ACM computing surveys (CSUR) 54, 6 (2021), 1–35.
  110. Auditing algorithms: Understanding algorithmic systems from the outside in. Foundations and Trends® in Human–Computer Interaction 14, 4 (2021), 272–344.
  111. Walter D Mignolo. 2007. Delinking: The rhetoric of modernity, the logic of coloniality and the grammar of de-coloniality. Cultural studies 21, 2-3 (2007), 449–514.
  112. Maria D Molina and S Shyam Sundar. 2022. When AI moderates online content: effects of human collaboration and interactive transparency on user trust. Journal of Computer-Mediated Communication 27, 4 (2022), zmac010.
  113. Platform-mediated Markets, Online Freelance Workers and Deconstructed Identities. Proceedings of the ACM on Human-Computer Interaction 6, CSCW2 (2022).
  114. Ashis Nandy. 1989. The Intimate Enemy: Loss and Recovery of Self Under Colonialism. Oxford University Press Oxford.
  115. Gabriel Nicholas and Aliya Bhatia. 2023a. Lost in Translation: Large Language Models in Non-English Content Analysis. arXiv preprint arXiv:2306.07377 (2023).
  116. Gabriel Nicholas and Aliya Bhatia. 2023b. Re: Request for Information (RFI) on Developing a Roadmap for the Directorate for Technology, Innovation, and Partnerships at the National Science Foundation— 88 FR 26345. https://cdt.org/wp-content/uploads/2023/08/Non-EN-NLP-_-NSF-RFI-_-Draft-of-CDT-Comment.pdf. (2023).
  117. Helen Nissenbaum. 2001. How computer systems embody values. Computer 34, 3 (2001), 120–119.
  118. Dissecting racial bias in an algorithm used to manage the health of populations. Science 366, 6464 (2019), 447–453.
  119. Shall AI moderators be made visible? Perception of accountability and trust in moderation systems on social media platforms. Big Data & Society (2022).
  120. Comparing the perceived legitimacy of content moderation processes: Contractors, algorithms, expert panels, and digital juries. Proceedings of the ACM on Human-Computer Interaction 6, CSCW1 (2022), 1–31.
  121. G. Pandey. 2001. Remembering Partition: Violence, Nationalism and History in India. Cambridge University Press. https://books.google.com/books?id=ZdLhnFet4w4C
  122. Bhasa Vidya Parishad. 2001. Praci Bhasavijnan: Indian Journal of Linguistics. Number v. 20. Bhasa Vidya Parishad. https://books.google.com/books?id=0yxhAAAAMAAJ
  123. David Patterson. 2021. How we’re minimizing AI’s carbon footprint. https://blog.google/technology/ai/minimizing-carbon-footprint/. Last accessed: August 3, 2023.
  124. Karl Pearson. 1900. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 50, 302 (1900), 157–175.
  125. Perturbation sensitivity analysis to detect unintended model biases. arXiv preprint arXiv:1910.04210 (2019).
  126. Maia Ramnath. 2012. Decolonizing anarchism: an antiauthoritarian history of India’s liberation struggle. Vol. 3. AK Press.
  127. Gavin Rand. 2006. ‘Martial races’ and ‘imperial subjects’: violence and governance in colonial India, 1857–1914. European Review of History: Revue européenne d’histoire 13, 1 (2006), 1–20.
  128. Nani Jansen Reventlow. 2021. How Artificial Intelligence Impacts Marginalised Groups. https://digitalfreedomfund.org/how-artificial-intelligence-impacts-marginalised-groups/. Last accessed: Sep 11, 2023.
  129. René Riedl. 2022. Is trust in artificial intelligence systems related to user personality? Review of empirical evidence and future research directions. Electronic Markets 32, 4 (2022), 2021–2051.
  130. Auditing partisan audience bias within google search. Proceedings of the ACM on Human-Computer Interaction 2, CSCW (2018).
  131. Julia Romberg and Tobias Escher. [n. d.]. Making Sense of Citizens’ Input through Artificial Intelligence: A Review of Methods for Computational Text Analysis to Support the Evaluation of Contributions in Public Participation. Digital Government: Research and Practice ([n. d.]).
  132. Christian Rudder. 2013. Inside OKCupid: The math of online dating. https://ed.ted.com/lessons/inside-okcupid-the-math-of-online-dating-christian-rudder. [Accessed 20-08-2023].
  133. Henrik Skaug Sætra. 2021. AI in context and the sustainable development goals: Factoring in the unsustainability of the sociotechnical system. Sustainability 13, 4 (2021), 1738.
  134. E.W. Said. 2014. Orientalism. Knopf Doubleday Publishing Group.
  135. Auditing algorithms: Research methods for detecting discrimination on internet platforms. Data and discrimination: converting critical concerns into productive inquiry 22, 2014 (2014), 4349–4357.
  136. The risk of racial bias in hate speech detection. In Proceedings of the 57th annual meeting of the association for computational linguistics. 1668–1678.
  137. Under the Morphosyntactic Lens: A Multifaceted Evaluation of Gender Bias in Speech Translation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Dublin, Ireland, 1807–1824. https://doi.org/10.18653/v1/2022.acl-long.127
  138. Steve Sawyer and Mohammad Hossein Jarrahi. 2014. Sociotechnical approaches to the study of information systems. In Computing handbook, third edition: Information systems and information technology. CRC Press, 5–1.
  139. Rethinking” Risk” in Algorithmic Systems Through A Computational Narrative Analysis of Casenotes in Child-Welfare. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–19.
  140. Unpacking invisible work practices, constraints, and latent power relationships in child welfare through casenote analysis. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–22.
  141. How to Train a (Bad) Algorithmic Caseworker: A Quantitative Deconstruction of Risk Assessments in Child Welfare. In CHI Conference on Human Factors in Computing Systems Extended Abstracts. 1–7.
  142. Salim Sazzed. 2020. Cross-lingual sentiment classification in low-resource Bengali language. In Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020). 50–60.
  143. Do datasets have politics? Disciplinary values in computer vision dataset development. Proceedings of the ACM on Human-Computer Interaction 5, CSCW2 (2021).
  144. Auto-essentialization: Gender in automated facial analysis as extended colonial project. Big Data & Society 8, 2 (2021), 20539517211053712.
  145. How computers see gender: An evaluation of gender classification in commercial facial analysis services. Proceedings of the ACM on Human-Computer Interaction 3, CSCW (2019), 1–33.
  146. How we’ve taught algorithms to see identity: Constructing race and gender in image databases for facial analysis. Proceedings of the ACM on Human-computer Interaction 4, CSCW1 (2020), 1–35.
  147. Intersectional HCI: Engaging identity through gender, race, and class. In Proceedings of the 2017 CHI conference on human factors in computing systems. 5412–5427.
  148. Moderator engagement and community development in the age of algorithms. New Media & Society 21, 7 (2019), 1417–1443.
  149. Dwaipayan Sen. 2018. The decline of the caste question: Jogendranath Mandal and the defeat of Dalit politics in Bengal. Cambridge University Press.
  150. Turkers, scholars,” arafat” and” peace” cultural communities and algorithmic gold standards. In Proceedings of the 18th acm conference on computer supported cooperative work & social computing.
  151. Samuel Sanford Shapiro and Martin B Wilk. 1965. An analysis of variance test for normality (complete samples). Biometrika 52, 3/4 (1965), 591–611.
  152. Gurharpal Singh and Heewon Kim. 2018. The limits of India’s ethno-linguistic federation: Understanding the demise of Sikh nationalism. Regional & Federal Studies 28, 4 (2018), 427–445.
  153. Mrinalini Sinha. 2017. Colonial masculinity: The ‘manly Englishman’and the ‘effeminate Bengali’in the late nineteenth century. In Colonial masculinity. Manchester University Press.
  154. Manjira Sinha and Anupam Basu. 2016. A study of readability of texts in Bangla through machine learning approaches. Education and information technologies 21, 5 (2016), 1071–1094.
  155. Hayden Smith and William Cipolli. 2022. The Instagram/Facebook ban on graphic self-harm imagery: A sentiment analysis and topic modeling approach. Policy & Internet 14, 1 (2022), 170–185.
  156. Time for historicism in CSCW: An invitation. Proceedings of the ACM on Human-Computer Interaction 5, CSCW2 (2021), 1–18.
  157. What’s in a p-value in NLP?. In Proceedings of the eighteenth conference on computational natural language learning. 1–10.
  158. Gayatri Chakravorty Spivak. 2023. Can the subaltern speak? In Imperialism. Routledge, 171–219.
  159. Ramya Srinivasan and Kanji Uchino. 2021. Biases in generative art: A causal look from the lens of art history. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. 41–51.
  160. Energy and policy considerations for deep learning in NLP. arXiv preprint arXiv:1906.02243 (2019).
  161. Design and Application of an AI-Based Text Content Moderation System. Scientific Programming 2022 (2022).
  162. Matt Swayne. 2022. Users trust AI as much as humans for flagging problematic content — Penn State University. https://www.psu.edu/news/institute-computational-and-data-sciences/story/users-trust-ai-much-humans-flagging-problematic/. Last accessed: August 10, 2023.
  163. Latanya Sweeney. 2013a. Discrimination in online ad delivery. Commun. ACM 56, 5 (2013), 44–54.
  164. Latanya Sweeney. 2013b. Discrimination in online ad delivery: Google ads, black names and white names, racial discrimination, and click advertising. Queue 11, 3 (2013), 10–29.
  165. Henri Tajfel. 1974. Social identity and intergroup behaviour. Social science information 13, 2 (1974), 65–93.
  166. Diana Taylor. 2003. The archive and the repertoire: Performing cultural memory in the Americas. Duke University Press.
  167. Casteism in India, but not racism-a study of bias in word embeddings of Indian languages. In Proceedings of the First Workshop on Language Technology and Resources for a Fair, Inclusive, and Safe Society within the 13th Language Resources and Evaluation Conference. 1–7.
  168. Occupational Biases in Norwegian and Multilingual Language Models. In Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP). Association for Computational Linguistics, Seattle, Washington, 200–211. https://doi.org/10.18653/v1/2022.gebnlp-1.21
  169. Rediscovering the social group: A self-categorization theory. Oxford: Blackwell.
  170. Conceptualizing visual analytic interventions for content moderation. In 2021 IEEE Visualization Conference (VIS). IEEE.
  171. Robert Van Krieken. 2004. Rethinking cultural genocide: Aboriginal child removal and settler-colonial state formation. Oceania 75, 2 (2004), 125–151.
  172. Nationality Bias in Text Generation. arXiv preprint arXiv:2302.02463 (2023).
  173. The Sentiment Problem: A Critical Survey towards Deconstructing Sentiment Analysis. arXiv preprint arXiv:2310.12318 (2023).
  174. A Study of Implicit Bias in Pretrained Language Models against People with Disabilities. In Proceedings of the 29th International Conference on Computational Linguistics. International Committee on Computational Linguistics, Gyeongju, Republic of Korea, 1324–1332. https://aclanthology.org/2022.coling-1.113
  175. Sanjeev Verma. 2022. Sentiment analysis of public services for smart society: Literature review and future research directions. Government Information Quarterly 39, 3 (2022), 101708.
  176. Ashley Marie Walker and Michael A DeVito. 2020. ”’More gay’fits in better”: Intracommunity Power Dynamics and Harms in Online LGBTQ+ Spaces. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems.
  177. A system for real-time twitter sentiment analysis of 2012 us presidential election cycle. In Proceedings of the ACL 2012 system demonstrations.
  178. Frank Wilcoxon. 1992. Individual comparisons by ranking methods. In Breakthroughs in Statistics: Methodology and Distribution. Springer, 196–202.
  179. Langdon Winner. 2017. Do artifacts have politics? In Computer ethics. Routledge, 177–192.
  180. A survey of sentiment analysis in social media. Knowledge and Information Systems 60 (2019), 617–663.
  181. Understanding and evaluating racial biases in image captioning. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 14830–14840.
  182. Ten simple rules for responsible big data research.
Citations (8)

Summary

  • The paper examines Bengali sentiment tools to audit bias rooted in colonial legacies against gender, religion, and nationality.
  • It applies an algorithmic audit using the BIBED dataset to measure identity biases in outputs from popular NLP tools.
  • Findings call for intersectional collaboration and more inclusive design to mitigate social biases in language technology.

Introduction

Natural language processing (NLP) has burgeoned with advancements that help machines interpret human emotions conveyed through text via sentiment analysis. While essential, this NLP domain's reliance on quantifying complex emotional experiences can miss nuances, inadvertently enforcing social and technical biases. Particularly in non-English NLP work, there is a disparity in research, disadvantaging languages like Bengali. This paper takes strides to analyze the biases in Bengali sentiment analysis (BSA) tools, centered on identity categories profoundly affected by colonialism.

Literature Review

The critical discourse on NLP has underscored an imbalance in linguistic research focus and resources, which is starkly highlighted in the comparison between English and Bengali language tools. Drawing from the concept of sociotechnical systems, the paper advances the notion that sentiment analysis tools interweave with social interaction—affected by the developers and the usage context. Embedded biases in these tools can replicate colonial ideologies and identity categorizations. Previous work outlines that while identities are often multidimensional, colonial impressions have historically altered self-perception concerning gender, religion, and nationality in Bengali societies. This paper further interrogates how sentiment tools process these identities.

Methods

The paper performs an algorithmic audit on sentiment analysis tools sourced from the Python Package Index (PyPI) and GitHub. Identity expressions examined are gender, religion, and nationality, drawing from the Bengali community's historical interactions with colonization. An existing Bengali Identity Bias Evaluation Dataset (BIBED) served as the benchmark to evaluate if BSA tools consistently discriminate against particular identities. By interacting with these tools and evaluating the outputs, the paper quantifies bias and its relationship with the developer demographics.

Results and Discussion

The findings are telling. Despite tools being fed sentences with identical context, varying responses are noted across different BSA tools, debunking the claim of universality that often upholds sentiment analysis methodologies. Significant biases were detected toward specific identities—gender, religion, and nationality. Contrary findings were that while there is bias in BSA tools, there isn't a clear link to the developers' demographic backgrounds.

The paper's discussion brings to light the "colonial impulse" in sentiment analysis tools, reflecting colonial power dynamics by favoring certain identities over others. It calls for intersectional collaboration among developers to ensure that design processes are inclusive and account for bias in sentiment analysis. Furthermore, the repercussions of its findings are pivotal when considering downstream applications like content moderation, where biases could amplify social divisions and hamper inclusive user engagement.

Conclusion

In conclusion, the paper provides a critical examination of BSA tools, showcasing the persistence of colonial values in technology. Emphasizing the need for diversity and intersectionality in technology development, the work calls for an engineering activism that is cognizant not just of technical prowess but also of the social fabric that it is inevitably woven into. This resonates deeply within the CHI community, situating the paper at the confluence of technology, critical theory, and social justice.