Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 69 tok/s
Gemini 2.5 Pro 53 tok/s Pro
GPT-5 Medium 42 tok/s Pro
GPT-5 High 41 tok/s Pro
GPT-4o 120 tok/s Pro
Kimi K2 191 tok/s Pro
GPT OSS 120B 459 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Machines Do See Color: A Guideline to Classify Different Forms of Racist Discourse in Large Corpora (2401.09333v2)

Published 17 Jan 2024 in cs.CL and cs.LG

Abstract: Current methods to identify and classify racist language in text rely on small-n qualitative approaches or large-n approaches focusing exclusively on overt forms of racist discourse. This article provides a step-by-step generalizable guideline to identify and classify different forms of racist discourse in large corpora. In our approach, we start by conceptualizing racism and its different manifestations. We then contextualize these racist manifestations to the time and place of interest, which allows researchers to identify their discursive form. Finally, we apply XLM-RoBERTa (XLM-R), a cross-lingual model for supervised text classification with a cutting-edge contextual understanding of text. We show that XLM-R and XLM-R-Racismo, our pretrained model, outperform other state-of-the-art approaches in classifying racism in large corpora. We illustrate our approach using a corpus of tweets relating to the Ecuadorian ind\'igena community between 2018 and 2021.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (72)
  1. Allport, Gordon W. 1954. The nature of prejudice. Doubleday Garden City, New York.
  2. Aruguete, Natalia and Ernesto Calvo. 2018. “Time to# protest: Selective exposure, cascading activation, and framing in social media.” Journal of communication 68(3):480–502.
  3. Beauvais, Edana. 2022. “The political consequences of Indigenous resentment.” Journal of Race, Ethnicity, and Politics 7(1):37–64.
  4. “?‘ Qué es racismo? Awareness of Racism and Discrimination in Ecuador.” Latin American research review pp. 102–125.
  5. “The better angels of our nature: How the antiprejudice norm affects policy and party preferences in Great Britain and Germany.” American Journal of Political Science 57(4):841–857.
  6. Bonfil Batalla, Guillermo. 1977. “El concepto de indio en América: una categoría de la situación colonial.” Boletín Bibliográfico de Antropología Americana (1973-1979) 39(48):17–32.
  7. Bonilla, Adrián and Mónica Mancero. 2020. ““Venimos a luchaar por el pueblo, no por el poder”: El levantamiento indígena y popular en Ecuador 20191.” Sociología y Política HOY (3):38–47.
  8. Bonilla-Silva, Eduardo. 2001. White supremacy and racism in the post-civil rights era. Lynne Rienner Publishers.
  9. Bonilla-Silva, Eduardo. 2002. “The linguistics of color blind racism: How to talk nasty about blacks without sounding “racist”.” Critical Sociology 28(1-2):41–64.
  10. Bonilla-Silva, Eduardo. 2006. Racism without racists: Color-blind racism and the persistence of racial inequality in the United States. Rowman & Littlefield Publishers.
  11. Bonilla-Silva, Eduardo. 2015. “The structure of racism in color-blind,“post-racial” America.” American Behavioral Scientist 59(11):1358–1376.
  12. Bourabain, Dounia and Pieter-Paul Verhaeghe. 2021. “Everyday racism in social science research: A systematic review and future directions.” Du Bois Review 18(2):221–250.
  13. Calvo, Ernesto and Natalia Aruguete. 2020. Fake news, trolls y otros encantos: Cómo funcionan (para bien y para mal) las redes sociales. Siglo XXI Editores.
  14. “Winning! Election returns and engagement in social media.” Plos one 18(3):e0281475.
  15. Canessa, Andrew. 2007. “Who is indigenous? Self-identification, indigeneity, and claims to justice in contemporary Bolivia.” Urban Anthropology and Studies of Cultural Systems and World Economic Development pp. 195–237.
  16. Canizales, Stephanie L and Jody Agius Vallejo. 2021. “Latinos & racism in the Trump era.” Daedalus 150(2):150–164.
  17. Chin, William Y. 2015. “The age of covert racism in the era of the roberts court during the waning of affirmative action.” Rutgers Race & L. Rev. 16:1.
  18. Coates, Rodney D. 2008. “Covert Racism in the USA and Globally.” Sociology Compass 2(1):208–231.
  19. Colloredo-Mansfeld, Rudi. 1998. “‘Dirty Indians’, radical indígenas, and the political economy of social difference in modern Ecuador.” Bulletin of Latin American Research 17(2):185–205.
  20. “Unsupervised cross-lingual representation learning at scale.” arXiv preprint arXiv:1911.02116 .
  21. Cramer, Katherine. 2020. “Understanding the role of racism in contemporary US public opinion.” Annual Review of Political Science 23:153–169.
  22. DeSante, Christopher D and Candis Watts Smith. 2020. “Fear, institutionalized racism, and empathy: the underlying dimensions of whites’ racial attitudes.” PS: Political Science & Politics 53(4):639–645.
  23. Díaz, Isabel and Adriana Mejía Artieda. 2020. Las elites en octubre: de ciudadanos indignados a propietarios alarmados. In Octubre y el derecho a la resistencia. CLACSO pp. 271–285.
  24. Fields, Barbara J. 2001. “Whiteness, racism, and identity.” International Labor and Working-Class History 60:48–56.
  25. Fields, Barbara J and Karen E Fields. 2022. Racecraft: The soul of inequality in American life. Verso.
  26. Gertner, Abigail S et al. 2019. MITRE at SemEval-2019 task 5: Transfer learning for multilingual hate speech detection. In Proceedings of the 13th International Workshop on Semantic Evaluation. pp. 453–459.
  27. Text as data: A new framework for machine learning and the social sciences. Princeton University Press.
  28. Grosfoguel, Ramon. 2016. “What is racism?” Journal of World-Systems Research 22(1):9–15.
  29. Guerrero, Andrés. 1997a. “The construction of a Ventriloquist’s image: liberal discourse and the ‘Miserable Indian race’in late 19th-century Ecuador.” Journal of Latin American Studies 29(3):555–590.
  30. Guerrero, Andrés. 1997b. “Poblaciones indígenas, ciudadanía y representación.” Nueva Sociedad 150:98–105.
  31. “Race, prejudice and attitudes toward redistribution: A comparative experimental approach.” European Journal of Political Research 55(4):723–744.
  32. Henry, Patrick J and David O Sears. 2002. “The symbolic racism 2000 scale.” Political psychology 23(2):253–283.
  33. Herzog, Benno and Arturo Lance Porfillio. 2022. “Talking with racists: insights from discourse and communication studies on the containment of far-right movements.” Humanities and Social Sciences Communications 9(1):1–7.
  34. Hill, Jane H. 1995. “Junk Spanish, covert racism, and the (leaky) boundary between public and private spheres.” Pragmatics 5(2):197–212.
  35. Huddy, Leonie and Stanley Feldman. 2009. “On assessing the political effects of racial prejudice.” Annual Review of Political Science 12:423–447.
  36. Hutchings, Vincent L and Cara Wong. 2014. “Racism, Group Position, and Attitudes About Immigration Among Blacks and Whites.” Du Bois Review: Social Science Research on Race 11(2):419–442.
  37. Katzew, Ilona. 2005. Casta painting: images of race in eighteenth-century Mexico. Yale University Press.
  38. Kinder, Donald R and David O Sears. 1981. “Prejudice and politics: Symbolic racism versus racial threats to the good life.” Journal of personality and social psychology 40(3):414.
  39. Kinder, Donald R and Lynn M Sanders. 1996. Divided by color: Racial politics and democratic ideals. University of Chicago Press.
  40. Levchak, Charisse C. 2018. Microaggressions, Macroaggressions, and Modern Racism. In Microaggressions and modern racism. Springer pp. 13–69.
  41. “Roberta: A robustly optimized bert pretraining approach.” arXiv preprint arXiv:1907.11692 .
  42. Loper, Edward and Steven Bird. 2002. “Nltk: The natural language toolkit.” arXiv cs/0205028 .
  43. Loshchilov, Ilya and Frank Hutter. 2017. “Decoupled weight decay regularization.” arXiv:1711.05101 .
  44. Marom, Lilach. 2019. “Under the cloak of professionalism: Covert racism in teacher education.” Race Ethnicity and Education 22(3):319–337.
  45. Mason, Lilliana. 2016. “A cross-cutting calm: How social sorting drives affective polarization.” Public Opinion Quarterly 80(S1):351–377.
  46. McConahay, John B. 1986. “Modern racism, ambivalence, and the Modern Racism Scale.”.
  47. “Advances in pre-training distributed word representations.” arXiv preprint arXiv:1712.09405 .
  48. Morgan, Stephen L and Christopher Winship. 2015. Counterfactuals and causal inference. Cambridge University Press.
  49. Oberem, Udo. 1985. “La sociedad indígena durante el Periodo Colonial de Hispanoamérica.” Miscelánea Antropológica Ecuatoriana 5:161–218.
  50. Painter, Nell Irvin. 2010. The history of white people. WW Norton & Company.
  51. Pallares, Amalia. 2002. From peasant struggles to Indian resistance: The Ecuadorian Andes in the late twentieth century. University of Oklahoma Press.
  52. “Comparing pre-trained language models for Spanish hate speech detection.” Expert Systems with Applications 166:114120.
  53. Pons, Pascal and Matthieu Latapy. 2005. Computing communities in large networks using random walks. In International symposium on computer and information sciences. Springer pp. 284–293.
  54. “Why Do White Americans oppose race-targeted policies?” Political psychology 30(5):805–828.
  55. Ravichandiran, S. 2021. Getting Started with Google BERT: Build and train state-of-the-art natural language processing models using BERT. Packt Publishing.
  56. Rich, Paul B et al. 1990. Race and empire in British politics. CUP Archive.
  57. Roitman, Karem and Alexis Oviedo. 2017. “Mestizo racism in Ecuador.” Ethnic and racial studies 40(15):2768–2786.
  58. Schreier, Margrit. 2012. Qualitative content analysis in practice. Sage publications.
  59. Racialized politics: The debate about racism in America. University of Chicago Press.
  60. Sears, David O and Patrick J Henry. 2005. “Over thirty years later: A contemporary look at symbolic racism.” Advances in experimental social psychology 37(1):95–125.
  61. Shoshana, Avihu. 2016. “The language of everyday racism and microaggression in the workplace: Palestinian professionals in Israel.” Ethnic and Racial Studies 39(6):1052–1069.
  62. “The dynamics of racial resentment across the 50 US states.” Perspectives on Politics 18(2):527–538.
  63. Telles, Edward and Stanley Bailey. 2013. “Understanding Latin American beliefs about racial inequality.” American Journal of Sociology 118(6):1559–1595.
  64. Tesler, Michael. 2013. “The return of old-fashioned racism to White Americans’ partisan preferences in the early Obama era.” The Journal of Politics 75(1):110–123.
  65. Traverso-Yépez, Martha. 2005. “Discursos racistas: institucionalización del racismo a través de las prácticas lingüísticas.” Revista Interamericana de Psicología 39(1):61–70.
  66. Natural Language Processing with Transformers. O’Reilly.
  67. Vallejo Vera, Sebastián. 2023. “Rage in the Machine: Activation of Racist Content in Social Media.” Latin American Politics and Society 65(1):74–100.
  68. Van Dijk, Teun A. 2009. Racism and discourse in Latin America: An introduction. In Racism and discourse in Latin America, ed. Teun A Van Dijk. Rowman & Littlefield Publishers pp. 4–13.
  69. Attention is all you need. In Advances in neural information processing systems. pp. 5998–6008.
  70. Whitten Jr, Norman E. 2003. “Symbolic inversion, the topology of El Mestizaje, and the spaces of Las Razasin Ecuador.” Journal of Latin American Anthropology 8(1):52–85.
  71. Wright, Michelle. 2004. Becoming black: Creating identity in the African diaspora. Duke U. Press.
  72. Zhang, Ziqi and Lei Luo. 2019. “Hate speech detection: A solved problem? the challenging case of long tail on twitter.” Semantic Web 10(5):925–945.
Citations (1)

Summary

  • The paper introduces a comprehensive methodology using Transformer-based models like XLM-RoBERTa to detect both overt and covert racist discourse within expansive text corpora.
  • It employs a four-step process that fuses robust theoretical frameworks with advanced BERT-based architectures to capture nuanced racial language in context.
  • Empirical analysis in the Ecuadorian context reveals that overt racist content is more likely from peripheral network actors, highlighting social risks and mobilization dynamics.

Introduction

The analysis of racist discourse within large textual corpora has been beset by methodological challenges, partially due to the nuanced and evolving nature of language underpinning such discourse. Traditional methodologies often focus on overt forms of racism and discount subtler manifestations, which has created a significant research gap in the domain of computational political science. Filling this gap, a comprehensive methodology combines a fundamental theoretical framework with state-of-the-art machine learning techniques, notably the bidirectional Transformers models like XLM-RoBERTa (XLM-R), to detect and categorize instances of covert and overt racism within expansive text corpora adeptly.

Conceptual Foundation

A solid theoretical footing is integral to discerning racism in text. The presented research draws on established definitions, viewing racism not just as a dyadic interaction between individuals but as a pervasive structural phenomenon informed by historical and societal context. The conceptualization of racism adopted amalgamates ideologies of racial domination and prevailing racial stereotypes, which in turn dictate how racist language manifests itself. Consequent to these conceptual grounds, the creation of a coding scheme to define overt and covert racism within the analyzed corpora is pivotal, allowing context-sensitive machinery to navigate the intricacies of racist discourse.

Methodological Advancements

The paper proposes a four-step process that guides researchers to contextualize racism aptly, identify explicit and subsurface forms of racial discourse, and apply cutting-edge NLP technique. The use of BERT-based models, particularly adept at comprehending context, supplements traditional methods. These models leverage Transformer architectures, advanced by their self-attention mechanisms that consider the interdependence of words within textual segments, thereby generating dynamic word representations essential for classifying nuanced discourse.

Empirical Application and Insights

Empirical analysis in the Ecuadorian context elucidates the profundity of the approach. Specialized versions of the XLM-R model are finely tuned, exhibiting superior proficiency in identifying covert and overt racist discourse over more traditional NLP methods. The methodology's robustness is further illuminated through its application to a corpus of tweets pertaining to the Ecuadorian indígena community. Interestingly, the research reveals that overt—and to some degree, covert—racist content is less likely to emanate from central figures within social networks, in alignment with the social risks that overt racial language entails. Furthermore, the paper demonstrates that challenges to the socio-racial status quo incite a marked increase in racial discourse, pointing to the use of such language as a mobilization tool against perceived threats to in-group hegemony.

Conclusion

The innovative approach advocated in the paper transcends beyond the detection of racist language, illustrating the utility of Transformer-based models in interpreting complex socio-political phenomena through large text corpora. The paper underscores the importance of fusing robust theoretical foundations with advanced computational models to yield insights that resonate with contemporary racism's covert and explicit modes. Notably, the template established here serves as a springboard for the exploration of other complex discursive patterns within political science and beyond.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 post and received 18 likes.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube