Papers
Topics
Authors
Recent
2000 character limit reached

Think Outside the Data: Colonial Biases and Systemic Issues in Automated Moderation Pipelines for Low-Resource Languages

Published 23 Jan 2025 in cs.CL and cs.HC | (2501.13836v2)

Abstract: Most social media users come from non-English speaking countries in the Global South, where much of harmful content appears in local languages. Yet, current AI-driven moderation systems struggle with low-resource languages spoken in these regions. This work examines the systemic challenges in building automated moderation tools for these languages. We conducted semi-structured interviews with 22 AI experts working on detecting harmful content in four low-resource languages: Tamil (South Asia), Swahili (East Africa), Maghrebi Arabic (North Africa), and Quechua (South America). Our findings show that beyond the well-known data scarcity in local languages, technical issues--such as outdated machine translation systems, sentiment and toxicity models grounded in Western values, and unreliable language detection technologies--undermine moderation efforts. Even with more data, current LLMs and preprocessing pipelines--primarily designed for English--struggle with the morphological richness, linguistic complexity, and code-mixing. As a result, automated moderation in Tamil, Swahili, Arabic, and Quechua remains fraught with inaccuracies and blind spots. Based on our findings, we argue that these limitations are not just technical gaps but reflect deeper structural inequities that continue to reproduce historical power imbalances. We conclude by discussing multi-stakeholder approaches to improve automated moderation for low-resource languages.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.