Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Impoverished Language Technology: The Lack of (Social) Class in NLP (2403.03874v1)

Published 6 Mar 2024 in cs.CL, cs.AI, and cs.CY

Abstract: Since Labov's (1964) foundational work on the social stratification of language, linguistics has dedicated concerted efforts towards understanding the relationships between socio-demographic factors and language production and perception. Despite the large body of evidence identifying significant relationships between socio-demographic factors and language production, relatively few of these factors have been investigated in the context of NLP technology. While age and gender are well covered, Labov's initial target, socio-economic class, is largely absent. We survey the existing NLP literature and find that only 20 papers even mention socio-economic status. However, the majority of those papers do not engage with class beyond collecting information of annotator-demographics. Given this research lacuna, we provide a definition of class that can be operationalised by NLP researchers, and argue for including socio-economic class in future language technologies.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Constructing a psychometric testbed for fair natural language processing. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3748–3758, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  2. Crowdsourcing speech data for low-resource languages from low-income workers. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 2819–2826, Marseille, France. European Language Resources Association.
  3. Relationship of subjective and objective social status with psychological and physiological functioning: Preliminary data in healthy, white women. Health psychology, 19(6):586.
  4. You write like you eat: Stylistic variation as a predictor of social stratification. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2583–2593, Florence, Italy. Association for Computational Linguistics.
  5. Emily M Bender and Batya Friedman. 2018. Data statements for natural language processing: Toward mitigating system bias and enabling better science. Transactions of the Association for Computational Linguistics, 6:587–604.
  6. Basil Bernstein. 1960. Language and social class. The British journal of sociology, 11(3):271–276.
  7. Pierre Bourdieu. 2018. Distinction: a social critique of the judgement of taste. In Inequality Classic Readings in Race, Class, and Gender, pages 287–318. Routledge.
  8. ConvAbuse: Data, analysis, and benchmarks for nuanced abuse detection in conversational AI. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7388–7403, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  9. Eve V Clark and Marisa Casillas. 2015. First language acquisition. In The Routledge handbook of linguistics, pages 311–328. Routledge.
  10. Amanda Cole. 2022. Crowdsourced participants’ accuracy at identifying the social class of speakers from South East England. In Proceedings of the 2nd Workshop on Novel Incentives in Data Collection from People: models, implementations, challenges and results within LREC 2022, pages 38–45, Marseille, France. European Language Resources Association.
  11. Rosemary Crompton. 2008. Class and stratification. Journal of Social Policy, 38:361–362.
  12. Stefania Degaetano-Ortlieb. 2018. Stylistic variation over 200 years of court proceedings according to gender and social class. In Proceedings of the Second Workshop on Stylistic Variation, pages 1–10, New Orleans. Association for Computational Linguistics.
  13. Best practices in conceptualizing and measuring social class in psychological research. Analyses of Social Issues and Public Policy, 13(1):77–113.
  14. Penelope Eckert. 2012. Three waves of variation study: The emergence of meaning in the study of sociolinguistic variation. Annual review of Anthropology, 41(1):87–100.
  15. A survey of race, racism, and anti-racism in NLP. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1905–1925, Online. Association for Computational Linguistics.
  16. Exploring stylistic variation with age and income on Twitter. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 313–319, Berlin, Germany. Association for Computational Linguistics.
  17. Assessing socioeconomic status of Twitter users: A survey. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), pages 388–398, Varna, Bulgaria. INCOMA Ltd.
  18. The remarkable benefit of user-level aggregation for lexical-based population-level predictions. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1167–1172, Brussels, Belgium. Association for Computational Linguistics.
  19. Annika Grützner-Zahn and Georg Rehm. 2022. Introducing the digital language equality metric: Contextual factors. In Proceedings of the Workshop Towards Digital Language Equality within the 13th Language Resources and Evaluation Conference, pages 13–26, Marseille, France. European Language Resources Association.
  20. The evolution of occupational identity in twitter biographies. In Proceedings of the Int. AAAI Conf. on Weblogs and Social Media (ICWSM 2024). Forthcoming.
  21. Temporal orientation of tweets for predicting income of users. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 659–665, Vancouver, Canada. Association for Computational Linguistics.
  22. Morphological complexity of children narratives in eight languages. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 4729–4738, Marseille, France. European Language Resources Association.
  23. Ganesh Jawahar and Djamé Seddah. 2019. Contextualized diachronic word representations. In Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change, pages 35–47, Florence, Italy. Association for Computational Linguistics.
  24. Cross-lingual syntactic variation over age and gender. In Proceedings of the Nineteenth Conference on Computational Natural Language Learning, pages 103–112, Beijing, China. Association for Computational Linguistics.
  25. Michael W Kraus and Nicole M Stephens. 2012. A road map for an emerging psychology of social class. Social and Personality Psychology Compass, 6(9):642–656.
  26. William Labov. 1964. The social stratification of English in New York city. Ph.D. thesis, Columbia University.
  27. Extracting socioeconomic patterns from the news: Modelling text and outlet importance jointly. In Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social Science, pages 13–17, Baltimore, MD, USA. Association for Computational Linguistics.
  28. Speaking up: Accents and social mobility. Technical report, The Sutton Trust.
  29. Socially aware bias measurements for Hindi language representations. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1041–1052, Seattle, United States. Association for Computational Linguistics.
  30. Detecting urgency in multilingual medical SMS in Kenya. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing: Student Research Workshop, pages 68–75, Online. Association for Computational Linguistics.
  31. Survey: Computational sociolinguistics: A Survey. Computational Linguistics, 42(3):537–593.
  32. An analysis of the user occupational class through Twitter content. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1754–1764, Beijing, China. Association for Computational Linguistics.
  33. Peter Saunders. 1990. Social class and stratification. Routledge.
  34. A new model of social class? Findings from the BBC’s Great British Class Survey experiment. Sociology, 47(2):219–250.
  35. Socioeconomic status and mortality. Diabetes Care, 36(1):49–55.
  36. Bev Skeggs. 1997. Formations of class & gender: Becoming Respectable. Sage Publications Ltd.
  37. The Danish Gigaword corpus. In Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), pages 413–421, Reykjavik, Iceland (Online). Linköping University Electronic Press, Sweden.
  38. WASSA 2021 shared task: Predicting empathy and emotion in reaction to news stories. In Proceedings of the Eleventh Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pages 92–104, Online. Association for Computational Linguistics.
  39. Elisa Usategui Basozábal et al. 1992. La sociolingüística de Basil Bernstein y sus implicaciones en el ámbito escolar. Revista de educación.
  40. What does the language of foods say about us? In Proceedings of the Tenth International Workshop on Health Text Mining and Information Analysis (LOUHI 2019), pages 87–96, Hong Kong. Association for Computational Linguistics.
  41. At the intersection of NLP and sustainable development: Exploring the impact of demographic-aware text representations in modeling value on a corpus of interviews. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 2007–2021, Marseille, France. European Language Resources Association.
  42. Residualized factor adaptation for community social media prediction tasks. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3560–3569, Brussels, Belgium. Association for Computational Linguistics.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Amanda Cercas Curry (18 papers)
  2. Zeerak Talat (24 papers)
  3. Dirk Hovy (57 papers)

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com