Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Classist Tools: Social Class Correlates with Performance in NLP (2403.04445v1)

Published 7 Mar 2024 in cs.CL

Abstract: Since the foundational work of William Labov on the social stratification of language (Labov, 1964), linguistics has made concentrated efforts to explore the links between sociodemographic characteristics and language production and perception. But while there is strong evidence for socio-demographic characteristics in language, they are infrequently used in NLP. Age and gender are somewhat well represented, but Labov's original target, socioeconomic status, is noticeably absent. And yet it matters. We show empirically that NLP disadvantages less-privileged socioeconomic groups. We annotate a corpus of 95K utterances from movies with social class, ethnicity and geographical language variety and measure the performance of NLP systems on three tasks: LLMling, automatic speech recognition, and grammar error correction. We find significant performance disparities that can be attributed to socioeconomic status as well as ethnicity and geographical differences. With NLP technologies becoming ever more ubiquitous and quotidian, they must accommodate all language varieties to avoid disadvantaging already marginalised groups. We argue for the inclusion of socioeconomic class in future language technologies.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Jonathan Anderson. 1983. Lix and rix: Variations on a little-known readability index. Journal of Reading, 26(6):490–496.
  2. XLS-R: Self-supervised cross-lingual speech representation learning at scale. arXiv preprint arXiv:2111.09296.
  3. wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in neural information processing systems, 33:12449–12460.
  4. Emily M. Bender and Batya Friedman. 2018. Data statements for natural language processing: Toward mitigating system bias and enabling better science. Transactions of the Association for Computational Linguistics, 6:587–604.
  5. Basil Bernstein. 1960. Language and social class. The British journal of sociology, 11(3):271–276.
  6. Mary Bucholtz and Kira Hall. 2005. Identity and interaction: A sociocultural linguistic approach. Discourse studies, 7(4-5):585–614.
  7. Eve V Clark and Marisa Casillas. 2015. First language acquisition. In The Routledge handbook of linguistics, pages 311–328. Routledge.
  8. Meri Coleman and Ta Lin Liau. 1975. A computer readability formula designed for machine scoring. Journal of Applied Psychology, 60(2):283.
  9. Are AI systems biased against the poor? A machine learning analysis using Word2Vec and GloVe embeddings. AI & society, pages 1–16.
  10. Penelope Eckert. 2012. Three waves of variation study: The emergence of meaning in the study of sociolinguistic variation. Annual review of Anthropology, 41(1):87–100.
  11. A survey of race, racism, and anti-racism in NLP. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1905–1925, Online. Association for Computational Linguistics.
  12. Exploring stylistic variation with age and income on Twitter. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 313–319, Berlin, Germany. Association for Computational Linguistics.
  13. Rudolph Flesch. 1948. A new readability yardstick. Journal of applied psychology, 32(3):221.
  14. Demystifying prompts in language models via perplexity estimation. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 10136–10148, Singapore. Association for Computational Linguistics.
  15. Robert Gunning. 1968. The Technique of Clear Writing. McGraw-Hill Book Company, New York.
  16. Mistral 7b. arXiv preprint arXiv:2310.06825.
  17. Cross-lingual syntactic variation over age and gender. In Proceedings of the Nineteenth Conference on Computational Natural Language Learning, pages 103–112, Beijing, China. Association for Computational Linguistics.
  18. Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel.
  19. William Labov. 1964. The social stratification of English in New York city. Ph.D. thesis, Columbia University.
  20. Qiuana Lopez and Mary Bucholtz. 2017. “How my hair look?” Linguistic authenticity and racialized gender and sexuality on The Wire. Journal of Language and Sexuality, 6(1):1–29.
  21. Alec W McHoul. 1987. An initial investigation of the usability of fictional conversation for doing conversation analysis. Semiotica, 67(1-2):83–104.
  22. From WER and RIL to MER and WIL: Improved evaluation measures for connected speech recognition. In Interspeech 2004, pages 2765–2768. ISCA.
  23. JFLEG: A fluency corpus and benchmark for grammatical error correction. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pages 229–234, Valencia, Spain. Association for Computational Linguistics.
  24. Stanza: A python natural language processing toolkit for many human languages. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 101–108, Online. Association for Computational Linguistics.
  25. Paulo Quaglio. 2008. Television dialogue and natural conversation. Corpora and discourse, pages 189–210.
  26. Robust speech recognition via large-scale weak supervision. In International Conference on Machine Learning, pages 28492–28518. PMLR.
  27. Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992, Hong Kong, China. Association for Computational Linguistics.
  28. John R Rickford. 1986. The need for new approaches to social class analysis in sociolinguistics. Language and communication, 6(3):215–221.
  29. A new model of social class? findings from the bbc’s great british class survey experiment. Sociology, 47(2):219–250.
  30. Socioeconomic status and mortality. Diabetes Care, 36(1):49–55.
  31. Automated readability index. AMRL-TR. Aerospace Medical Research Laboratories, pages 1–14.
  32. Anastasia G Stamou. 2014. A literature review on the mediation of sociolinguistic style in television and cinematic fiction: Sustaining the ideology of authenticity. Language and Literature, 23(2):118–140.
  33. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  34. Zephyr: Direct distillation of lm alignment. arXiv preprint arXiv:2310.16944.
  35. Elisa Usategui Basozábal et al. 1992. La sociolingüística de basil bernstein y sus implicaciones en el ámbito escolar. Revista de educación.
  36. Melanie Weirich and Adrian P Simpson. 2018. Gender identity is indexed and perceived in speech. PLoS One, 13(12):e0209226.
  37. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Amanda Cercas Curry (18 papers)
  2. Giuseppe Attanasio (21 papers)
  3. Zeerak Talat (24 papers)
  4. Dirk Hovy (57 papers)
Citations (3)

Summary

Exploring the Impact of Socioeconomic Status on NLP Performance

Introduction

NLP systems are indispensable tools in the modern digital landscape, offering capabilities ranging from LLMing and automatic speech recognition to grammar correction. Developing inclusive NLP technologies necessitates understanding and addressing performance disparities across diverse demographic groups. Recognizing the significant but often overlooked impact of socioeconomic status (SES) on language use, Amanda Cercas Curry, Giuseppe Attanasio, Zeerak Talat, and Dirk Hovy investigate the ways NLP tools perform across different SES groups. Their paper, "Classist Tools: Social Class Correlates with Performance in NLP," sheds light on the empirical evidence of how less-privileged socioeconomic groups are disadvantaged by current NLP technologies.

Dataset and Methodology

The research team embarked on a comprehensive paper involving the annotation of 95K utterances from movie scripts, categorizing them based on social class, ethnicity, and geographical language variety. This novel dataset provided a foundation for analyzing NLP system performance across three critical tasks: LLMing, automatic speech recognition, and grammar error correction.

Utilizing popular television shows and movies allowed for the ethically responsible collection of data representing a spectrum of socioeconomic statuses, ethnic backgrounds, and dialects. Shows were selected to cover a balanced representation, including both dominant and marginalized groups across different SES strata and geographical regions (primarily the US and UK).

Findings and Discussion

Socioeconomic Status and Language Variation

The paper confirms that socioeconomic status significantly impacts linguistic expression, as echoed in past sociolinguistic research. This impact manifests in various linguistic features, including lexicon, syntax, and style, which arguably should be considered in the design and deployment of NLP systems.

Performance Disparities in NLP Tasks

The empirical analysis across different NLP tasks reveals significant performance disparities attributable to differences in socioeconomic status, as well as ethnicity and geographical language variations. For instance, automatic speech recognition systems demonstrated higher error rates for lower SES groups and non-standard dialects. Similarly, LLMs exhibited higher perplexity scores—indicating lower "expectedness" or acceptability—for utterances attributed to lower SES, suggesting an inherent bias towards more privileged sociolects.

Implications for Fairness in NLP

These findings prompt critical reflection on the inclusivity and fairness of NLP technologies. As NLP systems become increasingly embedded in everyday digital interactions, there is a pressing need to ensure that these technologies do not perpetuate or exacerbate existing social inequalities. The research articulates a call to action for incorporating socio-demographic characteristics, such as socioeconomic status, into the design, development, and evaluation of NLP systems.

Concluding Thoughts

The paper conducted by Curry et al. represents an important step towards understanding and mitigating biases in NLP systems related to socioeconomic status. By highlighting the performance disparities and their potential implications, the research underscores the importance of developing NLP technologies that are inclusive and equitable across all social strata. Looking forward, the research paves the way for future investigations into socio-demographic factors in NLP, advocating for a more holistic approach to inclusivity in technology design and application.