Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Beyond Hate Speech: NLP's Challenges and Opportunities in Uncovering Dehumanizing Language (2402.13818v1)

Published 21 Feb 2024 in cs.CL

Abstract: Dehumanization, characterized as a subtle yet harmful manifestation of hate speech, involves denying individuals of their human qualities and often results in violence against marginalized groups. Despite significant progress in Natural Language Processing across various domains, its application in detecting dehumanizing language is limited, largely due to the scarcity of publicly available annotated data for this domain. This paper evaluates the performance of cutting-edge NLP models, including GPT-4, GPT-3.5, and LLAMA-2, in identifying dehumanizing language. Our findings reveal that while these models demonstrate potential, achieving a 70\% accuracy rate in distinguishing dehumanizing language from broader hate speech, they also display biases. They are over-sensitive in classifying other forms of hate speech as dehumanization for a specific subset of target groups, while more frequently failing to identify clear cases of dehumanization for other target groups. Moreover, leveraging one of the best-performing models, we automatically annotated a larger dataset for training more accessible models. However, our findings indicate that these models currently do not meet the high-quality data generation threshold necessary for this task.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. Open-source large language models outperform crowd workers and approach chatgpt in text-annotation tasks. arXiv preprint arXiv:2307.02179.
  2. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
  3. Race-based biases in judgments of social pain. Journal of Experimental Social Psychology, 88:103964.
  4. Is GPT-3 a good data annotator? In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 11173–11195, Toronto, Canada. Association for Computational Linguistics.
  5. Markus Eberts and Adrian Ulges. 2020. Span-based joint entity and relation extraction with transformer pre-training. In 24th European Conference on Artificial Intelligence - ECAI 2020, pages 2006–2013. IOS Press.
  6. Toward transformer-based nlp for extracting psychosocial indicators of moral disengagement. In Proceedings of the Annual Meeting of the Cognitive Science Society, volume 43.
  7. Chatgpt outperforms crowd-workers for text-annotation tasks. arXiv preprint arXiv:2303.15056.
  8. Liberals and conservatives rely on different sets of moral foundations. Journal of personality and social psychology, 96(5):1029.
  9. John Hagan and Wenona Rymond-Richmond. 2008. The collective dynamics of racial dehumanization and genocidal victimization in darfur. American Sociological Review, 73(6):875–902.
  10. Lasana T Harris and Susan T Fiske. 2015. Dehumanized perception. Zeitschrift für Psychologie.
  11. Nick Haslam. 2006. Dehumanization: An integrative review. Personality and social psychology review, 10(3):252–264.
  12. Subhuman, inhuman, and superhuman: Contrasting humans with nonhumans in three cultures. Social cognition, 26(2):248–258.
  13. Nick Haslam and Steve Loughnan. 2014. Dehumanization and infrahumanization. Annual review of psychology, 65:399–423.
  14. Annollm: Making large language models to be better crowdsourced annotators. arXiv preprint arXiv:2303.16854.
  15. Nour S. Kteily and Alexander P. Landry. 2022. Dehumanization: trends, insights, and challenges. Trends in Cognitive Sciences, 26(3):222–240.
  16. Can large language models aid in annotating speech emotional data? uncovering new frontiers. arXiv preprint arXiv:2307.06090.
  17. The emotional side of prejudice: The attribution of secondary emotions to ingroups and outgroups. Personality and social psychology review, 4(2):186–197.
  18. Roberta: A robustly optimized bert pretraining approach.
  19. A framework for the computational linguistic analysis of dehumanization. Frontiers in Artificial Intelligence, 3.
  20. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
  21. Saif Mohammad. 2018. Obtaining reliable human ratings of valence, arousal, and dominance for 20,000 english words. In Proceedings of the 56th annual meeting of the association for computational linguistics (volume 1: Long papers), pages 174–184.
  22. Anthony Oberschall. 1997. Vojislav Seselj’s nationalist propaganda: contents, techniques, aims and impacts, 1990-1994. How mass media propaganda impacts on ordinary people’s acceptance and participation in collective violence, and how Seselj’s nationalist propaganda promoted and justified coercion and violence by the Serbs against non-Serbs.
  23. Differential association of uniquely and non uniquely human emotions with the ingroup and the outgroup. Group Processes & Intergroup Relations, 5(2):105–117.
  24. Connotation frames: A data-driven investigation. arXiv preprint arXiv:1506.02739.
  25. Connotation frames: A data-driven investigation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 311–321, Berlin, Germany. Association for Computational Linguistics.
  26. Connotation frames of power and agency in modern films. In Proceedings of the 2017 conference on empirical methods in natural language processing, pages 2329–2334.
  27. From humans to machines: can chatgpt-like llms effectively replace human annotators in nlp tasks. In Workshop Proceedings of the 17th International AAAI Conference on Web and Social Media.
  28. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  29. Llama 2: Open foundation and fine-tuned chat models.
  30. Petter Törnberg. 2023. Chatgpt-4 outperforms experts and crowd workers in annotating political twitter messages with zero-shot learning. arXiv preprint arXiv:2304.06588.
  31. Learning from the worst: Dynamically generated datasets to improve online hate detection. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1667–1682, Online. Association for Computational Linguistics.
  32. Huggingface’s transformers: State-of-the-art natural language processing.
  33. Can large language models transform computational social science? arXiv preprint arXiv:2305.03514.

Summary

  • The paper presents a comprehensive evaluation of GPT-4, GPT-3.5, and LLAMA-2, achieving up to 70% accuracy in distinguishing dehumanizing language from broader hate speech.
  • The paper introduces a novel method for automatic annotation, though the challenges in data quality emphasize the need for expert human input.
  • The paper identifies significant variability in model performance across target groups, calling for more balanced datasets to mitigate bias.

Evaluation of NLP Models in Detecting Dehumanizing Language

Introduction

The paper presents a comprehensive assessment of the performance of state-of-the-art NLP models in detecting dehumanizing language within social media content. Dehumanization, a nuanced form of hate speech characterized by denying certain individuals or groups inherent human qualities, facilitates conditions that escalate to violence and discrimination against marginalized communities. Despite the critical societal implications, the application of NLP technologies in this domain has been limited, primarily due to the lack of extensive, annotated datasets necessary for training and evaluating models.

Analysis of Model Performance

Methodology

Leveraging a dataset from Vidgen et al. (2021), which includes 906 instances of dehumanization, the paper evaluates the efficacy of GPT-4, GPT-3.5, and LLAMA-2 across different prompting settings—zero-shot, few-shot, and explainable prompting. The methodology also incorporates a novel approach to automatically generate annotated data leveraging the most effective model, aiming to train more accessible models for broader applications.

Findings

The analysis reveals that while GPT models exhibit a promising ability to distinguish dehumanizing language from broader categories of hate speech with up to 70% accuracy, they also display significant biases. These biases manifest as over-sensitivity in classifying certain types of hate speech as dehumanization, particularly against gay and transgender individuals, and under-sensitivity towards dehumanizing language directed at immigrants and refugees. Furthermore, the attempt to automatically annotate a larger dataset for training purposes encountered challenges, as the generated data did not meet the high-quality standards required for effective model training.

Implications for Future Research

The findings underscore the necessity of integrating expert human annotations to identify subtle nuances within dehumanizing language accurately. This approach is essential not only for enhancing model performance but also for ensuring that biases do not undermine the effectiveness of these models in practical applications. The paper calls for a collaborative effort between the fields of NLP and social science to establish comprehensive, annotated corpora that can facilitate in-depth research into the impact of dehumanization in digital communication.

Analyzing Model Limitations

Variability Across Target Groups

A critical takeaway from the paper is the variability in model performance across different target groups. This inconsistency highlights a need for more balanced datasets that encompass a wider spectrum of target groups, which would help in developing models that are sensitive and equitable in identifying dehumanizing language across diverse contexts.

Confusion Between Hate Speech and Dehumanization

The paper also explores the specific categories of hate speech that models frequently misclassify as dehumanization. This confusion underscores the complexity of hate speech and the need for models that can discern nuanced differences between various forms of harmful language.

Concluding Remarks

In sum, the paper presents a crucial examination of the potential and limitations of cutting-edge NLP models in identifying dehumanizing language. It highlights a path forward that involves a synergy between advanced NLP technologies and human expertise, a combination that is paramount in accurately detecting and mitigating the effects of dehumanization on social media platforms. As the field continues to evolve, the interplay between technological advancements and nuanced human judgment remains a central theme in the quest to foster safer and more inclusive digital environments.

X Twitter Logo Streamline Icon: https://streamlinehq.com