Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

80 tokens/sec

GPT-4o

59 tokens/sec

Gemini 2.5 Pro Pro

43 tokens/sec

o3 Pro

7 tokens/sec

GPT-4.1 Pro

50 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

Improving Adversarial Data Collection by Supporting Annotators: Lessons from GAHD, a German Hate Speech Dataset (2403.19559v1)

Published 28 Mar 2024 in cs.CL

Abstract: Hate speech detection models are only as good as the data they are trained on. Datasets sourced from social media suffer from systematic gaps and biases, leading to unreliable models with simplistic decision boundaries. Adversarial datasets, collected by exploiting model weaknesses, promise to fix this problem. However, adversarial data collection can be slow and costly, and individual annotators have limited creativity. In this paper, we introduce GAHD, a new German Adversarial Hate speech Dataset comprising ca.\ 11k examples. During data collection, we explore new strategies for supporting annotators, to create more diverse adversarial examples more efficiently and provide a manual analysis of annotator disagreements for each strategy. Our experiments show that the resulting dataset is challenging even for state-of-the-art hate speech detection models, and that training on GAHD clearly improves model robustness. Further, we find that mixing multiple support strategies is most advantageous. We make GAHD publicly available at https://github.com/jagol/gahd.

PDF HTML Abstract

Improving Adversarial Data Collection for German Hate Speech Detection

Introduction

Detecting hate speech is a critical aspect of maintaining the safety and integrity of online spaces. Traditional datasets, derived from social media or comments sections, often contain biases that result in models lacking robustness and generalizability. This research introduces the German Adversarial Hate speech Dataset (GAHD), focusing on enhancing the diversity and efficiency of adversarial examples through unique strategies supporting annotators.

Dataset Creation and Annotation

GAHD's creation involved a dynamic adversarial data collection (DADC) process across four rounds, each employing a distinct strategy to aid annotators in crafting or identifying adversarial examples. The dataset encompasses approximately 11,000 examples, with a balanced representation of hate speech and non-hate speech categories. Notably, the annotation process included a detailed definition of hate speech tailored to the German context, emphasizing cultural nuances and inclusive of marginalized groups.

Strategies for Adversarial Data Collection

Unguided Example Generation: The initial round allowed annotators to freely generate examples, fostering creativity but also revealing challenges in consistently applying hate speech definitions.
Translation and Validation: Subsequent rounds leveraged translated adversarial examples from English datasets and sentences from German newspapers presumed to be benign but flagged by models as hate speech, providing a rich source of potential adversarial instances.
Contrastive Example Creation: The final round focused on generating examples expressly designed to challenge the model's predictions, refining the dataset's ability to test and enhance model robustness.

Dynamic Adversarial Data Collection Process

The iterative nature of DADC ensured continuous refinement of the target model, with each round incorporating newly collected adversarial examples into the training data. This method not only improved the dataset's quality but also allowed for an examination of different annotation support strategies on the efficiency and diversity of generated examples.

Model Evaluations and Benchmarks

GAHD presented a significant challenge to state-of-the-art hate speech detection models, including commercial APIs and LLMs. Notably, training models on GAHD resulted in substantial improvements in robustness, as evidenced by performance on both in-domain and out-of-domain test sets. The analysis also highlighted the varying effectiveness of adversarial examples generated through different support strategies, underscoring the value of mixing multiple strategies to produce a more resilient and comprehensive dataset.

Implications and Future Directions

The research demonstrates the viability and benefit of employing diversified strategies in adversarial data collection to improve hate speech detection models. By supporting annotators in generating more diverse and challenging examples, the resulting dataset offers a robust resource for training and evaluating hate speech detection models. Future work could explore additional methods for annotator support, including leveraging LLMs for augmentations and perturbations, to further enhance dataset diversity and model performance.

Conclusion

GAHD marks a significant advancement in the collection of adversarial data for hate speech detection, emphasizing the importance of diverse and efficient example generation. The strategies outlined in this paper not only contribute to the development of more robust models but also offer insights into optimizing the adversarial data collection process for future research.

PDF Markdown Bookmark Chat (Pro)

References (57)

Authors (3)

Janis Goldzycher (7 papers)
Paul Röttger (37 papers)
Gerold Schneider (8 papers)

Citations (5)

View on Semantic Scholar

Tweets

https://twitter.com/jagoldz/status/1775093755130876090