Basque and Spanish Counter Narrative Generation: Data Creation and Evaluation (2403.09159v1)
Abstract: Counter Narratives (CNs) are non-negative textual responses to Hate Speech (HS) aiming at defusing online hatred and mitigating its spreading across media. Despite the recent increase in HS content posted online, research on automatic CN generation has been relatively scarce and predominantly focused on English. In this paper, we present CONAN-EUS, a new Basque and Spanish dataset for CN generation developed by means of Machine Translation (MT) and professional post-edition. Being a parallel corpus, also with respect to the original English CONAN, it allows to perform novel research on multilingual and crosslingual automatic generation of CNs. Our experiments on CN generation with mT5, a multilingual encoder-decoder model, show that generation greatly benefits from training on post-edited data, as opposed to relying on silver MT data only. These results are confirmed by their correlation with a qualitative manual evaluation, demonstrating that manually revised training data remains crucial for the quality of the generated CNs. Furthermore, multilingual data augmentation improves results over monolingual settings for structurally similar languages such as English and Spanish, while being detrimental for Basque, a language isolate. Similar findings occur in zero-shot crosslingual evaluations, where model transfer (fine-tuning in English and generating in a different target language) outperforms fine-tuning mT5 on machine translated data for Spanish but not for Basque. This provides an interesting insight into the asymmetry in the multilinguality of generative models, a challenging topic which is still open to research.
- Translation artifacts in cross-lingual transfer learning. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7674–7684, Online. Association for Computational Linguistics.
- SemEval-2019 task 5: Multilingual detection of hate speech against immigrants and women in Twitter. In Proceedings of the 13th International Workshop on Semantic Evaluation, pages 54–63, Minneapolis, Minnesota, USA. Association for Computational Linguistics.
- Susan Benesch. 2014. Countering dangerous speech: New ideas for genocide prevention. Available at SSRN 3686876.
- Cache-based online adaptation for machine translation enhanced computer assisted translation. In MT-Summit, pages 35–42.
- Nichesourcing: harnessing the power of crowds of experts. In International Conference on Knowledge Engineering and Knowledge Management, pages 16–20.
- Multilingual counter narrative type classification. In Proceedings of the 8th Workshop on Argument Mining, pages 125–132, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- CONAN - COunter NArratives through nichesourcing: a multilingual dataset of responses to fight online hate speech. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2819–2829, Florence, Italy. Association for Computational Linguistics.
- Italian counter narrative generation to fight online hate speech. In CLiC-it.
- Towards knowledge-grounded counter narrative generation for hate speech. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 899–914, Online. Association for Computational Linguistics.
- Automated hate speech detection and the problem of offensive language. Proceedings of the international AAAI conference on web and social media, 11(1):512–515.
- Human-in-the-loop for data collection: a multi-target counter narrative dataset to fight online hate speech. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3226–3240, Online. Association for Computational Linguistics.
- Hate speech detection using word embedding and deep learning in the arabic language context. In ICPRAM, pages 453–460.
- Model and data transfer for cross-lingual sequence labelling in zero-resource settings. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 6403–6416, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- T-projection: High quality annotation projection for sequence labeling tasks. CoRR, abs/2212.10548.
- The sfu opinion and comments corpus: A corpus for the analysis of online news comments. Corpus Pragmatics, 4(2):155–190.
- Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In ACL 2004.
- Few-shot learning with multilingual generative language models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 9019–9052, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Thou shalt not hate: Countering online hate speech. In International Conference on Web and Social Media.
- Just collect, don’t filter: Noisy labels do not improve counterspeech collection for languages without annotated resources. In Proceedings of the 1st Workshop on CounterSpeech for Online Abuse (CS4OA), pages 44–61, Prague, Czechia. Association for Computational Linguistics.
- Abusive language detection in online user content. In Proceedings of the 25th international conference on world wide web, pages 145–153.
- Bleu: a method for automatic evaluation of machine translation. In ACL.
- Towards automatic online hate speech intervention generation using pretrained language model.
- A benchmark dataset for learning to intervene in online hate speech. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4755–4764, Hong Kong, China. Association for Computational Linguistics.
- Carla Schieb and Mike Preuss. 2016. Governing hate speech by means of counterspeech on facebook. In 66th ica annual conference, at fukuoka, japan, pages 1–23.
- Using pre-trained language models for producing counter narratives against hate speech: a comparative study. In Findings of the Association for Computational Linguistics: ACL 2022, pages 3099–3114, Dublin, Ireland. Association for Computational Linguistics.
- Generating counter narratives against online hate speech: Data and strategies. In ACL.
- Automatic counter-narrative generation for hate speech in spanish. Procesamiento del Lenguaje Natural, 71:227–245.
- Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
- mT5: A massively multilingual pre-trained text-to-text transformer. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 483–498, Online. Association for Computational Linguistics.
- Wanzheng Zhu and Suma Bhat. 2021. Generate, prune, select: A pipeline for counterspeech generation against online hate speech. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 134–149, Online. Association for Computational Linguistics.
- Jaione Bengoetxea (5 papers)
- Yi-Ling Chung (12 papers)
- Marco Guerini (40 papers)
- Rodrigo Agerri (41 papers)