A Federated Learning Approach to Privacy Preserving Offensive Language Identification (2404.11470v1)
Abstract: The spread of various forms of offensive speech online is an important concern in social media. While platforms have been investing heavily in ways of coping with this problem, the question of privacy remains largely unaddressed. Models trained to detect offensive language on social media are trained and/or fine-tuned using large amounts of data often stored in centralized servers. Since most social media data originates from end users, we propose a privacy preserving decentralized architecture for identifying offensive language online by introducing Federated Learning (FL) in the context of offensive language identification. FL is a decentralized architecture that allows multiple models to be trained locally without the need for data sharing hence preserving users' privacy. We propose a model fusion approach to perform FL. We trained multiple deep learning models on four publicly available English benchmark datasets (AHSD, HASOC, HateXplain, OLID) and evaluated their performance in detail. We also present initial cross-lingual experiments in English and Spanish. We show that the proposed model fusion approach outperforms baselines in all the datasets while preserving privacy.
- Semeval-2019 task 5: Multilingual detection of hate speech against immigrants and women in twitter. In Proceedings of SemEval.
- Hatebert: Retraining bert for abusive language detection in english. In Proceedings of WOAH.
- Çağrı Çöltekin. 2020. A Corpus of Turkish Offensive Language on Social Media. In Proceedings of LREC.
- Federated learning of large language models with parameter-efficient prompt tuning and adaptive optimization. In Proceedings of EMNLP.
- Federated learning of n-gram language models. In Proceedings of CoNLL.
- Fusing finetuned models for better pretraining. arXiv preprint arXiv:2204.03044.
- Unsupervised cross-lingual representation learning at scale. In Proceedings of ACL.
- Automated hate speech detection and the problem of offensive language. In Proceedings of ICWSM.
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of NAACL.
- Hate speech detection with comment embeddings. In Proceedings of WWW.
- Proceedings of the Fourth Workshop on Privacy in Natural Language Processing. ACL.
- Legal Framework, Dataset and Annotation Schema for Socially Unacceptable On-line Discourse Practices in Slovene. In Proceedings ALW.
- A Hierarchically-labeled Portuguese Hate Speech Dataset. In Proceedings of ALW.
- Cross-lingual offensive language identification for low resource languages: The case of marathi. In Proceedings of RANLP.
- A federated approach for hate speech detection. In Proceedings of EACL.
- Björn Gambäck and Utpal Kumar Sikdar. 2017. Using Convolutional Neural Networks to Classify Hate-speech. In Proceedings of ALW.
- Hansi Hettiarachchi and Tharindu Ranasinghe. 2019. Emoji powered capsule network to detect type and target of offensive posts in social media. In Proceedings of RANLP.
- Benchmarking Aggression Identification in Social Media. In Proceedings of TRAC.
- Evaluating aggression identification in social media. In Proceedings of TRAC.
- Federated learning in mobile edge networks: A comprehensive survey. IEEE Communications Surveys & Tutorials, 22(3):2031–2063.
- Proceedings FL4NLP. ACL.
- Fednlp: Benchmarking federated learning methods for natural language processing tasks. In Findings of NAACL.
- Hate speech detection: Challenges and solutions. PloS one, 14(8):e0221152.
- Shervin Malmasi and Marcos Zampieri. 2017. Detecting Hate Speech in Social Media. In Proceedings of RANLP.
- Shervin Malmasi and Marcos Zampieri. 2018. Challenges in Discriminating Profanity from Hate Speech. Journal of Experimental & Theoretical Artificial Intelligence, 30:1–16.
- Overview of the hasoc track at fire 2020: Hate speech and offensive language identification in tamil, malayalam, hindi, english and german. In Proceedings of FIRE.
- HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection. In Proceedings of AAAI.
- Communication-efficient learning of deep networks from decentralized data. In Proceedings of AISTATS.
- Del-hate: a deep learning tunable ensemble for hate speech detection. In Proceedings of ICMLA.
- Overview of the hasoc subtrack at fire 2021: Hate speech and offensive content identification in english and indo-aryan languages and conversational hate speech. In Proceedings of FIRE.
- Arabic Offensive Language on Twitter: Analysis and Experiments. In Proceedings of WANLP.
- Offendes: A new corpus in spanish for offensive language research. In Proceedings of RANLP.
- Offensive language identification in transliterated and code-mixed bangla. In Proceedings of BLP.
- Tharindu Ranasinghe and Marcos Zampieri. 2020. Multilingual Offensive Language Identification with Cross-lingual Embeddings. In Proceedings of EMNLP.
- Tharindu Ranasinghe and Marcos Zampieri. 2021. MUDES: Multilingual Detection of Offensive Spans. In Proceedings of NAACL.
- BRUMS at HASOC 2019: Deep Learning Models for Multilingual Hate Speech and Offensive Language Identification. In Proceedings of FIRE.
- Adaptive federated optimization. In Proceedings of ICLR.
- SOLID: A Large-Scale Weakly Supervised Dataset for Offensive Language Identification. In Findings of ACL.
- Federated optimization for heterogeneous networks. In Proceedings of AMTL.
- fbert: A neural transformer for identifying offensive content. In Findings of EMNLP.
- Overview of the hasoc subtrack at fire 2022: Hate speech and offensive content identification in english and indo-aryan languages. In Proceedings of FIRE.
- Fedperc: Federated learning for language generation with personal and context preference embeddings. In Findings of EACL.
- An efficient approach for crosssilo federated learning to rank. In Proceedings of ICDE.
- Vicarious offense and noise audit of offensive speech classifiers: Unifying human and machine disagreement on what is offensive. In Proceedings of EMNLP.
- Experiments of federated learning for covid-19 chest x-ray images. In Proceedings of ICAIS.
- Predicting the type and target of offensive posts in social media. In Proceedings of NAACL.
- SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval). In Proceedings of SemEval.
- SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020). In Proceedings of SemEval.
- Fedlegal: The first real-world federated learning benchmark for legal nlp. In Proceedings of ACL.
- Improving zero-shot cross-lingual hate speech detection with pseudo-label fine-tuning of transformer language models. In Proceedings of ICWSM.
- Marcos Zampieri (94 papers)
- Damith Premasiri (10 papers)
- Tharindu Ranasinghe (52 papers)