Focal Inferential Infusion Coupled with Tractable Density Discrimination for Implicit Hate Detection (2309.11896v2)
Abstract: Although pretrained LLMs (PLMs) have achieved state-of-the-art on many NLP tasks, they lack an understanding of subtle expressions of implicit hate speech. Various attempts have been made to enhance the detection of implicit hate by augmenting external context or enforcing label separation via distance-based metrics. Combining these two approaches, we introduce FiADD, a novel Focused Inferential Adaptive Density Discrimination framework. FiADD enhances the PLM finetuning pipeline by bringing the surface form/meaning of an implicit hate speech closer to its implied form while increasing the inter-cluster distance among various labels. We test FiADD on three implicit hate datasets and observe significant improvement in the two-way and three-way hate classification tasks. We further experiment on the generalizability of FiADD on three other tasks, detecting sarcasm, irony, and stance, in which surface and implied forms differ, and observe similar performance improvements. Consequently, we analyze the generated latent space to understand its evolution under FiADD, which corroborates the advantage of employing FiADD for implicit hate speech detection.
- SemEval-2022 Task 6: iSarcasmEval, Intended Sarcasm Detection in English and Arabic. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022). Association for Computational Linguistics, Seattle, United States, 802–814. https://doi.org/10.18653/v1/2022.semeval-1.111
- Detecting White Supremacist Hate Speech Using Domain Specific Word Embedding With Deep Learning and BERT. IEEE Access 9 (2021), 106363–106374. https://doi.org/10.1109/ACCESS.2021.3100435
- Hate speech detection in the Indonesian language: A dataset and preliminary study. In 2017 International Conference on Advanced Computer Science and Information Systems (ICACSIS). 233–238. https://doi.org/10.1109/ICACSIS.2017.8355039
- Raghad Alshaalan and Hend Al-Khalifa. 2020. Hate Speech Detection in Saudi Twittersphere: A Deep Learning Approach. In Proceedings of the Fifth Arabic Natural Language Processing Workshop. Association for Computational Linguistics, Barcelona, Spain (Online), 12–23. https://aclanthology.org/2020.wanlp-1.2
- RP-Mod & RP-Crowd: Moderator- and Crowd-Annotated German News Comment Datasets. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, J. Vanschoren and S. Yeung (Eds.), Vol. 1. Curran. https://datasets-benchmarks-proceedings.neurips.cc/paper_files/paper/2021/file/c9e1074f5b3f9fc8ea15d152add07294-Paper-round2.pdf
- Deep Learning for Hate Speech Detection in Tweets. In WWW. 759–760.
- HateBERT: Retraining BERT for Abusive Language Detection in English. In Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021). Association for Computational Linguistics, Online, 17–25. https://doi.org/10.18653/v1/2021.woah-1.3
- I Feel Offended, Don’t Be Abusive! Implicit/Explicit Messages in Offensive and Abusive Language. In Proceedings of the Twelfth Language Resources and Evaluation Conference. European Language Resources Association, Marseille, France, 6193–6202. https://aclanthology.org/2020.lrec-1.760
- Beyond Triplet Loss: A Deep Quadruplet Network for Person Re-identification. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 1320–1329. https://doi.org/10.1109/CVPR.2017.145
- XLM-E: Cross-lingual Language Model Pre-training via ELECTRA. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Dublin, Ireland, 6170–6182. https://doi.org/10.18653/v1/2022.acl-long.427
- Detecting Hate Speech with GPT-3. arXiv:2103.12407 [cs.CL]
- Learning a similarity metric discriminatively, with application to face verification. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), Vol. 1. 539–546 vol. 1. https://doi.org/10.1109/CVPR.2005.202
- Hindi-English Hate Speech Detection: Author Profiling, Debiasing, and Practical Perspectives. Proceedings of the AAAI Conference on Artificial Intelligence 34, 01 (Apr. 2020), 386–393. https://doi.org/10.1609/aaai.v34i01.5374
- Deep Divergence Learning. In Proceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 119), Hal Daumé III and Aarti Singh (Eds.). PMLR, 2027–2037. https://proceedings.mlr.press/v119/cilingir20a.html
- Automated Hate Speech Detection and the Problem of Offensive Language. Proceedings of the International AAAI Conference on Web and Social Media 11, 1 (May 2017), 512–515. https://ojs.aaai.org/index.php/ICWSM/article/view/14955
- Hate Speech Dataset from a White Supremacy Forum. In Proceedings of the 2nd Workshop on Abusive Language Online (ALW2). Association for Computational Linguistics, Brussels, Belgium, 11–20. https://doi.org/10.18653/v1/W18-5102
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186. https://doi.org/10.18653/v1/N19-1423
- Latent Hatred: A Benchmark for Understanding Implicit Hate Speech. In EMNLP.
- Large Scale Crowdsourcing and Characterization of Twitter Abusive Behavior. Proceedings of the International AAAI Conference on Web and Social Media 12, 1 (Jun. 2018). https://ojs.aaai.org/index.php/ICWSM/article/view/14991
- A Unified Deep Learning Architecture for Abuse Detection. In WebSci. 105–114.
- Handling Bias in Toxic Speech Detection: A Survey. ACM Comput. Surv. 55, 13s, Article 264 (jul 2023), 32 pages. https://doi.org/10.1145/3580494
- Koyel Ghosh and Dr. Apurbalal Senapati. 2022. Hate speech detection: a comparison of mono and multilingual transformer model with cross-language evaluation. In Proceedings of the 36th Pacific Asia Conference on Language, Information and Computation. De La Salle University, Manila, Philippines, 853–865. https://aclanthology.org/2022.paclic-1.94
- Counterspeeches up my sleeve! Intent Distribution Learning and Persistent Fusion for Intent-Conditioned Counterspeech Generation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Toronto, Canada, 5792–5809. https://doi.org/10.18653/v1/2023.acl-long.318
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Computation 9, 8 (Nov. 1997), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
- Muhammad Okky Ibrohim and Indra Budi. 2019. Multi-label Hate Speech and Abusive Language Detection in Indonesian Twitter. In Proceedings of the Third Workshop on Abusive Language Online. Association for Computational Linguistics, Florence, Italy, 46–57. https://doi.org/10.18653/v1/W19-3506
- The Gab Hate Corpus. (2022). https://doi.org/10.17605/OSF.IO/EDUA3
- Generalizable Implicit Hate Speech Detection Using Contrastive Learning. In Proceedings of the 29th International Conference on Computational Linguistics. International Committee on Computational Linguistics, Gyeongju, Republic of Korea, 6667–6679. https://aclanthology.org/2022.coling-1.579
- Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (Long Beach, CA, USA) (KDD ’23). Association for Computing Machinery, New York, NY, USA, 4333–4345. https://doi.org/10.1145/3580305.3599896
- R. Likert. 1932. A technique for the measurement of attitudes. Archives of Psychology 22 140 (1932), 55–55.
- Jessica Lin. 2022. Leveraging World Knowledge in Implicit Hate Speech Detection. In Proceedings of the Second Workshop on NLP for Positive Impact (NLP4PI). Association for Computational Linguistics, Abu Dhabi, United Arab Emirates (Hybrid), 31–39. https://aclanthology.org/2022.nlp4pi-1.4
- Focal Loss for Dense Object Detection. In 2017 IEEE International Conference on Computer Vision (ICCV). 2999–3007. https://doi.org/10.1109/ICCV.2017.324
- Large-Margin Softmax Loss for Convolutional Neural Networks. In Proceedings of The 33rd International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 48), Maria Florina Balcan and Kilian Q. Weinberger (Eds.). PMLR, New York, New York, USA, 507–516. https://proceedings.mlr.press/v48/liud16.html
- Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (Washington DC, USA) (KDD ’22). Association for Computing Machinery, New York, NY, USA, 3524–3534. https://doi.org/10.1145/3534678.3539161
- Hate is the New Infodemic: A Topic-aware Modeling of Hate Speech Diffusion on Twitter. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). 504–515. https://doi.org/10.1109/ICDE51399.2021.00050
- HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 14867–14875.
- SemEval-2016 Task 6: Detecting Stance in Tweets. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016). Association for Computational Linguistics, San Diego, California, 31–41. https://doi.org/10.18653/v1/S16-1003
- ETHOS: a multi-label hate speech detection dataset. Complex & Intelligent Systems 8, 6 (01 Dec 2022), 4663–4678. https://doi.org/10.1007/s40747-021-00608-2
- BERTweet: A pre-trained language model for English Tweets. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online, 9–14. https://doi.org/10.18653/v1/2020.emnlp-demos.2
- Debora Nozza. 2021. Exposing the limits of Zero-shot Cross-lingual Hate Speech Detection. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). Association for Computational Linguistics, Online, 907–914. https://doi.org/10.18653/v1/2021.acl-short.114
- Respectful or Toxic? Using Zero-Shot Learning with Language Models to Detect Hate Speech. In The 7th Workshop on Online Abuse and Harms (WOAH). Association for Computational Linguistics, Toronto, Canada, 60–68. https://doi.org/10.18653/v1/2023.woah-1.6
- Lexicon Enriched Hybrid Hate Speech Detection with Human-Centered Explanations. In Adjunct Proceedings of the 30th ACM Conference on User Modeling, Adaptation and Personalization (Barcelona, Spain) (UMAP ’22 Adjunct). Association for Computing Machinery, New York, NY, USA, 184–191. https://doi.org/10.1145/3511047.3537688
- Metric Learning with Adaptive Density Discrimination. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1511.05939
- Hate-Speech and Offensive Language Detection in Roman Urdu. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Online, 2512–2522. https://doi.org/10.18653/v1/2020.emnlp-main.197
- CounterGeDi: A Controllable Approach to Generate Polite, Detoxified and Emotional Counterspeech. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22, Lud De Raedt (Ed.). International Joint Conferences on Artificial Intelligence Organization, 5157–5163. https://doi.org/10.24963/ijcai.2022/716 AI for Good.
- Better Prevent than React: Deep Stratified Learning to Predict Hate Intensity of Twitter Reply Chains. In 2021 IEEE International Conference on Data Mining (ICDM). 549–558. https://doi.org/10.1109/ICDM51629.2021.00066
- Anatomy of Online Hate: Developing a Taxonomy and Machine Learning Models for Identifying and Classifying Hate in Online News Media. Proceedings of the International AAAI Conference on Web and Social Media 12, 1 (Jun. 2018). https://doi.org/10.1609/icwsm.v12i1.15028
- Social Bias Frames: Reasoning about Social and Power Implications of Language. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 5477–5490. https://doi.org/10.18653/v1/2020.acl-main.486
- Anna Schmidt and Michael Wiegand. 2017. A Survey on Hate Speech Detection using Natural Language Processing. In Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media. Association for Computational Linguistics, Valencia, Spain, 1–10. https://doi.org/10.18653/v1/W17-1101
- FaceNet: A Unified Embedding for Face Recognition and Clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
- Analyzing the Targets of Hate in Online Social Media. Proceedings of the International AAAI Conference on Web and Social Media 10, 1 (Aug. 2021), 687–690. https://ojs.aaai.org/index.php/ICWSM/article/view/14811
- Deep Metric Learning via Facility Location. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2206–2214. https://doi.org/10.1109/CVPR.2017.237
- Rohit Sridhar and Diyi Yang. 2022. Explaining Toxic Text via Knowledge Enhanced Text Generation. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Seattle, United States, 811–826. https://doi.org/10.18653/v1/2022.naacl-main.59
- Cleansing & expanding the HURTLEX(el) with a multidimensional categorization of offensive words. In Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). Association for Computational Linguistics, Seattle, Washington (Hybrid), 102–108. https://doi.org/10.18653/v1/2022.woah-1.10
- J. Suler. 2004. The Online Disinhibition Effect. Cyberpsychology & behavior : the impact of the Internet, multimedia and virtual reality on behavior and society 7 3 (2004), 321–326.
- Large-Scale Hate Speech Detection with Cross-Domain Transfer. In Proceedings of the Language Resources and Evaluation Conference. European Language Resources Association, Marseille, France, 2215–2225. https://aclanthology.org/2022.lrec-1.238
- SemEval-2018 Task 3: Irony Detection in English Tweets. In Proceedings of the 12th International Workshop on Semantic Evaluation. Association for Computational Linguistics, New Orleans, Louisiana, 39–50. https://doi.org/10.18653/v1/S18-1005
- Attention is All you Need. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
- Graph Attention Networks. 6th International Conference on Learning Representations (2017).
- Bertie Vidgen and Leon Derczynski. 2020. Directions in abusive language training data, a systematic review: Garbage in, garbage out. PLOS ONE 15, 12 (Dec. 2020), e0243300. https://doi.org/10.1371/journal.pone.0243300
- Introducing CAD: the Contextual Abuse Dataset. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Online, 2289–2303. https://doi.org/10.18653/v1/2021.naacl-main.182
- Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Online, 1667–1682. https://doi.org/10.18653/v1/2021.acl-long.132
- Zeerak Waseem and Dirk Hovy. 2016. Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter. In Proceedings of the NAACL Student Research Workshop. Association for Computational Linguistics, San Diego, California, 88–93. https://doi.org/10.18653/v1/N16-2013
- Implicitly Abusive Language – What does it actually look like and why are we not getting there?. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Online, 576–587. https://doi.org/10.18653/v1/2021.naacl-main.48
- Ex Machina: Personal Attacks Seen at Scale. In Proceedings of the 26th International Conference on World Wide Web (Perth, Australia) (WWW ’17). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 1391–1399. https://doi.org/10.1145/3038912.3052591
- Wenjie Yin and Arkaitz Zubiaga. 2021. Towards generalisable hate speech detection: a review on obstacles and solutions. PeerJ Computer Science 7 (June 2021), e598. https://doi.org/10.7717/peerj-cs.598
- How Hate Speech Varies by Target Identity: A Computational Analysis. In Proceedings of the 26th Conference on Computational Natural Language Learning (CoNLL). Association for Computational Linguistics, Abu Dhabi, United Arab Emirates (Hybrid), 27–39. https://doi.org/10.18653/v1/2022.conll-1.3
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.