Harnessing Artificial Intelligence to Combat Online Hate: Exploring the Challenges and Opportunities of Large Language Models in Hate Speech Detection
Abstract: LLMs excel in many diverse applications beyond language generation, e.g., translation, summarization, and sentiment analysis. One intriguing application is in text classification. This becomes pertinent in the realm of identifying hateful or toxic speech -- a domain fraught with challenges and ethical dilemmas. In our study, we have two objectives: firstly, to offer a literature review revolving around LLMs as classifiers, emphasizing their role in detecting and classifying hateful or toxic content. Subsequently, we explore the efficacy of several LLMs in classifying hate speech: identifying which LLMs excel in this task as well as their underlying attributes and training. Providing insight into the factors that contribute to an LLM proficiency (or lack thereof) in discerning hateful content. By combining a comprehensive literature review with an empirical analysis, our paper strives to shed light on the capabilities and constraints of LLMs in the crucial domain of hate speech detection.
- (2021). A comparison of pre-trained language models for multi-class text classification in the financial domain. In Companion proceedings of the web conference 2021 (pp. 260–268).
- (2023). Fighting fire with fire: Can chatgpt detect ai-generated text? arXiv preprint arXiv:2308.01284.
- (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877–1901.
- (2021). Detecting hate speech with gpt-3. arXiv preprint arXiv:2103.12407.
- (2022). Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
- (2019). Universal language model fine-tuning for polish hate speech detection. Proceedings ofthePolEval2019Workshop, 149.
- (2020). Phobert: Pre-trained language models for vietnamese. Findings of the Association for Computational Linguistics: EMNLP, 2020, 1037–1042.
- (2023). Respectful or toxic? using zero-shot learning with language models to detect hate speech. In The 7th workshop on online abuse and harms (woah) (pp. 60–68).
- (2017). Hate me, hate me not: Hate speech detection on facebook. In Proceedings of the first italian conference on cybersecurity (itasec17) (pp. 86–95).
- (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- (2020, November). Countering hate on social media: Large scale classification of hate and counter speech. In Proceedings of the fourth workshop on online abuse and harms (pp. 102–112). Association for Computational Linguistics.
- (2022). Impact and dynamics of hate and counter speech online. EPJ Data Science, 11(1), 3.
- (2023). Chatgpt outperforms crowd-workers for text-annotation tasks. arXiv preprint arXiv:2303.15056.
- (2022). Designing of prompts for hate speech recognition with in-context learning. In 2022 international conference on computational science and computational intelligence (csci) (pp. 319–320).
- (2018). Universal language model fine-tuning for text classification. arXiv preprint arXiv:1801.06146.
- (2023). Is chatgpt better than human annotators? potential and limitations of chatgpt in explaining implicit hate speech. arXiv preprint arXiv:2302.07736.
- (2018). Practical text classification with large pre-trained language models. arXiv preprint arXiv:1812.01207.
- (2019). Cross-lingual language model pretraining. arXiv preprint arXiv:1901.07291.
- (2020). Thenorth@ haspeede 2: Bert-based language model fine-tuning for italian hate speech detection. In 7th evaluation campaign of natural language processing and speech tools for italian. final workshop, evalita (Vol. 2765).
- (2019). Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461.
- (2023). Lost in the middle: How language models use long contexts. arXiv preprint arXiv:2307.03172.
- (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
- (2023). Towards legally enforceable hate speech detection for public forums. arXiv preprint arXiv:2305.13677.
- (2019). Hate speech detection: Challenges and solutions. PloS one, 14(8), e0221152.
- (2020). Text classification using label names only: A language model self-training approach. arXiv preprint arXiv:2010.07245.
- (2023). Recent advances in natural language processing via large pre-trained language models: A survey. ACM Computing Surveys, 56(2), 1–40.
- (2022). Crosslingual generalization through multitask finetuning. arXiv preprint arXiv:2211.01786.
- (2023). How good is chatgpt for detecting hate speech in portuguese? In Anais do xiv simpósio brasileiro de tecnologia da informação e da linguagem humana (pp. 94–103).
- OpenAI, R. (2023). Gpt-4 technical report. arXiv, 2303–08774.
- (1802). Deep contextualized word representations. corr abs/1802.05365 (2018). arXiv preprint arXiv:1802.05365.
- (2020). From universal language model to downstream task: Improving roberta-based vietnamese hate speech detection. In 2020 12th international conference on knowledge and systems engineering (kse) (pp. 37–42).
- (2021). Comparing pre-trained language models for spanish hate speech detection. Expert Systems with Applications, 166, 114120.
- (2021). Resources and benchmark corpora for hate speech detection: a systematic review. Language Resources and Evaluation, 55, 477–523.
- (2022). Adapter-based fine-tuning of pre-trained multilingual language models for code-mixed and code-switched text classification. Knowledge and Information Systems, 64(7), 1937–1966.
- Reiss, M. V. (2023). Testing the reliability of chatgpt for text annotation and classification: A cautionary remark. arXiv preprint arXiv:2304.11085.
- (2021, August). HateCheck: Functional tests for hate speech detection models. In Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (volume 1: Long papers) (pp. 41–58). Online: Association for Computational Linguistics. Retrieved from https://aclanthology.org/2021.acl-long.4 doi: 10.18653/v1/2021.acl-long.4
- (2017). A survey on hate speech detection using natural language processing. In Proceedings of the fifth international workshop on natural language processing for social media (pp. 1–10).
- (2023). Peace: Cross-platform hate speech detection-a causality-guided framework. In Joint european conference on machine learning and knowledge discovery in databases (pp. 559–575).
- (2024). Causality guided disentanglement for cross-platform hate speech detection. In Proceedings of the 17th acm international conference on web search and data mining (pp. 626–635).
- (2019). Mc-bert4hate: Hate speech detection using multi-channel bert for different languages and translations. In 2019 international conference on data mining workshops (icdmw) (pp. 551–559).
- (2020). Cross-lingual zero-and few-shot hate speech detection utilising frozen transformer language models and axel. arXiv preprint arXiv:2004.13850.
- (2019). How to fine-tune bert for text classification? In Chinese computational linguistics: 18th china national conference, ccl 2019, kunming, china, october 18–20, 2019, proceedings 18 (pp. 194–206).
- (2023). Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
- (2022). Mono vs multilingual bert for hate speech detection and text classification: A case study in marathi. In Iapr workshop on artificial neural networks in pattern recognition (pp. 121–128).
- (2021). Smedbert: A knowledge-enhanced pre-trained language model with structured semantics for medical text mining. arXiv preprint arXiv:2108.08983.
- (2021). A comparative study of using pre-trained language models for toxic comment classification. In Companion proceedings of the web conference 2021 (pp. 500–507).
- (2023). Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv preprint arXiv:2304.10145.
- (2022). Improving zero-shot cross-lingual hate speech detection with pseudo-label fine-tuning of transformer language models. In Proceedings of the international aaai conference on web and social media (Vol. 16, pp. 1435–1439).
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.