Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LLMs and Finetuning: Benchmarking cross-domain performance for hate speech detection (2310.18964v3)

Published 29 Oct 2023 in cs.CL

Abstract: In the evolving landscape of online communication, hate speech detection remains a formidable challenge, further compounded by the diversity of digital platforms. This study investigates the effectiveness and adaptability of pre-trained and fine-tuned LLMs in identifying hate speech, to address two central questions: (1) To what extent does the model performance depend on the fine-tuning and training parameters?, (2) To what extent do models generalize to cross-domain hate speech detection? and (3) What are the specific features of the datasets or models that influence the generalization potential? The experiment shows that LLMs offer a huge advantage over the state-of-the-art even without pretraining. Ordinary least squares analyses suggest that the advantage of training with fine-grained hate speech labels is washed away with the increase in dataset size. We conclude with a vision for the future of hate speech detection, emphasizing cross-domain generalizability and appropriate benchmarking practices.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Ahmad Nasir (1 paper)
  2. Aadish Sharma (1 paper)
  3. Kokil Jaidka (24 papers)
Citations (2)