Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Outcome-Constrained Large Language Models for Countering Hate Speech (2403.17146v2)

Published 25 Mar 2024 in cs.CL

Abstract: Automatic counterspeech generation methods have been developed to assist efforts in combating hate speech. Existing research focuses on generating counterspeech with linguistic attributes such as being polite, informative, and intent-driven. However, the real impact of counterspeech in online environments is seldom considered. This study aims to develop methods for generating counterspeech constrained by conversation outcomes and evaluate their effectiveness. We experiment with LLMs to incorporate into the text generation process two desired conversation outcomes: low conversation incivility and non-hateful hater reentry. Specifically, we experiment with instruction prompts, LLM finetuning, and LLM reinforcement learning (RL). Evaluation results show that our methods effectively steer the generation of counterspeech toward the desired outcomes. Our analyses, however, show that there are differences in the quality and style depending on the model.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Lingzi Hong (5 papers)
  2. Pengcheng Luo (4 papers)
  3. Eduardo Blanco (26 papers)
  4. Xiaoying Song (3 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.