Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models (2206.08325v2)

Published 16 Jun 2022 in cs.CL, cs.AI, and cs.CY

Abstract: LLMs produce human-like text that drive a growing number of applications. However, recent literature and, increasingly, real world observations, have demonstrated that these models can generate language that is toxic, biased, untruthful or otherwise harmful. Though work to evaluate LLM harms is under way, translating foresight about which harms may arise into rigorous benchmarks is not straightforward. To facilitate this translation, we outline six ways of characterizing harmful text which merit explicit consideration when designing new benchmarks. We then use these characteristics as a lens to identify trends and gaps in existing benchmarks. Finally, we apply them in a case study of the Perspective API, a toxicity classifier that is widely used in harm benchmarks. Our characteristics provide one piece of the bridge that translates between foresight and effective evaluation.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (12)
  1. Maribeth Rauh (10 papers)
  2. John Mellor (9 papers)
  3. Jonathan Uesato (29 papers)
  4. Po-Sen Huang (30 papers)
  5. Johannes Welbl (20 papers)
  6. Laura Weidinger (18 papers)
  7. Sumanth Dathathri (14 papers)
  8. Amelia Glaese (14 papers)
  9. Geoffrey Irving (31 papers)
  10. Iason Gabriel (27 papers)
  11. William Isaac (18 papers)
  12. Lisa Anne Hendricks (37 papers)
Citations (44)