Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the Safety of Conversational Models: Taxonomy, Dataset, and Benchmark (2110.08466v2)

Published 16 Oct 2021 in cs.CL

Abstract: Dialogue safety problems severely limit the real-world deployment of neural conversational models and have attracted great research interests recently. However, dialogue safety problems remain under-defined and the corresponding dataset is scarce. We propose a taxonomy for dialogue safety specifically designed to capture unsafe behaviors in human-bot dialogue settings, with focuses on context-sensitive unsafety, which is under-explored in prior works. To spur research in this direction, we compile DiaSafety, a dataset with rich context-sensitive unsafe examples. Experiments show that existing safety guarding tools fail severely on our dataset. As a remedy, we train a dialogue safety classifier to provide a strong baseline for context-sensitive dialogue unsafety detection. With our classifier, we perform safety evaluations on popular conversational models and show that existing dialogue systems still exhibit concerning context-sensitive safety problems.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Hao Sun (383 papers)
  2. Guangxuan Xu (13 papers)
  3. Jiawen Deng (19 papers)
  4. Jiale Cheng (18 papers)
  5. Chujie Zheng (35 papers)
  6. Hao Zhou (351 papers)
  7. Nanyun Peng (205 papers)
  8. Xiaoyan Zhu (54 papers)
  9. Minlie Huang (225 papers)
Citations (87)