On the Safety of Conversational Models: Taxonomy, Dataset, and Benchmark (2110.08466v2)

Published 16 Oct 2021 in cs.CL

Abstract: Dialogue safety problems severely limit the real-world deployment of neural conversational models and have attracted great research interests recently. However, dialogue safety problems remain under-defined and the corresponding dataset is scarce. We propose a taxonomy for dialogue safety specifically designed to capture unsafe behaviors in human-bot dialogue settings, with focuses on context-sensitive unsafety, which is under-explored in prior works. To spur research in this direction, we compile DiaSafety, a dataset with rich context-sensitive unsafe examples. Experiments show that existing safety guarding tools fail severely on our dataset. As a remedy, we train a dialogue safety classifier to provide a strong baseline for context-sensitive dialogue unsafety detection. With our classifier, we perform safety evaluations on popular conversational models and show that existing dialogue systems still exhibit concerning context-sensitive safety problems.

PDF Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

Authors (9)

Hao Sun (383 papers)
Guangxuan Xu (13 papers)
Jiawen Deng (19 papers)
Jiale Cheng (18 papers)
Chujie Zheng (35 papers)
Hao Zhou (351 papers)
Nanyun Peng (205 papers)
Xiaoyan Zhu (54 papers)
Minlie Huang (225 papers)

Citations (87)

View on Semantic Scholar

GitHub

GitHub - thu-coai/DiaSafety: This repo is for the paper: On the Safety of Conversational Models: Taxonomy, Dataset, and Benchmark (25 stars)

On the Safety of Conversational Models: Taxonomy, Dataset, and Benchmark (2110.08466v2)

Related Papers

GitHub