Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

GrounDial: Human-norm Grounded Safe Dialog Response Generation (2402.08968v1)

Published 14 Feb 2024 in cs.AI

Abstract: Current conversational AI systems based on LLMs are known to generate unsafe responses, agreeing to offensive user input or including toxic content. Previous research aimed to alleviate the toxicity, by fine-tuning LLM with manually annotated safe dialogue histories. However, the dependency on additional tuning requires substantial costs. To remove the dependency, we propose GrounDial, where response safety is achieved by grounding responses to commonsense social rules without requiring fine-tuning. A hybrid approach of in-context learning and human-norm-guided decoding of GrounDial enables the response to be quantitatively and qualitatively safer even without additional data or tuning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (12)
  1. Recent advances towards safe, responsible, and moral dialogue systems: A survey. arXiv preprint arXiv:2302.09270.
  2. Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned. arXiv preprint arXiv:2209.07858.
  3. Prosocialdialog: A prosocial backbone for conversational agents. arXiv preprint arXiv:2205.12688.
  4. Dexperts: Decoding-time controlled text generation with experts and anti-experts. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 6691–6706.
  5. Knowledge infused decoding. arXiv preprint arXiv:2204.03084.
  6. Parlai: A dialog research software platform. arXiv preprint arXiv:1705.06476.
  7. Recipes for building an open-domain chatbot. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 300–325.
  8. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
  9. Mpnet: Masked and permuted pre-training for language understanding. Advances in Neural Information Processing Systems, 33:16857–16867.
  10. Moraldial: A framework to train and evaluate moral dialogue systems via moral discussions. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2213–2230.
  11. Bot-adversarial dialogue for safe conversational agents. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2950–2968.
  12. The moral integrity corpus: A benchmark for ethical dialogue systems. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3755–3773.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Siwon Kim (16 papers)
  2. Shuyang Dai (15 papers)
  3. Mohammad Kachuee (25 papers)
  4. Shayan Ray (3 papers)
  5. Tara Taghavi (3 papers)
  6. Sungroh Yoon (163 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets