Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Safety and Helpfulness Balanced Responses via Controllable Large Language Models (2404.01295v1)

Published 1 Apr 2024 in cs.CL and cs.AI

Abstract: As LLMs become easily accessible nowadays, the trade-off between safety and helpfulness can significantly impact user experience. A model that prioritizes safety will cause users to feel less engaged and assisted while prioritizing helpfulness will potentially cause harm. Possible harms include teaching people how to build a bomb, exposing youth to inappropriate content, and hurting users' mental health. In this work, we propose to balance safety and helpfulness in diverse use cases by controlling both attributes in LLM. We explore training-free and fine-tuning methods that do not require extra human annotations and analyze the challenges of controlling safety and helpfulness in LLMs. Our experiments demonstrate that our method can rewind a learned model and unlock its controllability.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Yi-Lin Tuan (18 papers)
  2. Xilun Chen (31 papers)
  3. Eric Michael Smith (20 papers)
  4. Louis Martin (21 papers)
  5. Soumya Batra (4 papers)
  6. Asli Celikyilmaz (81 papers)
  7. William Yang Wang (254 papers)
  8. Daniel M. Bikel (7 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.