Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SLM as Guardian: Pioneering AI Safety with Small Language Models (2405.19795v1)

Published 30 May 2024 in cs.CL and cs.AI

Abstract: Most prior safety research of LLMs has focused on enhancing the alignment of LLMs to better suit the safety requirements of humans. However, internalizing such safeguard features into larger models brought challenges of higher training cost and unintended degradation of helpfulness. To overcome such challenges, a modular approach employing a smaller LLM to detect harmful user queries is regarded as a convenient solution in designing LLM-based system with safety requirements. In this paper, we leverage a smaller LLM for both harmful query detection and safeguard response generation. We introduce our safety requirements and the taxonomy of harmfulness categories, and then propose a multi-task learning mechanism fusing the two tasks into a single model. We demonstrate the effectiveness of our approach, providing on par or surpassing harmful query detection and safeguard response performance compared to the publicly available LLMs.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Ohjoon Kwon (16 papers)
  2. Donghyeon Jeon (8 papers)
  3. Nayoung Choi (7 papers)
  4. Gyu-Hwung Cho (3 papers)
  5. Changbong Kim (2 papers)
  6. Hyunwoo Lee (35 papers)
  7. Inho Kang (7 papers)
  8. Sun Kim (26 papers)
  9. Taiwoo Park (4 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com