Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 94 tok/s
Gemini 2.5 Pro 46 tok/s Pro
GPT-5 Medium 28 tok/s
GPT-5 High 30 tok/s Pro
GPT-4o 91 tok/s
GPT OSS 120B 454 tok/s Pro
Kimi K2 212 tok/s Pro
2000 character limit reached

Reducing the Scope of Language Models (2410.21597v2)

Published 28 Oct 2024 in cs.CL, cs.AI, and cs.LG

Abstract: We now deploy LLMs in a wide variety of user-facing applications. Typically, these deployments have some specific purpose, like answering questions about documentation or acting as coding assistants, but they require general language understanding. Under these circumstances these models should not be able to answer irrelevant requests such as, poetry generation or questions about physics, etc. Instead we would like LLMs to only answer to queries corresponding to desired behavior and refuse all other requests, which we refer to as scoping. We conduct a comprehensive empirical evaluation of potential methods from prompting to fine-tuning to preference learning to a recently proposed method for general alignment called Circuit Breakers (CB). Across three families of LLMs and a broad variety of tasks, we show that it is possible to scope LLMs. We examine scoping for multiple topics, and fine-grained topics. We ablate diversity of irrelevant queries, layer different techniques, conduct adversarial evaluations and more. Among other results, we find that, when diverse examples of irrelevant queries are available, simple supervised fine-tuning produces the best results, but when such diversity is low, Circuit Breakers perform quite well. One can often get the benefits of both methods by layering them in succession. We intend our study to serve as a practitioner's guide to scoping LLMs.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.