NeMo Guardrails: A Toolkit for Controllable and Safe LLM Applications with Programmable Rails (2310.10501v1)

Published 16 Oct 2023 in cs.CL and cs.AI

Abstract: NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems. Guardrails (or rails for short) are a specific way of controlling the output of an LLM, such as not talking about topics considered harmful, following a predefined dialogue path, using a particular language style, and more. There are several mechanisms that allow LLM providers and developers to add guardrails that are embedded into a specific model at training, e.g. using model alignment. Differently, using a runtime inspired from dialogue management, NeMo Guardrails allows developers to add programmable rails to LLM applications - these are user-defined, independent of the underlying LLM, and interpretable. Our initial results show that the proposed approach can be used with several LLM providers to develop controllable and safe LLM applications using programmable rails.

Citations (92)

View on Semantic Scholar

Summary

The paper introduces an open-source toolkit that implements programmable runtime safety rails to ensure on-topic and safe LLM dialogues.
It leverages a custom language, Colang, to define and dynamically enforce dialogue management rules at runtime.
The toolkit integrates fact-checking, hallucination, and moderation rails to enhance reliability and trust in conversational AI applications.

Introduction to NeMo Guardrails

The development of safe and manageable conversational systems utilizing LLMs is crucial for their successful integration into user-facing applications. LLMs have the tendency to deviate from designated topics, produce factually incorrect information or succumb to malicious prompts that can result in unsafe outputs. To mitigate these risks and enhance the trustworthiness of conversational agents, NeMo Guardrails introduces an open-source solution for implementing "programmable guardrails" in LLM applications. Unlike traditional approaches, which often embed guardrails directly into models during training, NeMo Guardrails allows developers to add customizable and interpretable rails at runtime, independent of any specific model.

Techniques for Enhancing Conversational AI

NeMo Guardrails provides a toolkit designed to ensure safe and on-topic dialogue with LLMs through various mechanisms including:

Programmable Runtime Engine: Situates between user and LLM as a proxy, using Colang, a custom modeling language, for defining rules and dialogue flows.
Dialogue Management: Employs interpretability and enforcement of predefined rules or dynamically generated rules to manage dialogue flows.
Embedded Rails and Model Alignment: Bridges programmable rails with existing model alignment techniques, which embed safety features during LLM training.

Safety Rails and Deployments

NeMo Guardrails includes a variety of safety-focused rails such as:

Fact-Checking Rail: Leverages entailment checks to validate LLM responses against provided evidence.
Hallucination Rail: Utilizes self-consistency checking to filter out responses based on unfounded premises.
Moderation Rails: Features input and output moderation to prevent malicious or harmful communication.

Evaluation and Development Tooling

The toolkit also encompasses evaluation tools for developers to test and measure the effectiveness of both topical and execution rails. Such evaluations highlight the importance of comprehensive testing, particularly with safety-related rails, emphasizing the complementary nature of NeMo Guardrails with existing embedded rails to create a robust LLM-powered conversational system.

In conclusion, NeMo Guardrails equips developers with the necessary infrastructure to build controllable and safe LLM applications, paving the way for more trustworthy and reliable AI interactions in real-world scenarios.

PDF Markdown

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Related Papers

Authors (5)

GitHub

GitHub - NVIDIA/NeMo-Guardrails: NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems. (3,588 stars)

Tweets

https://twitter.com/technews4869/status/1927950452022272333