Prompting Is Programming: A Query Language for Large Language Models (2212.06094v3)

Published 12 Dec 2022 in cs.CL and cs.AI

Abstract: LLMs have demonstrated outstanding performance on a wide range of tasks such as question answering and code generation. On a high level, given an input, a LLM can be used to automatically complete the sequence in a statistically-likely way. Based on this, users prompt these models with language instructions or examples, to implement a variety of downstream tasks. Advanced prompting methods can even imply interaction between the LLM, a user, and external tools such as calculators. However, to obtain state-of-the-art performance or adapt LLMs for specific tasks, complex task- and model-specific programs have to be implemented, which may still require ad-hoc interaction. Based on this, we present the novel idea of LLM Programming (LMP). LMP generalizes LLM prompting from pure text prompts to an intuitive combination of text prompting and scripting. Additionally, LMP allows constraints to be specified over the LLM output. This enables easy adaption to many tasks while abstracting LLM internals and providing high-level semantics. To enable LMP, we implement LMQL(short for LLM Query Language), which leverages the constraints and control flow from an LMP prompt to generate an efficient inference procedure that minimizes the number of expensive calls to the underlying LLM. We show that LMQL can capture a wide range of state-of-the-art prompting methods in an intuitive way, especially facilitating interactive flows that are challenging to implement with existing high-level APIs. Our evaluation shows that we retain or increase the accuracy on several downstream tasks, while also significantly reducing the required amount of computation or cost in the case of pay-to-use APIs (26-85% cost savings).

Authors (3)

Luca Beurer-Kellner (8 papers)
Marc Fischer (30 papers)
Martin Vechev (103 papers)

Citations (76)

View on Semantic Scholar

Summary

The paper introduces LMQL as a novel query language that formalizes prompt engineering into programming for streamlined LLM interactions.
It presents a SQL-like structure and constrained decoding mechanism that enhances efficiency by reducing model calls by up to 80%.
Evaluations in tasks like question answering and arithmetic reasoning demonstrate LMQL's effectiveness in optimizing language model outputs.

Understanding LLM Programming through LMQL

The paper "Prompting Is Programming: A Query Language for LLMs" introduces the LLM Query Language (LMQL) as a novel interface for interacting with LLMs. The authors propose a paradigm shift from traditional text prompting to a more formalized structure combining text prompts with scripting capabilities, termed LLM Programming (LMP).

The core motivation behind LMQL is to address several key challenges associated with LLMs: complexity in model-specific programming, inefficient inference processing due to multiple LM calls, and the lack of user-friendly interaction mechanisms which encourage advanced prompting methods. By abstracting away the intricate details of LLM internals, LMQL provides a high-level query language that streamlines the process of writing and optimizing complex language-based queries.

LLM Programming and LMQL

LMQL is structured akin to a SQL-like format but with imperative scripting capabilities, which allows users to leverage built-in functions and conditional logic for optimizing interactions with LLMs. The paper highlights how current models can be queried more effectively by leveraging LMQL's ability to script interactions and set constraints on the expected output. This feature is particularly beneficial for tasks that require context-specific interpretation, such as handling natural language prompts that require a programmatic response or leveraging external tools to complement LLMs with additional computational logic.

The procedural execution model of LMQL enables a separation of concerns, allowing model developers to focus on their interaction logic without diving into the underlying mechanics of a LLM's operation. This is achieved through iterative execution of the query program's body, with special provisions for handling text string manipulations and condition evaluations during decoding.

Constrained Decoding and its Implications

A significant contribution made by the authors is the efficient constrained decoding mechanism facilitated by LMQL. This mechanism leverages custom-defined operator semantics to allow for token-level constraint application and real-time output validation through the innovative use of FollowMaps. The introduction of eager execution semantics in LMQL allows the constraints to apply masking strategies during sequence decoding, thus pruning the search space and reducing the considerable computational overhead associated with studying all permissible continuations of a prompt.

This approach, although complex, represents a powerful method to restrict the LLM output thereby enhancing both accuracy and efficiency. It also demonstrates improved accuracy over conventional method-based decoding approaches in application contexts like question answering, arithmetic reasoning, and interactive multi-part prompting, collectively embodying the scope of LLM programming.

Evaluation and Performance

The paper showcases a variety of use cases where LMQL is preferable against standard LLM APIs, notably in scenarios utilizing ReAct and Chain-of-Thought prompting. Through several evaluations, the authors demonstrate substantial savings in the number of model queries and processing tokens, leading to drastic cost reductions—up to 80% reduced cost—when compared with conventional LLM interaction methods.

Conclusion and Future Directions

LMQL successfully extends the paradigm of prompt engineering with a programming-oriented structure that simplifies interactions with LLMs and optimizes their usage across a breadth of applications. With practical successes demonstrated in tasks needing cooperative prompt interaction and task-oriented reasoning, the implications of LMQL pave the way for a unified querying protocol for LMs, potentially as a standardized interface across various LM API vendors.

In future work, this approach could see enhancements such as further integration with diverse LLMs, streamlined extensions for additional prompting operators, and extensive performance analysis across increasingly complex LLM setups. Enabling sandboxing or serverless execution environments could also further its adoption by ensuring secure and efficient deployments in real-world applications.

PDF Markdown

Related Papers

Tweets

https://twitter.com/marc_r_fischer/status/1750138748682068382

YouTube

Show All Videos