Learning to Compress Prompt in Natural Language Formats

Published 28 Feb 2024 in cs.CL, cs.AI, and cs.LG | (2402.18700v2)

Abstract: LLMs are great at processing multiple natural language processing tasks, but their abilities are constrained by inferior performance with long context, slow inference speed, and the high cost of computing the results. Deploying LLMs with precise and informative context helps users process large-scale datasets more effectively and cost-efficiently. Existing works rely on compressing long prompt contexts into soft prompts. However, soft prompt compression encounters limitations in transferability across different LLMs, especially API-based LLMs. To this end, this work aims to compress lengthy prompts in the form of natural language with LLM transferability. This poses two challenges: (i) Natural Language (NL) prompts are incompatible with back-propagation, and (ii) NL prompts lack flexibility in imposing length constraints. In this work, we propose a Natural Language Prompt Encapsulation (Nano-Capsulator) framework compressing original prompts into NL formatted Capsule Prompt while maintaining the prompt utility and transferability. Specifically, to tackle the first challenge, the Nano-Capsulator is optimized by a reward function that interacts with the proposed semantics preserving loss. To address the second question, the Nano-Capsulator is optimized by a reward function featuring length constraints. Experimental results demonstrate that the Capsule Prompt can reduce 81.4% of the original length, decrease inference latency up to 4.5x, and save 80.1% of budget overheads while providing transferability across diverse LLMs and different datasets.

Abstract PDF HTML Upgrade to Chat

References (25)

Citations (12)

View on Semantic Scholar

Summary

The paper proposes Nano-Capsulator, which compresses lengthy prompts while preserving semantic integrity and task utility.
It employs semantics-preserving loss and utility-based reward functions to balance brevity and performance.
The approach achieves up to 81.4% prompt length reduction, 4.5x latency improvement, and 80.1% API cost savings.

Learning to Compress Prompt in Natural Language Formats

Introduction

The paper "Learning to Compress Prompt in Natural Language Formats" focuses on addressing limitations of LLMs regarding prompt length constraints and transferability. The authors propose the Nano-Capsulator framework to compress long prompts into Natural Language (NL) formats while preserving their utility and ensuring transferability across different LLMs.

Nano-Capsulator Framework

The Nano-Capsulator framework introduces a method for compressing lengthy LLM prompts into concise NL formats termed as Capsules. This approach tackles issues inherent in previous soft prompt-based methods, which struggle with transferability across various LLMs, especially API-based ones. Nano-Capsulator employs a unique optimization strategy involving reward functions for prompt utility preservation and semantic preservation through dedicated loss functions.

Figure 1: The illustration of Nano-Capsulator training framework. Nano-Capsulator compress the long prompt with the action of semantic and utility preservation.

Prompt Compression Mechanics

To compress prompts effectively, Nano-Capsulator utilizes two main strategies:

Semantics-Preserving Loss Function: This ensures that the compressed Capsule retains the essential semantic information from the original prompt. The semantics-preserving component uses embeddings to maintain high similarity between the original and compressed prompts.
Utility-Preserving Reward Function: This function incorporates length constraint penalties and evaluates the utility retention by measuring downstream task performance compared to original prompts. The reward function dynamically adjusts to provide better learning feedback.
Figure 2: An example of successful prompt compression with NL formats. The compressed NL-formatted prompt (green) aims to obtain a shorter length and maintain transferability and utility of the long prompt (red).

Experimental Evaluation

The evaluation involved diverse datasets and LLMs, illustrating the effectiveness and transferability of the Nano-Capsulator framework across varying scenarios. The experiments highlight key performance improvements:

Compression Efficiency: Nano-Capsulator can reduce prompt lengths by up to 81.4% while retaining semantic utility and applicability across multiple LLMs.
Cost and Latency Reduction: The framework significantly decreases inference latency by up to 4.5 times and reduces API cost overhead by up to 80.1%.
Figure 3: Evaluation of transferability on Nano-Capsulator across unseen datasets.

Comparison with Existing Methods

In comparisons with zero-shot summarization and soft prompt compression methods, Nano-Capsulator demonstrates superior performance in maintaining prompt quality and utility. This is attributed to the simultaneous optimization of length and semantic preservation, which enhances adaptability and generalization across different tasks and models.

Figure 4: Comparison results of Capsule and Zero-shot Summarization on GSM8K dataset (left) and MultiRC dataset (right).

Ablation Studies and Impact Analysis

To understand the contribution of each component, comprehensive ablation studies were conducted. Key findings include:

Reward Function Impact: The reward function substantially contributes to utility retention by penalizing loss of performance in compressed prompts.
Effect of Length Constraints: Varying length constraints provided insights into optimal compression rates for different models, illustrating trade-offs between brevity and information retention.

Figure 5: Ablation studies of comparison with Capsule and GPT-35-Turbo Summarization on CSQA dataset and GSM8K dataset (left); and of the contribution of Reward Function.

Figure 6: Impact of prompt length on Vicuna-13B (left) and Claude2 (right) on TriviaQA dataset.

Conclusion

The Nano-Capsulator framework effectively balances prompt brevity, utility preservation, and model transferability, addressing critical challenges associated with LLMs. Its application spans various domains and datasets without necessitating retraining, making it a versatile tool for optimizing LLM efficiency. Future research might focus on extending this method to support more complex interactions and integrating more sophisticated reward mechanisms to further enhance adaptability and robustness.

Markdown