Discrete Prompt Compression with Reinforcement Learning (2308.08758v3)
Abstract: Compressed prompts aid instruction-tuned LLMs (LMs) in overcoming context window limitations and reducing computational costs. Existing methods, which primarily based on training embeddings, face various challenges associated with interpretability, the fixed number of embedding tokens, reusability across different LMs, and inapplicability when interacting with black-box APIs. This study proposes prompt compression with reinforcement learning (PCRL), which is a discrete prompt compression method that addresses these issues. The proposed PCRL method utilizes a computationally efficient policy network that edits prompts directly. The training approach employed in the proposed PCRLs can be applied flexibly to various types of LMs, including both decoder-only and encoder-decoder architecture and it can be trained without gradient access to the LMs or labeled data. The proposed PCRL achieves an average reduction of 24.6% in terms of the token count across various instruction prompts while maintaining sufficient performance. In addition, we demonstrate that the learned policy can be transferred to larger LMs, and through a comprehensive analysis, we explore the token importance within the prompts. Our code is accessible at https://github.com/nenomigami/PromptCompressor.
- Falcon-40B: an open large language model with state-of-the-art performance.
- Natural language processing with Python: analyzing text with the natural language toolkit. ” O’Reilly Media, Inc.”.
- Language models are few-shot learners. Advances in neural information processing systems, 33: 1877–1901.
- Adapting Language Models to Compress Contexts. arXiv preprint arXiv:2305.14788.
- Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
- RLPrompt: Optimizing Discrete Text Prompts with Reinforcement Learning. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 3369–3391.
- Reducing transformer depth on demand with structured dropout. arXiv preprint arXiv:1909.11556.
- Efficient Unsupervised Sentence Compression by Fine-tuning Transformers with Reinforcement Learning. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1267–1280.
- Chatgpt outperforms crowd-workers for text-annotation tasks. arXiv preprint arXiv:2303.15056.
- Reinforcement learning with deep energy-based policies. In International conference on machine learning, 1352–1361. PMLR.
- Is chatgpt better than human annotators? potential and limitations of chatgpt in explaining implicit hate speech. arXiv preprint arXiv:2302.07736.
- Learned token pruning for transformers. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 784–794.
- A fast post-training pruning framework for transformers. Advances in Neural Information Processing Systems, 35: 24101–24116.
- The Summary Loop: Learning to Write Abstractive Summaries Without Examples. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 5135–5150.
- Keep it simple: Unsupervised simplification of multi-paragraph text. arXiv preprint arXiv:2107.03444.
- The Power of Scale for Parameter-Efficient Prompt Tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 3045–3059.
- Differentiable subset pruning of transformer heads. Transactions of the Association for Computational Linguistics, 9: 1442–1459.
- Li, Y. 2023. Unlocking Context Constraints of LLMs: Enhancing Context Efficiency of LLMs with Self-Information-Based Content Filtering. arXiv preprint arXiv:2304.12102.
- Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of the 2003 human language technology conference of the North American chapter of the association for computational linguistics, 150–157.
- GPT understands, too. arXiv preprint arXiv:2103.10385.
- Contextual multi-armed bandits. In Proceedings of the Thirteenth international conference on Artificial Intelligence and Statistics, 485–492. JMLR Workshop and Conference Proceedings.
- Learning to compress prompts with gist tokens. arXiv preprint arXiv:2304.08467.
- Ranking sentences for extractive summarization with reinforcement learning. arXiv preprint arXiv:1802.08636.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35: 27730–27744.
- GrIPS: Gradient-free, Edit-based Instruction Search for Prompting Large Language Models. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 3827–3846.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8): 9.
- Self-critical sequence training for image captioning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. Institute of Electrical and Electronics Engineers Inc.
- On the Effect of Dropping Layers of Pre-trained Transformer Models. arXiv preprint arXiv:2004.03844.
- DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.
- Multitask Prompted Training Enables Zero-Shot Task Generalization. In ICLR 2022-Tenth International Conference on Learning Representations.
- Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 255–269.
- Discrete Optimization for Unsupervised Sentence Summarization with Word-Level Extraction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 5032–5042.
- AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 4222–4235.
- Policy gradient methods for reinforcement learning with function approximation. Advances in neural information processing systems, 12.
- Stanford Alpaca: An Instruction-following LLaMA model. https://github.com/tatsu-lab/stanford˙alpaca.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
- Spatten: Efficient sparse attention architecture with cascade token and head pruning. In 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), 97–110. IEEE.
- Is chatgpt a good nlg evaluator? a preliminary study. arXiv preprint arXiv:2303.04048.
- Self-instruct: Aligning language model with self generated instructions. arXiv preprint arXiv:2212.10560.
- Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 5085–5109.
- Do Prompt-Based Models Really Understand the Meaning of Their Prompts? In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2300–2344.
- Finetuned Language Models are Zero-Shot Learners. In International Conference on Learning Representations.
- Prompt Compression and Contrastive Conditioning for Controllability and Toxicity Reduction in Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2022, 5621–5634.
- Tempera: Test-time prompt editing via reinforcement learning. In The Eleventh International Conference on Learning Representations.
- Simple Unsupervised Summarization by Contextual Matching. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 5101–5106.
- Large Language Models are Human-Level Prompt Engineers. In The Eleventh International Conference on Learning Representations.