Efficient Prompt Optimization Through the Lens of Best Arm Identification (2402.09723v3)
Abstract: The remarkable instruction-following capability of LLMs has sparked a growing interest in automatically finding good prompts, i.e., prompt optimization. Most existing works follow the scheme of selecting from a pre-generated pool of candidate prompts. However, these designs mainly focus on the generation strategy, while limited attention has been paid to the selection method. Especially, the cost incurred during the selection (e.g., accessing LLM and evaluating the responses) is rarely explicitly considered. To overcome this limitation, this work provides a principled framework, TRIPLE, to efficiently perform prompt selection under an explicit budget constraint. TRIPLE is built on a novel connection established between prompt optimization and fixed-budget best arm identification (BAI-FB) in multi-armed bandits (MAB); thus, it is capable of leveraging the rich toolbox from BAI-FB systematically and also incorporating unique characteristics of prompt optimization. Extensive experiments on multiple well-adopted tasks using various LLMs demonstrate the remarkable performance improvement of TRIPLE over baselines while satisfying the limited budget constraints. As an extension, variants of TRIPLE are proposed to efficiently select examples for few-shot prompts, also achieving superior empirical performance.
- Improved algorithms for linear stochastic bandits. Advances in neural information processing systems, 24.
- Bayesian fixed-budget best-arm identification. arXiv preprint arXiv:2211.08572.
- Minimax policies for adversarial and stochastic bandits. In COLT, volume 7, pages 1–122.
- Best arm identification in multi-armed bandits. In COLT, pages 41–53.
- Finite-time analysis of the multiarmed bandit problem. Machine learning, 47:235–256.
- Fixed-budget best-arm identification in structured bandits. arXiv preprint arXiv:2106.04763.
- A survey on practical applications of multi-armed and contextual bandits. arXiv preprint arXiv:1904.10040.
- Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends® in Machine Learning, 5(1):1–122.
- Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712.
- Instructzero: Efficient instruction optimization for black-box large language models. arXiv preprint arXiv:2306.03082.
- Rlprompt: Optimizing discrete text prompts with reinforcement learning. arXiv preprint arXiv:2205.12548.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- Black-box prompt learning for pre-trained language models. arXiv preprint arXiv:2201.08531.
- Best arm identification: A unified approach to fixed budget and fixed confidence. Advances in Neural Information Processing Systems, 25.
- Combinatorial network optimization with unknown variables: Multi-armed bandits with linear rewards and individual observations. IEEE/ACM Transactions on Networking, 20(5):1466–1478.
- Making pre-trained language models better few-shot learners. arXiv preprint arXiv:2012.15723.
- The kl-ucb algorithm for bounded stochastic bandits and beyond. In Proceedings of the 24th annual conference on learning theory, pages 359–376. JMLR Workshop and Conference Proceedings.
- Optimal best arm identification with fixed confidence. In Conference on Learning Theory, pages 998–1027. PMLR.
- Connecting large language models with evolutionary algorithms yields powerful prompt optimizers. arXiv preprint arXiv:2309.08532.
- Stochastic neighbor embedding. In Becker, S., Thrun, S., and Obermayer, K., editors, Advances in Neural Information Processing Systems, volume 15. MIT Press.
- Instruction induction: From few examples to natural language task descriptions. arXiv preprint arXiv:2205.10782.
- Best-arm identification algorithms for multi-armed bandits in the fixed confidence setting. In 2014 48th Annual Conference on Information Sciences and Systems (CISS), pages 1–6. IEEE.
- Optimal best-arm identification in linear bandits. Advances in Neural Information Processing Systems, 33:10007–10017.
- How can we know what language models know? Transactions of the Association for Computational Linguistics, 8:423–438.
- Almost optimal exploration in multi-armed bandits. In International conference on machine learning, pages 1238–1246. PMLR.
- Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213.
- Asymptotically efficient adaptive allocation rules. Advances in applied mathematics, 6(1):4–22.
- Bandit algorithms. Cambridge University Press.
- The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691.
- A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web, pages 661–670.
- Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190.
- Use your instinct: Instruction optimization using neural bandits coupled with transformers. arXiv preprint arXiv:2310.02905.
- Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35.
- P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks. arXiv preprint arXiv:2110.07602.
- Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. arXiv preprint arXiv:2104.08786.
- Reframing instructional prompts to gptk’s language. arXiv preprint arXiv:2109.07830.
- OpenAI (2023a). Gpt-3.5-turbo. https://platform.openai.com/docs/models/gpt-3-5. Accessed: 2024-01-29.
- OpenAI (2023b). Text-embedding-ada-002. https://platform.openai.com/docs/models/embeddings. Accessed: 2024-01-29.
- Plum: Prompt learning using metaheuristic. arXiv preprint arXiv:2311.08364.
- Grips: Gradient-free, edit-based instruction search for prompting large language models. arXiv preprint arXiv:2203.07281.
- Automatic prompt optimization with “gradient descent” and beam search. arXiv preprint arXiv:2305.03495.
- Learning for dose allocation in adaptive clinical trials with safety constraints. In International Conference on Machine Learning, pages 8730–8740. PMLR.
- Toward human readable prompt tuning: Kubrick’s the shining is a good movie, and a good prompt too? arXiv preprint arXiv:2212.10539.
- Autoprompt: Eliciting knowledge from language models with automatically generated prompts. arXiv preprint arXiv:2010.15980.
- Best-arm identification in linear bandits. Advances in Neural Information Processing Systems, 27.
- Query-dependent prompt evaluation and optimization with offline inverse rl. arXiv e-prints, pages arXiv–2309.
- Challenging big-bench tasks and whether chain-of-thought can solve them. arXiv preprint arXiv:2210.09261.
- Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3/4):285–294.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
- Best arm identification with fixed budget: A large deviation perspective. arXiv preprint arXiv:2312.12137.
- Hard prompts made easy: Gradient-based discrete optimization for prompt tuning and discovery. arXiv preprint arXiv:2302.03668.
- Idpg: An instance-dependent prompt generation method. arXiv preprint arXiv:2204.04497.
- Gps: Genetic prompt search for efficient few-shot learning. arXiv preprint arXiv:2210.17041.
- Large language models as optimizers. arXiv preprint arXiv:2309.03409.
- Minimax optimal fixed-budget best arm identification in linear bandits. Advances in Neural Information Processing Systems, 35:12253–12266.
- Tempera: Test-time prompt editing via reinforcement learning. In The Eleventh International Conference on Learning Representations.
- Auto-instruct: Automatic instruction generation and ranking for black-box language models. arXiv preprint arXiv:2310.13127.
- Automatic chain of thought prompting in large language models. arXiv preprint arXiv:2210.03493.
- Calibrate before use: Improving few-shot performance of language models. In International Conference on Machine Learning, pages 12697–12706. PMLR.
- Factual probing is [mask]: Learning vs. learning to recall. arXiv preprint arXiv:2104.05240.
- Neural contextual bandits with ucb-based exploration. In International Conference on Machine Learning, pages 11492–11502. PMLR.
- Large language models are human-level prompt engineers. arXiv preprint arXiv:2211.01910.
- Pure exploration in kernel and neural bandits. Advances in neural information processing systems, 34:11618–11630.
Collections
Sign up for free to add this paper to one or more collections.