Revisiting OPRO: The Limitations of Small-Scale LLMs as Optimizers (2405.10276v2)
Abstract: Numerous recent works aim to enhance the efficacy of LLMs through strategic prompting. In particular, the Optimization by PROmpting (OPRO) approach provides state-of-the-art performance by leveraging LLMs as optimizers where the optimization task is to find instructions that maximize the task accuracy. In this paper, we revisit OPRO for automated prompting with relatively small-scale LLMs, such as LLaMa-2 family and Mistral 7B. Our investigation reveals that OPRO shows limited effectiveness in small-scale LLMs, with limited inference capabilities constraining optimization ability. We suggest future automatic prompting engineering to consider both model capabilities and computational costs. Additionally, for small-scale LLMs, we recommend direct instructions that clearly outline objectives and methodologies as robust prompt baselines, ensuring efficient and effective prompt engineering in ongoing research.
- Promptbreeder: Self-referential self-improvement via prompt evolution. ArXiv, abs/2309.16797.
- Google Gemini Team. 2023. Gemini: A family of highly capable multimodal models. ArXiv, abs/2312.11805.
- Connecting large language models with evolutionary algorithms yields powerful prompt optimizers. ArXiv, abs/2309.08532.
- Distilling step-by-step! outperforming larger language models with less training data and smaller model sizes. ArXiv, abs/2305.02301.
- Large language models cannot self-correct reasoning yet. arXiv:2310.01798.
- Mistral 7b. ArXiv, abs/2310.06825.
- Large language models are zero-shot reasoners. Advances in Neural Information Processing Systems, 35:22199–22213.
- Gpt understands, too. ArXiv, abs/2103.10385.
- Are large language models good prompt optimizers? ArXiv, abs/2402.02101.
- OpenAI. 2020. Language models are few-shot learners. NeurIPS.
- Pytorch: An imperative style, high performance deep learning library. In Advances in Neural Information Processing Systems 32, pages 8024–8035.
- Automatic prompt optimization with "gradient descent" and beam search. In Conference on Empirical Methods in Natural Language Processing.
- Llama 2: Open foundation and fine-tuned chat models. ArXiv, abs/2307.09288.
- Efficient large language models: A survey. ArXiv, abs/2312.03863.
- Self-consistency improves chain of thought reasoning in language models. ArXiv, abs/2203.11171.
- Chain of thought prompting elicits reasoning in large language models. arXiv:2201.11903.
- Large language models as optimizers. arXiv preprint arXiv:2309.03409.
- Tree of thoughts: Deliberate problem solving with large language models. ArXiv, abs/2305.10601.
- Large language models are human-level prompt engineers. ArXiv, abs/2211.01910.
- Revisiting automated prompting: Are we actually doing better? In The 61st Annual Meeting of the Association of Computational Linguistics.