Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Efficient Prompting Methods for Large Language Models: A Survey (2404.01077v2)

Published 1 Apr 2024 in cs.CL
Efficient Prompting Methods for Large Language Models: A Survey

Abstract: Prompting is a mainstream paradigm for adapting LLMs to specific natural language processing tasks without modifying internal parameters. Therefore, detailed supplementary knowledge needs to be integrated into external prompts, which inevitably brings extra human efforts and computational burdens for practical applications. As an effective solution to mitigate resource consumption, Efficient Prompting Methods have attracted a wide range of attention. We provide mathematical expressions at a high level to deeply discuss Automatic Prompt Engineering for different prompt components and Prompt Compression in continuous and discrete spaces. Finally, we highlight promising future directions to inspire researchers interested in this field.

The paper "Efficient Prompting Methods for LLMs: A Survey" presents a comprehensive overview of methods designed to improve the efficiency of prompting LLMs. Prompting has become a mainstream paradigm for adapting LLMs to specific Natural Language Processing (NLP) tasks, enabling in-context learning. However, the use of lengthy and complex prompts increases computational costs and necessitates manual design efforts. Efficient prompting methods aim to address these challenges by reducing computational burden and optimizing prompt design.

The paper categorizes efficient prompting methods into two main approaches:

  • Prompting with efficient computation
  • Prompting with efficient design

The paper reviews advances in efficient prompting and highlights potential future research directions.

The paper begins with an introduction to the background of prompting, highlighting the evolution of Pre-trained LLMs (PLMs) and the shift in NLP paradigms from fully supervised learning to pre-train and fine-tune, and eventually to pre-train, prompt, and predict. The development of Generative Pre-trained Transformer 3 (GPT-3) and ChatGPT has solidified the "LLM + prompting" paradigm, leading to increased interest in effective prompting methods that conserve resources.

Prompt expressions are categorized into hard prompts (discrete natural language prompts) and soft prompts (continuous learnable vectors). Hard prompts are suitable for generative LLMs like the GPT series, while soft prompts, such as adapters and prefixes, enhance efficient training by exploring different embedding positions. The challenges associated with hard prompts include lengthy prompt content and the difficulty of prompt design. Lengthy prompts can strain the limited context window of LLMs and increase computational costs. The discrete nature of natural language makes manual prompt design challenging, relying heavily on empirical knowledge and human subjectivity.

The paper then discusses "Prompting with Efficient Computation," which aims to alleviate the economic burden of lengthy prompts on both open-source and closed-source LLMs. This involves prompt compression techniques, categorized into text-to-vector level and text-to-text level approaches, to extract essential information from original prompts while maintaining comparable performance.

  • Knowledge Distillation (KD): This classic compression method involves directing a lightweight student model to mimic a better-performing teacher model. KD methods compress the natural language information of the hard prompt inside LLMs through soft prompt tuning. The loss function is typically Kullback-Leibler Divergence between teacher and student.

    LS(pSc,c)=Ex[pT(yc,x)pSc(yx)]\mathcal{L}_{S}(p^{c}_{S}, c) = \mathbb{E}_{x} \left[ {p_{T}(y \mid c, x)} {p^{c}_{S}(y \mid x)} \right]

    where:

    • LS\mathcal{L}_{S} is the loss function for the student model
    • pScp^{c}_{S} is the probability distribution of the student model given prompt cc
    • cc is the hard prompts
    • pTp_{T} is the probability distribution of the teacher model
    • yy is the output
    • xx is the input
  • Encoding: This text-to-vector level compression involves fine-tuning LMs with a Cross-Entropy objective, compressing the extensive information of hard prompts into concise vectors accessible to the model. Semantic information from all modalities in the context is valuable for prompting LLMs.
  • Filtering: This text-to-text level compression evaluates the information entropy of different lexical structures in the prompt using a lightweight LM and filters out redundant information to simplify user prompts. The concept of "self-information" is used to quantify the amount of information within a prompt.

"Prompting with Efficient Design" addresses the growing complexity of prompt content by automating prompt optimization based on Prompt Engineering. This involves finding the best natural language prompt within a given search space to maximize task accuracy. The paper explores this problem from the perspectives of traditional mathematical optimization and intelligent algorithmic optimization, dividing this section into gradient-based and evolution-based approaches.

  • Gradient-based methods: These methods involve using gradient-descent algorithms to update parameters in neural networks. However, since hard prompts are discrete, researchers have investigated suitable gradient-based optimization frameworks for open-source and closed-source models separately. For open-source models, fine-tuning can be performed based on the real gradient, while for closed-source models, the gradient can only be imitated for prompting.
  • Evolution-based methods: These methods simulate the biological evolution process of "survival of the fittest" in nature, involving random searches by sampling objective functions. This approach exploits the diversity of samples in the search space and explores the optimization direction through iterations.

Finally, the paper abstracts the efficient prompting paradigm into a multi-objective optimization problem, with the overall objective of compressing prompts to reduce computational complexity while optimizing LLM task accuracy. The overall optimization formulation is defined as Ftotal\mathcal{F}_{\text{total}}:

$\mathcal{F}_{\text{total} = \lambda_1 \cdot \mathcal{F}_{\text{compression}(\widetilde{X}) + \lambda_2 \cdot \mathcal{F}_{\text{accuracy}(\Theta)}$

where:

  • Ftotal\mathcal{F}_{\text{total}} is the total optimization objective
  • Fcompression\mathcal{F}_{\text{compression}} is the objective for prompt compression
  • Faccuracy\mathcal{F}_{\text{accuracy}} is the objective for task accuracy
  • X~\widetilde{X} denotes the prompts that are compressed
  • Θ\Theta denotes the accessible parameters
  • λ1\lambda_1 and λ2\lambda_2 are weighting factors

$\mathcal{F}_{\text{compression}(\widetilde{X}) = \min \mathcal{D} \left( Y(\widetilde{X} \mid \text{argmax} I(\widetilde{X})), Y(X) \right)$

where:

  • D()\mathcal{D}(\cdot) denotes the discrepancy between the outputs before and after prompt compression
  • I(x)I(x) denotes the information entropy metrics that identify the amount of information
  • YY is the output
  • XX is the input

The paper concludes by summarizing efficient prompting methods for LLMs, highlighting their connections and abstracting these approaches from a theoretical perspective. It also provides a list of open-source projects and a typology diagram to overview the efficient prompting field.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. A general language assistant as a laboratory for alignment. ArXiv preprint, abs/2112.00861, 2021. URL https://arxiv.org/abs/2112.00861.
  2. Language models are few-shot learners. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (eds.), Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020. URL https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html.
  3. Sparks of artificial general intelligence: Early experiments with gpt-4. ArXiv preprint, abs/2303.12712, 2023. URL https://arxiv.org/abs/2303.12712.
  4. Recurrent memory transformer. ArXiv preprint, abs/2207.06881, 2022. URL https://arxiv.org/abs/2207.06881.
  5. Evoprompting: Language models for code-level neural architecture search. ArXiv preprint, abs/2302.14838, 2023a. URL https://arxiv.org/abs/2302.14838.
  6. Instructzero: Efficient instruction optimization for black-box large language models. ArXiv preprint, abs/2306.03082, 2023b. URL https://arxiv.org/abs/2306.03082.
  7. Adapting language models to compress contexts. ArXiv preprint, abs/2305.14788, 2023. URL https://arxiv.org/abs/2305.14788.
  8. Prompt injection: Parameterization of fixed inputs. ArXiv preprint, abs/2206.11349, 2022. URL https://arxiv.org/abs/2206.11349.
  9. RLPrompt: Optimizing discrete text prompts with reinforcement learning. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp.  3369–3391, Abu Dhabi, United Arab Emirates, 2022. Association for Computational Linguistics. URL https://aclanthology.org/2022.emnlp-main.222.
  10. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp.  4171–4186, Minneapolis, Minnesota, 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1423. URL https://aclanthology.org/N19-1423.
  11. Delta tuning: A comprehensive study of parameter efficient methods for pre-trained language models. ArXiv preprint, abs/2203.06904, 2022. URL https://arxiv.org/abs/2203.06904.
  12. A survey on in-context learning. 2022. URL https://api.semanticscholar.org/CorpusID:255372865.
  13. Extending context window of large language models via semantic compression. ArXiv preprint, abs/2312.09571, 2023. URL https://arxiv.org/abs/2312.09571.
  14. Promptbreeder: Self-referential self-improvement via prompt evolution. ArXiv preprint, abs/2309.16797, 2023. URL https://arxiv.org/abs/2309.16797.
  15. An image is worth one word: Personalizing text-to-image generation using textual inversion. ArXiv preprint, abs/2208.01618, 2022. URL https://arxiv.org/abs/2208.01618.
  16. Extensible prompts for language models on zero-shot language style customization. 2022. URL https://api.semanticscholar.org/CorpusID:254125409.
  17. In-context autoencoder for context compression in a large language model. ArXiv preprint, abs/2307.06945, 2023. URL https://arxiv.org/abs/2307.06945.
  18. Connecting large language models with evolutionary algorithms yields powerful prompt optimizers. ArXiv preprint, abs/2309.08532, 2023. URL https://arxiv.org/abs/2309.08532.
  19. Optimizing prompts for text-to-image generation. ArXiv preprint, abs/2212.09611, 2022. URL https://arxiv.org/abs/2212.09611.
  20. In-context learning creates task vectors. ArXiv preprint, abs/2310.15916, 2023. URL https://arxiv.org/abs/2310.15916.
  21. Distilling the knowledge in a neural network. ArXiv preprint, abs/1503.02531, 2015. URL https://arxiv.org/abs/1503.02531.
  22. John H. Holland. Adaptation in natural and artificial systems: An introductory analysis with applications to biology, control, and artificial intelligence. 1992. URL https://api.semanticscholar.org/CorpusID:58781161.
  23. Parameter-efficient transfer learning for NLP. In Kamalika Chaudhuri and Ruslan Salakhutdinov (eds.), Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pp.  2790–2799. PMLR, 2019. URL http://proceedings.mlr.press/v97/houlsby19a.html.
  24. Automatic engineering of long prompts. ArXiv preprint, abs/2311.10117, 2023. URL https://arxiv.org/abs/2311.10117.
  25. Llmlingua: Compressing prompts for accelerated inference of large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp.  13358–13376, 2023a.
  26. Longllmlingua: Accelerating and enhancing llms in long context scenarios via prompt compression. ArXiv preprint, abs/2310.06839, 2023b. URL https://arxiv.org/abs/2310.06839.
  27. Promptkd: Distilling student-friendly knowledge for generative language models via prompt tuning, 2024.
  28. The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp.  3045–3059, Online and Punta Cana, Dominican Republic, 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-main.243. URL https://aclanthology.org/2021.emnlp-main.243.
  29. Ode transformer: An ordinary differential equation-inspired model for neural machine translation. ArXiv preprint, abs/2104.02308, 2021. URL https://arxiv.org/abs/2104.02308.
  30. Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp.  4582–4597, Online, 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.acl-long.353. URL https://aclanthology.org/2021.acl-long.353.
  31. Compressing context to enhance inference efficiency of large language models. In Conference on Empirical Methods in Natural Language Processing, 2023. URL https://api.semanticscholar.org/CorpusID:263830231.
  32. Use your instinct: Instruction optimization using neural bandits coupled with transformers. ArXiv preprint, abs/2310.02905, 2023. URL https://arxiv.org/abs/2310.02905.
  33. Tcra-llm: Token compression retrieval augmented large language model for inference cost reduction. In Conference on Empirical Methods in Natural Language Processing, 2023a. URL https://api.semanticscholar.org/CorpusID:264439519.
  34. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35, 2023b.
  35. P-tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp.  61–68, Dublin, Ireland, 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.acl-short.8. URL https://aclanthology.org/2022.acl-short.8.
  36. Learning to compress prompts with gist tokens. ArXiv preprint, abs/2304.08467, 2023. URL https://arxiv.org/abs/2304.08467.
  37. Llmlingua-2: Data distillation for efficient and faithful task-agnostic prompt compression. 2024. URL https://api.semanticscholar.org/CorpusID:268531237.
  38. Hypertuning: Toward adapting large language models without back-propagation. In International Conference on Machine Learning, 2022. URL https://api.semanticscholar.org/CorpusID:253761398.
  39. GrIPS: Gradient-free, edit-based instruction search for prompting large language models. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pp.  3845–3864, Dubrovnik, Croatia, 2023. Association for Computational Linguistics. URL https://aclanthology.org/2023.eacl-main.277.
  40. Measuring and narrowing the compositionality gap in language models. ArXiv preprint, abs/2210.03350, 2022. URL https://arxiv.org/abs/2210.03350.
  41. Automatic prompt optimization with ”gradient descent” and beam search. In Conference on Empirical Methods in Natural Language Processing, 2023. URL https://api.semanticscholar.org/CorpusID:258546785.
  42. Nugget: Neural agglomerative embeddings of text. ArXiv preprint, abs/2310.01732, 2023. URL https://arxiv.org/abs/2310.01732.
  43. Improving language understanding by generative pre-training. 2018.
  44. Efficient content-based sparse attention with routing transformers. Transactions of the Association for Computational Linguistics, 9:53–68, 2021. doi: 10.1162/tacl˙a˙00353. URL https://aclanthology.org/2021.tacl-1.4.
  45. Claude E. Shannon. A mathematical theory of communication. Bell Syst. Tech. J., 27:623–656, 1948. URL https://api.semanticscholar.org/CorpusID:55379485.
  46. Eliciting knowledge from language models using automatically generated prompts. ArXiv preprint, abs/2010.15980, 2020. URL https://arxiv.org/abs/2010.15980.
  47. Learning by distilling context. ArXiv preprint, abs/2209.15189, 2022. URL https://arxiv.org/abs/2209.15189.
  48. Differential evolution – a simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization, 11:341–359, 1997. URL https://api.semanticscholar.org/CorpusID:5297867.
  49. Roformer: Enhanced transformer with rotary position embedding. ArXiv preprint, abs/2104.09864, 2021. URL https://arxiv.org/abs/2104.09864.
  50. Llama: Open and efficient foundation language models. ArXiv preprint, abs/2302.13971, 2023. URL https://arxiv.org/abs/2302.13971.
  51. Attention is all you need. In Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (eds.), Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pp.  5998–6008, 2017. URL https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.
  52. Efficient large language models: A survey. ArXiv preprint, abs/2312.03863, 2023. URL https://arxiv.org/abs/2312.03863.
  53. Label words are anchors: An information flow perspective for understanding in-context learning. ArXiv preprint, abs/2305.14160, 2023a. URL https://arxiv.org/abs/2305.14160.
  54. Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models. In Annual Meeting of the Association for Computational Linguistics, 2023b. URL https://api.semanticscholar.org/CorpusID:258558102.
  55. Self-consistency improves chain of thought reasoning in language models. ArXiv preprint, abs/2203.11171, 2022. URL https://arxiv.org/abs/2203.11171.
  56. Emergent abilities of large language models. ArXiv preprint, abs/2206.07682, 2022a. URL https://arxiv.org/abs/2206.07682.
  57. Chain of thought prompting elicits reasoning in large language models. ArXiv preprint, abs/2201.11903, 2022b. URL https://arxiv.org/abs/2201.11903.
  58. Hard prompts made easy: Gradient-based discrete optimization for prompt tuning and discovery. ArXiv preprint, abs/2302.03668, 2023. URL https://arxiv.org/abs/2302.03668.
  59. Prompt compression and contrastive conditioning for controllability and toxicity reduction in language models. In Findings of the Association for Computational Linguistics: EMNLP 2022, pp.  5621–5634, Abu Dhabi, United Arab Emirates, 2022. Association for Computational Linguistics. URL https://aclanthology.org/2022.findings-emnlp.412.
  60. Introduction to transformers: an nlp perspective. ArXiv preprint, abs/2311.17633, 2023. URL https://arxiv.org/abs/2311.17633.
  61. Large language models as optimizers. ArXiv preprint, abs/2309.03409, 2023. URL https://arxiv.org/abs/2309.03409.
  62. Dialclip: Empowering clip as multi-modal dialog retriever. ArXiv preprint, abs/2401.01076, 2024. URL https://arxiv.org/abs/2401.01076.
  63. Least-to-most prompting enables complex reasoning in large language models. ArXiv preprint, abs/2205.10625, 2022a. URL https://arxiv.org/abs/2205.10625.
  64. Large language models are human-level prompt engineers. ArXiv preprint, abs/2211.01910, 2022b. URL https://arxiv.org/abs/2211.01910.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Kaiyan Chang (10 papers)
  2. Songcheng Xu (2 papers)
  3. Chenglong Wang (80 papers)
  4. Yingfeng Luo (9 papers)
  5. Tong Xiao (119 papers)
  6. Jingbo Zhu (79 papers)
  7. Xiaoqian Liu (24 papers)
Citations (18)
Reddit Logo Streamline Icon: https://streamlinehq.com