Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 32 tok/s Pro
GPT-5 High 33 tok/s Pro
GPT-4o 108 tok/s Pro
Kimi K2 207 tok/s Pro
GPT OSS 120B 435 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Joint Prompt Optimization of Stacked LLMs using Variational Inference (2306.12509v2)

Published 21 Jun 2023 in cs.CL and cs.LG

Abstract: LLMs can be seen as atomic units of computation mapping sequences to a distribution over sequences. Thus, they can be seen as stochastic language layers in a language network, where the learnable parameters are the natural language prompts at each layer. By stacking two such layers and feeding the output of one layer to the next, we obtain a Deep Language Network (DLN). We first show how to effectively perform prompt optimization for a 1-Layer language network (DLN-1). Then, we present an extension that applies to 2-layer DLNs (DLN-2), where two prompts must be learned. The key idea is to consider the output of the first layer as a latent variable, which requires inference, and prompts to be learned as the parameters of the generative distribution. We first test the effectiveness of DLN-1 in multiple reasoning and natural language understanding tasks. Then, we show that DLN-2 can reach higher performance than a single layer, showing promise that we might reach comparable performance to GPT-4, even when each LLM in the network is smaller and less powerful.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. Learning to few-shot learn across diverse natural language classification tasks. In Proceedings of the 28th International Conference on Computational Linguistics, pages 5108–5123, Barcelona, Spain (Online). International Committee on Computational Linguistics.
  2. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pages 610–623.
  3. Variational inference: A review for statisticians. Journal of the American statistical Association, 112(518):859–877.
  4. Responsible language technologies: Foreseeing and mitigating harms. In Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems, CHI EA ’22, New York, NY, USA. Association for Computing Machinery.
  5. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  6. Selection-inference: Exploiting large language models for interpretable logical reasoning. arXiv preprint arXiv:2205.09712.
  7. Rlprompt: Optimizing discrete text prompts with reinforcement learning. In Goldberg, Y., Kozareva, Z., and Zhang, Y., editors, Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pages 3369–3391. Association for Computational Linguistics.
  8. Language model cascades. arXiv preprint arXiv:2207.10342.
  9. Improving factuality and reasoning in language models through multiagent debate.
  10. Making pre-trained language models better few-shot learners. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3816–3830, Online. Association for Computational Linguistics.
  11. Reasoning with language model is planning with world model. arXiv preprint arXiv:2305.14992.
  12. Instruction induction: From few examples to natural language task descriptions.
  13. Understanding by understanding not: Modeling negation in language models. In Toutanova, K., Rumshisky, A., Zettlemoyer, L., Hakkani-Tür, D., Beltagy, I., Bethard, S., Cotterell, R., Chakraborty, T., and Zhou, Y., editors, Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, June 6-11, 2021, pages 1301–1312. Association for Computational Linguistics.
  14. Inner monologue: Embodied reasoning through planning with language models. arXiv preprint arXiv:2207.05608.
  15. Decomposed prompting: A modular approach for solving complex tasks. International Conference on Learning Representations.
  16. Auto-encoding variational bayes.
  17. Large language models are zero-shot reasoners. In Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A., editors, Advances in Neural Information Processing Systems, volume 35, pages 22199–22213. Curran Associates, Inc.
  18. Internet-augmented language models through few-shot prompting for open-domain question answering. arXiv preprint arXiv:2203.05115.
  19. Generated knowledge prompting for commonsense reasoning. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3154–3169.
  20. What makes good in-context examples for gpt-3333?
  21. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35.
  22. Mind’s eye: Grounded language model reasoning through simulation. arXiv preprint arXiv:2210.05359.
  23. Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8086–8098, Dublin, Ireland. Association for Computational Linguistics.
  24. Self-refine: Iterative refinement with self-feedback.
  25. Augmented language models: a survey.
  26. Language as a latent variable: Discrete generative models for sentence compression. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 319–328, Austin, Texas. Association for Computational Linguistics.
  27. Language as a latent variable: Discrete generative models for sentence compression. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 319–328.
  28. Rethinking the role of demonstrations: What makes in-context learning work? In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11048–11064, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  29. Xtremedistil: Multi-stage distillation for massive multilingual models. In Jurafsky, D., Chai, J., Schluter, N., and Tetreault, J. R., editors, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, pages 2221–2234. Association for Computational Linguistics.
  30. Show your work: Scratchpads for intermediate computation with language models. arXiv preprint arXiv:2112.00114.
  31. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
  32. Grips: Gradient-free, edit-based instruction search for prompting large language models. arXiv preprint arXiv:2203.07281.
  33. Measuring and narrowing the compositionality gap in language models. arXiv preprint arXiv:2210.03350.
  34. Automatic prompt optimization with" gradient descent" and beam search. arXiv preprint arXiv:2305.03495.
  35. Learning to retrieve prompts for in-context learning. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2655–2671.
  36. Distilbert, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR, abs/1910.01108.
  37. Autoprompt: Eliciting knowledge from language models with automatically generated prompts. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4222–4235.
  38. Reflexion: an autonomous agent with dynamic memory and self-reflection. arXiv preprint arXiv:2303.11366.
  39. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint arXiv:2206.04615.
  40. Selective annotation makes language models better few-shot learners. International Conference on Learning Representations.
  41. Recitation-augmented language models. arXiv preprint arXiv:2210.01296.
  42. Challenging big-bench tasks and whether chain-of-thought can solve them. arXiv preprint arXiv:2210.09261.
  43. Distilling task-specific knowledge from BERT into simple neural networks. CoRR, abs/1903.12136.
  44. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  45. Llama 2: Open foundation and fine-tuned chat models.
  46. Self-consistency improves chain of thought reasoning in language models. International Conference on Learning Representations.
  47. Large language models are implicitly topic models: Explaining and finding good demonstrations for in-context learning.
  48. Chain-of-thought prompting elicits reasoning in large language models. In NeurIPS.
  49. Taxonomy of risks posed by language models. In 2022 ACM Conference on Fairness, Accountability, and Transparency, pages 214–229.
  50. Ai chains: Transparent and controllable human-ai interaction by chaining large language model prompts. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, pages 1–22.
  51. Wizardlm: Empowering large language models to follow complex instructions.
  52. Reprompting: Automated chain-of-thought prompt inference through gibbs sampling. arXiv preprint arXiv:2305.09993.
  53. Tree of thoughts: Deliberate problem solving with large language models.
  54. React: Synergizing reasoning and acting in language models. International Conference on Learning Representations.
  55. Automatic chain of thought prompting in large language models. International Conference on Learning Representations.
  56. Least-to-most prompting enables complex reasoning in large language models. International Conference on Learning Representations.
  57. Large language models are human-level prompt engineers. International Conference on Learning Representations.
Citations (18)

Summary

  • The paper introduces a variational inference framework that jointly optimizes prompts across layered LLMs, significantly enhancing task performance.
  • The method decomposes complex language tasks into sub-tasks via a two-layer Deep Language Network, achieving competitive results against models like GPT-4.
  • Empirical results demonstrate that the optimized DLN approach improves accuracy in reasoning and natural language understanding tasks through modular training.

Joint Prompt Optimization of Stacked LLMs Using Variational Inference

The paper "Joint Prompt Optimization of Stacked LLMs Using Variational Inference" explores a novel approach for enhancing the performance of LLMs through a structured, multi-layer configuration termed Deep Language Networks (DLNs). This work addresses the challenge of optimizing natural language prompts, which act as learnable parameters, in multi-layer LLM architectures. The authors propose a method that incorporates variational inference to jointly optimize these prompts across layers, promising improved efficiency and performance compared to traditional approaches.

Overview of the Approach

The research begins by conceptualizing LLMs as stochastic language layers where each layer processes inputs via a LLM (LM) and produces textual outputs. By stacking these layers, DLNs facilitate a division of complex tasks into smaller, manageable subtasks solved by sequential LLM calls with layer-specific prompts.

  1. Single-Layer Optimization (DLN-1): The authors introduce techniques for optimizing prompts in a single-layer architecture, referencing methodologies akin to Automatic Prompt Engineer (APE) protocols. An essential contribution is the emphasis on prompt optimization that leverages both instruction directives and task-context examples to achieve improved downstream task performance.
  2. Two-Layer Optimization (DLN-2): Extending the concept to two layers involves treating the intermediate output of the first layer as a latent variable. Variational inference is employed to maximize a variational lower bound, optimizing both layers' prompts jointly. The aim is to demonstrate competitive performance with large models like GPT-4 while using smaller-sized LLMs in the DLN setup.

Strong Results and Findings

The paper reports significant empirical success. The DLN models, particularly DLN-2, show enhanced performance across various reasoning and natural language understanding tasks, suggesting that hierarchical task decomposition is effective in language modeling. For instance, the DLN models outperform existing prompt optimization techniques and, in some cases, rival the capabilities of larger LLMs.

  • Numerical Evidence: The paper highlights marked improvements in accuracy for tasks such as sentiment analysis and spatial reasoning when utilizing DLN-2.
  • Competitive Performance: DLN-2 exhibits potential for achieving performance on par with much larger models like GPT-4 by strategically leveraging the stacking of smaller LLMs.

Implications for Future Developments

This paper pushes the boundaries of modularity in LLMs, highlighting the advantages of viewing language processing tasks as networks of interdependent components. Future developments could involve:

  • Enhanced Modular Training: Training LLMs in a modular fashion could reduce the need for large datasets and fine-tuning resources traditionally required for massive LLMs.
  • Adaptive Systems: The modularity of DLNs could aid in building LLM systems that are adaptable and customizable for diverse applications with minimal resource expenditure.
  • Exploration of Deeper Networks: While this paper focuses on one- and two-layer networks, extending the framework to more layers could further leverage the benefits of deep architectures in language processing tasks.

Conclusion

The paper presents an intriguing approach to optimizing prompts for stacked LLMs, utilizing variational inference to streamline and enhance prompt optimization in multi-layer architectures. This method shows promise for not only improving model efficiency and output quality but also setting the stage for more adaptable, resource-efficient LLM systems. As LLMs continue to evolve, the principles set forth in this paper could guide future research and development in building modular and scalable natural language processing systems.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 3 tweets and received 262 likes.

Upgrade to Pro to view all of the tweets about this paper: