Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Self-Instructed Derived Prompt Generation Meets In-Context Learning: Unlocking New Potential of Black-Box LLMs (2409.01552v1)

Published 3 Sep 2024 in cs.CL and cs.AI

Abstract: LLMs have shown success in generating high-quality responses. In order to achieve better alignment with LLMs with human preference, various works are proposed based on specific optimization process, which, however, is not suitable to Black-Box LLMs like GPT-4, due to inaccessible parameters. In Black-Box LLMs case, their performance is highly dependent on the quality of the provided prompts. Existing methods to enhance response quality often involve a prompt refinement model, yet these approaches potentially suffer from semantic inconsistencies between the refined and original prompts, and typically overlook the relationship between them. To address these challenges, we introduce a self-instructed in-context learning framework that empowers LLMs to deliver more effective responses by generating reliable derived prompts to construct informative contextual environments. Our approach incorporates a self-instructed reinforcement learning mechanism, enabling direct interaction with the response model during derived prompt generation for better alignment. We then formulate querying as an in-context learning task, using responses from LLMs combined with the derived prompts to establish a contextual demonstration for the original prompt. This strategy ensures alignment with the original query, reduces discrepancies from refined prompts, and maximizes the LLMs' in-context learning capability. Extensive experiments demonstrate that the proposed method not only generates more reliable derived prompts but also significantly enhances LLMs' ability to deliver more effective responses, including Black-Box models such as GPT-4.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862.
  2. Constitutional ai: Harmlessness from ai feedback. arXiv preprint arXiv:2212.08073.
  3. Language models are few-shot learners. Advances in neural information processing systems, 33: 1877–1901.
  4. Black-box prompt optimization: Aligning large language models without model training. arXiv preprint arXiv:2311.04155.
  5. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality. See https://vicuna. lmsys. org (accessed 14 April 2023), 2(3): 6.
  6. Free dolly: Introducing the world’s first truly open instruction-tuned llm. Company Blog of Databricks.
  7. DeepSpeed. 2024. DeepSpeed. https://www.deepspeed.ai/. Accessed: 2024-08-08.
  8. Rlprompt: Optimizing discrete text prompts with reinforcement learning. arXiv preprint arXiv:2205.12548.
  9. Devlin, J. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  10. A survey on in-context learning. arXiv preprint arXiv:2301.00234.
  11. Promptbreeder: Self-referential self-improvement via prompt evolution. arXiv preprint arXiv:2309.16797.
  12. BERTese: Learning to speak to BERT. arXiv preprint arXiv:2103.05327.
  13. Enhancing the Capability and Robustness of Large Language Models through Reinforcement Learning-Driven Query Refinement. arXiv preprint arXiv:2407.01461.
  14. How can we know what language models know? Transactions of the Association for Computational Linguistics, 8: 423–438.
  15. PRewrite: Prompt Rewriting with Reinforcement Learning. arXiv preprint arXiv:2401.08189.
  16. Rlaif: Scaling reinforcement learning from human feedback with ai feedback. arXiv preprint arXiv:2309.00267.
  17. Remax: A simple, effective, and efficient method for aligning large language models. arXiv preprint arXiv:2310.10505.
  18. What Makes Good In-Context Examples for GPT-3333? arXiv preprint arXiv:2101.06804.
  19. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9): 1–35.
  20. Noisy channel language model prompting for few-shot text classification. arXiv preprint arXiv:2108.04106.
  21. OpenAI, R.; et al. 2023. GPT-4 technical report. ArXiv, 2303: 08774.
  22. Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35: 27730–27744.
  23. Automatic prompt optimization with” gradient descent” and beam search. arXiv preprint arXiv:2305.03495.
  24. Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems, 36.
  25. Prompt programming for large language models: Beyond the few-shot paradigm. In Extended abstracts of the 2021 CHI conference on human factors in computing systems, 1–7.
  26. Learning to retrieve prompts for in-context learning. arXiv preprint arXiv:2112.08633.
  27. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
  28. Autoprompt: Eliciting knowledge from language models with automatically generated prompts. arXiv preprint arXiv:2010.15980.
  29. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint arXiv:2206.04615.
  30. Learning to summarize with human feedback. Advances in Neural Information Processing Systems, 33: 3008–3021.
  31. Improving and simplifying pattern exploiting training. arXiv preprint arXiv:2103.11955.
  32. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  33. Better zero-shot reasoning with self-adaptive prompting. arXiv preprint arXiv:2305.14106.
  34. Self-instruct: Aligning language models with self-generated instructions. arXiv preprint arXiv:2212.10560.
  35. Pandalm: An automatic evaluation benchmark for llm instruction tuning optimization. arXiv preprint arXiv:2306.05087.
  36. Do prompt-based models really understand the meaning of their prompts? arXiv preprint arXiv:2109.01247.
  37. k𝑘kitalic_k NN prompting: Beyond-context learning with calibration-free nearest neighbor inference. arXiv preprint arXiv:2303.13824.
  38. Qwen2 technical report. arXiv preprint arXiv:2407.10671.
  39. Iterative forward tuning boosts in-context learning in language models. arXiv preprint arXiv:2305.13016.
  40. Zhongjing: Enhancing the chinese medical capabilities of large language model through expert feedback and real-world multi-turn dialogue. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, 19368–19376.
  41. Guess the instruction! flipped learning makes language models stronger zero-shot learners. arXiv preprint arXiv:2210.02969.
  42. Did you read the instructions? rethinking the effectiveness of task definitions in instruction learning. arXiv preprint arXiv:2306.01150.
  43. Why Johnny can’t prompt: how non-AI experts try (and fail) to design LLM prompts. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, 1–21.
  44. Tempera: Test-time prompting via reinforcement learning. arXiv preprint arXiv:2211.11890.
  45. Slic-hf: Sequence likelihood calibration with human feedback. arXiv preprint arXiv:2305.10425.
  46. Judging llm-as-a-judge with mt-bench and chatbot arena. Advances in Neural Information Processing Systems, 36.
  47. Large language models are human-level prompt engineers. arXiv preprint arXiv:2211.01910.

Summary

We haven't generated a summary for this paper yet.