Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Self-Instructed Derived Prompt Generation Meets In-Context Learning: Unlocking New Potential of Black-Box LLMs (2409.01552v1)

Published 3 Sep 2024 in cs.CL and cs.AI

Abstract: LLMs have shown success in generating high-quality responses. In order to achieve better alignment with LLMs with human preference, various works are proposed based on specific optimization process, which, however, is not suitable to Black-Box LLMs like GPT-4, due to inaccessible parameters. In Black-Box LLMs case, their performance is highly dependent on the quality of the provided prompts. Existing methods to enhance response quality often involve a prompt refinement model, yet these approaches potentially suffer from semantic inconsistencies between the refined and original prompts, and typically overlook the relationship between them. To address these challenges, we introduce a self-instructed in-context learning framework that empowers LLMs to deliver more effective responses by generating reliable derived prompts to construct informative contextual environments. Our approach incorporates a self-instructed reinforcement learning mechanism, enabling direct interaction with the response model during derived prompt generation for better alignment. We then formulate querying as an in-context learning task, using responses from LLMs combined with the derived prompts to establish a contextual demonstration for the original prompt. This strategy ensures alignment with the original query, reduces discrepancies from refined prompts, and maximizes the LLMs' in-context learning capability. Extensive experiments demonstrate that the proposed method not only generates more reliable derived prompts but also significantly enhances LLMs' ability to deliver more effective responses, including Black-Box models such as GPT-4.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862.
  2. Constitutional ai: Harmlessness from ai feedback. arXiv preprint arXiv:2212.08073.
  3. Language models are few-shot learners. Advances in neural information processing systems, 33: 1877–1901.
  4. Black-box prompt optimization: Aligning large language models without model training. arXiv preprint arXiv:2311.04155.
  5. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality. See https://vicuna. lmsys. org (accessed 14 April 2023), 2(3): 6.
  6. Free dolly: Introducing the world’s first truly open instruction-tuned llm. Company Blog of Databricks.
  7. DeepSpeed. 2024. DeepSpeed. https://www.deepspeed.ai/. Accessed: 2024-08-08.
  8. Rlprompt: Optimizing discrete text prompts with reinforcement learning. arXiv preprint arXiv:2205.12548.
  9. Devlin, J. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  10. A survey on in-context learning. arXiv preprint arXiv:2301.00234.
  11. Promptbreeder: Self-referential self-improvement via prompt evolution. arXiv preprint arXiv:2309.16797.
  12. BERTese: Learning to speak to BERT. arXiv preprint arXiv:2103.05327.
  13. Enhancing the Capability and Robustness of Large Language Models through Reinforcement Learning-Driven Query Refinement. arXiv preprint arXiv:2407.01461.
  14. How can we know what language models know? Transactions of the Association for Computational Linguistics, 8: 423–438.
  15. PRewrite: Prompt Rewriting with Reinforcement Learning. arXiv preprint arXiv:2401.08189.
  16. Rlaif: Scaling reinforcement learning from human feedback with ai feedback. arXiv preprint arXiv:2309.00267.
  17. Remax: A simple, effective, and efficient method for aligning large language models. arXiv preprint arXiv:2310.10505.
  18. What Makes Good In-Context Examples for GPT-3333? arXiv preprint arXiv:2101.06804.
  19. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9): 1–35.
  20. Noisy channel language model prompting for few-shot text classification. arXiv preprint arXiv:2108.04106.
  21. OpenAI, R.; et al. 2023. GPT-4 technical report. ArXiv, 2303: 08774.
  22. Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35: 27730–27744.
  23. Automatic prompt optimization with” gradient descent” and beam search. arXiv preprint arXiv:2305.03495.
  24. Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems, 36.
  25. Prompt programming for large language models: Beyond the few-shot paradigm. In Extended abstracts of the 2021 CHI conference on human factors in computing systems, 1–7.
  26. Learning to retrieve prompts for in-context learning. arXiv preprint arXiv:2112.08633.
  27. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
  28. Autoprompt: Eliciting knowledge from language models with automatically generated prompts. arXiv preprint arXiv:2010.15980.
  29. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint arXiv:2206.04615.
  30. Learning to summarize with human feedback. Advances in Neural Information Processing Systems, 33: 3008–3021.
  31. Improving and simplifying pattern exploiting training. arXiv preprint arXiv:2103.11955.
  32. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  33. Better zero-shot reasoning with self-adaptive prompting. arXiv preprint arXiv:2305.14106.
  34. Self-instruct: Aligning language models with self-generated instructions. arXiv preprint arXiv:2212.10560.
  35. Pandalm: An automatic evaluation benchmark for llm instruction tuning optimization. arXiv preprint arXiv:2306.05087.
  36. Do prompt-based models really understand the meaning of their prompts? arXiv preprint arXiv:2109.01247.
  37. k𝑘kitalic_k NN prompting: Beyond-context learning with calibration-free nearest neighbor inference. arXiv preprint arXiv:2303.13824.
  38. Qwen2 technical report. arXiv preprint arXiv:2407.10671.
  39. Iterative forward tuning boosts in-context learning in language models. arXiv preprint arXiv:2305.13016.
  40. Zhongjing: Enhancing the chinese medical capabilities of large language model through expert feedback and real-world multi-turn dialogue. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, 19368–19376.
  41. Guess the instruction! flipped learning makes language models stronger zero-shot learners. arXiv preprint arXiv:2210.02969.
  42. Did you read the instructions? rethinking the effectiveness of task definitions in instruction learning. arXiv preprint arXiv:2306.01150.
  43. Why Johnny can’t prompt: how non-AI experts try (and fail) to design LLM prompts. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, 1–21.
  44. Tempera: Test-time prompting via reinforcement learning. arXiv preprint arXiv:2211.11890.
  45. Slic-hf: Sequence likelihood calibration with human feedback. arXiv preprint arXiv:2305.10425.
  46. Judging llm-as-a-judge with mt-bench and chatbot arena. Advances in Neural Information Processing Systems, 36.
  47. Large language models are human-level prompt engineers. arXiv preprint arXiv:2211.01910.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Zhuo Li (164 papers)
  2. Yuhao Du (18 papers)
  3. Jinpeng Hu (10 papers)
  4. Xiang Wan (94 papers)
  5. Anningzhe Gao (22 papers)

Summary

We haven't generated a summary for this paper yet.