Self-Instructed Derived Prompt Generation Meets In-Context Learning: Unlocking New Potential of Black-Box LLMs (2409.01552v1)
Abstract: LLMs have shown success in generating high-quality responses. In order to achieve better alignment with LLMs with human preference, various works are proposed based on specific optimization process, which, however, is not suitable to Black-Box LLMs like GPT-4, due to inaccessible parameters. In Black-Box LLMs case, their performance is highly dependent on the quality of the provided prompts. Existing methods to enhance response quality often involve a prompt refinement model, yet these approaches potentially suffer from semantic inconsistencies between the refined and original prompts, and typically overlook the relationship between them. To address these challenges, we introduce a self-instructed in-context learning framework that empowers LLMs to deliver more effective responses by generating reliable derived prompts to construct informative contextual environments. Our approach incorporates a self-instructed reinforcement learning mechanism, enabling direct interaction with the response model during derived prompt generation for better alignment. We then formulate querying as an in-context learning task, using responses from LLMs combined with the derived prompts to establish a contextual demonstration for the original prompt. This strategy ensures alignment with the original query, reduces discrepancies from refined prompts, and maximizes the LLMs' in-context learning capability. Extensive experiments demonstrate that the proposed method not only generates more reliable derived prompts but also significantly enhances LLMs' ability to deliver more effective responses, including Black-Box models such as GPT-4.
- Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862.
- Constitutional ai: Harmlessness from ai feedback. arXiv preprint arXiv:2212.08073.
- Language models are few-shot learners. Advances in neural information processing systems, 33: 1877–1901.
- Black-box prompt optimization: Aligning large language models without model training. arXiv preprint arXiv:2311.04155.
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality. See https://vicuna. lmsys. org (accessed 14 April 2023), 2(3): 6.
- Free dolly: Introducing the world’s first truly open instruction-tuned llm. Company Blog of Databricks.
- DeepSpeed. 2024. DeepSpeed. https://www.deepspeed.ai/. Accessed: 2024-08-08.
- Rlprompt: Optimizing discrete text prompts with reinforcement learning. arXiv preprint arXiv:2205.12548.
- Devlin, J. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- A survey on in-context learning. arXiv preprint arXiv:2301.00234.
- Promptbreeder: Self-referential self-improvement via prompt evolution. arXiv preprint arXiv:2309.16797.
- BERTese: Learning to speak to BERT. arXiv preprint arXiv:2103.05327.
- Enhancing the Capability and Robustness of Large Language Models through Reinforcement Learning-Driven Query Refinement. arXiv preprint arXiv:2407.01461.
- How can we know what language models know? Transactions of the Association for Computational Linguistics, 8: 423–438.
- PRewrite: Prompt Rewriting with Reinforcement Learning. arXiv preprint arXiv:2401.08189.
- Rlaif: Scaling reinforcement learning from human feedback with ai feedback. arXiv preprint arXiv:2309.00267.
- Remax: A simple, effective, and efficient method for aligning large language models. arXiv preprint arXiv:2310.10505.
- What Makes Good In-Context Examples for GPT-3333? arXiv preprint arXiv:2101.06804.
- Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9): 1–35.
- Noisy channel language model prompting for few-shot text classification. arXiv preprint arXiv:2108.04106.
- OpenAI, R.; et al. 2023. GPT-4 technical report. ArXiv, 2303: 08774.
- Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35: 27730–27744.
- Automatic prompt optimization with” gradient descent” and beam search. arXiv preprint arXiv:2305.03495.
- Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems, 36.
- Prompt programming for large language models: Beyond the few-shot paradigm. In Extended abstracts of the 2021 CHI conference on human factors in computing systems, 1–7.
- Learning to retrieve prompts for in-context learning. arXiv preprint arXiv:2112.08633.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
- Autoprompt: Eliciting knowledge from language models with automatically generated prompts. arXiv preprint arXiv:2010.15980.
- Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint arXiv:2206.04615.
- Learning to summarize with human feedback. Advances in Neural Information Processing Systems, 33: 3008–3021.
- Improving and simplifying pattern exploiting training. arXiv preprint arXiv:2103.11955.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
- Better zero-shot reasoning with self-adaptive prompting. arXiv preprint arXiv:2305.14106.
- Self-instruct: Aligning language models with self-generated instructions. arXiv preprint arXiv:2212.10560.
- Pandalm: An automatic evaluation benchmark for llm instruction tuning optimization. arXiv preprint arXiv:2306.05087.
- Do prompt-based models really understand the meaning of their prompts? arXiv preprint arXiv:2109.01247.
- k𝑘kitalic_k NN prompting: Beyond-context learning with calibration-free nearest neighbor inference. arXiv preprint arXiv:2303.13824.
- Qwen2 technical report. arXiv preprint arXiv:2407.10671.
- Iterative forward tuning boosts in-context learning in language models. arXiv preprint arXiv:2305.13016.
- Zhongjing: Enhancing the chinese medical capabilities of large language model through expert feedback and real-world multi-turn dialogue. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, 19368–19376.
- Guess the instruction! flipped learning makes language models stronger zero-shot learners. arXiv preprint arXiv:2210.02969.
- Did you read the instructions? rethinking the effectiveness of task definitions in instruction learning. arXiv preprint arXiv:2306.01150.
- Why Johnny can’t prompt: how non-AI experts try (and fail) to design LLM prompts. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, 1–21.
- Tempera: Test-time prompting via reinforcement learning. arXiv preprint arXiv:2211.11890.
- Slic-hf: Sequence likelihood calibration with human feedback. arXiv preprint arXiv:2305.10425.
- Judging llm-as-a-judge with mt-bench and chatbot arena. Advances in Neural Information Processing Systems, 36.
- Large language models are human-level prompt engineers. arXiv preprint arXiv:2211.01910.
- Zhuo Li (164 papers)
- Yuhao Du (18 papers)
- Jinpeng Hu (10 papers)
- Xiang Wan (94 papers)
- Anningzhe Gao (22 papers)