PILLOW: Enhancing Efficient Instruction Fine-tuning via Prompt Matching (2312.05621v2)
Abstract: Instruction fine-tuning has conventionally been employed to adapt LLMs to a variety of tasks. Nonetheless, this technique often necessitates substantial computational resources, making it impractical for deployment by individuals or small-scale entities. Recently, Low-Rank Adaptation (LoRA) has become a promising alternative, offering high capabilities on par with full tuning with reduced resource overhead. However, attaining satisfactory performance through the fine-tuning of LoRA is a non-trivial challenge. In this paper, we propose PILLOW, which aims to improve LoRA's performance by a discrimination-based prompting method, leveraging LLMs' In-Context Learning ability. PILLOW incorporates a matching network that selects prompts from a user-defined prompt pool, concatenates the selected prompts with the user instruction as input, and performs inference using the LoRA-fine-tuned LLMs. Trained with Reinforcement Learning, PILLOW exhibits commensurate performance on various evaluation metrics compared with typical instruction fine-tuning methods, utilizing only consumer-grade GPU resources and exhibiting a large reduction in computational costs.
- Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862.
- One-for-all: Generalized lora for parameter-efficient fine-tuning. arXiv preprint arXiv:2306.07967.
- A model-based hybrid soft actor-critic deep reinforcement learning algorithm for optimal ventilator settings. Information sciences, 611:47–64.
- Free dolly: Introducing the world’s first truly open instruction-tuned llm.
- Reinforcement learning for intelligent healthcare applications: A survey. Artificial Intelligence in Medicine, 109:101964.
- Rlprompt: Optimizing discrete text prompts with reinforcement learning. arXiv preprint arXiv:2205.12548.
- Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314.
- A survey for in-context learning. arXiv preprint arXiv:2301.00234.
- Gptscore: Evaluate as you desire. arXiv preprint arXiv:2302.04166.
- Recent advances in reinforcement learning in finance. Mathematical Finance, 33(3):437–503.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.
- The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691.
- Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190.
- What makes good in-context examples for gpt-3333? arXiv preprint arXiv:2101.06804.
- Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35.
- Gpt understands, too. arXiv preprint arXiv:2103.10385.
- Summary of chatgpt/gpt-4 research and perspective towards the future of large language models. arXiv preprint arXiv:2304.01852.
- Cross-task generalization via natural language crowdsourcing instructions. arXiv preprint arXiv:2104.08773.
- Crosslingual generalization through multitask finetuning. arXiv preprint arXiv:2211.01786.
- OpenAI. 2023. Gpt-4 technical report. ArXiv, abs/2303.08774.
- A latent batch-constrained deep reinforcement learning approach for precision dosing clinical decision support. Knowledge-based systems, 237:107689.
- Bellman meets hawkes: Model-based reinforcement learning via temporal point processes. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 9543–9551.
- Learning to retrieve prompts for in-context learning. arXiv preprint arXiv:2112.08633.
- Multitask prompted training enables zero-shot task generalization. arXiv preprint arXiv:2110.08207.
- BLEURT: Learning robust metrics for text generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7881–7892, Online. Association for Computational Linguistics.
- Autoprompt: Eliciting knowledge from language models with automatically generated prompts. arXiv preprint arXiv:2010.15980.
- Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca.
- Large language models are not fair evaluators. arXiv preprint arXiv:2305.17926.
- Self-instruct: Aligning language model with self generated instructions. arXiv preprint arXiv:2212.10560.
- Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652.
- Bartscore: Evaluating generated text as text generation. Advances in Neural Information Processing Systems, 34:27263–27277.
- Tempera: Test-time prompting via reinforcement learning. arXiv preprint arXiv:2211.11890.
- A survey of large language models. arXiv preprint arXiv:2303.18223.
- Calibrate before use: Improving few-shot performance of language models. In International Conference on Machine Learning, pages 12697–12706. PMLR.
- Lima: Less is more for alignment. arXiv preprint arXiv:2305.11206.
- Zhenting Qi (19 papers)
- Xiaoyu Tan (21 papers)
- Shaojie Shi (1 paper)
- Chao Qu (39 papers)
- Yinghui Xu (48 papers)
- Yuan Qi (85 papers)