Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PILLOW: Enhancing Efficient Instruction Fine-tuning via Prompt Matching (2312.05621v2)

Published 9 Dec 2023 in cs.CL

Abstract: Instruction fine-tuning has conventionally been employed to adapt LLMs to a variety of tasks. Nonetheless, this technique often necessitates substantial computational resources, making it impractical for deployment by individuals or small-scale entities. Recently, Low-Rank Adaptation (LoRA) has become a promising alternative, offering high capabilities on par with full tuning with reduced resource overhead. However, attaining satisfactory performance through the fine-tuning of LoRA is a non-trivial challenge. In this paper, we propose PILLOW, which aims to improve LoRA's performance by a discrimination-based prompting method, leveraging LLMs' In-Context Learning ability. PILLOW incorporates a matching network that selects prompts from a user-defined prompt pool, concatenates the selected prompts with the user instruction as input, and performs inference using the LoRA-fine-tuned LLMs. Trained with Reinforcement Learning, PILLOW exhibits commensurate performance on various evaluation metrics compared with typical instruction fine-tuning methods, utilizing only consumer-grade GPU resources and exhibiting a large reduction in computational costs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862.
  2. One-for-all: Generalized lora for parameter-efficient fine-tuning. arXiv preprint arXiv:2306.07967.
  3. A model-based hybrid soft actor-critic deep reinforcement learning algorithm for optimal ventilator settings. Information sciences, 611:47–64.
  4. Free dolly: Introducing the world’s first truly open instruction-tuned llm.
  5. Reinforcement learning for intelligent healthcare applications: A survey. Artificial Intelligence in Medicine, 109:101964.
  6. Rlprompt: Optimizing discrete text prompts with reinforcement learning. arXiv preprint arXiv:2205.12548.
  7. Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314.
  8. A survey for in-context learning. arXiv preprint arXiv:2301.00234.
  9. Gptscore: Evaluate as you desire. arXiv preprint arXiv:2302.04166.
  10. Recent advances in reinforcement learning in finance. Mathematical Finance, 33(3):437–503.
  11. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.
  12. The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691.
  13. Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190.
  14. What makes good in-context examples for gpt-3333? arXiv preprint arXiv:2101.06804.
  15. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35.
  16. Gpt understands, too. arXiv preprint arXiv:2103.10385.
  17. Summary of chatgpt/gpt-4 research and perspective towards the future of large language models. arXiv preprint arXiv:2304.01852.
  18. Cross-task generalization via natural language crowdsourcing instructions. arXiv preprint arXiv:2104.08773.
  19. Crosslingual generalization through multitask finetuning. arXiv preprint arXiv:2211.01786.
  20. OpenAI. 2023. Gpt-4 technical report. ArXiv, abs/2303.08774.
  21. A latent batch-constrained deep reinforcement learning approach for precision dosing clinical decision support. Knowledge-based systems, 237:107689.
  22. Bellman meets hawkes: Model-based reinforcement learning via temporal point processes. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 9543–9551.
  23. Learning to retrieve prompts for in-context learning. arXiv preprint arXiv:2112.08633.
  24. Multitask prompted training enables zero-shot task generalization. arXiv preprint arXiv:2110.08207.
  25. BLEURT: Learning robust metrics for text generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7881–7892, Online. Association for Computational Linguistics.
  26. Autoprompt: Eliciting knowledge from language models with automatically generated prompts. arXiv preprint arXiv:2010.15980.
  27. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca.
  28. Large language models are not fair evaluators. arXiv preprint arXiv:2305.17926.
  29. Self-instruct: Aligning language model with self generated instructions. arXiv preprint arXiv:2212.10560.
  30. Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652.
  31. Bartscore: Evaluating generated text as text generation. Advances in Neural Information Processing Systems, 34:27263–27277.
  32. Tempera: Test-time prompting via reinforcement learning. arXiv preprint arXiv:2211.11890.
  33. A survey of large language models. arXiv preprint arXiv:2303.18223.
  34. Calibrate before use: Improving few-shot performance of language models. In International Conference on Machine Learning, pages 12697–12706. PMLR.
  35. Lima: Less is more for alignment. arXiv preprint arXiv:2305.11206.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Zhenting Qi (19 papers)
  2. Xiaoyu Tan (21 papers)
  3. Shaojie Shi (1 paper)
  4. Chao Qu (39 papers)
  5. Yinghui Xu (48 papers)
  6. Yuan Qi (85 papers)
Citations (8)

Summary

We haven't generated a summary for this paper yet.