Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PILLOW: Enhancing Efficient Instruction Fine-tuning via Prompt Matching (2312.05621v2)

Published 9 Dec 2023 in cs.CL

Abstract: Instruction fine-tuning has conventionally been employed to adapt LLMs to a variety of tasks. Nonetheless, this technique often necessitates substantial computational resources, making it impractical for deployment by individuals or small-scale entities. Recently, Low-Rank Adaptation (LoRA) has become a promising alternative, offering high capabilities on par with full tuning with reduced resource overhead. However, attaining satisfactory performance through the fine-tuning of LoRA is a non-trivial challenge. In this paper, we propose PILLOW, which aims to improve LoRA's performance by a discrimination-based prompting method, leveraging LLMs' In-Context Learning ability. PILLOW incorporates a matching network that selects prompts from a user-defined prompt pool, concatenates the selected prompts with the user instruction as input, and performs inference using the LoRA-fine-tuned LLMs. Trained with Reinforcement Learning, PILLOW exhibits commensurate performance on various evaluation metrics compared with typical instruction fine-tuning methods, utilizing only consumer-grade GPU resources and exhibiting a large reduction in computational costs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862.
  2. One-for-all: Generalized lora for parameter-efficient fine-tuning. arXiv preprint arXiv:2306.07967.
  3. A model-based hybrid soft actor-critic deep reinforcement learning algorithm for optimal ventilator settings. Information sciences, 611:47–64.
  4. Free dolly: Introducing the world’s first truly open instruction-tuned llm.
  5. Reinforcement learning for intelligent healthcare applications: A survey. Artificial Intelligence in Medicine, 109:101964.
  6. Rlprompt: Optimizing discrete text prompts with reinforcement learning. arXiv preprint arXiv:2205.12548.
  7. Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314.
  8. A survey for in-context learning. arXiv preprint arXiv:2301.00234.
  9. Gptscore: Evaluate as you desire. arXiv preprint arXiv:2302.04166.
  10. Recent advances in reinforcement learning in finance. Mathematical Finance, 33(3):437–503.
  11. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.
  12. The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691.
  13. Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190.
  14. What makes good in-context examples for gpt-3333? arXiv preprint arXiv:2101.06804.
  15. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35.
  16. Gpt understands, too. arXiv preprint arXiv:2103.10385.
  17. Summary of chatgpt/gpt-4 research and perspective towards the future of large language models. arXiv preprint arXiv:2304.01852.
  18. Cross-task generalization via natural language crowdsourcing instructions. arXiv preprint arXiv:2104.08773.
  19. Crosslingual generalization through multitask finetuning. arXiv preprint arXiv:2211.01786.
  20. OpenAI. 2023. Gpt-4 technical report. ArXiv, abs/2303.08774.
  21. A latent batch-constrained deep reinforcement learning approach for precision dosing clinical decision support. Knowledge-based systems, 237:107689.
  22. Bellman meets hawkes: Model-based reinforcement learning via temporal point processes. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 9543–9551.
  23. Learning to retrieve prompts for in-context learning. arXiv preprint arXiv:2112.08633.
  24. Multitask prompted training enables zero-shot task generalization. arXiv preprint arXiv:2110.08207.
  25. BLEURT: Learning robust metrics for text generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7881–7892, Online. Association for Computational Linguistics.
  26. Autoprompt: Eliciting knowledge from language models with automatically generated prompts. arXiv preprint arXiv:2010.15980.
  27. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca.
  28. Large language models are not fair evaluators. arXiv preprint arXiv:2305.17926.
  29. Self-instruct: Aligning language model with self generated instructions. arXiv preprint arXiv:2212.10560.
  30. Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652.
  31. Bartscore: Evaluating generated text as text generation. Advances in Neural Information Processing Systems, 34:27263–27277.
  32. Tempera: Test-time prompting via reinforcement learning. arXiv preprint arXiv:2211.11890.
  33. A survey of large language models. arXiv preprint arXiv:2303.18223.
  34. Calibrate before use: Improving few-shot performance of language models. In International Conference on Machine Learning, pages 12697–12706. PMLR.
  35. Lima: Less is more for alignment. arXiv preprint arXiv:2305.11206.
Citations (8)

Summary

We haven't generated a summary for this paper yet.