Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Sweeping Heterogeneity with Smart MoPs: Mixture of Prompts for LLM Task Adaptation (2310.02842v2)

Published 4 Oct 2023 in cs.CL and cs.AI

Abstract: LLMs have the ability to solve a variety of tasks, such as text summarization and mathematical questions, just out of the box, but they are often trained with a single task in mind. Due to high computational costs, the current trend is to use prompt instruction tuning to better adjust monolithic, pretrained LLMs for new -- but often individual -- downstream tasks. Thus, how one would expand prompt tuning to handle -- concomitantly -- heterogeneous tasks and data distributions is a widely open question. To address this gap, we suggest the use of \emph{Mixture of Prompts}, or MoPs, associated with smart gating functionality: the latter -- whose design is one of the contributions of this paper -- can identify relevant skills embedded in different groups of prompts and dynamically assign combined experts (i.e., collection of prompts), based on the target task. Additionally, MoPs are empirically agnostic to any model compression technique applied -- for efficiency reasons -- as well as instruction data source and task composition. In practice, MoPs can simultaneously mitigate prompt training "interference" in multi-task, multi-source scenarios (e.g., task and data heterogeneity across sources), as well as possible implications from model approximations. As a highlight, MoPs manage to decrease final perplexity from $\sim20\%$ up to $\sim70\%$, as compared to baselines, in the federated scenario, and from $\sim 3\%$ up to $\sim30\%$ in the centralized scenario.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Attentional mixtures of soft prompt tuning for parameter-efficient multi-task knowledge sharing. arXiv preprint arXiv:2205.11961, 3, 2022.
  2. SLoRA: Federated parameter efficient fine-tuning of language models. arXiv preprint arXiv:2308.06522, 2023.
  3. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, pp.  610–623, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450383097. doi: 10.1145/3442188.3445922. URL https://doi.org/10.1145/3442188.3445922.
  4. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
  5. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  6. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712, 2023.
  7. Federated large language model: A position paper. arXiv preprint arXiv:2307.08925, 2023.
  8. Free dolly: Introducing the world’s first truly open instruction-tuned llm, 2023. URL https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm.
  9. Parameter-efficient fine-tuning of large-scale pre-trained language models. Nature Machine Intelligence, 5(3):220–235, 2023.
  10. SparseGPT: Massive language models can be accurately pruned in one-shot. arXiv preprint arXiv:2301.00774, 2023.
  11. News summarization and evaluation in the era of GPT-3. arXiv preprint arXiv:2209.12356, 2022.
  12. Don’t stop pretraining: Adapt language models to domains and tasks. arXiv preprint arXiv:2004.10964, 2020.
  13. Surface form competition: Why the highest probability answer isn’t always right. arXiv preprint arXiv:2104.08315, 2021.
  14. Parameter-efficient transfer learning for NLP. In International Conference on Machine Learning, pp.  2790–2799. PMLR, 2019.
  15. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2021.
  16. LLM-Adapters: An adapter family for parameter-efficient fine-tuning of large language models. arXiv preprint arXiv:2304.01933, 2023.
  17. Language is not all you need: Aligning perception with language models. arXiv preprint arXiv:2302.14045, 2023.
  18. FDAPT: Federated domain-adaptive pre-training for language models. arXiv preprint arXiv:2307.06933, 2023.
  19. Alignment of language agents, 2021.
  20. Teaching arithmetic to small transformers. arXiv preprint arXiv:2307.03381, 2023.
  21. The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp.  3045–3059, 2021.
  22. Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp.  4582–4597, 2021.
  23. Generating Wikipedia by summarizing long sequences. In International Conference on Learning Representations, 2018.
  24. Text summarization with pretrained encoders. arXiv preprint arXiv:1908.08345, 2019.
  25. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pp.  1273–1282. PMLR, 2017.
  26. Cross-task generalization via natural language crowdsourcing instructions. In ACL, 2022.
  27. Training language models to follow instructions with human feedback. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (eds.), Advances in Neural Information Processing Systems, volume 35, pp.  27730–27744. Curran Associates, Inc., 2022a. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/b1efde53be364a73914f58805a001731-Paper-Conference.pdf.
  28. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022b.
  29. From sparse to soft mixtures of experts. arXiv preprint arXiv:2308.00951, 2023.
  30. Learning transferable visual models from natural language supervision. In International conference on machine learning, pp.  8748–8763. PMLR, 2021.
  31. A generalist agent. Transactions on Machine Learning Research, 2022.
  32. Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004, 2023.
  33. Language models are multilingual chain-of-thought reasoners. arXiv preprint arXiv:2210.03057, 2022.
  34. Mixture of prompt experts for generalizable and interpretable question answering. arXiv preprint arXiv:2305.14628, 2023.
  35. Understanding the capabilities, limitations, and societal impact of large language models, 2021.
  36. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  37. Emergent abilities of large language models. arXiv preprint arXiv:2206.07682, 2022a.
  38. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022b.
  39. Compress, then prompt: Improving accuracy-efficiency trade-off of llm inference with transferable prompt. arXiv preprint arXiv:2305.11186, 2023.
  40. FedYolo: Augmenting federated learning with pretrained transformers. arXiv preprint arXiv:2307.04905, 2023.
  41. Fedprompt: Communication-efficient and privacy preserving prompt tuning in federated learning, 2023.
  42. Calibrate before use: Improving few-shot performance of language models. In International Conference on Machine Learning, pp.  12697–12706. PMLR, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Chen Dun (16 papers)
  2. Mirian Hipolito Garcia (6 papers)
  3. Guoqing Zheng (25 papers)
  4. Ahmed Hassan Awadallah (50 papers)
  5. Anastasios Kyrillidis (96 papers)
  6. Robert Sim (25 papers)
Citations (6)

Summary

We haven't generated a summary for this paper yet.