Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Efficient Federated Prompt Tuning for Black-box Large Pre-trained Models (2310.03123v1)

Published 4 Oct 2023 in cs.LG and cs.AI

Abstract: With the blowout development of pre-trained models (PTMs), the efficient tuning of these models for diverse downstream applications has emerged as a pivotal research concern. Although recent investigations into prompt tuning have provided promising avenues, three salient challenges persist: (1) memory constraint: the continuous growth in the size of open-source PTMs renders fine-tuning, even a fraction of their parameters, challenging for many practitioners. (2) model privacy: existing PTMs often function as public API services, with their parameters inaccessible for effective or tailored fine-tuning. (3) data privacy: the fine-tuning of PTMs necessitates high-quality datasets, which are typically localized and not shared to public. To optimally harness each local dataset while navigating memory constraints and preserving privacy, we propose Federated Black-Box Prompt Tuning (Fed-BBPT). This innovative approach eschews reliance on parameter architectures and private dataset access, instead capitalizing on a central server that aids local users in collaboratively training a prompt generator through regular aggregation. Local users leverage API-driven learning via a zero-order optimizer, obviating the need for PTM deployment. Relative to extensive fine-tuning, Fed-BBPT proficiently sidesteps memory challenges tied to PTM storage and fine-tuning on local machines, tapping into comprehensive, high-quality, yet private training datasets. A thorough evaluation across 40 datasets spanning CV and NLP tasks underscores the robustness of our proposed model.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (95)
  1. Federated learning based on dynamic regularization. arXiv preprint arXiv:2111.04263, 2021.
  2. Anthropic. Claude-2, 2023. URL https://www.anthropic.com/index/claude-2.
  3. Exploring visual prompts for adapting large-scale models. arXiv preprint arXiv:2203.17274, 2022.
  4. Pada: Example-based prompt learning for on-the-fly adaptation to unseen domains. Transactions of the Association for Computational Linguistics, 10:414–433, 2022.
  5. The fifth pascal recognizing textual entailment challenge. TAC, 7:8, 2009.
  6. Food-101–mining discriminative components with random forests. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VI 13, pp.  446–461. Springer, 2014.
  7. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  8. Convergence and accuracy trade-offs in federated learning and meta-learning. In International Conference on Artificial Intelligence and Statistics, pp.  2575–2583. PMLR, 2021.
  9. Fedtune: A deep dive into efficient federated fine-tuning with pre-trained transformers. arXiv preprint arXiv:2211.08025, 2022.
  10. Instructzero: Efficient instruction optimization for black-box large language models. arXiv preprint arXiv:2306.03082, 2023.
  11. Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In Proceedings of the 10th ACM workshop on artificial intelligence and security, pp.  15–26, 2017.
  12. Remote sensing image scene classification: Benchmark and state of the art. Proceedings of the IEEE, 105(10):1865–1883, 2017.
  13. Heterogeneous ensemble knowledge transfer for training large models in federated learning. arXiv preprint arXiv:2204.12703, 2022.
  14. François Chollet. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  1251–1258, 2017.
  15. Describing textures in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  3606–3613, 2014.
  16. The pascal recognising textual entailment challenge. In Machine learning challenges workshop, pp.  177–190. Springer, 2005.
  17. Multi-objective bayesian optimization over high-dimensional search spaces. In Uncertainty in Artificial Intelligence, pp.  507–517. PMLR, 2022a.
  18. Bayesian optimization over discrete and mixed spaces via probabilistic reparameterization. Advances in Neural Information Processing Systems, 35:12760–12774, 2022b.
  19. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp.  248–255. Ieee, 2009.
  20. Rlprompt: Optimizing discrete text prompts with reinforcement learning. arXiv preprint arXiv:2205.12548, 2022.
  21. Taming pre-trained language models with n-gram representations for low-resource domain adaptation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp.  3336–3349, 2021.
  22. Black-box prompt learning for pre-trained language models. arXiv preprint arXiv:2201.08531, 2022.
  23. Automatically constructing a corpus of sentential paraphrases. In Third International Workshop on Paraphrasing (IWP2005), 2005.
  24. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. In 2004 conference on computer vision and pattern recognition workshop, pp.  178–178. IEEE, 2004.
  25. Making pre-trained language models better few-shot learners. arXiv preprint arXiv:2012.15723, 2020.
  26. The third pascal recognizing textual entailment challenge. In Proceedings of the ACL-PASCAL workshop on textual entailment and paraphrasing, pp.  1–9, 2007.
  27. Ppt: Pre-trained prompt tuning for few-shot learning. arXiv preprint arXiv:2109.04332, 2021.
  28. Promptfl: Let federated participants cooperatively learn prompts instead of models-federated learning in age of foundation model. IEEE Transactions on Mobile Computing, 2023.
  29. The second pascal recognising textual entailment challenge. In Proceedings of the Second PASCAL Challenges Workshop on Recognising Textual Entailment, volume 7, pp.  785–794, 2006.
  30. Ptr: Prompt tuning with rules for text classification. AI Open, 3:182–192, 2022.
  31. Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 12(7):2217–2226, 2019.
  32. Instruction induction: From few examples to natural language task descriptions. arXiv preprint arXiv:2205.10782, 2022.
  33. Measuring the effects of non-identical data distribution for federated visual classification. arXiv preprint arXiv:1909.06335, 2019.
  34. Visual prompt tuning. In European Conference on Computer Vision, pp.  709–727. Springer, 2022.
  35. Clevr: A diagnostic dataset for compositional language and elementary visual reasoning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  2901–2910, 2017.
  36. Jon M Kleinberg. Two algorithms for nearest-neighbor search in high dimensions. In Proceedings of the twenty-ninth annual ACM symposium on Theory of computing, pp.  599–608, 1997.
  37. 3d object representations for fine-grained categorization. In Proceedings of the IEEE international conference on computer vision workshops, pp.  554–561, 2013.
  38. The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691, 2021.
  39. Visual prompt based personalized federated learning. arXiv preprint arXiv:2303.08678, 2023a.
  40. A review of applications in federated learning. Computers & Industrial Engineering, 149:106854, 2020a.
  41. Plmmark: a secure and robust black-box watermarking framework for pre-trained language models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pp.  14991–14999, 2023b.
  42. Federated learning: Challenges, methods, and future directions. IEEE signal processing magazine, 37(3):50–60, 2020b.
  43. Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190, 2021.
  44. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35, 2023.
  45. A primer on zeroth-order optimization in signal processing and machine learning: Principals, recent advances, and applications. IEEE Signal Processing Magazine, 37(5):43–54, 2020.
  46. P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks. arXiv preprint arXiv:2110.07602, 2021.
  47. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
  48. Fedclip: Fast generalization and personalization for clip in federated learning. arXiv preprint arXiv:2302.13485, 2023.
  49. From local sgd to local fixed-point methods for federated learning. In International Conference on Machine Learning, pp.  6692–6701. PMLR, 2020.
  50. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pp.  1273–1282. PMLR, 2017.
  51. Cross-modal adversarial reprogramming. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp.  2427–2435, 2022.
  52. Reading digits in natural images with unsupervised feature learning. 2011.
  53. Automated flower classification over a large number of classes. In 2008 Sixth Indian conference on computer vision, graphics & image processing, pp.  722–729. IEEE, 2008.
  54. Blackvip: Black-box visual prompting for robust transfer learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  24224–24235, 2023.
  55. OpenAI. Gpt-4 technical report, 2023a.
  56. OpenAI. Chatgpt, 2023b. URL https://openai.com/blog/chatgpt.
  57. Cats and dogs. In 2012 IEEE conference on computer vision and pattern recognition, pp.  3498–3505. IEEE, 2012.
  58. To tune or not to tune? adapting pretrained representations to diverse tasks. arXiv preprint arXiv:1903.05987, 2019.
  59. Prompt-calibrated tuning: Improving black-box optimization for few-shot scenarios. In 2023 4th International Seminar on Artificial Intelligence, Networking and Information Technology (AINIT), pp.  402–407. IEEE, 2023.
  60. Learning how to ask: Querying lms with mixtures of soft prompts. arXiv preprint arXiv:2104.06599, 2021.
  61. Learning transferable visual models from natural language supervision. In International conference on machine learning, pp.  8748–8763. PMLR, 2021.
  62. Autoprompt: Eliciting knowledge from language models with automatically generated prompts. arXiv preprint arXiv:2010.15980, 2020.
  63. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 conference on empirical methods in natural language processing, pp.  1631–1642, 2013.
  64. Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402, 2012.
  65. James C Spall. Multivariate stochastic approximation using a simultaneous perturbation gradient approximation. IEEE transactions on automatic control, 37(3):332–341, 1992.
  66. James C Spall. A one-measurement form of simultaneous perturbation stochastic approximation. Automatica, 33(1):109–112, 1997.
  67. Pretraining federated text models for next word prediction. In Advances in Information and Communication: Proceedings of the 2021 Future of Information and Communication Conference (FICC), Volume 2, pp.  477–488. Springer, 2021.
  68. Cross-domain federated adaptive prompt tuning for clip. arXiv preprint arXiv:2211.07864, 2022.
  69. Prompt tuning based adapter for vision-language model adaption. arXiv preprint arXiv:2303.15234, 2023a.
  70. Make prompt-based black-box tuning colorful: Boosting model generalization from three orthogonal perspectives. arXiv preprint arXiv:2305.08088, 2023b.
  71. Bbtv2: towards a gradient-free future with large language models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp.  3916–3930, 2022a.
  72. Multi-task pre-training of modular prompt for few-shot learning. arXiv preprint arXiv:2210.07565, 2022b.
  73. Black-box tuning for language-model-as-a-service. In International Conference on Machine Learning, pp.  20841–20855. PMLR, 2022c.
  74. Fedspeed: Larger local interval, less communication round, and higher generalization accuracy. arXiv preprint arXiv:2302.10429, 2023c.
  75. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  76. Autozoom: Autoencoder-based zeroth order optimization method for attacking black-box neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pp.  742–749, 2019.
  77. Bayesian optimization is superior to random search for machine learning hyperparameter tuning: Analysis of the black-box optimization challenge 2020. In NeurIPS 2020 Competition and Demonstration Track, pp.  3–26. PMLR, 2021.
  78. Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461, 2018a.
  79. Tackling the objective inconsistency problem in heterogeneous federated optimization. Advances in neural information processing systems, 33:7611–7623, 2020.
  80. Stochastic zeroth-order optimization in high dimensions. In International conference on artificial intelligence and statistics, pp.  1356–1365. PMLR, 2018b.
  81. Bayesian optimization in a billion dimensions via random embeddings. Journal of Artificial Intelligence Research, 55:361–387, 2016.
  82. Neural network acceptability judgments. Transactions of the Association for Computational Linguistics, 7:625–641, 2019.
  83. Pretrained models for multilingual federated learning. arXiv preprint arXiv:2206.02291, 2022.
  84. A broad-coverage challenge corpus for sentence understanding through inference. arXiv preprint arXiv:1704.05426, 2017.
  85. Sun database: Large-scale scene recognition from abbey to zoo. In 2010 IEEE computer society conference on computer vision and pattern recognition, pp.  3485–3492. IEEE, 2010.
  86. Wizardlm: Empowering large language models to follow complex instructions. arXiv preprint arXiv:2304.12244, 2023.
  87. Cpt: Colorful prompt tuning for pre-trained vision-language models. arXiv preprint arXiv:2109.11797, 2021.
  88. Language models as black-box optimizers for vision-language models. arXiv preprint arXiv:2309.05950, 2023.
  89. A survey on federated learning. Knowledge-Based Systems, 216:106775, 2021.
  90. Towards building the federated gpt: Federated instruction tuning. arXiv preprint arXiv:2305.05644, 2023a.
  91. Instruction tuning for large language models: A survey. arXiv preprint arXiv:2308.10792, 2023b.
  92. Black-box prompt tuning with subspace learning. arXiv preprint arXiv:2305.03518, 2023.
  93. Factual probing is [mask]: Learning vs. learning to recall. arXiv preprint arXiv:2104.05240, 2021.
  94. Learning to prompt for vision-language models. International Journal of Computer Vision, 130(9):2337–2348, 2022a.
  95. Large language models are human-level prompt engineers. arXiv preprint arXiv:2211.01910, 2022b.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Zihao Lin (22 papers)
  2. Yan Sun (309 papers)
  3. Yifan Shi (15 papers)
  4. Xueqian Wang (99 papers)
  5. Lifu Huang (92 papers)
  6. Li Shen (363 papers)
  7. Dacheng Tao (829 papers)
Citations (7)