Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MIP: CLIP-based Image Reconstruction from PEFT Gradients (2403.07901v1)

Published 26 Feb 2024 in cs.CV and cs.LG

Abstract: Contrastive Language-Image Pre-training (CLIP) model, as an effective pre-trained multimodal neural network, has been widely used in distributed machine learning tasks, especially Federated Learning (FL). Typically, CLIP-based FL adopts Parameter-Efficient Fine-Tuning (PEFT) for model training, which only fine-tunes adapter parameters or soft prompts rather than the full parameters. Although PEFT is different from the traditional training mode, in this paper, we theoretically analyze that the gradients of adapters or soft prompts can still be used to perform image reconstruction attacks. Based on our theoretical analysis, we propose Multm-In-Parvo (MIP), a proprietary reconstruction attack method targeting CLIP-based distributed machine learning architecture. Specifically, MIP can reconstruct CLIP training images according to the gradients of soft prompts or an adapter. In addition, MIP includes a label prediction strategy to accelerate convergence and an inverse gradient estimation mechanism to avoid the vanishing gradient problem on the text encoder. Experimental results show that MIP can effectively reconstruct training images according to the gradients of soft prompts or adapters of CLIP models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. LAMP: extracting text from gradients with language model priors. In Proceedings of Conference on Neural Information Processing Systems (NeurIPS), 2022.
  2. A little is enough: Circumventing defenses for distributed learning. In Proceedings of Conference on Neural Information Processing Systems (NeurIPS), pp.  8632–8642, 2019.
  3. Practical secure aggregation for privacy-preserving machine learning. In proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp.  1175–1191, 2017.
  4. Large scale distributed deep networks. In Proceedings of Conference on Neural Information Processing Systems (NeurIPS), pp.  1232–1240, 2012.
  5. TAG: gradient attack on transformer-based language models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) (Findings), pp.  3600–3610, 2021.
  6. Clip-adapter: Better vision-language models with feature adapters. CoRR, abs/2110.04544, 2021.
  7. Inverting gradients - how easy is it to break privacy in federated learning? In Proceedings of Conference on Neural Information Processing Systems (NeurIPS), 2020.
  8. pfedprompt: Learning personalized prompt for vision-language models in federated learning. In Proceedings of the ACM Web Conference (WWW), pp.  1364–1374. ACM, 2023.
  9. Enabling retrain-free deep neural network pruning using surrogate lagrangian relaxation. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp.  2497–2504, 2021.
  10. Audioclip: Extending clip to image, text and audio. In Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.  976–980, 2022.
  11. Momentum contrast for unsupervised visual representation learning. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
  12. Parameter-efficient transfer learning for NLP. In Proceedings of the International Conference on Machine Learning (ICML), pp.  2790–2799, 2019.
  13. Lora: Low-rank adaptation of large language models. In Proceedings of the International Conference on Learning Representations (ICLR). OpenReview.net, 2022.
  14. Firecaffe: Near-linear acceleration of deep neural network training on compute clusters. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  2592–2600, 2016.
  15. Distributed learning: developing a predictive model based on data from multiple hospitals without data leaving the hospital–a real life proof of concept. Radiotherapy and Oncology, 121(3):459–467, 2016.
  16. Less is more: Clipbert for video-and-language learning via sparse sampling. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  7331–7341, 2021.
  17. Scaling distributed machine learning with the parameter server. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementations (OSDI), pp.  583–598, 2014.
  18. Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of The Joint Conference of the Annual Meeting of the Association for Computational Linguistics and the International Joint Conference on Natural Language Processing ACL/IJCNLP, pp.  4582–4597, 2021.
  19. Auditing privacy defenses in federated learning via generative gradient leakage. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  10122–10132, 2022.
  20. Scaling down to scale up: A guide to parameter-efficient fine-tuning. CoRR, abs/2303.15647, 2023.
  21. GPT understands, too. CoRR, abs/2103.10385, 2021.
  22. Communication efficient federated learning for multilingual neural machine translation with adapter. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL) (Findings), pp.  5315–5328. Association for Computational Linguistics, 2023.
  23. Fedclip: Fast generalization and personalization for CLIP in federated learning. IEEE Data Eng. Bull., 46(1):52–66, 2023.
  24. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), pp.  1273–1282, 2017.
  25. Bandwidth optimal all-reduce algorithms for clusters of workstations. J. Parallel Distributed Comput., 69(2):117–124, 2009.
  26. Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning (ICML), pp.  8748–8763, 2021.
  27. Zero-shot text-to-image generation. In Proceedings of International Conference on Machine Learning (ICML), volume 139, pp.  8821–8831, 2021.
  28. High-resolution image synthesis with latent diffusion models. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  10674–10685, 2022.
  29. Horovod: fast and easy distributed deep learning in tensorflow. CoRR, abs/1802.05799, 2018.
  30. Clip4caption: Clip for video caption. In Proceedings of ACM Multimedia (MM), pp.  4858–4862, 2021.
  31. See through gradients: Image batch recovery via gradinversion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  16337–16346, 2021.
  32. Pointclip: Point cloud understanding by CLIP. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  8542–8552, 2022a.
  33. Can language understand depth? In Proceedings of ACM Multimedia (MM), pp.  6868–6874, 2022b.
  34. idlg: Improved deep leakage from gradients. CoRR, abs/2001.02610, 2020.
  35. Fedprompt: Communication-efficient and privacy-preserving prompt tuning in federated learning. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.  1–5, 2023.
  36. Learning to prompt for vision-language models. Int. J. Comput. Vis., 130(9):2337–2348, 2022.
  37. Deep leakage from gradients. In Proceedings of Conference on Neural Information Processing Systems (NeurIPS), pp.  14747–14756, 2019.
Citations (1)

Summary

We haven't generated a summary for this paper yet.