Federated Learning Empowered by Generative Content (2312.05807v1)
Abstract: Federated learning (FL) enables leveraging distributed private data for model training in a privacy-preserving way. However, data heterogeneity significantly limits the performance of current FL methods. In this paper, we propose a novel FL framework termed FedGC, designed to mitigate data heterogeneity issues by diversifying private data with generative content. FedGC is a simple-to-implement framework as it only introduces a one-shot step of data generation. In data generation, we summarize three crucial and worth-exploring aspects (budget allocation, prompt design, and generation guidance) and propose three solution candidates for each aspect. Specifically, to achieve a better trade-off between data diversity and fidelity for generation guidance, we propose to generate data based on the guidance of prompts and real data simultaneously. The generated data is then merged with private data to facilitate local model training. Such generative data increases the diversity of private data to prevent each client from fitting the potentially biased private data, alleviating the issue of data heterogeneity. We conduct a systematic empirical study on FedGC, covering diverse baselines, datasets, scenarios, and modalities. Interesting findings include (1) FedGC consistently and significantly enhances the performance of FL methods, even when notable disparities exist between generative and private data; (2) FedGC achieves both better performance and privacy-preservation. We wish this work can inspire future works to further explore the potential of enhancing FL with generative content.
- Federated learning based on dynamic regularization. In International Conference on Learning Representations, 2020.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Leaf: A benchmark for federated settings. arXiv preprint arXiv:1812.01097, 2018.
- A survey on deep learning applied to medical images: from simple artificial neural networks to generative models. Neural Computing and Applications, 35(3):2291–2323, 2023.
- On the importance and applicability of pre-training for federated learning. In The Eleventh International Conference on Learning Representations, 2022.
- A simple framework for contrastive learning of visual representations. In International conference on machine learning, pp. 1597–1607. PMLR, 2020.
- Gunther Eysenbach et al. The role of chatgpt, generative language models, and artificial intelligence in medical education: a conversation with chatgpt and a call for papers. JMIR Medical Education, 9(1):e46885, 2023.
- Unbiased metric learning: On the utilization of multiple datasets and web images for softening bias. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1657–1664, 2013.
- Discriminatory analysis. nonparametric discrimination: Consistency properties. International Statistical Review/Revue Internationale de Statistique, 57(3):238–247, 1989.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
- Is synthetic data from generative models ready for image recognition? In The Eleventh International Conference on Learning Representations, 2022.
- Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 12(7):2217–2226, 2019.
- Measuring the effects of non-identical data distribution for federated visual classification. arXiv preprint arXiv:1909.06335, 2019.
- Lora: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2021.
- Fedexp: Speeding up federated averaging via extrapolation. In The Eleventh International Conference on Learning Representations, 2022.
- Advances and open problems in federated learning. Foundations and Trends® in Machine Learning, 14(1–2):1–210, 2021.
- Scaffold: Stochastic controlled averaging for federated learning. In International Conference on Machine Learning, pp. 5132–5143. PMLR, 2020.
- Diffusion models for medical image analysis: A comprehensive survey. arXiv preprint arXiv:2211.07804, 2022.
- Learning multiple layers of features from tiny images. 2009.
- Model-contrastive federated learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10713–10722, 2021.
- Federated learning: Challenges, methods, and future directions. IEEE Signal Processing Magazine, 37(3):50–60, 2020a.
- Federated optimization in heterogeneous networks. Proceedings of Machine Learning and Systems, 2:429–450, 2020b.
- On the convergence of fedavg on non-iid data. In International Conference on Learning Representations, 2019.
- Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pp. 1273–1282. PMLR, 2017.
- Where to begin? on the impact of pre-training and initialization in federated learning. In The Eleventh International Conference on Learning Representations, 2022.
- Glide: Towards photorealistic image generation and editing with text-guided diffusion models. In International Conference on Machine Learning, pp. 16784–16804. PMLR, 2022.
- OpenAI. Gpt-4 technical report, 2023.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
- John G Proakis. Digital communications. McGraw-Hill, Higher Education, 2008.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pp. 8748–8763. PMLR, 2021.
- Adaptive federated optimization. In International Conference on Learning Representations, 2020.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695, 2022.
- U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pp. 234–241. Springer, 2015.
- White-box vs black-box: Bayes optimal strategies for membership inference. In International Conference on Machine Learning, pp. 5558–5567. PMLR, 2019.
- Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022.
- Towards understanding and mitigating dimensional collapse in heterogeneous federated learning. arXiv preprint arXiv:2210.00226, 2022.
- Diversity is definitely needed: Improving model-agnostic zero-shot classification via stable diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 769–778, 2023.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
- The ham10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Scientific data, 5(1):1–9, 2018.
- Federated learning with matched averaging. In International Conference on Learning Representations, 2020a. URL https://openreview.net/forum?id=BkluqlSFDS.
- Tackling the objective inconsistency problem in heterogeneous federated optimization. Advances in neural information processing systems, 33:7611–7623, 2020b.
- A field guide to federated optimization. arXiv preprint arXiv:2107.06917, 2021.
- Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST), 10(2):1–19, 2019.
- Fedfm: Anchor-based feature matching for data heterogeneity in federated learning. arXiv preprint arXiv:2210.07615, 2022.
- Feddisco: Federated learning with discrepancy-aware collaboration. arXiv preprint arXiv:2305.19229, 2023.
- How does data augmentation affect privacy in machine learning? In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp. 10746–10753, 2021.
- Character-level convolutional networks for text classification. Advances in neural information processing systems, 28, 2015.
- Trading off privacy, utility and efficiency in federated learning. arXiv preprint arXiv:2209.00230, 2022.
- Deep domain-adversarial image generation for domain generalisation. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pp. 13025–13032, 2020.
- Rui Ye (42 papers)
- Xinyu Zhu (28 papers)
- Jingyi Chai (10 papers)
- Siheng Chen (152 papers)
- Yanfeng Wang (211 papers)