Position Paper: Assessing Robustness, Privacy, and Fairness in Federated Learning Integrated with Foundation Models (2402.01857v1)
Abstract: Federated Learning (FL), while a breakthrough in decentralized machine learning, contends with significant challenges such as limited data availability and the variability of computational resources, which can stifle the performance and scalability of the models. The integration of Foundation Models (FMs) into FL presents a compelling solution to these issues, with the potential to enhance data richness and reduce computational demands through pre-training and data augmentation. However, this incorporation introduces novel issues in terms of robustness, privacy, and fairness, which have not been sufficiently addressed in the existing research. We make a preliminary investigation into this field by systematically evaluating the implications of FM-FL integration across these dimensions. We analyze the trade-offs involved, uncover the threats and issues introduced by this integration, and propose a set of criteria and strategies for navigating these challenges. Furthermore, we identify potential research directions for advancing this field, laying a foundation for future development in creating reliable, secure, and equitable FL systems.
- Generating synthetic data in finance: opportunities, challenges and pitfalls. In Proceedings of the First ACM International Conference on AI in Finance, pp. 1–8, 2020.
- Poisoning attacks against support vector machines. In ICML, 2012.
- Evasion attacks against machine learning at test time. In ECML PKDD, 2013.
- On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021a.
- On the opportunities and risks of foundation models. CoRR, abs/2108.07258, 2021b.
- Vulnerabilities in federated learning. IEEE Access, 2021a.
- Vulnerabilities in federated learning. IEEE Access, 9:63229–63249, 2021b.
- Sparks of artificial general intelligence: Early experiments with GPT-4. CoRR, abs/2303.12712, 2023.
- Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy, 2017.
- Extracting training data from large language models. In 30th USENIX Security Symposium, USENIX Security, 2021.
- Jailbreaking black box large language models in twenty queries. CoRR, abs/2310.08419, 2023.
- ZOO: zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In AISec@CCS, 2017.
- Why do adversarial attacks transfer? explaining transferability of evasion and poisoning attacks. In 28th USENIX Security Symposium, USENIX Security 2019, Santa Clara, CA, USA, August 14-16, 2019, 2019.
- A survey for in-context learning. arXiv preprint arXiv:2301.00234, 2022.
- Fairness through awareness. In Innovations in Theoretical Computer Science, 2012.
- Deep generative models for synthetic data. 2021.
- Fairfed: Enabling group fairness in federated learning. In AAAI, 2023.
- Local model poisoning attacks to byzantine-robust federated learning. In 29th USENIX Security Symposium, USENIX Security, 2020.
- Badnets: Identifying vulnerabilities in the machine learning model supply chain. CoRR, abs/1708.06733, 2017.
- Out-of-distribution generalization of federated learning via implicit invariant relationships. In ICML, 2023.
- Learning from imbalanced data. IEEE Trans. Knowl. Data Eng., 2009.
- A baseline for detecting misclassified and out-of-distribution examples in neural networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, 2017.
- Deep models under the GAN: information leakage from collaborative deep learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS, 2017.
- Generalized odin: Detecting out-of-distribution image without learning from out-of-distribution data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
- A survey on federated learning for resource-constrained iot devices. IEEE Internet of Things Journal, 9(1):1–24, 2021.
- Advances and open problems in federated learning. Foundations and Trends® in Machine Learning, 14(1–2):1–210, 2021.
- Decentralized federated learning through proxy model sharing. Nature communications, 14(1):2899, 2023.
- Backdoor attacks for in-context learning with language models. CoRR, abs/2307.14692, 2023.
- Federated learning for internet of things: Recent advances, taxonomy, and open challenges. IEEE Communications Surveys & Tutorials, 23(3):1759–1799, 2021.
- Krawczyk, B. Learning from imbalanced data: open challenges and future directions. Prog. Artif. Intell., 2016.
- Open sesame! universal black box jailbreaking of large language models. CoRR, abs/2309.01446, 2023.
- Fedmd: Heterogenous federated learning via model distillation. CoRR, abs/1910.03581, 2019. URL http://arxiv.org/abs/1910.03581.
- Distilling large vision-language model with out-of-distribution generalizability. In ICCV, 2023a.
- Backdoor threats from compromised foundation models to federated learning. FL@FM-NeurIPS 23, 2023b.
- Unveiling backdoor risks brought by foundation models in heterogeneous federated learning. In PAKDD, 2024.
- Mixkd: Towards efficient distillation of large-scale language models. In 9th International Conference on Learning Representations, ICLR, 2021.
- Ensemble distillation for robust model fusion in federated learning. In NeurIPS, 2020.
- Jailbreaking chatgpt via prompt engineering: An empirical study. CoRR, abs/2305.13860, 2023.
- Contribution-aware federated learning for smart healthcare. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp. 12396–12404, 2022.
- Collaborative fairness in federated learning. In Federated Learning - Privacy and Incentive. 2020a.
- Threats to federated learning: A survey. CoRR, abs/2003.02133, 2020b.
- Robust federated learning: The case of affine distribution shifts. In NeurIPS, 2020.
- Poison frogs! targeted clean-label poisoning attacks on neural networks. In NeurIPS, 2018.
- Backdoor pre-trained models can transfer to all. In CCS ’21: 2021 ACM SIGSAC Conference on Computer and Communications Security, Virtual Event, Republic of Korea, November 15 - 19, 2021, 2021.
- Badgpt: Exploring security vulnerabilities of chatgpt via backdoor attacks to instructgpt. CoRR, abs/2304.12298, 2023.
- Joint device scheduling and resource allocation for latency constrained wireless federated learning. IEEE Transactions on Wireless Communications, 20(1):453–467, 2020.
- Membership inference attacks against machine learning models. In 2017 IEEE Symposium on Security and Privacy, SP, 2017.
- Prompting GPT-3 to be reliable. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023, 2023.
- Data poisoning attacks against federated learning systems. In Computer Security - ESORICS 2020 - 25th European Symposium on Research in Computer Security, ESORICS 2020, Guildford, UK, September 14-18, 2020, Proceedings, Part I, 2020.
- Torres, D. G. Generation of synthetic data with generative adversarial networks. PhD thesis, Royal Institute of Technology Stockholm, Sweden, 2018.
- Overcoming noisy and irrelevant data in federated learning. In 2020 25th International Conference on Pattern Recognition (ICPR), pp. 5020–5027. IEEE, 2021.
- Decodingtrust: A comprehensive assessment of trustworthiness in GPT models. CoRR, abs/2306.11698, 2023a.
- Knowledge-enhanced semi-supervised federated learning for aggregating heterogeneous lightweight clients in iot. In Proceedings of the 2023 SIAM International Conference on Data Mining (SDM), pp. 496–504. SIAM, 2023b.
- Adaptive federated learning in resource constrained edge computing systems. IEEE journal on selected areas in communications, 37(6):1205–1221, 2019.
- Backdoor attacks against transfer learning with pre-trained deep learning models. IEEE Trans. Serv. Comput., 2022.
- Jailbroken: How does LLM safety training fail? CoRR, abs/2307.02483, 2023.
- An efficient federated distillation learning system for multitask time series classification. IEEE Trans. Instrum. Meas., 2022.
- Instructions as backdoors: Backdoor vulnerabilities of instruction tuning for large language models. CoRR, abs/2305.14710, 2023.
- Knowledge distillation from multiple foundation models for end-to-end speech recognition. CoRR, abs/2303.10917, 2023.
- Fedgh: Heterogeneous federated learning with generalized global header. arXiv preprint arXiv:2303.13137, 2023.
- Bag of tricks for training data extraction from language models. In ICML, 2023.
- Fedcav: contribution-aware model aggregation on distributed heterogeneous data in federated learning. In Proceedings of the 50th International Conference on Parallel Processing, pp. 1–10, 2021.
- Federated learning for the internet of things: Applications, challenges, and opportunities. IEEE Internet of Things Magazine, 5(1):24–29, 2022.
- A comprehensive survey on pretrained foundation models: A history from BERT to chatgpt. CoRR, 2023.
- Deep leakage from gradients. In NeurIPS, 2019.
- Data-free knowledge distillation for heterogeneous federated learning. In Proceedings of the 38th International Conference on Machine Learning, ICML, 2021.
- When foundation model meets federated learning: Motivations, challenges, and future directions. arXiv preprint arXiv:2306.15546, 2023.
- Xi Li (197 papers)
- Jiaqi Wang (218 papers)