Collaboratively Learning Federated Models from Noisy Decentralized Data (2409.02189v1)
Abstract: Federated learning (FL) has emerged as a prominent method for collaboratively training machine learning models using local data from edge devices, all while keeping data decentralized. However, accounting for the quality of data contributed by local clients remains a critical challenge in FL, as local data are often susceptible to corruption by various forms of noise and perturbations, which compromise the aggregation process and lead to a subpar global model. In this work, we focus on addressing the problem of noisy data in the input space, an under-explored area compared to the label noise. We propose a comprehensive assessment of client input in the gradient space, inspired by the distinct disparity observed between the density of gradient norm distributions of models trained on noisy and clean input data. Based on this observation, we introduce a straightforward yet effective approach to identify clients with low-quality data at the initial stage of FL. Furthermore, we propose a noise-aware FL aggregation method, namely Federated Noise-Sifting (FedNS), which can be used as a plug-in approach in conjunction with widely used FL strategies. Our extensive evaluation on diverse benchmark datasets under different federated settings demonstrates the efficacy of FedNS. Our method effortlessly integrates with existing FL strategies, enhancing the global model's performance by up to 13.68% in IID and 15.85% in non-IID settings when learning from noisy decentralized data.
- J. Konečnỳ, H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and D. Bacon, “Federated learning: Strategies for improving communication efficiency,” arXiv preprint arXiv:1610.05492, 2016.
- B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Artificial intelligence and statistics. PMLR, 2017.
- V. Gudivada, A. Apon, and J. Ding, “Data quality considerations for big data and machine learning: Going beyond data cleaning and transformations,” International Journal on Advances in Software, 2017.
- L. Budach, M. Feuerpfeil, N. Ihde, A. Nathansen, N. Noack, H. Patzlaff, F. Naumann, and H. Harmouch, “The effects of data quality on machine learning performance,” arXiv preprint arXiv:2207.14529, 2022.
- P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummings et al., “Advances and open problems in federated learning,” Foundations and trends® in machine learning, 2021.
- S. Dodge and L. Karam, “A study and comparison of human and deep learning recognition performance under visual distortions,” in 2017 26th international conference on computer communication and networks (ICCCN). IEEE, 2017.
- N. Gupta, S. Mujumdar, H. Patel, S. Masuda, N. Panwar, S. Bandyopadhyay, S. Mehta, S. Guttula, S. Afzal, R. Sharma Mittal et al., “Data quality for machine learning tasks,” in Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining, 2021.
- R. Huang, A. Geng, and Y. Li, “On the importance of gradients for detecting distributional shifts in the wild,” Advances in Neural Information Processing Systems, 2021.
- T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V. Smith, “Federated optimization in heterogeneous networks,” Proceedings of Machine learning and systems, 2020.
- D. Yin, Y. Chen, R. Kannan, and P. Bartlett, “Byzantine-robust distributed learning: Towards optimal statistical rates,” in International Conference on Machine Learning. Pmlr, 2018.
- J. Wang, Q. Liu, H. Liang, G. Joshi, and H. V. Poor, “Tackling the objective inconsistency problem in heterogeneous federated optimization,” Advances in neural information processing systems, 2020.
- S. Dodge and L. Karam, “Understanding how image quality affects deep neural networks,” in 2016 eighth international conference on quality of multimedia experience (QoMEX). IEEE, 2016.
- R. Geirhos, C. R. M. Temme, J. Rauber, H. H. Schütt, M. Bethge, and F. A. Wichmann, “Generalisation in humans and deep neural networks,” in Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, Eds., 2018.
- T. S. Borkar and L. J. Karam, “Deepcorrect: Correcting dnn models against image distortions,” IEEE Transactions on Image Processing, 2019.
- D. Liang, X. Gao, W. Lu, and L. He, “Deep multi-label learning for image distortion identification,” Signal processing, 2020.
- S. Zheng, Y. Song, T. Leung, and I. Goodfellow, “Improving the robustness of deep neural networks via stability training,” in Proceedings of the ieee conference on computer vision and pattern recognition, 2016.
- Y. Zhou, S. Song, and N.-M. Cheung, “On classification of distorted images with deep convolutional neural networks,” in 2017 IEEE International conference on acoustics, speech and signal processing (ICASSP). IEEE, 2017.
- G. Patrini, A. Rozza, A. Krishna Menon, R. Nock, and L. Qu, “Making deep neural networks robust to label noise: A loss correction approach,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017.
- J. Li, Y. Wong, Q. Zhao, and M. S. Kankanhalli, “Learning to learn from noisy labeled data,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019.
- A. Ghosh, H. Kumar, and P. S. Sastry, “Robust loss functions under label noise for deep neural networks,” in Proceedings of the AAAI conference on artificial intelligence, 2017.
- B. Han, Q. Yao, T. Liu, G. Niu, I. W. Tsang, J. T. Kwok, and M. Sugiyama, “A survey of label-noise representation learning: Past, present and future,” arXiv preprint arXiv:2011.04406, 2020.
- Y. Chen, X. Yang, X. Qin, H. Yu, P. Chan, and Z. Shen, “Dealing with label quality disparity in federated learning,” Federated Learning: Privacy and Incentive, 2020.
- M. Yang, H. Qian, X. Wang, Y. Zhou, and H. Zhu, “Client selection for federated learning with label noise,” IEEE Transactions on Vehicular Technology, pp. 2193–2197, 2021.
- X. Fang and M. Ye, “Robust federated learning with noisy and heterogeneous clients,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
- J. Xu, Z. Chen, T. Q. Quek, and K. F. E. Chong, “Fedcorr: Multi-stage federated learning for label noise correction,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022.
- S. Yang, H. Park, J. Byun, and C. Kim, “Robust federated learning with noisy labels,” IEEE Intelligent Systems, pp. 35–43, 2022.
- J. Zhang, D. Lv, Q. Dai, F. Xin, and F. Dong, “Noise-aware local model training mechanism for federated learning,” ACM Transactions on Intelligent Systems and Technology, pp. 1–22, 2023.
- B. Zeng, X. Yang, Y. Chen, H. Yu, and Y. Zhang, “Clc: A consensus-based label correction approach in federated learning,” ACM Transactions on Intelligent Systems and Technology (TIST), pp. 1–23, 2022.
- V. Tsouvalas, A. Saeed, T. Ozcelebi, and N. Meratnia, “Labeling chaos to learning harmony: Federated learning with noisy labels,” ACM Transactions on Intelligent Systems and Technology, pp. 1–26, 2024.
- T. Wang, J. Rausch, C. Zhang, R. Jia, and D. Song, “A principled approach to data valuation for federated learning,” Federated Learning: Privacy and Incentive, 2020.
- W. Li, S. Fu, F. Zhang, and Y. Pang, “Data valuation and detections in federated learning,” arXiv preprint arXiv:2311.05304, 2023.
- E. Bagdasaryan, A. Veit, Y. Hua, D. Estrin, and V. Shmatikov, “How to backdoor federated learning,” in International conference on artificial intelligence and statistics. PMLR, 2020.
- H. Wang, K. Sreenivasan, S. Rajput, H. Vishwakarma, S. Agarwal, J.-y. Sohn, K. Lee, and D. Papailiopoulos, “Attack of the tails: Yes, you really can backdoor federated learning,” Advances in Neural Information Processing Systems, 2020.
- P. Singh and R. Shree, “A comparative study to noise models and image restoration techniques,” International Journal of Computer Applications, 2016.
- C. Tian, L. Fei, W. Zheng, Y. Xu, W. Zuo, and C.-W. Lin, “Deep learning on image denoising: An overview,” Neural Networks, 2020.
- A. Villar-Corrales, F. Schirrmacher, and C. Riess, “Deep learning architectural designs for super-resolution of noisy images,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021.
- M. Baradad Jurjo, J. Wulff, T. Wang, P. Isola, and A. Torralba, “Learning to see by looking at noise,” Advances in Neural Information Processing Systems, 2021.
- M. Paul, S. Ganguli, and G. K. Dziugaite, “Deep learning on a data diet: Finding important examples early in training,” Advances in Neural Information Processing Systems, 2021.
- D. Hendrycks and T. Dietterich, “Benchmarking neural network robustness to common corruptions and perturbations,” Proceedings of the International Conference on Learning Representations, 2019.
- A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features from tiny images,” 2009.
- J. Yang, R. Shi, D. Wei, Z. Liu, L. Zhao, B. Ke, H. Pfister, and B. Ni, “Medmnist v2-a large-scale lightweight benchmark for 2d and 3d biomedical image classification,” Scientific Data, 2023.
- H. Xiao, K. Rasul, and R. Vollgraf, “Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms,” 2017.
- P. Helber, B. Bischke, A. Dengel, and D. Borth, “Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2019.
- Y. Le and X. Yang, “Tiny imagenet visual recognition challenge,” CS 231N, 2015.
- J. Wei, Z. Zhu, H. Cheng, T. Liu, G. Niu, and Y. Liu, “Learning with noisy labels revisited: A study using real-world human annotations,” arXiv preprint arXiv:2110.12088, 2021.
- A. Trockman and J. Z. Kolter, “Patches are all you need?” arXiv preprint arXiv:2201.09792, 2022.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.