Locally Estimated Global Perturbations are Better than Local Perturbations for Federated Sharpness-aware Minimization (2405.18890v1)
Abstract: In federated learning (FL), the multi-step update and data heterogeneity among clients often lead to a loss landscape with sharper minima, degenerating the performance of the resulted global model. Prevalent federated approaches incorporate sharpness-aware minimization (SAM) into local training to mitigate this problem. However, the local loss landscapes may not accurately reflect the flatness of global loss landscape in heterogeneous environments; as a result, minimizing local sharpness and calculating perturbations on client data might not align the efficacy of SAM in FL with centralized training. To overcome this challenge, we propose FedLESAM, a novel algorithm that locally estimates the direction of global perturbation on client side as the difference between global models received in the previous active and current rounds. Besides the improved quality, FedLESAM also speed up federated SAM-based approaches since it only performs once backpropagation in each iteration. Theoretically, we prove a slightly tighter bound than its original FedSAM by ensuring consistent perturbation. Empirically, we conduct comprehensive experiments on four federated benchmark datasets under three partition strategies to demonstrate the superior performance and efficiency of FedLESAM.
- Federated learning based on dynamic regularization. In International Conference on Learning Representations, 2020.
- Towards understanding sharpness-aware minimization. In International Conference on Machine Learning, pp. 639–668. PMLR, 2022.
- Improving generalization in federated learning by seeking flat minima. In European Conference on Computer Vision, pp. 654–672. Springer, 2022.
- Fedgamma: Federated learning with global sharpness-aware minimization. IEEE Transactions on Neural Networks and Learning Systems, 2023.
- Sharp minima can generalize for deep nets. In International Conference on Machine Learning, pp. 1019–1028. PMLR, 2017.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Efficient sharpness-aware minimization for improved training of neural networks. In International Conference on Learning Representations, 2022a.
- Sharpness-aware training for free. Advances in Neural Information Processing Systems, 35:23439–23451, 2022b.
- Fedskip: Combatting statistical heterogeneity with federated skip aggregation. In 2022 IEEE International Conference on Data Mining (ICDM), pp. 131–140. IEEE, 2022.
- Federated learning under partially disjoint data via manifold reshaping. Transactions on Machine Learning Research, 2023a.
- Federated learning with bilateral curation for partially class-disjoint data. In Thirty-seventh Conference on Neural Information Processing Systems, 2023b.
- Sharpness-aware minimization for efficiently improving generalization. In International Conference on Learning Representations, 2021.
- Multi-institutional collaborations for improving deep learning-based magnetic resonance image reconstruction using federated learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2423–2432, 2021.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
- Simplifying neural nets by discovering flat minima. Advances in neural information processing systems, 7, 1994.
- On harmonizing implicit subpopulations. In The Twelfth International Conference on Learning Representations, 2023a.
- Long-tailed partial label learning via dynamic rebalancing. arXiv preprint arXiv:2302.05080, 2023b.
- The non-iid data quagmire of decentralized machine learning. In International Conference on Machine Learning, pp. 4387–4398. PMLR, 2020.
- Measuring the effects of non-identical data distribution for federated visual classification. arXiv preprint arXiv:1909.06335, 2019.
- St-p3: End-to-end vision-based autonomous driving via spatial-temporal feature learning. In European Conference on Computer Vision, pp. 533–549. Springer, 2022.
- Learning multi-agent communication from graph modeling perspective. In The Twelfth International Conference on Learning Representations, 2024.
- Federated learning without full labels: A survey. arXiv preprint arXiv:2303.14453, 2023.
- Scaffold: Stochastic controlled averaging for federated learning. In International conference on machine learning, pp. 5132–5143. PMLR, 2020.
- On large-batch training for deep learning: Generalization gap and sharp minima. In International Conference on Learning Representations, 2017.
- Learning multiple layers of features from tiny images. Toronto, ON, Canada, 2009.
- Asam: Adaptive sharpness-aware minimization for scale-invariant learning of deep neural networks. In International Conference on Machine Learning, pp. 5905–5914. PMLR, 2021.
- Enhancing sharpness-aware optimization through variance suppression. Advances in Neural Information Processing Systems, 36, 2024.
- Visualizing the loss landscape of neural nets. Advances in neural information processing systems, 31, 2018.
- Federated learning on non-iid data silos: An experimental study. In 2022 IEEE 38th International Conference on Data Engineering (ICDE), pp. 965–978. IEEE, 2022.
- On the convergence of fedavg on non-iid data. In International Conference on Learning Representations, 2020.
- Federated transfer reinforcement learning for autonomous driving. arXiv preprint arXiv:1910.06001, 2019.
- Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pp. 1273–1282. PMLR, 2017.
- Make sharpness-aware minimization stronger: A sparsified perturbation approach. In Oh, A. H., Agarwal, A., Belgrave, D., and Cho, K. (eds.), Advances in Neural Information Processing Systems, 2022.
- Normalization layers are all that sharpness-aware minimization needs. Advances in Neural Information Processing Systems, 36, 2024.
- Federated split task-agnostic vision transformer for covid-19 cxr diagnosis. Advances in Neural Information Processing Systems, 34, 2021.
- Moment matching for multi-source domain adaptation. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 1406–1415, 2019.
- Generalized federated learning via sharpness aware minimization. In International Conference on Machine Learning, pp. 18250–18280. PMLR, 2022.
- Adaptive federated optimization. In International Conference on Learning Representations, 2020.
- Privacy-preserving deep learning. In Proceedings of the 22nd ACM SIGSAC conference on computer and communications security, pp. 1310–1321, 2015.
- Dynamic regularized sharpness aware minimization in federated learning: Approaching global consistency and smooth landscape. arXiv preprint arXiv:2305.11584, 2023a.
- Fedspeed: Larger local interval, less communication round, and higher generalization accuracy. arXiv preprint arXiv:2302.10429, 2023b.
- Efficient federated learning via local adaptive amended optimizer with linear speedup. IEEE Transactions on Pattern Analysis & Machine Intelligence, 45(12):14453–14464, 2023c.
- Understanding how consistency works in federated learning via stage-wise relaxed initialization. Advances in Neural Information Processing Systems, 36, 2024.
- Deep hashing network for unsupervised domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5018–5027, 2017.
- Sharpness-aware gradient matching for domain generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3769–3778, 2023.
- Group normalization. In Proceedings of the European conference on computer vision (ECCV), pp. 3–19, 2018.
- Fedcm: Federated learning with client-level momentum. arXiv preprint arXiv:2106.10874, 2021.
- Fourier-based augmentation with applications to domain generalization. Pattern Recognition, 139:109474, 2023.
- Domain-inspired sharpness aware minimization under domain shifts. In The Twelfth International Conference on Learning Representations, 2023a.
- Gradient norm aware minimization seeks first-order flatness and improves generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20247–20257, 2023b.
- Penalizing gradient norm for efficiently improving generalization in deep learning. In International Conference on Machine Learning, pp. 26982–26992. PMLR, 2022.
- Combating representation learning disparity with geometric harmonization. Advances in Neural Information Processing Systems, 36:20394–20408, 2023.
- Deep leakage from gradients. In Federated learning, pp. 17–31. Springer, 2020.
- Surrogate gap minimization improves sharpness-aware training. In International Conference on Learning Representations, 2022.