Revisiting Weighted Aggregation in Federated Learning with Neural Networks (2302.10911v4)
Abstract: In federated learning (FL), weighted aggregation of local models is conducted to generate a global model, and the aggregation weights are normalized (the sum of weights is 1) and proportional to the local data sizes. In this paper, we revisit the weighted aggregation process and gain new insights into the training dynamics of FL. First, we find that the sum of weights can be smaller than 1, causing global weight shrinking effect (analogous to weight decay) and improving generalization. We explore how the optimal shrinking factor is affected by clients' data heterogeneity and local epochs. Second, we dive into the relative aggregation weights among clients to depict the clients' importance. We develop client coherence to study the learning dynamics and find a critical point that exists. Before entering the critical point, more coherent clients play more essential roles in generalization. Based on the above insights, we propose an effective method for Federated Learning with Learnable Aggregation Weights, named as FedLAW. Extensive experiments verify that our method can improve the generalization of the global model by a large margin on different datasets and models.
- Federated learning based on dynamic regularization. In International Conference on Learning Representations, 2020.
- A convergence theory for deep learning via over-parameterization. In International Conference on Machine Learning, pp. 242–252. PMLR, 2019.
- Improving generalization in federated learning by seeking flat minima. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXIII, pp. 654–672. Springer, 2022.
- On large-cohort training for federated learning. Advances in neural information processing systems, 34:20461–20475, 2021.
- Chatterjee, S. Coherent gradients: An approach to understanding generalization in gradient descent-based optimization. In International Conference on Learning Representations, 2019.
- Making coherence out of nothing at all: measuring the evolution of gradient alignment. arXiv preprint arXiv:2008.01217, 2020.
- Fedbe: Making bayesian model ensemble applicable to federated learning. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021. URL https://openreview.net/forum?id=dgtpE6gKjHn.
- Distributionally robust federated averaging. Advances in neural information processing systems, 33:15111–15122, 2020.
- Sharp minima can generalize for deep nets. In International Conference on Machine Learning, pp. 1019–1028. PMLR, 2017.
- Essentially no barriers in neural network energy landscape. In International conference on machine learning, pp. 1309–1318. PMLR, 2018.
- Efficient sharpness-aware minimization for improved training of neural networks. arXiv preprint arXiv:2110.03141, 2021.
- The role of permutation invariance in linear mode connectivity of neural networks. In International Conference on Learning Representations, 2022.
- Sharpness-aware minimization for efficiently improving generalization. arXiv preprint arXiv:2010.01412, 2020.
- Large scale structure of neural network loss landscapes. Advances in Neural Information Processing Systems, 32, 2019.
- Stiffness: A new perspective on generalization in neural networks. arXiv preprint arXiv:1901.09491, 2019.
- Forward and reverse gradient-based hyperparameter optimization. In International Conference on Machine Learning, pp. 1165–1173. PMLR, 2017.
- Loss surfaces, mode connectivity, and fast ensembling of dnns. Advances in neural information processing systems, 31, 2018.
- Auto-fedrl: Federated hyperparameter optimization for multi-institutional medical image segmentation. arXiv preprint arXiv:2203.06338, 2022.
- Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700–4708, 2017.
- Personalized cross-silo federated learning on non-iid data. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp. 7865–7873, 2021.
- Averaging weights leads to wider optima and better generalization. In 34th Conference on Uncertainty in Artificial Intelligence 2018, UAI 2018, pp. 876–885, 2018.
- On the relation between the sharpest directions of dnn loss and the sgd step length. arXiv preprint arXiv:1807.05031, 2018.
- The break-even point on optimization trajectories of deep neural networks. In International Conference on Learning Representations, 2019.
- The break-even point on optimization trajectories of deep neural networks. arXiv preprint arXiv:2002.09572, 2020.
- Hyp-rl: Hyperparameter optimization by reinforcement learning. arXiv preprint arXiv:1906.11527, 2019.
- Kantorovich, L. V. On the translocation of masses. Journal of mathematical sciences, 133(4):1381–1382, 2006.
- Scaffold: Stochastic controlled averaging for federated learning. In International Conference on Machine Learning, pp. 5132–5143. PMLR, 2020.
- On large-batch training for deep learning: Generalization gap and sharp minima. In 5th International Conference on Learning Representations, ICLR 2017, 2017.
- Asam: Adaptive sharpness-aware minimization for scale-invariant learning of deep neural networks. In International Conference on Machine Learning, pp. 5905–5914. PMLR, 2021.
- On the training dynamics of deep networks with l_2𝑙_2l\_2italic_l _ 2 regularization. Advances in Neural Information Processing Systems, 33:4790–4799, 2020.
- Visualizing the loss landscape of neural nets. Advances in neural information processing systems, 31, 2018.
- Learning to collaborate in decentralized learning of personalized models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9766–9775, 2022a.
- Federated learning: Challenges, methods, and future directions. IEEE Signal Process. Mag., 37(3):50–60, 2020a. doi: 10.1109/MSP.2020.2975749. URL https://doi.org/10.1109/MSP.2020.2975749.
- Federated optimization in heterogeneous networks. Proceedings of Machine Learning and Systems, 2:429–450, 2020b.
- Understanding the disharmony between weight normalization family and weight decay. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pp. 4715–4722, 2020c.
- Towards effective clustered federated learning: A peer-to-peer framework with adaptive neighbor matching. IEEE Transactions on Big Data, 2022b.
- Mining latent relationships among clients: Peer-to-peer federated learning with adaptive neighbor matching. arXiv preprint arXiv:2203.12285, 2022c.
- Ensemble distillation for robust model fusion in federated learning. Advances in Neural Information Processing Systems, 33:2351–2363, 2020.
- Decoupled weight decay regularization. In International Conference on Learning Representations, 2018.
- Understanding the generalization benefit of normalization layers: Sharpness reduction. arXiv preprint arXiv:2206.07085, 2022.
- Gradient-based hyperparameter optimization through reversible learning. In International conference on machine learning, pp. 2113–2122. PMLR, 2015.
- Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pp. 1273–1282. PMLR, 2017.
- Mostafa, H. Robust federated learning through representation matching and adaptive hyper-parameters. arXiv preprint arXiv:1912.13075, 2019.
- Model fusion via optimal transport. Advances in Neural Information Processing Systems, 33:22045–22055, 2020.
- What can linear interpolation of neural network loss landscapes tell us? arXiv preprint arXiv:2106.16004, 2021.
- Spherical motion dynamics: Learning dynamics of normalized neural network using sgd and weight decay. Advances in Neural Information Processing Systems, 34:6380–6391, 2021.
- Federated learning with matched averaging. arXiv preprint arXiv:2002.06440, 2020a.
- Tackling the objective inconsistency problem in heterogeneous federated optimization. Advances in neural information processing systems, 33:7611–7623, 2020b.
- A field guide to federated optimization. arXiv preprint arXiv:2107.06917, 2021.
- Drflm: Distributionally robust federated learning with inter-client noise via local mixup. arXiv preprint arXiv:2204.07742, 2022.
- Auto-fedavg: learnable federated averaging for multi-institutional medical image segmentation. arXiv preprint arXiv:2104.10195, 2021.
- Understanding and scheduling weight decay. arXiv preprint arXiv:2011.11152, 2020.
- Critical learning periods in federated learning. arXiv preprint arXiv:2109.05613, 2021.
- Pyhessian: Neural networks through the lens of the hessian. In 2020 IEEE international conference on big data (Big data), pp. 581–590. IEEE, 2020.
- Gradient diversity: a key ingredient for scalable distributed learning. In International Conference on Artificial Intelligence and Statistics, pp. 1998–2007. PMLR, 2018.
- Three mechanisms of weight decay regularization. In International Conference on Learning Representations, 2018.
- Weak and strong gradient directions: Explaining memorization, generalization, and hardness of examples at scale. arXiv preprint arXiv:2003.07422, 2020.
- An improved analysis of training over-parameterized deep neural networks. Advances in neural information processing systems, 32, 2019.