Making Batch Normalization Great in Federated Deep Learning (2303.06530v4)
Abstract: Batch Normalization (BN) is widely used in {centralized} deep learning to improve convergence and generalization. However, in {federated} learning (FL) with decentralized data, prior work has observed that training with BN could hinder performance and suggested replacing it with Group Normalization (GN). In this paper, we revisit this substitution by expanding the empirical study conducted in prior work. Surprisingly, we find that BN outperforms GN in many FL settings. The exceptions are high-frequency communication and extreme non-IID regimes. We reinvestigate factors that are believed to cause this problem, including the mismatch of BN statistics across clients and the deviation of gradients during local training. We empirically identify a simple practice that could reduce the impacts of these factors while maintaining the strength of BN. Our approach, which we named FIXBN, is fairly easy to implement, without any additional training or communication costs, and performs favorably across a wide range of FL settings. We hope that our study could serve as a valuable reference for future practical usage and theoretical analysis in FL.
- Federated learning based on dynamic regularization. In ICLR, 2021.
- Siloed federated learning for multi-centric histopathology datasets. In DART. Springer, 2020.
- Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
- Fednorm: Modality-based normalization in federated learning for multi-modal liver segmentation. arXiv preprint arXiv:2205.11096, 2022.
- Understanding batch normalization. NeurIPS, 2018.
- Fedat: a high-performance and communication-efficient federated learning system with asynchronous tiers. In SC, 2021.
- On large-cohort training for federated learning. NeurIPS, 2021.
- Fedbe: Making bayesian model ensemble applicable to federated learning. In ICLR, 2021.
- On bridging generic and personalized federated learning for image classification. In ICLR, 2022.
- On the importance and applicability of pre-training for federated learning. In ICLR, 2023.
- Encoder-decoder with atrous separable convolution for semantic image segmentation. In ECCV, 2018.
- The cityscapes dataset. In CVPR Workshop on The Future of Datasets in Vision, 2015.
- Imagenet: A large-scale hierarchical image database. In CVPR, 2009.
- Heterofl: Computation and communication efficient federated learning for heterogeneous clients. In ICLR, 2020.
- Rethinking normalization methods in federated learning. In Proceedings of the 3rd International Workshop on Distributed Machine Learning, 2022.
- Feddna: Federated learning with decoupled normalization-layer aggregation for non-iid data. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 2021.
- Dropout vs. batch normalization: an empirical study of their impact to deep learning. Multimedia Tools and Applications, 2020.
- On the convergence of local descent methods in federated learning. arXiv preprint arXiv:1910.14425, 2019.
- Group knowledge transfer: Federated learning of large cnns at the edge. In NeurIPS, 2020.
- Deep residual learning for image recognition. In CVPR, 2016.
- Federated robustness propagation: Sharing adversarial robustness in federated learning. arXiv preprint arXiv:2106.10196, 2021.
- Federated learning of user verification models without sharing embeddings. In ICML. PMLR, 2021.
- The non-iid data quagmire of decentralized machine learning. In ICML. PMLR, 2020.
- Measuring the effects of non-identical data distribution for federated visual classification. arXiv preprint arXiv:1909.06335, 2019.
- Fedpara: Low-rank hadamard product for communication-efficient federated learning. arXiv preprint arXiv:2108.06098, 2021.
- Fedbs: Learning on non-iid data in federated learning using batch normalization. In 2021 IEEE 33rd International Conference on Tools with Artificial Intelligence (ICTAI). IEEE, 2021.
- Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML. PMLR, 2015.
- Tsmobn: Interventional generalization for unseen clients in federated learning. arXiv preprint arXiv:2110.09974, 2021.
- Accelerated federated learning with decoupled adaptive optimization. In ICML. PMLR, 2022.
- Advances and open problems in federated learning. arXiv preprint arXiv:1912.04977, 2019.
- Mime: Mimicking centralized stochastic algorithms in federated learning. arXiv preprint arXiv:2008.03606, 2020a.
- Scaffold: Stochastic controlled averaging for federated learning. In ICML, 2020b.
- Federated hyperparameter tuning: Challenges, baselines, and connections to weight-sharing. Advances in Neural Information Processing Systems, 34:19184–19197, 2021.
- Learning multiple layers of features from tiny images. 2009.
- Tiny imagenet visual recognition challenge. CS 231N, 2015.
- Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998.
- Feddane: A federated newton-type method. In 2019 53rd Asilomar Conference on Signals, Systems, and Computers, 2019.
- Ditto: Fair and robust federated learning through. arXiv preprint arXiv:2012.04221, 2020a.
- Federated learning: Challenges, methods, and future directions. IEEE Signal Processing Magazine, 2020b.
- Federated optimization in heterogeneous networks. Proceedings of Machine Learning and Systems, 2020c.
- On the convergence of fedavg on non-iid data. In ICLR, 2020d.
- Fed{bn}: Federated learning on non-{iid} features via local batch normalization. In ICLR, 2021.
- Revisiting batch normalization for practical domain adaptation. arXiv preprint arXiv:1603.04779, 2016.
- Variance reduced local sgd with lower communication complexity. arXiv preprint arXiv:1912.12844, 2019.
- Ensemble distillation for robust model fusion in federated learning. In NeurIPS, 2020.
- Personalized federated learning with adaptive batchnorm for healthcare. IEEE Transactions on Big Data, 2022.
- Beyond batchnorm: towards a unified understanding of normalization in deep learning. NeurIPS, 2021.
- Towards understanding regularization in batch normalization. In ICLR, 2019.
- Three approaches for personalization with applications to federated learning. arXiv preprint arXiv:2002.10619, 2020.
- Communication-efficient learning of deep networks from decentralized data. In AISTATS, 2017.
- Fedos: using open-set learning to stabilize training in federated learning. arXiv preprint arXiv:2208.11512, 2022.
- Where to begin? exploring the impact of pre-training and initialization in federated learning. In ICLR, 2023.
- ONNX. Open neural network exchange (onnx) model zoo. https://github.com/onnx/models, 2023.
- PyTorch. Pytorch hub. https://pytorch.org/hub/, 2023.
- Micro-batch training with batch-channel normalization and weight standardization. arXiv preprint arXiv:1903.10520, 2019.
- Adaptive federated optimization. In ICLR, 2020.
- Adaptive federated optimization. In ICLR, 2021.
- Mobilenetv2: Inverted residuals and linear bottlenecks. In CVPR, 2018.
- How does batch normalization help optimization? NeurIPS, 2018.
- Federated multi-task learning. In NeurIPS, 2017.
- Stich, S. U. Local sgd converges fast and communicates little. In ICLR, 2019.
- Gradient masked averaging for federated learning. arXiv preprint arXiv:2201.11986, 2022.
- Instance normalization: The missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022, 2016.
- Federated learning with matched averaging. In ICLR, 2020.
- Quantized federated learning under transmission delay and outage constraints. IEEE Journal on Selected Areas in Communications, 2021.
- Why batch normalization damage federated learning on non-iid data? arXiv preprint arXiv:2301.02982, 2023.
- Group normalization. In ECCV, 2018.
- A mean field theory of batch normalization. arXiv preprint arXiv:1902.08129, 2019.
- Towards the practical utility of federated learning in the medical domain. arXiv preprint arXiv:2207.03075, 2022.
- Fed2: Feature-aligned federated learning. In KDD, 2021.
- Salvaging federated learning by local adaptation. arXiv preprint arXiv:2002.04758, 2020.
- Federated accelerated stochastic gradient descent. In NeurIPS, 2020.
- What do we mean by generalization in federated learning? In ICLR, 2021.
- Bayesian nonparametric federated learning of neural networks. In ICML, 2019.
- Normalization is all you need: Understanding layer-normalized federated learning under extreme label shift. arXiv preprint arXiv:2308.09565, 2023.
- Fixup initialization: Residual learning without normalization. arXiv preprint arXiv:1901.09321, 2019.
- Federated learning with non-iid data. arXiv preprint arXiv:1806.00582, 2018.
- Design and analysis of uplink and downlink communications for federated learning. IEEE Journal on Selected Areas in Communications, 2020.
- On the convergence properties of a k𝑘kitalic_k-step averaging stochastic gradient descent algorithm for nonconvex optimization. arXiv preprint arXiv:1708.01012, 2017.
- Distilled one-shot federated learning. arXiv preprint arXiv:2009.07999, 2020.
- Is normalization indispensable for multi-domain federated learning? arXiv preprint arXiv:2306.05879, 2023.