Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

FedBN: Federated Learning on Non-IID Features via Local Batch Normalization (2102.07623v2)

Published 15 Feb 2021 in cs.LG

Abstract: The emerging paradigm of federated learning (FL) strives to enable collaborative training of deep models on the network edge without centrally aggregating raw data and hence improving data privacy. In most cases, the assumption of independent and identically distributed samples across local clients does not hold for federated learning setups. Under this setting, neural network training performance may vary significantly according to the data distribution and even hurt training convergence. Most of the previous work has focused on a difference in the distribution of labels or client shifts. Unlike those settings, we address an important problem of FL, e.g., different scanners/sensors in medical imaging, different scenery distribution in autonomous driving (highway vs. city), where local clients store examples with different distributions compared to other clients, which we denote as feature shift non-iid. In this work, we propose an effective method that uses local batch normalization to alleviate the feature shift before averaging models. The resulting scheme, called FedBN, outperforms both classical FedAvg, as well as the state-of-the-art for non-iid data (FedProx) on our extensive experiments. These empirical results are supported by a convergence analysis that shows in a simplified setting that FedBN has a faster convergence rate than FedAvg. Code is available at https://github.com/med-air/FedBN.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Xiaoxiao Li (144 papers)
  2. Meirui Jiang (15 papers)
  3. Xiaofei Zhang (36 papers)
  4. Michael Kamp (24 papers)
  5. Qi Dou (163 papers)
Citations (700)

Summary

Federated Learning on Non-IID Data with Local Batch Normalization

This paper addresses a significant challenge in federated learning (FL) known as feature shift non-IID, where the input data distribution varies across clients rather than the label distribution. This issue is prevalent in real-world scenarios, such as medical imaging and autonomous driving. The proposed method, FedBN, extends existing solutions by incorporating local batch normalization (BN) to combat feature discrepancies.

Background and Motivation

Federated learning enables multiple edge devices to collaboratively train deep learning models without sharing raw data. However, traditional FL methods like FedAvg suffer from performance degradation when data is non-IID. Previous approaches have primarily focused on label distribution skew, neglecting the feature distribution variations which can be equally detrimental in practice. This paper identifies this gap and provides a tailored solution for cases where local feature distributions are inconsistent across clients.

Proposed Method: FedBN

FedBN modifies the classical FedAvg by using local BN layers, omitted from the global aggregation. Each participating client updates its BN parameters independently, allowing the model to adapt better to the local data characteristics. This method harmonizes the variance arising from diverse local data sources during model training, enhancing convergence rates and accuracy.

Experimental Evaluation

The authors present extensive experiments on benchmark datasets including SVHN, USPS, and MNIST variants, as well as on real-world datasets like Office-Caltech10 and DomainNet. These evaluations show that FedBN consistently outperforms FedAvg and FedProx, especially under significant feature shift conditions. Importantly, it maintains zero additional parameter tuning overhead and minimal computational costs.

Theoretical Insights

The paper offers a theoretical analysis to support FedBN's empirical results. Using the neural tangent kernel (NTK) framework, it is demonstrated that FedBN achieves superior convergence rates by keeping the local BN parameters instead of averaging them. This strategy ensures that the model's trajectory does not diverge, even with heterogenous feature distributions, as indicated by the lower complexity requirements and stable convergence.

Implications and Future Directions

The implications of this work are multifaceted. Practically, FedBN can be implemented in various federated learning scenarios without changes to existing communication protocols, providing an immediate improvement in dealing with feature shift non-IID data. Theoretically, the convergence results pave the way for further exploration of optimization strategies specific to non-IID settings.

Future research could explore the integration of FedBN with other optimization and aggregation strategies, or its applicability in privacy-sensitive domains like healthcare. Additionally, quantifying the privacy-preserving benefits of local BN parameters could be an interesting avenue, given their concealed nature in the global model.

Conclusion

FedBN represents a relevant advancement in federated learning scenarios where feature shift non-IID data is present. It demonstrates significant improvements over existing methodologies without requiring complex modifications to the federated learning pipeline. By addressing the under-explored problem of feature distribution skew, FedBN provides a robust solution with extensive practical applicability.

Github Logo Streamline Icon: https://streamlinehq.com