Federated Learning on Non-IID Data with Local Batch Normalization
This paper addresses a significant challenge in federated learning (FL) known as feature shift non-IID, where the input data distribution varies across clients rather than the label distribution. This issue is prevalent in real-world scenarios, such as medical imaging and autonomous driving. The proposed method, FedBN, extends existing solutions by incorporating local batch normalization (BN) to combat feature discrepancies.
Background and Motivation
Federated learning enables multiple edge devices to collaboratively train deep learning models without sharing raw data. However, traditional FL methods like FedAvg suffer from performance degradation when data is non-IID. Previous approaches have primarily focused on label distribution skew, neglecting the feature distribution variations which can be equally detrimental in practice. This paper identifies this gap and provides a tailored solution for cases where local feature distributions are inconsistent across clients.
Proposed Method: FedBN
FedBN modifies the classical FedAvg by using local BN layers, omitted from the global aggregation. Each participating client updates its BN parameters independently, allowing the model to adapt better to the local data characteristics. This method harmonizes the variance arising from diverse local data sources during model training, enhancing convergence rates and accuracy.
Experimental Evaluation
The authors present extensive experiments on benchmark datasets including SVHN, USPS, and MNIST variants, as well as on real-world datasets like Office-Caltech10 and DomainNet. These evaluations show that FedBN consistently outperforms FedAvg and FedProx, especially under significant feature shift conditions. Importantly, it maintains zero additional parameter tuning overhead and minimal computational costs.
Theoretical Insights
The paper offers a theoretical analysis to support FedBN's empirical results. Using the neural tangent kernel (NTK) framework, it is demonstrated that FedBN achieves superior convergence rates by keeping the local BN parameters instead of averaging them. This strategy ensures that the model's trajectory does not diverge, even with heterogenous feature distributions, as indicated by the lower complexity requirements and stable convergence.
Implications and Future Directions
The implications of this work are multifaceted. Practically, FedBN can be implemented in various federated learning scenarios without changes to existing communication protocols, providing an immediate improvement in dealing with feature shift non-IID data. Theoretically, the convergence results pave the way for further exploration of optimization strategies specific to non-IID settings.
Future research could explore the integration of FedBN with other optimization and aggregation strategies, or its applicability in privacy-sensitive domains like healthcare. Additionally, quantifying the privacy-preserving benefits of local BN parameters could be an interesting avenue, given their concealed nature in the global model.
Conclusion
FedBN represents a relevant advancement in federated learning scenarios where feature shift non-IID data is present. It demonstrates significant improvements over existing methodologies without requiring complex modifications to the federated learning pipeline. By addressing the under-explored problem of feature distribution skew, FedBN provides a robust solution with extensive practical applicability.