The Non-IID Data Quagmire of Decentralized Machine Learning (1910.00189v2)

Published 1 Oct 2019 in cs.LG and stat.ML

Abstract: Many large-scale ML applications need to perform decentralized learning over datasets generated at different devices and locations. Such datasets pose a significant challenge to decentralized learning because their different contexts result in significant data distribution skew across devices/locations. In this paper, we take a step toward better understanding this challenge by presenting a detailed experimental study of decentralized DNN training on a common type of data skew: skewed distribution of data labels across devices/locations. Our study shows that: (i) skewed data labels are a fundamental and pervasive problem for decentralized learning, causing significant accuracy loss across many ML applications, DNN models, training datasets, and decentralized learning algorithms; (ii) the problem is particularly challenging for DNN models with batch normalization; and (iii) the degree of data skew is a key determinant of the difficulty of the problem. Based on these findings, we present SkewScout, a system-level approach that adapts the communication frequency of decentralized learning algorithms to the (skew-induced) accuracy loss between data partitions. We also show that group normalization can recover much of the accuracy loss of batch normalization.

Citations (518)

View on Semantic Scholar

Summary

The paper empirically demonstrates that non-IID data significantly degrades decentralized model accuracy across various architectures.
It identifies that batch normalization exacerbates accuracy loss by relying on local rather than global minibatch statistics.
It proposes GEMINI, a dynamic communication strategy that reduces communication costs by up to 34.1x while preserving accuracy.

Decentralized Machine Learning and Non-IID Data Challenges

The paper, "The Non-IID Data Quagmire of Decentralized Machine Learning," addresses a critical issue facing decentralized machine learning: the challenge of training models on non-identically independently distributed (Non-IID) data partitions. This paper offers a comprehensive examination of the problems arising from skewed data distributions in decentralized learning environments, such as federated and geo-distributed systems.

Key Findings

Pervasiveness of Non-IID Data Issues: The authors empirically demonstrate that skewed data labels cause significant degradation in model accuracy across various decentralized learning algorithms and applications. This problem affects many models, from classical architectures like AlexNet to more complex structures such as ResNet.
Batch Normalization Vulnerabilities: DNN models utilizing batch normalization show pronounced susceptibility to accuracy loss in Non-IID settings. Batch normalization's reliance on local minibatch statistics rather than global statistics exacerbates this issue.
Impact of Data Skewness: The degree of skew in label distributions significantly influences the difficulty of the learning problem. Higher levels of skew exacerbate model divergence and accuracy loss.

Proposed Solution

The paper introduces GEMINI, a novel approach designed to mitigate the communication costs and accuracy loss associated with Non-IID data. GEMINI dynamically adjusts the communication frequency between decentralized data partitions based on an estimated accuracy loss, finely balancing communication and model quality.

Experimental Insights

The paper provides robust experimental evidence supporting its claims. Extensive evaluations across different datasets, like CIFAR-10 and ImageNet, reinforce the generality of the findings. The proposed approach is shown to reduce communication by up to 34.1 times while retaining accuracy comparable to centralized synchronous parallel (BSP) setups.

Theoretical and Practical Implications

The research underscores the necessity of re-evaluating traditional decentralized learning algorithms, particularly as they are strained under Non-IID conditions. Practically, this paper suggests pathways to enhance model training efficiency in real-world applications, such as mobile device learning and global-scale data processing, where Non-IID data is prevalent.

Future Prospects

The authors call for further investigation into alternative normalization techniques and hybrid models that could capably adapt to diverse data scenarios. Exploring strategies like multi-task learning and clustering data partitions may provide additional ways to address Non-IID data issues.

This paper significantly contributes to a nuanced understanding of the complexities involved in decentralized machine learning and outlines a pragmatic approach to enhancing model robustness across diverse data settings.

PDF Markdown