Cross-feature Contrastive Loss for Decentralized Deep Learning on Heterogeneous Data (2310.15890v3)
Abstract: The current state-of-the-art decentralized learning algorithms mostly assume the data distribution to be Independent and Identically Distributed (IID). However, in practical scenarios, the distributed datasets can have significantly heterogeneous data distributions across the agents. In this work, we present a novel approach for decentralized learning on heterogeneous data, where data-free knowledge distillation through contrastive loss on cross-features is utilized to improve performance. Cross-features for a pair of neighboring agents are the features (i.e., last hidden layer activations) obtained from the data of an agent with respect to the model parameters of the other agent. We demonstrate the effectiveness of the proposed technique through an exhaustive set of experiments on various Computer Vision datasets (CIFAR-10, CIFAR-100, Fashion MNIST, Imagenette, and ImageNet), model architectures, and network topologies. Our experiments show that the proposed method achieves superior performance (0.2-4% improvement in test accuracy) compared to other existing techniques for decentralized learning on heterogeneous data.
- Distributed delayed stochastic optimization. Advances in neural information processing systems, 24, 2011.
- Global update tracking: A decentralized learning algorithm for heterogeneous data. arXiv preprint arXiv:2305.04792, 2023.
- Neighborhood gradient clustering: An efficient decentralized learning method for non-iid data distributions. arXiv preprint arXiv:2209.14390, 2022.
- Stochastic gradient push for distributed deep learning. In International Conference on Machine Learning, pages 344–353. PMLR, 2019.
- Decentralized deep learning using momentum-accelerated consensus. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3675–3679. IEEE, 2021.
- Scalable collaborative learning via representation sharing. arXiv preprint arXiv:2211.10943, 2022.
- Léon Bottou. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT’2010, pages 177–186. Springer, 2010.
- Large scale distributed deep networks. Advances in neural information processing systems, 25, 2012.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
- Cross-gradient aggregation for decentralized learning from non-iid data. In International Conference on Machine Learning, pages 3036–3046. PMLR, 2021.
- Applications of deep learning in intelligent transportation systems. Journal of Big Data Analytics in Transportation, 2(2):115–145, 2020.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- The non-IID data quagmire of decentralized machine learning. In Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 4387–4398. PMLR, 13–18 Jul 2020.
- Hamel Husain. Imagenette - a subset of 10 easily classified classes from the imagenet dataset. https://github.com/fastai/imagenette, 2018.
- An improved analysis of gradient tracking for decentralized machine learning. Advances in Neural Information Processing Systems, 34:11422–11435, 2021.
- A unified theory of decentralized sgd with changing topology and local updates. In International Conference on Machine Learning, pages 5381–5393. PMLR, 2020.
- Federated optimization: Distributed machine learning for on-device intelligence. 2016.
- Cifar (canadian institute for advanced research). http://www.cs.toronto.edu/ kriz/cifar.html, 2014.
- Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
- Decentralized federated learning via mutual knowledge transfer. IEEE Internet of Things Journal, 9(2):1136–1147, 2021.
- Fedmd: Heterogenous federated learning via model distillation. arXiv preprint arXiv:1910.03581, 2019.
- Can decentralized algorithms outperform centralized algorithms? a case study for decentralized parallel stochastic gradient descent. Advances in Neural Information Processing Systems, 30, 2017.
- Quasi-global momentum: Accelerating decentralized deep learning on heterogeneous data. In Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 6654–6665. PMLR, 18–24 Jul 2021.
- Ensemble distillation for robust model fusion in federated learning. Advances in Neural Information Processing Systems, 33:2351–2363, 2020.
- Evolving normalization-activation layers. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 13539–13550. Curran Associates, Inc., 2020.
- Angelia Nedic. Distributed gradient methods for convex machine learning problems in networks: Distributed optimization. IEEE Signal Processing Magazine, 37(3):92–101, 2020.
- Homogenizing non-iid datasets via in-distribution knowledge distillation for decentralized learning. arXiv preprint arXiv:2304.04326, 2023.
- Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4510–4520, 2018.
- Momentum tracking: Momentum acceleration for decentralized deep learning on heterogeneous data. arXiv preprint arXiv:2209.15505, 2022.
- Relaysum for decentralized deep learning on heterogeneous data. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 28004–28015. Curran Associates, Inc., 2021.
- Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017.
- Fast linear iterations for distributed averaging. Systems & Control Letters, 53(1):65–78, 2004.
- A bp-like distributed algorithm for weighted average consensus. In 2019 12th Asian Control Conference (ASCC), pages 728–733. IEEE, 2019.
- Data-free knowledge distillation for heterogeneous federated learning. In International Conference on Machine Learning, pages 12878–12889. PMLR, 2021.