Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Averaging Rate Scheduler for Decentralized Learning on Heterogeneous Data (2403.03292v1)

Published 5 Mar 2024 in cs.LG and cs.DC

Abstract: State-of-the-art decentralized learning algorithms typically require the data distribution to be Independent and Identically Distributed (IID). However, in practical scenarios, the data distribution across the agents can have significant heterogeneity. In this work, we propose averaging rate scheduling as a simple yet effective way to reduce the impact of heterogeneity in decentralized learning. Our experiments illustrate the superiority of the proposed method (~3% improvement in test accuracy) compared to the conventional approach of employing a constant averaging rate.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)
  1. Global update tracking: A decentralized learning algorithm for heterogeneous data. In Advances in Neural Information Processing Systems, 2023a.
  2. Neighborhood gradient mean: An efficient decentralized learning method for non-iid data. Transactions on Machine Learning Research, 2023b. ISSN 2835-8856.
  3. Léon Bottou. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT’2010, pp.  177–186. Springer, 2010.
  4. Data-heterogeneity-aware mixing for decentralized learning. arXiv preprint arXiv:2204.06477, 2022.
  5. Cross-gradient aggregation for decentralized learning from non-iid data. In International Conference on Machine Learning, pp.  3036–3046. PMLR, 2021.
  6. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  770–778, 2016.
  7. Hamel Husain. Imagenette - a subset of 10 easily classified classes from the imagenet dataset. https://github.com/fastai/imagenette, 2018.
  8. Decentralized deep learning with arbitrary communication compression. arXiv preprint arXiv:1907.09356, 2019.
  9. An improved analysis of gradient tracking for decentralized machine learning. Advances in Neural Information Processing Systems, 34:11422–11435, 2021.
  10. Cifar (canadian institute for advanced research). http://www.cs.toronto.edu/ kriz/cifar.html, 2014.
  11. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
  12. Can decentralized algorithms outperform centralized algorithms? a case study for decentralized parallel stochastic gradient descent. Advances in Neural Information Processing Systems, 30, 2017.
  13. Quasi-global momentum: Accelerating decentralized deep learning on heterogeneous data. In Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp.  6654–6665. PMLR, 18–24 Jul 2021.
  14. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  4510–4520, 2018.
  15. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  16. Momentum tracking: Momentum acceleration for decentralized deep learning on heterogeneous data. arXiv preprint arXiv:2209.15505, 2022.
  17. Deepsqueeze: Decentralization meets error-compensated compression. arXiv preprint arXiv:1907.07346, 2019.
  18. Relaysum for decentralized deep learning on heterogeneous data. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (eds.), Advances in Neural Information Processing Systems, volume 34, pp.  28004–28015. Curran Associates, Inc., 2021.
  19. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017.
  20. Fast linear iterations for distributed averaging. Systems & Control Letters, 53(1):65–78, 2004.

Summary

We haven't generated a summary for this paper yet.