Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Asynchronous Local Computations in Distributed Bayesian Learning (2311.03496v2)

Published 6 Nov 2023 in cs.LG, cs.DC, and cs.MA

Abstract: Due to the expanding scope of ML to the fields of sensor networking, cooperative robotics and many other multi-agent systems, distributed deployment of inference algorithms has received a lot of attention. These algorithms involve collaboratively learning unknown parameters from dispersed data collected by multiple agents. There are two competing aspects in such algorithms, namely, intra-agent computation and inter-agent communication. Traditionally, algorithms are designed to perform both synchronously. However, certain circumstances need frugal use of communication channels as they are either unreliable, time-consuming, or resource-expensive. In this paper, we propose gossip-based asynchronous communication to leverage fast computations and reduce communication overhead simultaneously. We analyze the effects of multiple (local) intra-agent computations by the active agents between successive inter-agent communications. For local computations, Bayesian sampling via unadjusted Langevin algorithm (ULA) MCMC is utilized. The communication is assumed to be over a connected graph (e.g., as in decentralized learning), however, the results can be extended to coordinated communication where there is a central server (e.g., federated learning). We theoretically quantify the convergence rates in the process. To demonstrate the efficacy of the proposed algorithm, we present simulations on a toy problem as well as on real world data sets to train ML models to perform classification tasks. We observe faster initial convergence and improved performance accuracy, especially in the low data range. We achieve on average 78% and over 90% classification accuracy respectively on the Gamma Telescope and mHealth data sets from the UCI ML repository.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (19)
  1. A. Parayil, H. Bai, J. George, and P. Gurram, “Decentralized langevin dynamics for bayesian learning,” Advances in Neural Information Processing Systems, vol. 33, pp. 15 978–15 989, 2020.
  2. L. Lamport, “Time, clocks, and the ordering of events in a distributed system,” in Concurrency: the Works of Leslie Lamport, 2019, pp. 179–196.
  3. J. Tsitsiklis, D. Bertsekas, and M. Athans, “Distributed asynchronous deterministic and stochastic gradient optimization algorithms,” IEEE Transactions on Automatic Control, vol. 31, no. 9, pp. 803–812, 1986.
  4. K. Srivastava and A. Nedic, “Distributed asynchronous constrained stochastic optimization,” IEEE Journal of Selected Topics in Signal Processing, vol. 5, no. 4, pp. 772–790, 2011.
  5. B. M. Assran, A. Aytekin, H. R. Feyzmahdavian, M. Johansson, and M. G. Rabbat, “Advances in asynchronous parallel and distributed optimization,” Proceedings of the IEEE, vol. 108, no. 11, pp. 2013–2031, 2020.
  6. J. N. Tsitsiklis, “Problems in decentralized decision making and computation.” Massachusetts Inst of Tech Cambridge Lab for Information and Decision Systems, Tech. Rep., 1984.
  7. A. S. Berahas, R. Bollapragada, N. S. Keskar, and E. Wei, “Balancing communication and computation in distributed optimization,” IEEE Transactions on Automatic Control, vol. 64, no. 8, pp. 3141–3155, 2018.
  8. A. Nedić, A. Olshevsky, and M. G. Rabbat, “Network topology and communication-computation tradeoffs in decentralized optimization,” Proceedings of the IEEE, vol. 106, no. 5, pp. 953–976, 2018.
  9. W. X. I. Gutman, “Generalized inverse of the Laplacian matrix and some applications,” Bulletin, Classe des Sciences Mathématiques et Naturelles, Sciences mathématiques, vol. 129, no. 29, pp. 15–23, 2004.
  10. A. S. Dalalyan, “Theoretical guarantees for approximate sampling from smooth and log-concave densities,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 79, no. 3, pp. 651–676, 2017.
  11. A. Dalalyan, “Further and stronger analogy between sampling and optimization: Langevin monte carlo and gradient descent,” in Conference on Learning Theory.   PMLR, 2017, pp. 678–689.
  12. X. Cheng, N. S. Chatterji, P. L. Bartlett, and M. I. Jordan, “Underdamped langevin mcmc: A non-asymptotic analysis,” in Conference on learning theory.   PMLR, 2018, pp. 300–323.
  13. X. Cheng and P. Bartlett, “Convergence of langevin mcmc in kl-divergence,” in Algorithmic Learning Theory.   PMLR, 2018, pp. 186–211.
  14. A. Durmus and E. Moulines, “Sampling from strongly log-concave distributions with the unadjusted langevin algorithm,” arXiv preprint arXiv:1605.01559, 2016.
  15. ——, “Nonasymptotic convergence analysis for the unadjusted langevin algorithm,” The Annals of Applied Probability, vol. 27, no. 3, pp. 1551–1587, 2017.
  16. ——, “High-dimensional bayesian inference via the unadjusted langevin algorithm,” Bernoulli, vol. 25, no. 4A, pp. 2854–2882, 2019.
  17. M. Welling and Y. W. Teh, “Bayesian learning via stochastic gradient langevin dynamics,” in Proceedings of the 28th international conference on machine learning (ICML-11), 2011, pp. 681–688.
  18. S. S. Ram, A. Nedić, and V. V. Veeravalli, “Asynchronous gossip algorithms for stochastic optimization,” in Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.   IEEE, 2009, pp. 3581–3586.
  19. Y. Teh, A. Thiéry, and S. Vollmer, “Consistency and fluctuations for stochastic gradient langevin dynamics,” Journal of Machine Learning Research, vol. 17, 2016.

Summary

We haven't generated a summary for this paper yet.