Learned Finite-Time Consensus for Distributed Optimization (2404.07018v2)
Abstract: Most algorithms for decentralized learning employ a consensus or diffusion mechanism to drive agents to a common solution of a global optimization problem. Generally this takes the form of linear averaging, at a rate of contraction determined by the mixing rate of the underlying network topology. For very sparse graphs this can yield a bottleneck, slowing down the convergence of the learning algorithm. We show that a sequence of matrices achieving finite-time consensus can be learned for unknown graph topologies in a decentralized manner by solving a constrained matrix factorization problem. We demonstrate numerically the benefit of the resulting scheme in both structured and unstructured graphs.
- G. Mateos, J. A. Bazerque, and G. B. Giannakis, “Distributed Sparse Linear Regression,” IEEE Transactions on Signal Processing, vol. 58, no. 10, p. 5262–5276, 2010.
- V. Kekatos and G. B. Giannakis, “Distributed Robust Power System State Estimation,” IEEE Transactions on Power Systems, vol. 28, no. 2, pp. 1617–1626, 2013.
- M. Assran, N. Loizou, N. Ballas, and M. Rabbat, “Stochastic Gradient Push for Distributed Deep Learning,” in Proceedings of the 36th International Conference on Machine Learning, vol. 97, Long Beach, California, USA, Jun. 2019, PMLR 97:344-353.
- S. Vlaski, S. Kar, A. H. Sayed, and J. M. Moura, “Networked Signal and Information Processing: Learning by multiagent systems,” IEEE Signal Processing Magazine, vol. 40, no. 5, p. 92–105, 2023.
- A. H. Sayed, “Adaptation, Learning, and Optimization over Networks,” Foundations and Trends® in Machine Learning, vol. 7, no. 4–5, p. 311–801, 2014.
- F. S. Cattivelli and A. H. Sayed, “Diffusion LMS Strategies for Distributed Estimation,” IEEE Transactions on Signal Processing, vol. 58, no. 3, p. 1035–1048, 2010.
- S. S. Ram, A. Nedić, and V. V. Veeravalli, “Distributed Stochastic Subgradient Projection Algorithms for Convex Optimization,” Journal of Optimization Theory and Applications, vol. 147, no. 3, p. 516–545, 2010.
- E. Wei and A. Ozdaglar, “Distributed Alternating Direction Method of Multipliers,” in 2012 IEEE 51st IEEE Conference on Decision and Control (CDC), 2012, pp. 5445–5450.
- W. Shi, Q. Ling, G. Wu, and W. Yin, “EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization,” SIAM Journal on Optimization, vol. 25, no. 2, pp. 944–966, 2015.
- K. Yuan, B. Ying, X. Zhao, and A. H. Sayed, “Exact Diffusion for Distributed Optimization and LearningPart I: Algorithm Development,” IEEE Transactions on Signal Processing, vol. 67, no. 3, p. 708–723, 2017.
- P. D. Lorenzo and G. Scutari, “NEXT: In-Network Nonconvex Optimization,” IEEE Transactions on Signal and Information Processing over Networks, vol. 2, no. 2, p. 120–136, 2016.
- J. Xu, S. Zhu, Y. C. Soh, and L. Xie, “Augmented distributed gradient methods for multi-agent optimization under uncoordinated constant stepsizes,” 2015 54th IEEE Conference on Decision and Control (CDC), p. 2055–2060, 2015.
- A. Nedić, A. Olshevsky, and W. Shi, “Achieving Geometric Convergence for Distributed Optimization Over Time-Varying Graphs,” SIAM Journal on Optimization, vol. 27, no. 4, p. 2597–2633, 2017.
- C. Xi, R. Xin, and U. A. Khan, “ADD-OPT: Accelerated Distributed Directed Optimization,” IEEE Transactions on Automatic Control, vol. 63, no. 5, p. 1329–1339, 2017.
- X. Zhao and A. H. Sayed, “Asynchronous Adaptation and Learning Over Networks—Part I: Modeling and Stability Analysis,” IEEE Transactions on Signal Processing, vol. 63, no. 4, p. 811–826, 2013.
- D. Jakovetić, J. Xavier, and J. M. F. Moura, “Fast Distributed Gradient Methods,” IEEE Transactions on Automatic Control, vol. 59, no. 5, pp. 1131–1146, 2014.
- Z. Song, L. Shi, S. Pu, and M. Yan, “Optimal gradient tracking for decentralized optimization,” Mathematical Programming, p. 1–53, 2023.
- K. Huang, S. Pu, and A. Nedić, “An Accelerated Distributed Stochastic Gradient Method with Momentum,” arXiv:2402.09714, 2024.
- L. Xiao and S. Boyd, “Fast linear iterations for distributed averaging,” Systems & Control Letters, vol. 53, no. 1, p. 65–78, 2004.
- J. M. Hendrickx, R. M. Jungers, A. Olshevsky, and G. Vankeerberghen, “Graph diameter, eigenvalues, and minimum-time consensus,” Automatica, vol. 50, no. 2, p. 635–640, 2014.
- E. D. H. Nguyen, X. Jiang, B. Ying, and C. A. Uribe, “On Graphs with Finite-Time Consensus and Their Use in Gradient Tracking,” arXiv:2311.01317, 2023.
- A. Y. Kibangou, “Finite-time average consensus based protocol for distributed estimation over awgn channels,” in 2011 50th IEEE Conference on Decision and Control and European Control Conference, 2011, pp. 5595–5600.
- S. Safavi and U. A. Khan, “Revisiting finite-time distributed algorithms via successive nulling of eigenvalues,” IEEE Signal Processing Letters, vol. 22, no. 1, pp. 54–57, 2015.
- A. Sandryhaila, S. Kar, and J. M. F. Moura, “Finite-time distributed consensus through graph filters,” in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014, pp. 1080–1084.
- X. Jiang, E. D. H. Nguyen, C. A. Uribe, and B. Ying, “Sparse factorization of the square all-ones matrix of arbitrary order,” arXiv:2401.14596, 2024.
- S. Arora, N. Cohen, N. Golowich, and W. Hu, “A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks,” in International Conference on Learning Representations, 2019.
- A. R. D. Pierro and E. S. H. Neto, “From convex feasibility to convex constrained optimization using block action projection methods and underrelaxation,” International Transactions in Operational Research, vol. 16, no. 4, p. 495–504, 2009.
- R. Albert and A.-L. Barabási, “Statistical mechanics of complex networks,” Rev. Mod. Phys., vol. 74, pp. 47–97, Jan 2002.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.