Compressed Gradient Tracking for Decentralized Optimization Over General Directed Networks (2106.07243v4)
Abstract: In this paper, we propose two communication efficient decentralized optimization algorithms over a general directed multi-agent network. The first algorithm, termed Compressed Push-Pull (CPP), combines the gradient tracking Push-Pull method with communication compression. We show that CPP is applicable to a general class of unbiased compression operators and achieves linear convergence rate for strongly convex and smooth objective functions. The second algorithm is a broadcast-like version of CPP (B-CPP), and it also achieves linear convergence rate under the same conditions on the objective functions. B-CPP can be applied in an asynchronous broadcast setting and further reduce communication costs compared to CPP. Numerical experiments complement the theoretical analysis and confirm the effectiveness of the proposed methods.
- K. Cohen, A. Nedić, and R. Srikant, “On projected stochastic gradient descent algorithm with weighted averaging for least squares regression,” IEEE Transactions on Automatic Control, vol. 62, no. 11, pp. 5974–5981, 2017.
- A. I. Forrester, A. Sóbester, and A. J. Keane, “Multi-fidelity optimization via surrogate modelling,” in Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, vol. 463. The Royal Society, 2007, pp. 3251–3269.
- A. Nedić, A. Olshevsky, and C. A. Uribe, “Fast convergence rates for distributed non-Bayesian learning,” IEEE Transactions on Automatic Control, vol. 62, no. 11, pp. 5538–5553, 2017.
- J. Chen and A. H. Sayed, “Diffusion adaptation strategies for distributed optimization and learning over networks,” IEEE Transactions on Signal Processing, vol. 60, no. 8, pp. 4289–4305, 2012.
- S. Pu, A. Garcia, and Z. Lin, “Noise reduction by swarming in social foraging,” IEEE Transactions on Automatic Control, vol. 61, no. 12, pp. 4007–4013, 2016.
- B. Baingana, G. Mateos, and G. B. Giannakis, “Proximal-gradient algorithms for tracking cascades over social networks,” IEEE Journal of Selected Topics in Signal Processing, vol. 8, no. 4, pp. 563–575, 2014.
- K. Cohen, A. Nedić, and R. Srikant, “Distributed learning algorithms for spectrum sharing in spatial random access wireless networks,” IEEE Transactions on Automatic Control, vol. 62, no. 6, pp. 2854–2869, 2017.
- G. Mateos and G. B. Giannakis, “Distributed recursive least-squares: Stability and performance analysis,” IEEE Transactions on Signal Processing, vol. 60, no. 7, pp. 3740–3754, 2012.
- A. Nedić and J. Liu, “Distributed optimization for control,” Annual Review of Control, Robotics, and Autonomous Systems, vol. 1, pp. 77–103, 2018.
- A. Nedic and A. Ozdaglar, “Distributed subgradient methods for multi-agent optimization,” IEEE Transactions on Automatic Control, vol. 54, no. 1, pp. 48–61, 2009.
- Z. Li, W. Shi, and M. Yan, “A decentralized proximal-gradient method with network independent step-sizes and separated convergence rates,” IEEE Transactions on Signal Processing, vol. 67, no. 17, pp. 4494–4506, 2019.
- G. Qu and N. Li, “Harnessing smoothness to accelerate distributed optimization,” IEEE Transactions on Control of Network Systems, vol. 5, no. 3, pp. 1245–1260, 2017.
- K. Scaman, F. Bach, S. Bubeck, L. Massoulié, and Y. T. Lee, “Optimal algorithms for non-smooth distributed optimization in networks,” in Advances in Neural Information Processing Systems, 2018, pp. 2745–2754.
- W. Shi, Q. Ling, G. Wu, and W. Yin, “EXTRA: An exact first-order algorithm for decentralized consensus optimization,” SIAM Journal on Optimization, vol. 25, no. 2, pp. 944–966, 2015.
- C. A. Uribe, S. Lee, A. Gasnikov, and A. Nedić, “A dual approach for optimal algorithms in distributed optimization over networks,” Optimization Methods and Software, vol. 36, no. 1, pp. 171–210, 2021. [Online]. Available: https://doi.org/10.1080/10556788.2020.1750013
- J. Xu, S. Zhu, Y. C. Soh, and L. Xie, “Augmented distributed gradient methods for multi-agent optimization under uncoordinated constant stepsizes,” in 2015 54th IEEE Conference on Decision and Control (CDC). IEEE, 2015, pp. 2055–2060.
- D. Kempe, A. Dobra, and J. Gehrke, “Gossip-based computation of aggregate information,” in 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings. IEEE, 2003, pp. 482–491.
- A. Nedić and A. Olshevsky, “Distributed optimization over time-varying directed graphs,” IEEE Transactions on Automatic Control, vol. 60, no. 3, pp. 601–615, 2014.
- C. Xi and U. A. Khan, “Dextra: A fast algorithm for optimization over directed graphs,” IEEE Transactions on Automatic Control, vol. 62, no. 10, pp. 4980–4993, 2017.
- J. Zeng and W. Yin, “ExtraPush for convex smooth decentralized optimization over directed networks,” Journal of Computational Mathematics, vol. 35, no. 4, pp. 383–396, 2017.
- A. Nedic, A. Olshevsky, and W. Shi, “Achieving geometric convergence for distributed optimization over time-varying graphs,” SIAM Journal on Optimization, vol. 27, no. 4, pp. 2597–2633, 2017.
- Y. Tian, Y. Sun, and G. Scutari, “ASY-SONATA: Achieving linear convergence in distributed asynchronous multiagent optimization,” in 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton). IEEE, 2018, pp. 543–551.
- C. Xi, R. Xin, and U. A. Khan, “ADD-OPT: Accelerated distributed directed optimization,” IEEE Transactions on Automatic Control, vol. 63, no. 5, pp. 1329–1339, 2017.
- S. Pu, W. Shi, J. Xu, and A. Nedic, “Push-pull gradient methods for distributed optimization in networks,” IEEE Transactions on Automatic Control, 2020.
- R. Xin and U. A. Khan, “A linear algorithm for optimization over directed graphs with geometric convergence,” IEEE Control Systems Letters, vol. 2, no. 3, pp. 315–320, 2018.
- F. Saadatniaki, R. Xin, and U. A. Khan, “Decentralized optimization over time-varying directed graphs with row and column-stochastic matrices,” IEEE Transactions on Automatic Control, vol. 65, no. 11, pp. 4769–4780, 2020.
- J. Zhang and K. You, “Fully asynchronous distributed optimization with linear convergence in directed networks,” arXiv preprint arXiv:1901.08215, 2021.
- D. Alistarh, D. Grubic, J. Li, R. Tomioka, and M. Vojnovic, “QSGD: Communication-efficient SGD via gradient quantization and encoding,” Advances in Neural Information Processing Systems, pp. 1709–1720, 2017.
- J. Bernstein, Y.-X. Wang, K. Azizzadenesheli, and A. Anandkumar, “signsgd: Compressed optimisation for non-convex problems,” in International Conference on Machine Learning. PMLR, 2018, pp. 560–569.
- A. Beznosikov, S. Horváth, P. Richtárik, and M. Safaryan, “On biased compression for distributed learning,” arXiv preprint arXiv:2002.12410, 2020.
- S. P. Karimireddy, Q. Rebjock, S. Stich, and M. Jaggi, “Error feedback fixes signsgd and other gradient compression schemes,” in International Conference on Machine Learning. PMLR, 2019, pp. 3252–3261.
- J. Liu, C. Zhang et al., “Distributed learning systems with first-order methods,” Foundations and Trends® in Databases, vol. 9, no. 1, pp. 1–100, 2020.
- X. Liu, Y. Li, J. Tang, and M. Yan, “A double residual compression algorithm for efficient distributed learning,” in International Conference on Artificial Intelligence and Statistics. PMLR, 2020, pp. 133–143.
- K. Mishchenko, E. Gorbunov, M. Takáč, and P. Richtárik, “Distributed learning with compressed gradient differences,” arXiv preprint arXiv:1901.09269, 2019.
- F. Seide, H. Fu, J. Droppo, G. Li, and D. Yu, “1-bit stochastic gradient descent and its application to data-parallel distributed training of speech dnns,” in Fifteenth Annual Conference of the International Speech Communication Association, 2014.
- S. U. Stich, “On communication compression for distributed optimization on heterogeneous data,” arXiv preprint arXiv:2009.02388, 2020.
- S. U. Stich, J.-B. Cordonnier, and M. Jaggi, “Sparsified SGD with memory,” in Advances in Neural Information Processing Systems, 2018, pp. 4447–4458.
- H. Tang, C. Yu, X. Lian, T. Zhang, and J. Liu, “Doublesqueeze: Parallel stochastic gradient descent with double-pass error-compensated compression,” in International Conference on Machine Learning. PMLR, 2019, pp. 6155–6165.
- H. Xu, C.-Y. Ho, A. M. Abdelmoniem, A. Dutta, E. H. Bergou, K. Karatsenidis, M. Canini, and P. Kalnis, “Compressed communication for distributed deep learning: Survey and quantitative evaluation,” King Abdullah University of Science and Technology (KAUST), Tech. Rep., 2020.
- Y. Lin, S. Han, H. Mao, Y. Wang, and B. Dally, “Deep gradient compression: Reducing the communication bandwidth for distributed training,” in International Conference on Learning Representations, 2018. [Online]. Available: https://openreview.net/forum?id=SkhQHMW0W
- L. Xiao, S. Boyd, and S. Lall, “A scheme for robust distributed sensor fusion based on average consensus,” in IPSN 2005. Fourth International Symposium on Information Processing in Sensor Networks, 2005. IEEE, 2005, pp. 63–70.
- R. Carli, F. Fagnani, P. Frasca, T. Taylor, and S. Zampieri, “Average consensus on networks with transmission noise or quantization,” in 2007 European Control Conference (ECC). IEEE, 2007, pp. 1852–1857.
- A. Nedic, A. Olshevsky, A. Ozdaglar, and J. N. Tsitsiklis, “Distributed subgradient methods and quantization effects,” in 2008 47th IEEE Conference on Decision and Control. IEEE, 2008, pp. 4177–4184.
- T. C. Aysal, M. J. Coates, and M. G. Rabbat, “Distributed average consensus with dithered quantization,” IEEE Transactions on Signal Processing, vol. 56, no. 10, pp. 4905–4918, 2008.
- R. Carli, F. Fagnani, P. Frasca, and S. Zampieri, “Gossip consensus algorithms via quantized communication,” Automatica, vol. 46, no. 1, pp. 70–80, 2010.
- D. Yuan, S. Xu, H. Zhao, and L. Rong, “Distributed dual averaging method for multi-agent optimization with quantized communication,” Systems & Control Letters, vol. 61, no. 11, pp. 1053–1061, 2012.
- A. Reisizadeh, A. Mokhtari, H. Hassani, and R. Pedarsani, “An exact quantized decentralized gradient descent algorithm,” IEEE Transactions on Signal Processing, vol. 67, no. 19, pp. 4934–4947, 2019.
- R. Carli, F. Bullo, and S. Zampieri, “Quantized average consensus via dynamic coding/decoding schemes,” International Journal of Robust and Nonlinear Control: IFAC-Affiliated Journal, vol. 20, no. 2, pp. 156–175, 2010.
- T. T. Doan, S. T. Maguluri, and J. Romberg, “Fast convergence rates of distributed subgradient methods with adaptive quantization,” IEEE Transactions on Automatic Control, vol. 66, no. 5, pp. 2191–2205, 2020.
- A. S. Berahas, C. Iakovidou, and E. Wei, “Nested distributed gradient methods with adaptive quantized communication,” in 2019 IEEE 58th Conference on Decision and Control (CDC). IEEE, 2019, pp. 1519–1525.
- Z. Li, D. Kovalev, X. Qian, and P. Richtarik, “Acceleration for compressed gradient descent in distributed and federated optimization,” in International Conference on Machine Learning. PMLR, 2020, pp. 5895–5904.
- X. Liu, Y. Li, R. Wang, J. Tang, and M. Yan, “Linear convergent decentralized optimization with compression,” in International Conference on Learning Representations, 2021. [Online]. Available: https://openreview.net/forum?id=84gjULz1t5
- A. Koloskova, S. Stich, and M. Jaggi, “Decentralized stochastic optimization and gossip algorithms with compressed communication,” in International Conference on Machine Learning. PMLR, 2019, pp. 3478–3487.
- A. Koloskova*, T. Lin*, S. U. Stich, and M. Jaggi, “Decentralized deep learning with arbitrary communication compression,” in International Conference on Learning Representations, 2020. [Online]. Available: https://openreview.net/forum?id=SkgGCkrKvH
- H. Tang, S. Gan, C. Zhang, T. Zhang, and J. Liu, “Communication compression for decentralized training,” in Advances in Neural Information Processing Systems, 2018, pp. 7663–7673.
- H. Tang, X. Lian, S. Qiu, L. Yuan, C. Zhang, T. Zhang, and J. Liu, “Deepsqueeze: Decentralization meets error-compensated compression,” arXiv preprint arXiv:1907.07346, 2019.
- Y. Kajiyama, N. Hayashi, and S. Takai, “Linear convergence of consensus-based quantized optimization for smooth and strongly convex cost functions,” IEEE Transactions on Automatic Control, 2020.
- Z. Li, Y. Liao, K. Huang, and S. Pu, “Compressed gradient tracking for decentralized optimization with linear convergence,” arXiv preprint arXiv:2103.13748, 2021.
- Y. Xiong, L. Wu, K. You, and L. Xie, “Quantized distributed gradient tracking algorithm with linear convergence in directed networks,” arXiv preprint arXiv:2104.03649, 2021.
- J. Zhang, K. You, and L. Xie, “Innovation compression for communication-efficient distributed optimization with linear convergence,” arXiv preprint arXiv:2105.06697, 2021.
- T. C. Aysal, M. E. Yildiz, A. D. Sarwate, and A. Scaglione, “Broadcast gossip algorithms for consensus,” IEEE Transactions on Signal processing, vol. 57, no. 7, pp. 2748–2761, 2009.
- S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah, “Randomized gossip algorithms,” IEEE Transactions on Information Theory, vol. 52, no. 6, pp. 2508–2530, 2006.
- S. Pu and A. Nedić, “Distributed stochastic gradient tracking methods,” Mathematical Programming, pp. 1–49, 2020.
- K. Mansouri, T. Ringsted, D. Ballabio, R. Todeschini, and V. Consonni, “Quantitative structure–activity relationship models for ready biodegradability of chemicals,” Journal of chemical information and modeling, vol. 53, no. 4, pp. 867–878, 2013.