Robust Decentralized Learning with Local Updates and Gradient Tracking (2405.00965v1)
Abstract: As distributed learning applications such as Federated Learning, the Internet of Things (IoT), and Edge Computing grow, it is critical to address the shortcomings of such technologies from a theoretical perspective. As an abstraction, we consider decentralized learning over a network of communicating clients or nodes and tackle two major challenges: data heterogeneity and adversarial robustness. We propose a decentralized minimax optimization method that employs two important modules: local updates and gradient tracking. Minimax optimization is the key tool to enable adversarial training for ensuring robustness. Having local updates is essential in Federated Learning (FL) applications to mitigate the communication bottleneck, and utilizing gradient tracking is essential to proving convergence in the case of data heterogeneity. We analyze the performance of the proposed algorithm, Dec-FedTrack, in the case of nonconvex-strongly concave minimax optimization, and prove that it converges a stationary point. We also conduct numerical experiments to support our theoretical findings.
- J. Konečnỳ, H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and D. Bacon, “Federated learning: Strategies for improving communication efficiency,” arXiv preprint arXiv:1610.05492, 2016.
- P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummings et al., “Advances and open problems in federated learning,” Foundations and Trends® in Machine Learning, vol. 14, no. 1–2, pp. 1–210, 2021.
- F. Farnia, A. Reisizadeh, R. Pedarsani, and A. Jadbabaie, “An optimal transport approach to personalized federated learning,” IEEE Journal on Selected Areas in Information Theory, vol. 3, no. 2, pp. 162–171, 2022.
- A. Reisizadeh, F. Farnia, R. Pedarsani, and A. Jadbabaie, “Robust federated learning: The case of affine distribution shifts,” Advances in Neural Information Processing Systems, vol. 33, pp. 21 554–21 565, 2020.
- K. Hsieh, A. Phanishayee, O. Mutlu, and P. Gibbons, “The non-iid data quagmire of decentralized machine learning,” in International Conference on Machine Learning. PMLR, 2020, pp. 4387–4398.
- T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V. Smith, “Federated optimization in heterogeneous networks,” Proceedings of Machine learning and systems, vol. 2, pp. 429–450, 2020.
- G. Zhou, Q. Li, Y. Liu, Y. Zhao, Q. Tan, S. Yao, and K. Xu, “Fedpage: Pruning adaptively toward global efficiency of heterogeneous federated learning,” IEEE/ACM Transactions on Networking, 2023.
- A. Rodio, F. Faticanti, O. Marfoq, G. Neglia, and E. Leonardi, “Federated learning under heterogeneous and correlated client availability,” IEEE/ACM Transactions on Networking, 2023.
- L. Wang, Y. Xu, H. Xu, Z. Jiang, M. Chen, W. Zhang, and C. Qian, “Bose: Block-wise federated learning in heterogeneous edge computing,” IEEE/ACM Transactions on Networking, 2023.
- J. Liu, S. Wang, H. Xu, Y. Xu, Y. Liao, J. Huang, and H. Huang, “Federated learning with experience-driven model migration in heterogeneous edge networks,” IEEE/ACM Transactions on Networking, 2024.
- A. Koloskova, T. Lin, and S. U. Stich, “An improved analysis of gradient tracking for decentralized machine learning,” Advances in Neural Information Processing Systems, vol. 34, pp. 11 422–11 435, 2021.
- S. Pu and A. Nedić, “Distributed stochastic gradient tracking methods,” Mathematical Programming, vol. 187, pp. 409–457, 2021.
- J. Zhang and K. You, “Decentralized stochastic gradient tracking for non-convex empirical risk minimization,” arXiv preprint arXiv:1909.02712, 2019.
- C. Chen, J. Zhang, L. Shen, P. Zhao, and Z. Luo, “Communication efficient primal-dual algorithm for nonconvex nonsmooth distributed optimization,” in International Conference on Artificial Intelligence and Statistics. PMLR, 2021, pp. 1594–1602.
- H. Hendrikx, F. Bach, and L. Massoulie, “An optimal algorithm for decentralized finite-sum optimization,” SIAM Journal on Optimization, vol. 31, no. 4, pp. 2753–2783, 2021.
- D. Kovalev, A. Salim, and P. Richtárik, “Optimal and practical algorithms for smooth and strongly convex decentralized optimization,” Advances in Neural Information Processing Systems, vol. 33, pp. 18 342–18 352, 2020.
- B. Li, S. Cen, Y. Chen, and Y. Chi, “Communication-efficient distributed optimization in networks with gradient tracking and variance reduction,” Journal of Machine Learning Research, vol. 21, no. 180, pp. 1–51, 2020.
- B. Li, Z. Li, and Y. Chi, “Destress: Computation-optimal and communication-efficient decentralized nonconvex finite-sum optimization,” SIAM Journal on Mathematics of Data Science, vol. 4, no. 3, pp. 1031–1051, 2022.
- H. Li, Z. Lin, and Y. Fang, “Variance reduced extra and diging and their optimal acceleration for strongly convex decentralized optimization,” Journal of Machine Learning Research, vol. 23, no. 222, pp. 1–41, 2022.
- W. Shi, Q. Ling, G. Wu, and W. Yin, “Extra: An exact first-order algorithm for decentralized consensus optimization,” SIAM Journal on Optimization, vol. 25, no. 2, pp. 944–966, 2015.
- H. Sun, S. Lu, and M. Hong, “Improving the sample and communication complexity for decentralized non-convex optimization: Joint gradient estimation and tracking,” in International conference on machine learning. PMLR, 2020, pp. 9217–9228.
- C. A. Uribe, S. Lee, A. Gasnikov, and A. Nedić, “A dual approach for optimal algorithms in distributed optimization over networks,” in 2020 Information Theory and Applications Workshop (ITA). IEEE, 2020, pp. 1–37.
- Z. Wang, J. Zhang, T.-H. Chang, J. Li, and Z.-Q. Luo, “Distributed stochastic consensus optimization with momentum for nonconvex nonsmooth problems,” IEEE Transactions on Signal Processing, vol. 69, pp. 4486–4501, 2021.
- R. Xin, U. A. Khan, and S. Kar, “Fast decentralized nonconvex finite-sum optimization with recursive variance reduction,” SIAM Journal on Optimization, vol. 32, no. 1, pp. 1–28, 2022.
- K. Yuan, Q. Ling, and W. Yin, “On the convergence of decentralized gradient descent,” SIAM Journal on Optimization, vol. 26, no. 3, pp. 1835–1854, 2016.
- J. Liu, J. Liu, H. Xu, Y. Liao, Z. Wang, and Q. Ma, “Yoga: Adaptive layer-wise model aggregation for decentralized federated learning,” IEEE/ACM Transactions on Networking, 2023.
- Y. Liu, T. Lin, A. Koloskova, and S. U. Stich, “Decentralized gradient tracking with local steps,” 2023.
- E. D. H. Nguyen, S. A. Alghunaim, K. Yuan, and C. A. Uribe, “On the performance of gradient tracking with local updates,” 2022.
- F. Haddadpour, M. M. Kamani, A. Mokhtari, and M. Mahdavi, “Federated learning with compression: Unified analysis and sharp guarantees,” in International Conference on Artificial Intelligence and Statistics. PMLR, 2021, pp. 2350–2358.
- A. S. Berahas, R. Bollapragada, and S. Gupta, “Balancing communication and computation in gradient tracking algorithms for decentralized optimization,” arXiv preprint arXiv:2303.14289, 2023.
- I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Advances in neural information processing systems, vol. 27, 2014.
- T. Lin, C. Jin, and M. Jordan, “On gradient descent ascent for nonconvex-concave minimax problems,” in International Conference on Machine Learning. PMLR, 2020, pp. 6083–6093.
- S. Qiu, Z. Yang, X. Wei, J. Ye, and Z. Wang, “Single-timescale stochastic nonconvex-concave optimization for smooth nonlinear td learning,” arXiv preprint arXiv:2008.10103, 2020.
- L. Luo, H. Ye, Z. Huang, and T. Zhang, “Stochastic recursive gradient descent ascent for stochastic nonconvex-strongly-concave minimax problems,” Advances in Neural Information Processing Systems, vol. 33, pp. 20 566–20 577, 2020.
- S. Zhang, J. Yang, C. Guzmán, N. Kiyavash, and N. He, “The complexity of nonconvex-strongly-concave minimax optimization,” in Uncertainty in Artificial Intelligence. PMLR, 2021, pp. 482–492.
- A. Koppel, F. Y. Jakubiec, and A. Ribeiro, “A saddle point algorithm for networked online convex optimization,” IEEE Transactions on Signal Processing, vol. 63, no. 19, pp. 5149–5164, 2015.
- D. Mateos-Núnez and J. Cortés, “Distributed subgradient methods for saddle-point problems,” in 2015 54th IEEE Conference on Decision and Control (CDC). IEEE, 2015, pp. 5462–5467.
- A. Rogozin, A. Beznosikov, D. Dvinskikh, D. Kovalev, P. Dvurechensky, and A. Gasnikov, “Decentralized saddle point problems via non-euclidean mirror prox,” Optimization Methods and Software, pp. 1–26, 2024.
- A. Beznosikov, G. Scutari, A. Rogozin, and A. Gasnikov, “Distributed saddle-point problems under data similarity,” Advances in Neural Information Processing Systems, vol. 34, pp. 8172–8184, 2021.
- A. Beznosikov, A. Rogozin, D. Kovalev, and A. Gasnikov, “Near-optimal decentralized algorithms for saddle point problems over time-varying networks,” in Optimization and Applications: 12th International Conference, OPTIMA 2021, Petrovac, Montenegro, September 27–October 1, 2021, Proceedings 12. Springer, 2021, pp. 246–257.
- X. Wu, Z. Hu, and H. Huang, “Decentralized riemannian algorithm for nonconvex minimax problems,” arXiv preprint arXiv:2302.03825, 2023.
- Z. Liu, X. Zhang, S. Lu, and J. Liu, “Precision: Decentralized constrained min-max learning with low communication and sample complexities,” arXiv preprint arXiv:2303.02532, 2023.
- Y. Xu, “Decentralized gradient descent maximization method for composite nonconvex strongly-concave minimax problems,” arXiv preprint arXiv:2304.02441, 2023.
- H. Gao, “Decentralized stochastic gradient descent ascent for finite-sum minimax problems,” arXiv preprint arXiv:2212.02724, 2022.
- G. Mancino-Ball and Y. Xu, “Variance-reduced accelerated methods for decentralized stochastic double-regularized nonconvex strongly-concave minimax problems,” arXiv preprint arXiv:2307.07113, 2023.
- L. Chen, H. Ye, and L. Luo, “A simple and efficient stochastic algorithm for decentralized nonconvex-strongly-concave minimax optimization,” arXiv preprint arXiv:2212.02387, 2022.
- W. Xian, F. Huang, Y. Zhang, and H. Huang, “A faster decentralized algorithm for nonconvex minimax problems,” Advances in Neural Information Processing Systems, vol. 34, pp. 25 865–25 877, 2021.
- X. Zhang, G. Mancino-Ball, N. S. Aybat, and Y. Xu, “Jointly improving the sample and communication complexities in decentralized stochastic minimax optimization,” arXiv preprint arXiv:2307.09421, 2023.
- I. Tsaknakis, M. Hong, and S. Liu, “Decentralized min-max optimization: Formulations, algorithms and applications in network poisoning attack,” in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020, pp. 5755–5759.
- M. Liu, W. Zhang, Y. Mroueh, X. Cui, J. Ross, T. Yang, and P. Das, “A decentralized parallel algorithm for training generative adversarial nets,” Advances in Neural Information Processing Systems, vol. 33, pp. 11 056–11 070, 2020.
- A. Cutkosky and F. Orabona, “Momentum-based variance reduction in non-convex sgd,” Advances in neural information processing systems, vol. 32, 2019.
- C. Fang, C. J. Li, Z. Lin, and T. Zhang, “Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator,” Advances in neural information processing systems, vol. 31, 2018.
- C. Hou, K. K. Thekumparampil, G. Fanti, and S. Oh, “Efficient algorithms for federated saddle point optimization,” arXiv preprint arXiv:2102.06333, 2021.
- L. Liao, L. Shen, J. Duan, M. Kolar, and D. Tao, “Local adagrad-type algorithm for stochastic convex-concave minimax problems,” arXiv preprint arXiv:2106.10022, 2021.
- Z. Sun and E. Wei, “A communication-efficient algorithm with linear convergence for federated minimax learning,” Advances in Neural Information Processing Systems, vol. 35, pp. 6060–6073, 2022.
- Y. Deng, M. M. Kamani, and M. Mahdavi, “Distributionally robust federated averaging,” Advances in neural information processing systems, vol. 33, pp. 15 111–15 122, 2020.
- Y. Deng and M. Mahdavi, “Local stochastic gradient descent ascent: Convergence analysis and communication efficiency,” in International Conference on Artificial Intelligence and Statistics. PMLR, 2021, pp. 1387–1395.
- J. Xie, C. Zhang, Y. Zhang, Z. Shen, and H. Qian, “A federated learning framework for nonconvex-pl minimax problems,” arXiv preprint arXiv:2105.14216, 2021.
- P. Sharma, R. Panda, G. Joshi, and P. Varshney, “Federated minimax optimization: Improved convergence analyses and algorithms,” in International Conference on Machine Learning. PMLR, 2022, pp. 19 683–19 730.
- P. Sharma, R. Panda, and G. Joshi, “Federated minimax optimization with client heterogeneity,” arXiv preprint arXiv:2302.04249, 2023.
- H. Yang, X. Zhang, Z. Liu, and J. Liu, “Sagda: achieving o (ε𝜀\varepsilonitalic_ε-2) communication complexity in federated min-max learning,” in Proceedings of the 36th International Conference on Neural Information Processing Systems, 2022, pp. 7142–7154.
- S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard, “Universal adversarial perturbations,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1765–1773.
- T. Lin, C. Jin, and M. I. Jordan, “Near-optimal algorithms for minimax optimization,” in Conference on Learning Theory. PMLR, 2020, pp. 2738–2779.
- A. Koloskova, N. Loizou, S. Boreiri, M. Jaggi, and S. Stich, “A unified theory of decentralized sgd with changing topology and local updates,” in International Conference on Machine Learning. PMLR, 2020, pp. 5381–5393.
- S. Bubeck et al., “Convex optimization: Algorithms and complexity,” Foundations and Trends® in Machine Learning, vol. 8, no. 3-4, pp. 231–357, 2015.
- J. C. Platt, “12 fast training of support vector machines using sequential minimal optimization,” Advances in kernel methods, pp. 185–208, 1999.