Near-Optimal Distributed Minimax Optimization under the Second-Order Similarity (2405.16126v1)
Abstract: This paper considers the distributed convex-concave minimax optimization under the second-order similarity. We propose stochastic variance-reduced optimistic gradient sliding (SVOGS) method, which takes the advantage of the finite-sum structure in the objective by involving the mini-batch client sampling and variance reduction. We prove SVOGS can achieve the $\varepsilon$-duality gap within communication rounds of ${\mathcal O}(\delta D2/\varepsilon)$, communication complexity of ${\mathcal O}(n+\sqrt{n}\delta D2/\varepsilon)$, and local gradient calls of $\tilde{\mathcal O}(n+(\sqrt{n}\delta+L)D2/\varepsilon\log(1/\varepsilon))$, where $n$ is the number of nodes, $\delta$ is the degree of the second-order similarity, $L$ is the smoothness parameter and $D$ is the diameter of the constraint set. We can verify that all of above complexity (nearly) matches the corresponding lower bounds. For the specific $\mu$-strongly-convex-$\mu$-strongly-convex case, our algorithm has the upper bounds on communication rounds, communication complexity, and local gradient calls of $\mathcal O(\delta/\mu\log(1/\varepsilon))$, ${\mathcal O}((n+\sqrt{n}\delta/\mu)\log(1/\varepsilon))$, and $\tilde{\mathcal O}(n+(\sqrt{n}\delta+L)/\mu)\log(1/\varepsilon))$ respectively, which are also nearly tight. Furthermore, we conduct the numerical experiments to show the empirical advantages of proposed method.
- Stochastic variance reduction for variational inequality methods. In Conference on Learning Theory, 2022.
- Zeyuan Allen-Zhu. Katyusha: The first direct acceleration of stochastic gradient methods. Journal of Machine Learning Research, 18(221):1–51, 2018.
- Communication complexity of distributed convex learning and optimization. Advances in Neural Information Processing Systems, 2015.
- Compression and data similarity: Combination of two techniques for communication-efficient solving of distributed variational inequalities. In International Conference on Optimization and Applications, 2022.
- Distributed saddle-point problems under data similarity. Advances in Neural Information Processing Systems, 2021.
- Decentralized local stochastic extra-gradient for variational inequalities. Advances in Neural Information Processing Systems, 2022a.
- Distributed methods with compressed communication for solving variational inequalities, with theoretical guarantees. Advances in Neural Information Processing Systems, 2022b.
- Similarity, compression and local steps: three pillars of efficient communications for distributed variational inequalities. Advances in Neural Information Processing Systems, 2023.
- LIBSVM: a library for support vector machines. ACM transactions on intelligent systems and technology (TIST), 2(3):1–27, 2011.
- Reducing noise in GAN training with variance reduced extragradient. Advances in Neural Information Processing Systems, 2019.
- Local stochastic gradient descent ascent: Convergence analysis and communication efficiency. In International Conference on Artificial Intelligence and Statistics, 2021.
- Distributed continuous-time convex optimization on weight-balanced digraphs. IEEE Transactions on Automatic Control, 59(3):781–786, 2014.
- Local SGD: Unified theory and new efficient methods. In International Conference on Artificial Intelligence and Statistics, 2021.
- Lower complexity bounds of finite-sum optimization problems: The results and construction. Journal of Machine Learning Research, 25(2):1–86, 2024.
- Statistically preconditioned accelerated gradient method for distributed optimization. In International conference on machine learning, 2020.
- Efficient algorithms for federated saddle point optimization. arXiv preprint:2102.06333, 2021.
- Linear lower bounds and conditioning of differentiable games. In International conference on machine learning, 2020.
- Advances and open problems in federated learning. Foundations and trends® in machine learning, 14(1–2):1–210, 2021.
- SCAFFOLD: Stochastic controlled averaging for federated learning. In International conference on machine learning, 2020.
- Faster federated optimization under second-order similarity. In International Conference on Learning Representations, 2023.
- First analysis of local GD on heterogeneous data. arXiv preprint:1909.04715, 2019.
- Tighter theory for local SGD on identical and heterogeneous data. In International Conference on Artificial Intelligence and Statistics, 2020.
- A minimax theorem with applications to machine learning, signal processing, and finance. In IEEE Conference on Decision and Control, 2007.
- Galina M. Korpelevich. The extragradient method for finding saddle points and other problems. Matecon, 12:747–756, 1976.
- Optimal gradient sliding and its application to optimal distributed optimization under similarity. Advances in Neural Information Processing Systems, 2022a.
- Optimal algorithms for decentralized stochastic variational inequalities. Advances in Neural Information Processing Systems, 2022b.
- Guanghui Lan. Gradient sliding for composite optimization. Mathematical Programming, 159:201–235, 2016.
- On the convergence of FedAvg on non-iid data. International Conference on Learning Representations, 2019.
- Stochastic distributed optimization under average second-order similarity: Algorithms and analysis. Advances in Neural Information Processing Systems, 36, 2024.
- A stochastic proximal point algorithm for saddle-point problems. arXiv preprint:1909.06946, 2019.
- Near optimal stochastic algorithms for finite-sum unbalanced convex-concave minimax optimization. arXiv preprint:2106.01761, 2021.
- Variance reduced ProxSkip: algorithm, theory and application to federated learning. Advances in Neural Information Processing Systems, 2022.
- Yu Malitsky. Projected reflected gradient methods for monotone variational inequalities. SIAM Journal on Optimization, 25(1):502–520, 2015.
- A forward-backward splitting method for monotone inclusions without cocoercivity. SIAM Journal on Optimization, 30(2):1451–1472, 2020.
- Globally convergent newton methods for ill-conditioned generalized self-concordant losses. Advances in Neural Information Processing Systems, 2019.
- ProxSkip: Yes! local gradient steps provably lead to communication acceleration! finally! In International Conference on Machine Learning, pages 15750–15769. PMLR, 2022.
- Linear convergence in federated learning: Tackling client heterogeneity and sparse gradients. Advances in Neural Information Processing Systems, 2021.
- Arkadi Nemirovski. Prox-method with rate of convergence O(1/t)𝑂1𝑡O(1/t)italic_O ( 1 / italic_t ) for variational inequalities with lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM Journal on Optimization, 15:229–251, 2004.
- Yurii Nesterov. Dual extrapolation and its applications to solving variational inequalities and related problems. Mathematical Programming, 109(2):319–344, 2007.
- Yurii Nesterov. Introductory lectures on convex optimization: A basic course, volume 87. Springer Science & Business Media, 2013.
- A duality-based approach for distributed min–max optimization. IEEE Transactions on Automatic Control, 64(6):2559–2566, 2019.
- Lower complexity bounds of first-order methods for convex-concave bilinear saddle-point problems. Mathematical Programming, 185(1):1–35, 2021.
- Leonid Denisovich Popov. A modification of the arrow-hurwicz method for search of saddle points. Mathematical notes of the Academy of Sciences of the USSR, 28:845–848, 1980.
- Nonconvex min-max optimization: Applications, challenges, and recent theoretical advances. IEEE Signal Processing Magazine, 37(5):55–66, 2020.
- Decentralized distributed optimization for saddle point problems. arXiv preprint:2102.07758, 2021.
- Communication-efficient distributed optimization using an approximate newton-type method. In International conference on machine learning, 2014.
- Sebastian U. Stich. Local SGD converges fast and communicates little. arXiv preprint:1805.09767, 2018.
- Distributed optimization based on gradient tracking revisited: Enhancing convergence rate via surrogation. SIAM Journal on Optimization, 32(2):354–385, 2022.
- Permutation compressors for provably faster distributed nonconvex optimization. arXiv preprint:2110.03300, 2021.
- Acceleration in distributed optimization under similarity. In International Conference on Artificial Intelligence and Statistics, 2022.
- Paul Tseng. A modified forward-backward splitting method for maximal monotone mappings. SIAM Journal on Control and Optimization, 38(2):431–446, 2000.
- Robust multiobjective portfolio optimization: A minimax regret approach. European Journal of Operational Research, 262(1):299–305, 2017.
- Lower complexity bounds for finite-sum convex-concave minimax optimization problems. In International Conference on Machine Learning, 2020.
- A catalyst framework for minimax optimization. Advances in Neural Information Processing Systems, 33:5667–5678, 2020.
- On lower iteration complexity bounds for the convex concave saddle point problems. Mathematical Programming, 194(1):901–935, 2022.
- DiSCO: Distributed optimization for self-concordant empirical loss. In International Conference on Machine Learning, 2015.