Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Near-Optimal Distributed Minimax Optimization under the Second-Order Similarity (2405.16126v1)

Published 25 May 2024 in math.OC and cs.LG

Abstract: This paper considers the distributed convex-concave minimax optimization under the second-order similarity. We propose stochastic variance-reduced optimistic gradient sliding (SVOGS) method, which takes the advantage of the finite-sum structure in the objective by involving the mini-batch client sampling and variance reduction. We prove SVOGS can achieve the $\varepsilon$-duality gap within communication rounds of ${\mathcal O}(\delta D2/\varepsilon)$, communication complexity of ${\mathcal O}(n+\sqrt{n}\delta D2/\varepsilon)$, and local gradient calls of $\tilde{\mathcal O}(n+(\sqrt{n}\delta+L)D2/\varepsilon\log(1/\varepsilon))$, where $n$ is the number of nodes, $\delta$ is the degree of the second-order similarity, $L$ is the smoothness parameter and $D$ is the diameter of the constraint set. We can verify that all of above complexity (nearly) matches the corresponding lower bounds. For the specific $\mu$-strongly-convex-$\mu$-strongly-convex case, our algorithm has the upper bounds on communication rounds, communication complexity, and local gradient calls of $\mathcal O(\delta/\mu\log(1/\varepsilon))$, ${\mathcal O}((n+\sqrt{n}\delta/\mu)\log(1/\varepsilon))$, and $\tilde{\mathcal O}(n+(\sqrt{n}\delta+L)/\mu)\log(1/\varepsilon))$ respectively, which are also nearly tight. Furthermore, we conduct the numerical experiments to show the empirical advantages of proposed method.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. Stochastic variance reduction for variational inequality methods. In Conference on Learning Theory, 2022.
  2. Zeyuan Allen-Zhu. Katyusha: The first direct acceleration of stochastic gradient methods. Journal of Machine Learning Research, 18(221):1–51, 2018.
  3. Communication complexity of distributed convex learning and optimization. Advances in Neural Information Processing Systems, 2015.
  4. Compression and data similarity: Combination of two techniques for communication-efficient solving of distributed variational inequalities. In International Conference on Optimization and Applications, 2022.
  5. Distributed saddle-point problems under data similarity. Advances in Neural Information Processing Systems, 2021.
  6. Decentralized local stochastic extra-gradient for variational inequalities. Advances in Neural Information Processing Systems, 2022a.
  7. Distributed methods with compressed communication for solving variational inequalities, with theoretical guarantees. Advances in Neural Information Processing Systems, 2022b.
  8. Similarity, compression and local steps: three pillars of efficient communications for distributed variational inequalities. Advances in Neural Information Processing Systems, 2023.
  9. LIBSVM: a library for support vector machines. ACM transactions on intelligent systems and technology (TIST), 2(3):1–27, 2011.
  10. Reducing noise in GAN training with variance reduced extragradient. Advances in Neural Information Processing Systems, 2019.
  11. Local stochastic gradient descent ascent: Convergence analysis and communication efficiency. In International Conference on Artificial Intelligence and Statistics, 2021.
  12. Distributed continuous-time convex optimization on weight-balanced digraphs. IEEE Transactions on Automatic Control, 59(3):781–786, 2014.
  13. Local SGD: Unified theory and new efficient methods. In International Conference on Artificial Intelligence and Statistics, 2021.
  14. Lower complexity bounds of finite-sum optimization problems: The results and construction. Journal of Machine Learning Research, 25(2):1–86, 2024.
  15. Statistically preconditioned accelerated gradient method for distributed optimization. In International conference on machine learning, 2020.
  16. Efficient algorithms for federated saddle point optimization. arXiv preprint:2102.06333, 2021.
  17. Linear lower bounds and conditioning of differentiable games. In International conference on machine learning, 2020.
  18. Advances and open problems in federated learning. Foundations and trends® in machine learning, 14(1–2):1–210, 2021.
  19. SCAFFOLD: Stochastic controlled averaging for federated learning. In International conference on machine learning, 2020.
  20. Faster federated optimization under second-order similarity. In International Conference on Learning Representations, 2023.
  21. First analysis of local GD on heterogeneous data. arXiv preprint:1909.04715, 2019.
  22. Tighter theory for local SGD on identical and heterogeneous data. In International Conference on Artificial Intelligence and Statistics, 2020.
  23. A minimax theorem with applications to machine learning, signal processing, and finance. In IEEE Conference on Decision and Control, 2007.
  24. Galina M. Korpelevich. The extragradient method for finding saddle points and other problems. Matecon, 12:747–756, 1976.
  25. Optimal gradient sliding and its application to optimal distributed optimization under similarity. Advances in Neural Information Processing Systems, 2022a.
  26. Optimal algorithms for decentralized stochastic variational inequalities. Advances in Neural Information Processing Systems, 2022b.
  27. Guanghui Lan. Gradient sliding for composite optimization. Mathematical Programming, 159:201–235, 2016.
  28. On the convergence of FedAvg on non-iid data. International Conference on Learning Representations, 2019.
  29. Stochastic distributed optimization under average second-order similarity: Algorithms and analysis. Advances in Neural Information Processing Systems, 36, 2024.
  30. A stochastic proximal point algorithm for saddle-point problems. arXiv preprint:1909.06946, 2019.
  31. Near optimal stochastic algorithms for finite-sum unbalanced convex-concave minimax optimization. arXiv preprint:2106.01761, 2021.
  32. Variance reduced ProxSkip: algorithm, theory and application to federated learning. Advances in Neural Information Processing Systems, 2022.
  33. Yu Malitsky. Projected reflected gradient methods for monotone variational inequalities. SIAM Journal on Optimization, 25(1):502–520, 2015.
  34. A forward-backward splitting method for monotone inclusions without cocoercivity. SIAM Journal on Optimization, 30(2):1451–1472, 2020.
  35. Globally convergent newton methods for ill-conditioned generalized self-concordant losses. Advances in Neural Information Processing Systems, 2019.
  36. ProxSkip: Yes! local gradient steps provably lead to communication acceleration! finally! In International Conference on Machine Learning, pages 15750–15769. PMLR, 2022.
  37. Linear convergence in federated learning: Tackling client heterogeneity and sparse gradients. Advances in Neural Information Processing Systems, 2021.
  38. Arkadi Nemirovski. Prox-method with rate of convergence O⁢(1/t)𝑂1𝑡O(1/t)italic_O ( 1 / italic_t ) for variational inequalities with lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM Journal on Optimization, 15:229–251, 2004.
  39. Yurii Nesterov. Dual extrapolation and its applications to solving variational inequalities and related problems. Mathematical Programming, 109(2):319–344, 2007.
  40. Yurii Nesterov. Introductory lectures on convex optimization: A basic course, volume 87. Springer Science & Business Media, 2013.
  41. A duality-based approach for distributed min–max optimization. IEEE Transactions on Automatic Control, 64(6):2559–2566, 2019.
  42. Lower complexity bounds of first-order methods for convex-concave bilinear saddle-point problems. Mathematical Programming, 185(1):1–35, 2021.
  43. Leonid Denisovich Popov. A modification of the arrow-hurwicz method for search of saddle points. Mathematical notes of the Academy of Sciences of the USSR, 28:845–848, 1980.
  44. Nonconvex min-max optimization: Applications, challenges, and recent theoretical advances. IEEE Signal Processing Magazine, 37(5):55–66, 2020.
  45. Decentralized distributed optimization for saddle point problems. arXiv preprint:2102.07758, 2021.
  46. Communication-efficient distributed optimization using an approximate newton-type method. In International conference on machine learning, 2014.
  47. Sebastian U. Stich. Local SGD converges fast and communicates little. arXiv preprint:1805.09767, 2018.
  48. Distributed optimization based on gradient tracking revisited: Enhancing convergence rate via surrogation. SIAM Journal on Optimization, 32(2):354–385, 2022.
  49. Permutation compressors for provably faster distributed nonconvex optimization. arXiv preprint:2110.03300, 2021.
  50. Acceleration in distributed optimization under similarity. In International Conference on Artificial Intelligence and Statistics, 2022.
  51. Paul Tseng. A modified forward-backward splitting method for maximal monotone mappings. SIAM Journal on Control and Optimization, 38(2):431–446, 2000.
  52. Robust multiobjective portfolio optimization: A minimax regret approach. European Journal of Operational Research, 262(1):299–305, 2017.
  53. Lower complexity bounds for finite-sum convex-concave minimax optimization problems. In International Conference on Machine Learning, 2020.
  54. A catalyst framework for minimax optimization. Advances in Neural Information Processing Systems, 33:5667–5678, 2020.
  55. On lower iteration complexity bounds for the convex concave saddle point problems. Mathematical Programming, 194(1):901–935, 2022.
  56. DiSCO: Distributed optimization for self-concordant empirical loss. In International Conference on Machine Learning, 2015.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com