Spurious Stationarity and Hardness Results for Mirror Descent (2404.08073v1)
Abstract: Despite the considerable success of Bregman proximal-type algorithms, such as mirror descent, in machine learning, a critical question remains: Can existing stationarity measures, often based on Bregman divergence, reliably distinguish between stationary and non-stationary points? In this paper, we present a groundbreaking finding: All existing stationarity measures necessarily imply the existence of spurious stationary points. We further establish an algorithmic independent hardness result: Bregman proximal-type algorithms are unable to escape from a spurious stationary point in finite steps when the initial point is unfavorable, even for convex problems. Our hardness result points out the inherent distinction between Euclidean and Bregman geometries, and introduces both fundamental theoretical and numerical challenges to both machine learning and optimization communities.
- On the rate of convergence of Bregman proximal methods in constrained variational inequalities. arXiv preprint arXiv:2211.08043.
- On linear convergence of non-Euclidean gradient methods without strong convexity and Lipschitz gradient continuity. Journal of Optimization Theory and Applications, 182(3):1068–1087.
- A descent lemma beyond Lipschitz gradient continuity: First-order methods revisited and applications. Mathematics of Operations Research, 42(2):330–348.
- Regularizing with Bregman–Moreau envelopes. SIAM Journal on Optimization, 28(4):3208–3228.
- On the hidden biases of policy mirror ascent in continuous action spaces. In Proceedings of the 39th International Conference on Machine Learning (ICML 2022), pages 1716–1731. PMLR.
- First order methods beyond convexity and Lipschitz gradient continuity with applications to quadratic inverse problems. SIAM Journal on Optimization, 28(3):2131–2151.
- Proximal minimization algorithm with D-functions. Journal of Optimization Theory and Applications, 73(3):451–464.
- Convergence analysis of a proximal-like minimization algorithm using Bregman functions. SIAM Journal on Optimization, 3(3):538–543.
- Graph optimal transport for cross-domain alignment. In Proceedings of the 37th International Conference on Machine Learning (ICML 2020), pages 1542–1553. PMLR.
- Gradient regularization of newton method with Bregman distances. Mathematical Programming, 204(1):1–25.
- Optimal complexity and certification of Bregman first-order methods. Mathematical Programming, 194(1):41–83.
- Bregman gradient policy optimization. In Proceedings of the 10th International Conference on Learning Representations (ICLR 2022).
- Enhanced bilevel optimization via Bregman distance. In Advances in Neural Information Processing Systems 35, pages 28928–28939.
- Kiwiel, K. C. (1997). Proximal minimization methods with generalized Bregman functions. SIAM journal on Control and Optimization, 35(4):1142–1168.
- Bregman Finito/MISO for nonconvex regularized finite sum minimization without Lipschitz gradient continuity. SIAM Journal on Optimization, 32(3):2230–2262.
- Bregman proximal Langevin Monte Carlo via Bregman-Moreau Envelopes. In Proceedings of the 39th International Conference on Machine Learning (ICML 2022), pages 12049–12077. PMLR.
- A convergent single-loop algorithm for relaxation of gromov-wasserstein in graph data. In Proceedings of the 11th International Conference on Learning Representations (ICLR 2023).
- Stochastic mirror descent for low-rank tensor decomposition under non-euclidean losses. IEEE Transactions on Signal Processing, 70:1803–1818.
- Variational Analysis, volume 317. Springer Science & Business Media.
- Entropic metric alignment for correspondence problems. ACM Transactions on Graphics (ToG), 35(4):1–13.
- Scalable Gromov-Wasserstein learning for graph partitioning and matching. In Advances in Neural Information Processing Systems 32.
- Gromov-Wasserstein learning for graph matching and node embedding. In Proceedings of the 36th International Conference on Machine Learning (ICML 2019), pages 6932–6941. PMLR.
- Policy optimization with stochastic mirror descent. In Proceedings of the 36th AAAI Conference on Artificial Intelligence (AAAI 2022), volume 36, pages 8823–8831.
- Proximal-like incremental aggregated gradient method with linear convergence under Bregman distance growth conditions. Mathematics of Operations Research, 46(1):61–81.
- On the convergence rate of stochastic mirror descent for nonsmooth nonconvex optimization. arXiv preprint arXiv:1806.04781.
- Level-set subdifferential error bounds and linear convergence of Bregman proximal gradient method. Journal of Optimization Theory and Applications, 189(3):889–918.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.