Continuized Nesterov Momentum Achieves the $O(\varepsilon^{-7/4})$ Complexity without Additional Mechanisms
Abstract: For first-order optimization of non-convex functions with Lipschitz continuous gradient and Hessian, the best known complexity for reaching an $\varepsilon$-approximation of a stationary point is $O(\varepsilon{-7/4})$. Existing algorithms achieving this bound are based on momentum, but are always complemented with safeguard mechanisms, such as restarts or negative-curvature exploitation steps. Whether such mechanisms are fundamentally necessary has remained an open question. Leveraging the continuized method, we show that a Nesterov momentum algorithm with stochastic parameters alone achieves the same complexity in expectation. This result holds up to a multiplicative stochastic factor with unit expectation and a restriction to a subset of the realizations, both of which are independent of the objective function. We empirically verify that these constitute mild limitations.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.