Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Variable Sample-size Stochastic Quasi-Newton Method for Smooth and Nonsmooth Stochastic Convex Optimization (1804.05368v5)

Published 15 Apr 2018 in math.OC

Abstract: Classical theory for quasi-Newton schemes has focused on smooth deterministic unconstrained optimization while recent forays into stochastic convex optimization have largely resided in smooth, unconstrained, and strongly convex regimes. Naturally, there is a compelling need to address nonsmoothness, the lack of strong convexity, and the presence of constraints. Accordingly, this paper presents a quasi-Newton framework that can process merely convex and possibly nonsmooth (but smoothable) stochastic convex problems. We propose a framework that combines iterative smoothing and regularization with a variance-reduced scheme reliant on using increasing sample-sizes of gradients. We make the following contributions. (i) We develop a regularized and smoothed variable sample-size BFGS update (rsL-BFGS) that generates a sequence of Hessian approximations and can accommodate nonsmooth convex objectives by utilizing iterative regularization and smoothing. (ii) In strongly convex regimes with state-dependent noise, the proposed variable sample-size stochastic quasi-Newton scheme admits a non-asymptotic linear rate of convergence while the oracle complexity of computing an $\epsilon$-solution is $\mathcal{O}(\kappa{m+1}/\epsilon)$ where $\kappa$ is the condition number and $m\geq 1$. In nonsmooth (but smoothable) regimes, using Moreau smoothing retains the linear convergence rate. To contend with the possible unavailability of Lipschitzian and strong convexity parameters, we also provide sublinear rates; (iii) In merely convex but smooth settings, the regularized VS-SQN scheme rVS-SQN displays a rate of $\mathcal{O}(1/k{(1-\varepsilon)})$. When the smoothness requirements are weakened, the rate for the regularized and smoothed VS-SQN scheme worsens to $\mathcal{O}(k{-1/3})$. Such statements allow for a state-dependent noise assumption under a quadratic growth property.

Summary

We haven't generated a summary for this paper yet.