Asynchronous Variance-reduced Block Schemes for Composite Nonconvex Stochastic Optimization: Block-specific Steplengths and Adapted Batch-sizes (1808.02543v4)
Abstract: We consider the minimization of a sum of an expectation-valued coordinate-wise $L_i$-smooth nonconvex function and a nonsmooth block-separable convex regularizer. We propose an asynchronous variance-reduced algorithm, where in each iteration, a single block is randomly chosen to update its estimates by a proximal variable sample-size stochastic gradient scheme, while the remaining blocks are kept invariant. Notably, each block employs a steplength that is in accordance with its block-specific Lipschitz constant while block-specific batch-sizes are random variables updated at a rate that grows either at a geometric or polynomial rate with the (random) number of times that block is selected. We show that every limit point for almost every sample path is a stationary point and establish the ergodic non-asymptotic rate $\mathcal{O}(1/K) $. Iteration and oracle complexity to obtain an $\epsilon$-stationary point are shown to be $\mathcal{O}(1/\epsilon)$ and $\mathcal{O}(1/\epsilon2)$, respectively. Furthermore, under a $ \mu $-proximal Polyak-{\L}ojasiewicz (PL) condition with the batch size increasing at a geometric rate, we prove that the suboptimality diminishes at a {\em geometric} rate, the {\em optimal} deterministic rate while iteration and oracle complexity to obtain an $\epsilon$-optimal solution are proven to be $\mathcal{O}( (L_{\rm max}/\mu) \ln(1/\epsilon))$ and $\mathcal{O}\left((L_{\rm ave}/\mu) (1/\epsilon){1+c} \right)$ with $c\geq 0$, respectively. In pursuit of less aggressive sampling rates, when the batch sizes increase at a polynomial rate of degree $v \geq 1$, suboptimality decays at a corresponding polynomial rate while the iteration and oracle complexity to obtain an $\epsilon-$optimal solution are provably $\mathcal{O} ( v(1/\epsilon){1/v})$ and $\mathcal{O} \left(ev v{2v+1}(1/\epsilon){1+1/v}\right)$, respectively.