Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fast decentralized non-convex finite-sum optimization with recursive variance reduction (2008.07428v6)

Published 17 Aug 2020 in math.OC, cs.LG, cs.MA, cs.SY, eess.SY, and stat.ML

Abstract: This paper considers decentralized minimization of $N:=nm$ smooth non-convex cost functions equally divided over a directed network of $n$ nodes. Specifically, we describe a stochastic first-order gradient method, called GT-SARAH, that employs a SARAH-type variance reduction technique and gradient tracking (GT) to address the stochastic and decentralized nature of the problem. We show that GT-SARAH, with appropriate algorithmic parameters, finds an $\epsilon$-accurate first-order stationary point with $O\big(\max\big{N{\frac{1}{2}},n(1-\lambda){-2},n{\frac{2}{3}}m{\frac{1}{3}}(1-\lambda){-1}\big}L\epsilon{-2}\big)$ gradient complexity, where ${(1-\lambda)\in(0,1]}$ is the spectral gap of the network weight matrix and $L$ is the smoothness parameter of the cost functions. This gradient complexity outperforms that of the existing decentralized stochastic gradient methods. In particular, in a big-data regime such that ${n = O(N{\frac{1}{2}}(1-\lambda){3})}$, this gradient complexity furthers reduces to ${O(N{\frac{1}{2}}L\epsilon{-2})}$, independent of the network topology, and matches that of the centralized near-optimal variance-reduced methods. Moreover, in this regime GT-SARAH achieves a non-asymptotic linear speedup, in that, the total number of gradient computations at each node is reduced by a factor of $1/n$ compared to the centralized near-optimal algorithms that perform all gradient computations at a single node. To the best of our knowledge, GT-SARAH is the first algorithm that achieves this property. In addition, we show that appropriate choices of local minibatch size balance the trade-offs between the gradient and communication complexity of GT-SARAH. Over infinite time horizon, we establish that all nodes in GT-SARAH asymptotically achieve consensus and converge to a first-order stationary point in the almost sure and mean-squared sense.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Ran Xin (25 papers)
  2. Usman A. Khan (56 papers)
  3. Soummya Kar (147 papers)
Citations (40)

Summary

We haven't generated a summary for this paper yet.